[ovs-dev] [PATCH v9 0/5] dpcls func ptrs & optimizations

Harry van Haaren harry.van.haaren at intel.com
Wed May 8 15:13:16 UTC 2019

1) As per v7, the function pointer rework for dpcls
2) Last 2 patches include specialized scalar optimizations

v9: use count_1bits(), ALWAYS_INLINE, and rebased.
v8: fixed variable-lenght array issues.

Running with Eth/IPv4/UDP traffic should show performance improvements,
with EMC/SMC disabled (so just DPCLS traffic), on a simple test case there
is a > 15% speedup. Please test this patchset, and report back numbers!

Patchset Details;
The code is split into 5 patches to make the code traceable during
review, as the resulting code is quite different to today's dpcls_lookup.

Checkpatch flags two warnings, which I believe to not be sanely fixable
due to the way MACROs accept arguments.

Running TESTSUITE shows all passing, with ~22 tests being skipped, which
is the same as before this patchset.

I've tried to get sparse running to check locally, however I'm having issues
getting that working. I'll dig in more, however didn't want to delay sending
of this patch-set. As the VLAs have been removed (the only warnings I saw in
the sparse output) I think this should be clean now.

Per patch details:
1) Refactor dpcls_lookup and the subtable for flexibility.
In particular, add a function pointer to the subtable
structure, which enables "plugging-in" a lookup function
at runtime. This enables a number of optimizations in future.

2) and 3)
With the function pointer in place, we refactor the existing
dpcls_lookup matching code into its own function, and later its
own file. To split it to its own file requires making various
dpcls data-structures available in the dpif-netdev.h header.

Refactor the existing code, to favour compute of flat arrays of
miniflows, instead of the MACRO based iteration. This simplifies
the code itself, and makes future optimizations possible due to
simplified loop structures, and loop trip counts pass in via
function arguments. See commit message for more details.

This patch implements a select few specialized functions, for handling
miniflows with 5-1, 4-1, and 4-0 miniflow unit bit patterns. More of
these types of functions can (and should) be added to accelerate other
patterns of subtable lookups! See commit message for more details.

As always: feedback, suggestions, performance numbers all welcome!
Regards -Harry

Harry van Haaren (5):
  dpif-netdev: implement function pointers/subtable
  dpif-netdev: move dpcls lookup structures to .h
  dpif-netdev: split out generic lookup function
  dpif-netdev: refactor generic implementation
  dpif-netdev: add specialized generic scalar functions

 lib/automake.mk                  |   1 +
 lib/dpif-netdev-lookup-generic.c | 298 +++++++++++++++++++++++++++++++
 lib/dpif-netdev.c                | 195 ++++++++++----------
 lib/dpif-netdev.h                |  94 ++++++++++
 4 files changed, 491 insertions(+), 97 deletions(-)
 create mode 100644 lib/dpif-netdev-lookup-generic.c


