[ovs-dev] [PATCH v3 0/7] DPCLS Subtable ISA Optimization
William Tu
u9012063 at gmail.com
Tue Jun 16 15:50:06 UTC 2020
On Wed, Jun 10, 2020 at 3:47 AM Harry van Haaren
<harry.van.haaren at intel.com> wrote:
>
> v3 Changes Summary:
> - Added new "subtable lookup get" command for ease of use
> - Changed set command to include "prio" aligning with other commands
> - Improved output of "subtable lookup prio set" command
> - Added documentation
> - Minor code cleanups, #defines for magic numbers, typos etc
> - Implement fix for hash-mismatch issue (reported by William Tu)
>
Thanks,
I benchmark v3 with EMC disable.
Similar to the conclusion from v2, overall with AVX512 lookup enabled,
the overall performance is slower, but the miniflow_lookup is faster.
=== Without AVX ===
root at instance-3:~/ovs# ovs-appctl dpif-netdev/pmd-stats-show
pmd thread numa_id 0 core_id 0:
packets received: 213457536
packet recirculations: 0
avg. datapath passes per packet: 1.00
emc hits: 0
smc hits: 0
megaflow hits: 213457119
avg. subtable lookups per megaflow hit: 1.00
miss with success upcall: 1
miss with failed upcall: 416
avg. packets per output batch: 0.00
idle cycles: 0 (0.00%)
processing cycles: 49779856442 (100.00%)
avg cycles per packet: 233.21 (49779856442/213457536)
avg processing cycles per packet: 233.21 (49779856442/213457536)
=== With AVX512 ===
./boot.sh && ./configure CFLAGS="-g -O2 -mpopcnt -msse4.2
-march=native" --enable-Werror --with-dpdk=/usr/src/dpdk/build/ &&
make -j4 && make install
root at instance-3:~/ovs# ovs-appctl dpif-netdev/pmd-stats-show
pmd thread numa_id 0 core_id 0:
packets received: 130351552
packet recirculations: 0
avg. datapath passes per packet: 1.00
emc hits: 0
smc hits: 0
megaflow hits: 130351071
avg. subtable lookups per megaflow hit: 1.00
miss with success upcall: 1
miss with failed upcall: 480
avg. packets per output batch: 0.00
idle cycles: 0 (0.00%)
processing cycles: 31506266904 (100.00%)
avg cycles per packet: 241.70 (31506266904/130351552)
avg processing cycles per packet: 241.70 (31506266904/130351552)
> v4 Planned work:
> - Add NEWS section
> - Investigate/fix --enable-shared builds link-time issues
> - Enable autovalidator to run with unit-tests without recompilation
> (Already works now, but requires manual priority change at compile time)
> - Address other feedback on v3
>
>
> This patchset implements the changes as proposed during the
> OVS Conf '19, in the talk "Next steps for SW Datapath".
> Youtube link: https://youtu.be/x0bOpojnpmU
>
> The talk raises 3 main requirements for CPU ISA Optimizations,
> each of which is addressed in some of the patches below.
> - Test & Validation (video @ 2:20)
> - Usabiliity & Debug (video @ 6:00)
> - Package & Deploy (video @ 8:45)
>
> Patch 1/7:
> The test and validation requirements proposed above are implemented,
> with the refactor of the subtable function pointer registration,
> and the autovalidator implementation is added.
>
> Patch 2 & 3 / 7:
> Adds the commands for usability & debug. Now improved with a "get" and
> "set" command. Get returns current priorities and a list of each lookup
> implementation. Set provides feedback to the user as to the number of
> DPCLS ports/subtables that have new lookup functions due to the command
> that was executed.
>
> Patch 4/7:
> Enable CPU ISA detection at runtime, providing information for future
> ISA optimized functions.
>
> Patch 5/7:
> Build system changes to enable the Package & Deploy requirements,
> allowing a single OVS binary to run on all CPUs, but also gain best
> performance from CPU specific ISA optimizations.
>
> Patch 6/7:
> Actual AVX-512 implementation for DPCLS subtable search. This is the
> actual SIMD vector code, which performs DPCLS miniflow iteration in
> parallel.
>
> Patch 7/7:
> Add section in dpdk/bridges.rst on how to use the DPCLS commands, and
> what they can be used for. Testing and validation using autovalidator
> concept introduced, and command to set its priority is provided.
>
>
> Thanks for reading, any questions please let me know.
> Regards, -Harry
>
>
> Harry van Haaren (7):
> dpif: implement subtable lookup validation
> dpif-netdev: add subtable lookup set command
> dpif-netdev: add subtable-lookup-get command for usability
> dpcls: enable cpu feature detection
> lib/automake: split build multiple static library
> dpif-lookup: add avx512 gather implementation
> docs/dpdk/bridge: add datapath performance section
>
> Documentation/topics/dpdk/bridge.rst | 63 ++++++
> lib/automake.mk | 69 +++++--
> lib/dpdk-stub.c | 13 ++
> lib/dpdk.c | 27 +++
> lib/dpdk.h | 2 +
> lib/dpif-netdev-lookup-autovalidator.c | 106 ++++++++++
> lib/dpif-netdev-lookup-avx512-gather.c | 265 +++++++++++++++++++++++++
> lib/dpif-netdev-lookup-generic.c | 9 +-
> lib/dpif-netdev-lookup.c | 111 +++++++++++
> lib/dpif-netdev-lookup.h | 82 ++++++++
> lib/dpif-netdev-private.h | 15 --
> lib/dpif-netdev.c | 165 ++++++++++++++-
> 12 files changed, 884 insertions(+), 43 deletions(-)
> create mode 100644 lib/dpif-netdev-lookup-autovalidator.c
> create mode 100644 lib/dpif-netdev-lookup-avx512-gather.c
> create mode 100644 lib/dpif-netdev-lookup.c
> create mode 100644 lib/dpif-netdev-lookup.h
>
> --
> 2.17.1
>
More information about the dev
mailing list