[ovs-dev] [PATCH v3 0/7] DPCLS Subtable ISA Optimization

William Tu u9012063 at gmail.com
Tue Jun 16 15:50:06 UTC 2020


On Wed, Jun 10, 2020 at 3:47 AM Harry van Haaren
<harry.van.haaren at intel.com> wrote:
>
> v3 Changes Summary:
> - Added new "subtable lookup get" command for ease of use
> - Changed set command to include "prio" aligning with other commands
> - Improved output of "subtable lookup prio set" command
> - Added documentation
> - Minor code cleanups, #defines for magic numbers, typos etc
> - Implement fix for hash-mismatch issue (reported by William Tu)
>
Thanks,
I benchmark v3 with EMC disable.
Similar to the conclusion from v2, overall with AVX512 lookup enabled,
the overall performance is slower, but the miniflow_lookup is faster.

=== Without AVX ===
root at instance-3:~/ovs# ovs-appctl dpif-netdev/pmd-stats-show
pmd thread numa_id 0 core_id 0:
  packets received: 213457536
  packet recirculations: 0
  avg. datapath passes per packet: 1.00
  emc hits: 0
  smc hits: 0
  megaflow hits: 213457119
  avg. subtable lookups per megaflow hit: 1.00
  miss with success upcall: 1
  miss with failed upcall: 416
  avg. packets per output batch: 0.00
  idle cycles: 0 (0.00%)
  processing cycles: 49779856442 (100.00%)
  avg cycles per packet: 233.21 (49779856442/213457536)
  avg processing cycles per packet: 233.21 (49779856442/213457536)

=== With AVX512 ===
./boot.sh && ./configure CFLAGS="-g -O2 -mpopcnt -msse4.2
-march=native" --enable-Werror --with-dpdk=/usr/src/dpdk/build/ &&
make -j4 && make install
root at instance-3:~/ovs# ovs-appctl dpif-netdev/pmd-stats-show
pmd thread numa_id 0 core_id 0:
  packets received: 130351552
  packet recirculations: 0
  avg. datapath passes per packet: 1.00
  emc hits: 0
  smc hits: 0
  megaflow hits: 130351071
  avg. subtable lookups per megaflow hit: 1.00
  miss with success upcall: 1
  miss with failed upcall: 480
  avg. packets per output batch: 0.00
  idle cycles: 0 (0.00%)
  processing cycles: 31506266904 (100.00%)
  avg cycles per packet: 241.70 (31506266904/130351552)
  avg processing cycles per packet: 241.70 (31506266904/130351552)



> v4 Planned work:
> - Add NEWS section
> - Investigate/fix --enable-shared builds link-time issues
> - Enable autovalidator to run with unit-tests without recompilation
>   (Already works now, but requires manual priority change at compile time)
> - Address other feedback on v3
>
>
> This patchset implements the changes as proposed during the
> OVS Conf '19, in the talk "Next steps for SW Datapath".
> Youtube link: https://youtu.be/x0bOpojnpmU
>
> The talk raises 3 main requirements for CPU ISA Optimizations,
> each of which is addressed in some of the patches below.
> - Test & Validation (video @ 2:20)
> - Usabiliity & Debug (video @ 6:00)
> - Package & Deploy (video @ 8:45)
>
> Patch 1/7:
> The test and validation requirements proposed above are implemented,
> with the refactor of the subtable function pointer registration,
> and the autovalidator implementation is added.
>
> Patch 2 & 3 / 7:
> Adds the commands for usability & debug. Now improved with a "get" and
> "set" command. Get returns current priorities and a list of each lookup
> implementation. Set provides feedback to the user as to the number of
> DPCLS ports/subtables that have new lookup functions due to the command
> that was executed.
>
> Patch 4/7:
> Enable CPU ISA detection at runtime, providing information for future
> ISA optimized functions.
>
> Patch 5/7:
> Build system changes to enable the Package & Deploy requirements,
> allowing a single OVS binary to run on all CPUs, but also gain best
> performance from CPU specific ISA optimizations.
>
> Patch 6/7:
> Actual AVX-512 implementation for DPCLS subtable search. This is the
> actual SIMD vector code, which performs DPCLS miniflow iteration in
> parallel.
>
> Patch 7/7:
> Add section in dpdk/bridges.rst on how to use the DPCLS commands, and
> what they can be used for. Testing and validation using autovalidator
> concept introduced, and command to set its priority is provided.
>
>
> Thanks for reading, any questions please let me know.
> Regards, -Harry
>
>
> Harry van Haaren (7):
>   dpif: implement subtable lookup validation
>   dpif-netdev: add subtable lookup set command
>   dpif-netdev: add subtable-lookup-get command for usability
>   dpcls: enable cpu feature detection
>   lib/automake: split build multiple static library
>   dpif-lookup: add avx512 gather implementation
>   docs/dpdk/bridge: add datapath performance section
>
>  Documentation/topics/dpdk/bridge.rst   |  63 ++++++
>  lib/automake.mk                        |  69 +++++--
>  lib/dpdk-stub.c                        |  13 ++
>  lib/dpdk.c                             |  27 +++
>  lib/dpdk.h                             |   2 +
>  lib/dpif-netdev-lookup-autovalidator.c | 106 ++++++++++
>  lib/dpif-netdev-lookup-avx512-gather.c | 265 +++++++++++++++++++++++++
>  lib/dpif-netdev-lookup-generic.c       |   9 +-
>  lib/dpif-netdev-lookup.c               | 111 +++++++++++
>  lib/dpif-netdev-lookup.h               |  82 ++++++++
>  lib/dpif-netdev-private.h              |  15 --
>  lib/dpif-netdev.c                      | 165 ++++++++++++++-
>  12 files changed, 884 insertions(+), 43 deletions(-)
>  create mode 100644 lib/dpif-netdev-lookup-autovalidator.c
>  create mode 100644 lib/dpif-netdev-lookup-avx512-gather.c
>  create mode 100644 lib/dpif-netdev-lookup.c
>  create mode 100644 lib/dpif-netdev-lookup.h
>
> --
> 2.17.1
>


More information about the dev mailing list