[ovs-dev] [PATCH v2 00/28] dpif-netdev: Parallel offload processing
Maxime Coquelin
maxime.coquelin at redhat.com
Fri Apr 23 13:21:00 UTC 2021
Hi Gaetan,
On 4/12/21 5:19 PM, Gaetan Rivet wrote:
> This patch series aims to improve the performance of the management
> of hw-offloads in dpif-netdev. In the current version, some setup
> will experience high memory usage and poor latency between a flow
> decision and its execution regarding hardware offloading.
>
> This series starts by measuring key metrics regarding both issues
> Those patches are introduced first to compare the current status
> with each improvements introduced.
> Offloads enqueued and inserted, as well as the latency
> from queue insertion to hardware insertion is measured. A new
> command 'ovs-appctl dpctl/offload-stats-show' is introduced
> to show the current measure.
>
> In my current performance test setup I am measuring an
> average latency hovering between 1~2 seconds.
> After the optimizations, it is reduced to 500~900 ms.
> Finally when using multiple threads and with proper driver
> support[1], it is measured in the order of 1 ms.
>
> A few modules are introduced:
>
> * An ID pool with reduced capabilities, simplifying its
> operations and allowing better performances in both
> single and multi-thread setup.
>
> * A lockless queue between PMDs / revalidators and
> offload thread(s). As the number of PMDs increases,
> contention can be high on the shared queue.
> This queue is designed to serve as message queue
> between threads.
>
> * A bounded lockless MPMC ring and some helpers for
> calculating moving averages.
>
> * A moving average module for Cumulative and Exponential
> moving averages.
>
> The netdev-offload-dpdk module is made thread-safe.
> Internal maps are made per-netdev instead, and locks are
> taken for shorter critical sections within the module.
>
> CI result: https://github.com/grivet/ovs/actions/runs/554918929
>
> [1]: The rte_flow API was made thread-safe in the 20.11 DPDK
> release. Drivers that do not implement those operations
> concurrently are protected by a lock. Others will
> allow better concurrency, that improve the result
> of this series.
>
> v2:
>
> * Improved the MPSC queue API to simplify usage.
>
> * Moved flush operation from initiator thread to offload
> thread(s). This ensures offload metadata are shared only
> among the offload thread pool.
>
> * Flush operation needs additional thread synchronization.
> The ovs_barrier currently triggers a UAF. Add a unit-test to
> validate its operations and a fix for the UAF.
>
> CI result: https://github.com/grivet/ovs/actions/runs/741430135
> The error comes from a failure to download 'automake' on
> osx, unrelated to any change in this series.
>
> Gaetan Rivet (28):
> netdev: Add flow API uninit function
> netdev-offload-dpdk: Use per-netdev offload metadata
> netdev-offload-dpdk: Implement hw-offload statistics read
> dpctl: Add function to read hardware offload statistics
> dpif-netdev: Rename flow offload thread
> dpif-netdev: Rename offload thread structure
> mov-avg: Add a moving average helper structure
> dpif-netdev: Implement hardware offloads stats query
> ovs-atomic: Expose atomic exchange operation
> mpsc-queue: Module for lock-free message passing
> llring: Add lockless MPMC bounded queue structure
> seq-pool: Module for faster ID generation
> netdev-offload: Add multi-thread API
> dpif-netdev: Quiesce offload thread periodically
> dpif-netdev: Postpone flow offload item freeing
> dpif-netdev: Use seq-pool for mark allocation
> dpif-netdev: Introduce tagged union of offload requests
> tests: Add ovs-barrier unit test
> ovs-thread: Fix barrier use-after-free
> dpif-netdev: Execute flush from offload thread
> netdev-offload-dpdk: Use per-thread HW offload stats
> netdev-offload-dpdk: Lock rte_flow map access
> netdev-offload-dpdk: Protect concurrent offload destroy/query
> dpif-netdev: Use lockless queue to manage offloads
> dpif-netdev: Make megaflow and mark mappings thread objects
> dpif-netdev: Replace port mutex by rwlock
> dpif-netdev: Use one or more offload threads
> netdev-dpdk: Remove rte-flow API access locks
>
> lib/automake.mk | 7 +
> lib/dpctl.c | 36 ++
> lib/dpif-netdev.c | 742 +++++++++++++++++++++++++---------
> lib/dpif-netlink.c | 1 +
> lib/dpif-provider.h | 7 +
> lib/dpif.c | 8 +
> lib/dpif.h | 9 +
> lib/llring.c | 153 +++++++
> lib/llring.h | 76 ++++
> lib/mov-avg.h | 171 ++++++++
> lib/mpsc-queue.c | 251 ++++++++++++
> lib/mpsc-queue.h | 189 +++++++++
> lib/netdev-dpdk.c | 6 -
> lib/netdev-offload-dpdk.c | 277 +++++++++++--
> lib/netdev-offload-provider.h | 4 +
> lib/netdev-offload.c | 92 ++++-
> lib/netdev-offload.h | 21 +
> lib/ovs-atomic-c++.h | 3 +
> lib/ovs-atomic-clang.h | 5 +
> lib/ovs-atomic-gcc4+.h | 5 +
> lib/ovs-atomic-gcc4.7+.h | 5 +
> lib/ovs-atomic-i586.h | 5 +
> lib/ovs-atomic-locked.h | 9 +
> lib/ovs-atomic-msvc.h | 22 +
> lib/ovs-atomic-pthreads.h | 5 +
> lib/ovs-atomic-x86_64.h | 5 +
> lib/ovs-atomic.h | 8 +-
> lib/ovs-thread.c | 61 ++-
> lib/ovs-thread.h | 6 +-
> lib/seq-pool.c | 198 +++++++++
> lib/seq-pool.h | 66 +++
> tests/automake.mk | 3 +
> tests/library.at | 15 +
> tests/test-barrier.c | 264 ++++++++++++
> tests/test-mpsc-queue.c | 727 +++++++++++++++++++++++++++++++++
> tests/test-seq-pool.c | 543 +++++++++++++++++++++++++
> vswitchd/vswitch.xml | 16 +
> 37 files changed, 3788 insertions(+), 233 deletions(-)
> create mode 100644 lib/llring.c
> create mode 100644 lib/llring.h
> create mode 100644 lib/mov-avg.h
> create mode 100644 lib/mpsc-queue.c
> create mode 100644 lib/mpsc-queue.h
> create mode 100644 lib/seq-pool.c
> create mode 100644 lib/seq-pool.h
> create mode 100644 tests/test-barrier.c
> create mode 100644 tests/test-mpsc-queue.c
> create mode 100644 tests/test-seq-pool.c
>
> --
> 2.31.1
>
The series looks almost good to me.
Please note that I haven't had a setup to test it this week, I'll try to
run some tests with it next week.
Thanks,
Maxime
More information about the dev
mailing list