[ovs-dev] [PATCH v2 0/3] Userspace deferral of work

Cian Ferriter cian.ferriter at intel.com
Tue Sep 7 11:17:22 UTC 2021


This patch adds infrastructure to the userspace datapath to defer or
postpone work. At a high level, each PMD thread places work items into
its own per thread work ring to be done later. The work ring is a FIFO
queue of pointers to work items. Each work item has a "work_func()"
function pointer allowing abstraction from what work is actually being
done. More details about the infrastructure can be seen in the patch and
its commit message.

The ability to defer work is necessary when considering asynchronous
use-cases.  The use-case this patch is targeted at is DMA offload of TX
using VHOST ports.  In this use-case, packets are passed to a copy
engine rather than being copied in software. Once completed, the packets
have to be freed and VHOST port statistics have to be updated in
software. This completion work needs to be deferred.

There are a number of requirements for an effective defer
infrastructure. What are these and how are they accomplished:

1. Allow the thread which kicked off the DMA transfer to keep doing
useful work, rather than waiting or polling for work to be completed.
This is accomplished by deferring the completion work for DMA transfer
rather than waiting for the DMA transfer to complete before moving on to
process more packets. The completion work is added to the work ring to
be done after some time, but more useful work can be done in the
meantime.

2. Allow some time to pass between kicking off a DMA transfer for a
VHOST port and checking for completion of the DMA transfer.
This is accomplished by doing deferred work after processing all RXQs
assigned to a PMD thread.

3. Upon checking for completion of the DMA transfer, allow re-deferral
of work in the case where the DMA transfer has not completed.
This is accomplished by adding checks in the "do_work()" function to
defer the work again when DMA has not completed. This re-deferring of
work helps with requirements 1 and 2.

A ring buffer is used to queue the pointers to work items since its FIFO
property means the DMA transfers which have been in progress the longest
are checked first and have the highest chance of being completed.

Open TODOs:
- The patchset refers to "work" and "work items" but this infrastructure
  is focused on "netdev async work". The variables and functions could
  be named more appropriately. I'm open to any suggestions here.
- This patchset has been tested manually. Some form of automated testing
  would be better. Since we have stats for lots of different scenarios,
  unit tests should be quite easy. I'm open to suggestions for other
  forms of testing.

v2:
- Count cycles spent doing asynchronous work (patch 2/3).
- Add a configurable delay to work deferral (patch 3/3).
- Implement and use a simpler ring buffer in OVS, rather than using the
  DPDK implementation.
- Only print work defer stats if some work has actually been deferred.
- Add a "force" flag to the "process_async()" API to implement an
  attempt limit on the number of times an asynchronous piece of work
  should be attempted.
- Do all outstanding work on a PMD thread before allowing a reload to
  occur.


Cian Ferriter (3):
  dpif-netdev: Add a per thread work ring
  dpif-netdev: Count cycles spent doing async work
  dpif-netdev: Add a configurable delay to work deferral

 lib/automake.mk                  |   1 +
 lib/dpif-netdev-perf.c           |  20 ++-
 lib/dpif-netdev-perf.h           |   9 ++
 lib/dpif-netdev-private-defer.h  |  95 ++++++++++++++
 lib/dpif-netdev-private-thread.h |   4 +
 lib/dpif-netdev.c                | 204 ++++++++++++++++++++++++++++++-
 lib/netdev-dpdk.c                |  22 ++--
 lib/netdev-provider.h            |  19 ++-
 lib/netdev.c                     |   3 +-
 9 files changed, 362 insertions(+), 15 deletions(-)
 create mode 100644 lib/dpif-netdev-private-defer.h

-- 
2.32.0



More information about the dev mailing list