[ovs-dev] [PATCH v3 0/4] XDP offload using flow API provider

Toshiaki Makita toshiaki.makita1 at gmail.com
Mon Jun 29 15:30:29 UTC 2020

This patch adds an XDP-based flow cache using the OVS netdev-offload
flow API provider.  When an OVS device with XDP offload enabled,
packets first are processed in the XDP flow cache (with parse, and
table lookup implemented in eBPF) and if hits, the action processing
are also done in the context of XDP, which has the minimum overhead.

This provider is based on top of William's recently posted patch for
custom XDP load.  When a custom XDP is loaded, the provider detects if
the program supports classifier, and if supported it starts offloading
flows to the XDP program.

The patches are derived from xdp_flow[1], which is a mechanism similar to
this but implemented in kernel.

* Motivation

While userspace datapath using netdev-afxdp or netdev-dpdk shows good
performance, there are use cases where packets better to be processed in
kernel, for example, TCP/IP connections, or container to container
connections.  Current solution is to use tap device or af_packet with
extra kernel-to/from-userspace overhead.  But with XDP, a better solution
is to steer packets earlier in the XDP program, and decides to send to
userspace datapath or stay in kernel.

One problem with current netdev-afxdp is that it forwards all packets to
userspace, The first patch from William (netdev-afxdp: Enable loading XDP
program.) only provides the interface to load XDP program, howerver users
usually don't know how to write their own XDP program.

XDP also supports HW-offload so it may be possible to offload flows to
HW through this provider in the future, although not currently.
The reason is that map-in-map is required for our program to support
classifier with subtables in XDP, but map-in-map is not offloadable.
If map-in-map becomes offloadable, HW-offload of our program will also
be doable.

* How to use

1. Install clang/llvm >= 9, libbpf >= 0.0.6 (included in kernel 5.5), and
   kernel >= 5.3.

2. make with --enable-afxdp --enable-bpf
--enable-bpf will generate XDP program "bpf/flowtable_afxdp.o".  Note that
the BPF object will not be installed anywhere by "make install" at this point. 

3. Load custom XDP program
$ ovs-vsctl add-port ovsbr0 veth0 -- set int veth0 options:xdp-mode=native \
$ ovs-vsctl add-port ovsbr0 veth1 -- set int veth1 options:xdp-mode=native \

If you use veth devices, make sure to load some (possibly dummy) programs
on the peers of veth devices.
Some HW NIC drivers require as many queues as cores on its system. Tweak
queues using "ethtool -L".

5. Enable hw-offload 
$ ovs-vsctl set Open_vSwitch . other_config:offload-driver=linux_xdp
$ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
This will starts offloading flows to the XDP program.

You should be able to see some maps installed, including "debug_stats".
$ bpftool map

If packets are successfully redirected by the XDP program,
debug_stats[2] will be counted.
$ bpftool map dump id <ID of debug_stats>

Currently only very limited keys and output actions are supported.
For example NORMAL action entry and IP based matching work with current
key support. VLAN actions used by port tag/trunks are also supported.

* Performance

Tested 2 cases. 1) i40e to veth, 2) i40e to i40e.
Test 1 Measured drop rate at veth interface with redirect action from
physical interface (i40e 25G NIC, XXV 710) to veth. The CPU is Xeon
Silver 4114 (2.20 GHz).
                    +------+                      +-------+    +-------+
 pktgen -- wire --> | eth0 | -- NORMAL ACTION --> | veth0 |----| veth2 |
                    +------+                      +-------+    +-------+

Test 2 uses i40e instead of veth, and measured tx packet rate at output

Single-flow performance test results:

1) i40e-veth

  a) no-zerocopy in i40e

    - xdp   3.7 Mpps
    - afxdp 820 kpps

  b) zerocopy in i40e (veth does not have zc)

    - xdp   1.8 Mpps
    - afxdp 800 Kpps

2) i40e-i40e

  a) no-zerocopy

    - xdp   3.0 Mpps
    - afxdp 1.1 Mpps

  b) zerocopy

    - xdp   1.7 Mpps
    - afxdp 4.0 Mpps

** xdp is better when zc is disabled. The reason of poor performance on zc
   is that xdp_frame requires packet memory allocation and memcpy on
   XDP_REDIRECT to other devices iff zc is enabled.

** afxdp with zc is better than xdp without zc, but afxdp is using 2 cores
   in this case, one is pmd and the other is softirq. When pmd and softirq
   were running on the same core, the performance was extremely poor as
   pmd consumes cpus.
   When offloading to xdp, xdp only uses softirq while pmd is still
   consuming 100% cpu.  This means we need probably only one pmd for xdp
   even when we want to use more cores for multi-flow.
   Testing afxdp-nonpmd is TBD.

This patch set is based on top of commit dda80837 ("AUTHORS: Add Sriharsha

[1] https://lwn.net/Articles/802653/

- Use ".ovs_meta" section to inform vswitchd of metadata like supported
- Rewrite action loop logic in bpf to support multiple actions.
- Add missing linux/types.h in acinclude.m4, as per William Tu.
- Fix infinite reconfiguration loop when xsks_map is missing.
- Add vlan-related actions in bpf program.
- Fix CI build error.
- Fix inability to delete subtable entries.

For easy review I left pre-squashed commits from v2 here.

- Add uninit callback of netdev-offload-xdp.
- Introduce "offload-driver" other_config to specify offload driver.
- Add --enable-bpf (HAVE_BPF) config option to build bpf programs.
- Workaround incorrect UINTPTR_MAX in x64 clang bpf build.
- Fix boot.sh autoconf warning.

Toshiaki Makita (3):
  netdev-offload: Add "offload-driver" other_config to specify offload
  netdev-offload: Add xdp flow api provider
  bpf: Add reference XDP program implementation for netdev-offload-xdp

William Tu (1):
  netdev-afxdp: Enable loading XDP program.

 .travis.yml                           |    2 +-
 Documentation/intro/install/afxdp.rst |   59 ++
 Makefile.am                           |    9 +-
 NEWS                                  |    2 +
 acinclude.m4                          |   57 ++
 bpf/.gitignore                        |    4 +
 bpf/Makefile.am                       |   61 ++
 bpf/bpf_compiler.h                    |   25 +
 bpf/bpf_miniflow.h                    |  179 ++++
 bpf/bpf_netlink.h                     |   62 ++
 bpf/bpf_workaround.h                  |   28 +
 bpf/flowtable_afxdp.c                 |  585 +++++++++++
 configure.ac                          |    2 +
 lib/automake.mk                       |    6 +-
 lib/bpf-util.c                        |   38 +
 lib/bpf-util.h                        |   22 +
 lib/netdev-afxdp.c                    |  359 ++++++-
 lib/netdev-afxdp.h                    |    3 +
 lib/netdev-linux-private.h            |    5 +
 lib/netdev-offload-provider.h         |    6 +
 lib/netdev-offload-xdp.c              | 1315 +++++++++++++++++++++++++
 lib/netdev-offload-xdp.h              |   49 +
 lib/netdev-offload.c                  |   43 +-
 23 files changed, 2901 insertions(+), 20 deletions(-)
 create mode 100644 bpf/.gitignore
 create mode 100644 bpf/Makefile.am
 create mode 100644 bpf/bpf_compiler.h
 create mode 100644 bpf/bpf_miniflow.h
 create mode 100644 bpf/bpf_netlink.h
 create mode 100644 bpf/bpf_workaround.h
 create mode 100644 bpf/flowtable_afxdp.c
 create mode 100644 lib/bpf-util.c
 create mode 100644 lib/bpf-util.h
 create mode 100644 lib/netdev-offload-xdp.c
 create mode 100644 lib/netdev-offload-xdp.h


More information about the dev mailing list