[ovs-dev] [RFC PATCH 0/5] XDP offload using flow API provider
u9012063 at gmail.com
Wed Mar 11 21:17:26 UTC 2020
Add a couple of people who might be interested in this feature.
On Tue, Mar 10, 2020 at 8:29 AM Toshiaki Makita
<toshiaki.makita1 at gmail.com> wrote:
> This patch adds an XDP-based flow cache using the OVS netdev-offload
> flow API provider. When a OVS device with XDP offload enabled,
> packets first are processed in the XDP flow cache (with parse, and
> table lookup implemented in eBPF) and if hits, the action processing
> are also done in the context of XDP, which has the minimum overhead.
> This provider is based on top of William's recently posted patch for
> custom XDP load. When a custom XDP is loaded, the provider detects if
> the program supports classifier, and if supported it starts offloading
> flows to the XDP program.
> The patches are derived from xdp_flow, which is a mechanism similar to
> this but implemented in kernel.
> * Motivation
> While userspace datapath using netdev-afxdp or netdev-dpdk shows good
> performance, there are use cases where packets better to be processed in
> kernel, for example, TCP/IP connections, or container to container
> connections. Current solution is to use tap device or af_packet with
> extra kernel-to/from-userspace overhead. But with XDP, a better solution
> is to steer packets earlier in the XDP program, and decides to send to
> userspace datapath or stay in kernel.
> One problem with current netdev-afxdp is that it forwards all packets to
> userspace, The first patch from William (netdev-afxdp: Enable loading XDP
> program.) only provides the interface to load XDP program, howerver users
> usually don't know how to write their own XDP program.
> XDP also supports HW-offload so it may be possible to offload flows to
> HW through this provider in the future, although not currently.
> The reason is that map-in-map is required for our program to support
> classifier with subtables in XDP, but map-in-map is not offloadable.
> If map-in-map becomes offloadable, HW-offload of our program will also
> be doable.
> * How to use
> 1. Install clang/llvm >= 9, libbpf >= 0.0.4, and kernel >= 5.3.
> 2. make with --enable-afxdp
> It will generate XDP program "bpf/flowtable_afxdp.o". Note that the BPF
> object will not be installed anywhere by "make install" at this point.
> 3. Load custom XDP program
> $ ovs-vsctl add-port ovsbr0 veth0 -- set int veth0 options:xdp-mode=native \
> $ ovs-vsctl add-port ovsbr0 veth1 -- set int veth1 options:xdp-mode=native \
> 4. Enable XDP_REDIRECT
> If you use veth devices, make sure to load some (possibly dummy) programs
> on the peers of veth devices.
> 5. Enable hw-offload
> $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
> This will starts offloading flows to the XDP program.
> You should be able to see some maps installed, including "debug_stats".
> $ bpftool map
> If packets are successfully redirected by the XDP program,
> debug_stats will be counted.
> $ bpftool map dump id <ID of debug_stats>
> Currently only very limited keys and output actions is supported.
> For example NORMAL action entry and IP based matching work with current
> key support.
> * Performance
> Tested 2 cases. 1) i40e to veth, 2) i40e to i40e.
> Test 1 Measured drop rate at veth interface with redirect action from
> physical interface (i40e 25G NIC, XXV 710) to veth. The CPU is Xeon
> Silver 4114 (2.20 GHz).
> +------+ +-------+ +-------+
> pktgen -- wire --> | eth0 | -- NORMAL ACTION --> | veth0 |----| veth2 |
> +------+ +-------+ +-------+
> Test 2 uses i40e instead of veth, and measured tx packet rate at output
> Single-flow performance test results:
> 1) i40e-veth
> a) no-zerocopy in i40e
> - xdp 3.7 Mpps
> - afxdp 820 kpps
> b) zerocopy in i40e (veth does not have zc)
> - xdp 1.8 Mpps
> - afxdp 800 Kpps
> 2) i40e-i40e
> a) no-zerocopy
> - xdp 3.0 Mpps
> - afxdp 1.1 Mpps
> b) zerocopy
> - xdp 1.7 Mpps
> - afxdp 4.0 Mpps
> ** xdp is better when zc is disabled. The reason of poor performance on zc
> is that xdp_frame requires packet memory allocation and memcpy on
> XDP_REDIRECT to other devices iff zc is enabled.
> ** afxdp with zc is better than xdp without zc, but afxdp is using 2 cores
> in this case, one is pmd and the other is softirq. When pmd and softirq
> were running on the same core, the performance was extremely poor as
> pmd consumes cpus.
> When offloading to xdp, xdp only uses softirq while pmd is still
> consuming 100% cpu. This means we need probably only one pmd for xdp
> even when we want to use more cores for multi-flow.
> I'll also test afxdp-nonpmd when it is applied.
> This patch set is based on top of commit 59e994426 ("datapath: Update
> kernel test list, news and FAQ").
>  https://lwn.net/Articles/802653/
> Toshiaki Makita (4):
> netdev-offload: Add xdp flow api provider
> netdev-offload: Register xdp flow api provider
> tun_metadata: Use OVS_ALIGNED_VAR to align opts field
> bpf: Add reference XDP program implementation for netdev-offload-xdp
> William Tu (1):
> netdev-afxdp: Enable loading XDP program.
> Documentation/intro/install/afxdp.rst | 59 ++
> Makefile.am | 10 +-
> NEWS | 2 +
> bpf/.gitignore | 4 +
> bpf/Makefile.am | 56 ++
> bpf/bpf_miniflow.h | 199 +++++
> bpf/bpf_netlink.h | 34 +
> bpf/flowtable_afxdp.c | 510 +++++++++++
> configure.ac | 1 +
> include/openvswitch/tun-metadata.h | 6 +-
> lib/automake.mk | 6 +-
> lib/bpf-util.c | 38 +
> lib/bpf-util.h | 22 +
> lib/netdev-afxdp.c | 342 +++++++-
> lib/netdev-afxdp.h | 3 +
> lib/netdev-linux-private.h | 5 +
> lib/netdev-offload-provider.h | 3 +
> lib/netdev-offload-xdp.c | 1116 +++++++++++++++++++++++++
> lib/netdev-offload-xdp.h | 49 ++
> lib/netdev.c | 4 +-
> 20 files changed, 2452 insertions(+), 17 deletions(-)
> create mode 100644 bpf/.gitignore
> create mode 100644 bpf/Makefile.am
> create mode 100644 bpf/bpf_miniflow.h
> create mode 100644 bpf/bpf_netlink.h
> create mode 100644 bpf/flowtable_afxdp.c
> create mode 100644 lib/bpf-util.c
> create mode 100644 lib/bpf-util.h
> create mode 100644 lib/netdev-offload-xdp.c
> create mode 100644 lib/netdev-offload-xdp.h
More information about the dev