[ovs-dev] [RFC PATCH 0/5] XDP offload using flow API provider

牧田俊明 toshiaki.makita1 at gmail.com
Tue Mar 17 09:26:07 UTC 2020


Any more feedback?
I'll work on implementing missing parts in RFC and prepare v2.
If anyone has feedback on the concept at this point, it would be helpful.

Thanks,
Toshiaki Makita

2020年3月12日(木) 6:18 William Tu <u9012063 at gmail.com>:

> Add a couple of people who might be interested in this feature.
>
> On Tue, Mar 10, 2020 at 8:29 AM Toshiaki Makita
> <toshiaki.makita1 at gmail.com> wrote:
> >
> > This patch adds an XDP-based flow cache using the OVS netdev-offload
> > flow API provider.  When a OVS device with XDP offload enabled,
> > packets first are processed in the XDP flow cache (with parse, and
> > table lookup implemented in eBPF) and if hits, the action processing
> > are also done in the context of XDP, which has the minimum overhead.
> >
> > This provider is based on top of William's recently posted patch for
> > custom XDP load.  When a custom XDP is loaded, the provider detects if
> > the program supports classifier, and if supported it starts offloading
> > flows to the XDP program.
> >
> > The patches are derived from xdp_flow[1], which is a mechanism similar to
> > this but implemented in kernel.
> >
> >
> > * Motivation
> >
> > While userspace datapath using netdev-afxdp or netdev-dpdk shows good
> > performance, there are use cases where packets better to be processed in
> > kernel, for example, TCP/IP connections, or container to container
> > connections.  Current solution is to use tap device or af_packet with
> > extra kernel-to/from-userspace overhead.  But with XDP, a better solution
> > is to steer packets earlier in the XDP program, and decides to send to
> > userspace datapath or stay in kernel.
> >
> > One problem with current netdev-afxdp is that it forwards all packets to
> > userspace, The first patch from William (netdev-afxdp: Enable loading XDP
> > program.) only provides the interface to load XDP program, howerver users
> > usually don't know how to write their own XDP program.
> >
> > XDP also supports HW-offload so it may be possible to offload flows to
> > HW through this provider in the future, although not currently.
> > The reason is that map-in-map is required for our program to support
> > classifier with subtables in XDP, but map-in-map is not offloadable.
> > If map-in-map becomes offloadable, HW-offload of our program will also
> > be doable.
> >
> >
> > * How to use
> >
> > 1. Install clang/llvm >= 9, libbpf >= 0.0.4, and kernel >= 5.3.
> >
> > 2. make with --enable-afxdp
> > It will generate XDP program "bpf/flowtable_afxdp.o".  Note that the BPF
> > object will not be installed anywhere by "make install" at this point.
> >
> > 3. Load custom XDP program
> > E.g.
> > $ ovs-vsctl add-port ovsbr0 veth0 -- set int veth0
> options:xdp-mode=native \
> >   options:xdp-obj="path/to/ovs/bpf/flowtable_afxdp.o"
> > $ ovs-vsctl add-port ovsbr0 veth1 -- set int veth1
> options:xdp-mode=native \
> >   options:xdp-obj="path/to/ovs/bpf/flowtable_afxdp.o"
> >
> > 4. Enable XDP_REDIRECT
> > If you use veth devices, make sure to load some (possibly dummy) programs
> > on the peers of veth devices.
> >
> > 5. Enable hw-offload
> > $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
> > This will starts offloading flows to the XDP program.
> >
> > You should be able to see some maps installed, including "debug_stats".
> > $ bpftool map
> >
> > If packets are successfully redirected by the XDP program,
> > debug_stats[2] will be counted.
> > $ bpftool map dump id <ID of debug_stats>
> >
> > Currently only very limited keys and output actions is supported.
> > For example NORMAL action entry and IP based matching work with current
> > key support.
> >
> >
> > * Performance
> >
> > Tested 2 cases. 1) i40e to veth, 2) i40e to i40e.
> > Test 1 Measured drop rate at veth interface with redirect action from
> > physical interface (i40e 25G NIC, XXV 710) to veth. The CPU is Xeon
> > Silver 4114 (2.20 GHz).
> >                                                                XDP_DROP
> >                     +------+                      +-------+    +-------+
> >  pktgen -- wire --> | eth0 | -- NORMAL ACTION --> | veth0 |----| veth2 |
> >                     +------+                      +-------+    +-------+
> >
> > Test 2 uses i40e instead of veth, and measured tx packet rate at output
> > device.
> >
> > Single-flow performance test results:
> >
> > 1) i40e-veth
> >
> >   a) no-zerocopy in i40e
> >
> >     - xdp   3.7 Mpps
> >     - afxdp 820 kpps
> >
> >   b) zerocopy in i40e (veth does not have zc)
> >
> >     - xdp   1.8 Mpps
> >     - afxdp 800 Kpps
> >
> > 2) i40e-i40e
> >
> >   a) no-zerocopy
> >
> >     - xdp   3.0 Mpps
> >     - afxdp 1.1 Mpps
> >
> >   b) zerocopy
> >
> >     - xdp   1.7 Mpps
> >     - afxdp 4.0 Mpps
> >
> > ** xdp is better when zc is disabled. The reason of poor performance on
> zc
> >    is that xdp_frame requires packet memory allocation and memcpy on
> >    XDP_REDIRECT to other devices iff zc is enabled.
> >
> > ** afxdp with zc is better than xdp without zc, but afxdp is using 2
> cores
> >    in this case, one is pmd and the other is softirq. When pmd and
> softirq
> >    were running on the same core, the performance was extremely poor as
> >    pmd consumes cpus.
> >    When offloading to xdp, xdp only uses softirq while pmd is still
> >    consuming 100% cpu.  This means we need probably only one pmd for xdp
> >    even when we want to use more cores for multi-flow.
> >    I'll also test afxdp-nonpmd when it is applied.
> >
> >
> > This patch set is based on top of commit 59e994426 ("datapath: Update
> > kernel test list, news and FAQ").
> >
> > [1] https://lwn.net/Articles/802653/
> >
> > Toshiaki Makita (4):
> >   netdev-offload: Add xdp flow api provider
> >   netdev-offload: Register xdp flow api provider
> >   tun_metadata: Use OVS_ALIGNED_VAR to align opts field
> >   bpf: Add reference XDP program implementation for netdev-offload-xdp
> >
> > William Tu (1):
> >   netdev-afxdp: Enable loading XDP program.
> >
> >  Documentation/intro/install/afxdp.rst |   59 ++
> >  Makefile.am                           |   10 +-
> >  NEWS                                  |    2 +
> >  bpf/.gitignore                        |    4 +
> >  bpf/Makefile.am                       |   56 ++
> >  bpf/bpf_miniflow.h                    |  199 +++++
> >  bpf/bpf_netlink.h                     |   34 +
> >  bpf/flowtable_afxdp.c                 |  510 +++++++++++
> >  configure.ac                          |    1 +
> >  include/openvswitch/tun-metadata.h    |    6 +-
> >  lib/automake.mk                       |    6 +-
> >  lib/bpf-util.c                        |   38 +
> >  lib/bpf-util.h                        |   22 +
> >  lib/netdev-afxdp.c                    |  342 +++++++-
> >  lib/netdev-afxdp.h                    |    3 +
> >  lib/netdev-linux-private.h            |    5 +
> >  lib/netdev-offload-provider.h         |    3 +
> >  lib/netdev-offload-xdp.c              | 1116 +++++++++++++++++++++++++
> >  lib/netdev-offload-xdp.h              |   49 ++
> >  lib/netdev.c                          |    4 +-
> >  20 files changed, 2452 insertions(+), 17 deletions(-)
> >  create mode 100644 bpf/.gitignore
> >  create mode 100644 bpf/Makefile.am
> >  create mode 100644 bpf/bpf_miniflow.h
> >  create mode 100644 bpf/bpf_netlink.h
> >  create mode 100644 bpf/flowtable_afxdp.c
> >  create mode 100644 lib/bpf-util.c
> >  create mode 100644 lib/bpf-util.h
> >  create mode 100644 lib/netdev-offload-xdp.c
> >  create mode 100644 lib/netdev-offload-xdp.h
> >
> > --
> > 2.24.1
> >
>


More information about the dev mailing list