[ovs-dev] [PATCH v4 0/5] XDP offload using flow API provider

Toshiaki Makita toshiaki.makita1 at gmail.com
Sat Aug 15 01:54:51 UTC 2020


Ping.
Any feedback is welcome.

Thanks,
Toshiaki Makita

On 2020/07/31 11:55, Toshiaki Makita wrote:
> This patch adds an XDP-based flow cache using the OVS netdev-offload
> flow API provider.  When an OVS device with XDP offload enabled,
> packets first are processed in the XDP flow cache (with parse, and
> table lookup implemented in eBPF) and if hits, the action processing
> are also done in the context of XDP, which has the minimum overhead.
> 
> This provider is based on top of William's recently posted patch for
> custom XDP load.  When a custom XDP is loaded, the provider detects if
> the program supports classifier, and if supported it starts offloading
> flows to the XDP program.
> 
> The patches are derived from xdp_flow[1], which is a mechanism similar to
> this but implemented in kernel.
> 
> 
> * Motivation
> 
> While userspace datapath using netdev-afxdp or netdev-dpdk shows good
> performance, there are use cases where packets better to be processed in
> kernel, for example, TCP/IP connections, or container to container
> connections.  Current solution is to use tap device or af_packet with
> extra kernel-to/from-userspace overhead.  But with XDP, a better solution
> is to steer packets earlier in the XDP program, and decides to send to
> userspace datapath or stay in kernel.
> 
> One problem with current netdev-afxdp is that it forwards all packets to
> userspace, The first patch from William (netdev-afxdp: Enable loading XDP
> program.) only provides the interface to load XDP program, howerver users
> usually don't know how to write their own XDP program.
> 
> XDP also supports HW-offload so it may be possible to offload flows to
> HW through this provider in the future, although not currently.
> The reason is that map-in-map is required for our program to support
> classifier with subtables in XDP, but map-in-map is not offloadable.
> If map-in-map becomes offloadable, HW-offload of our program may also
> be possible.
> 
> 
> * How to use
> 
> 1. Install clang/llvm >= 9, libbpf >= 0.0.6 (included in kernel 5.5), and
>     kernel >= 5.3.
> 
> 2. make with --enable-afxdp --enable-xdp-offload
> --enable-bpf will generate XDP program "bpf/flowtable_afxdp.o".  Note that
> the BPF object will not be installed anywhere by "make install" at this point.
> 
> 3. Load custom XDP program
> E.g.
> $ ovs-vsctl add-port ovsbr0 veth0 -- set int veth0 options:xdp-mode=native \
>    options:xdp-obj="/path/to/ovs/bpf/flowtable_afxdp.o"
> $ ovs-vsctl add-port ovsbr0 veth1 -- set int veth1 options:xdp-mode=native \
>    options:xdp-obj="/path/to/ovs/bpf/flowtable_afxdp.o"
> 
> 4. Enable XDP_REDIRECT
> If you use veth devices, make sure to load some (possibly dummy) programs
> on the peers of veth devices. This patch set includes a program which
> does nothing but returns XDP_PASS. You can use it for the veth peer like
> this:
> $ ip link set veth1 xdpdrv object /path/to/ovs/bpf/xdp_noop.o section xdp
> 
> Some HW NIC drivers require as many queues as cores on its system. Tweak
> queues using "ethtool -L".
> 
> 5. Enable hw-offload
> $ ovs-vsctl set Open_vSwitch . other_config:offload-driver=linux_xdp
> $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
> This will starts offloading flows to the XDP program.
> 
> You should be able to see some maps installed, including "debug_stats".
> $ bpftool map
> 
> If packets are successfully redirected by the XDP program,
> debug_stats[2] will be counted.
> $ bpftool map dump id <ID of debug_stats>
> 
> Currently only very limited keys and output actions are supported.
> For example NORMAL action entry and IP based matching work with current
> key support. VLAN actions used by port tag/trunks are also supported.
> 
> 
> * Performance
> 
> Tested 2 cases. 1) i40e to veth, 2) i40e to i40e.
> Test 1 Measured drop rate at veth interface with redirect action from
> physical interface (i40e 25G NIC, XXV 710) to veth. The CPU is Xeon
> Silver 4114 (2.20 GHz).
>                                                                 XDP_DROP
>                      +------+                      +-------+    +-------+
>   pktgen -- wire --> | eth0 | -- NORMAL ACTION --> | veth0 |----| veth2 |
>                      +------+                      +-------+    +-------+
> 
> Test 2 uses i40e instead of veth, and measured tx packet rate at output
> device.
> 
> Single-flow performance test results:
> 
> 1) i40e-veth
> 
>    a) no-zerocopy in i40e
> 
>      - xdp   3.7 Mpps
>      - afxdp 980 kpps
> 
>    b) zerocopy in i40e (veth does not have zc)
> 
>      - xdp   1.9 Mpps
>      - afxdp 980 Kpps
> 
> 2) i40e-i40e
> 
>    a) no-zerocopy
> 
>      - xdp   3.5 Mpps
>      - afxdp 1.5 Mpps
> 
>    b) zerocopy
> 
>      - xdp   2.0 Mpps
>      - afxdp 4.4 Mpps
> 
> ** xdp is better when zc is disabled. The reason of poor performance on zc
>     is that xdp_frame requires packet memory allocation and memcpy on
>     XDP_REDIRECT to other devices iff zc is enabled.
> 
> ** afxdp with zc is better than xdp without zc, but afxdp is using 2 cores
>     in this case, one is pmd and the other is softirq. When pmd and softirq
>     were running on the same core, the performance was extremely poor as
>     pmd consumes cpus. I also tested afxdp-nonpmd to run softirq and
>     userspace processing on the same core, but the result was lower than
>     (pmd results) / 2.
>     With nonpmd, xdp performance was the same as xdp with pmd. This means
>     xdp only uses one core (for softirq only). Even with pmd, we need only
>     one pmd for xdp even when we want to use more cores for multi-flow.
> 
> 
> This patch set is based on top of commit e8bf77748 ("odp-util: Fix clearing
> match mask if set action is partially unnecessary.").
> 
> To make review easier I left pre-squashed commits from v3 here.
> https://github.com/tmakita/ovs/compare/xdp_offload_v3...tmakita:xdp_offload_v4_history?expand=1
> 
> [1] https://lwn.net/Articles/802653/
> 
> v4:
> - Fix checkpatch errors.
> - Fix duplicate flow api register.
> - Don't call unnecessary flow api init callbacks when default flow api
>    provider can be used.
> - Fix typo in comments.
> - Improve bpf Makefile.am to support automatic dependencies.
> - Add a dummy XDP program for veth peers.
> - Rename netdev_info to netdev_xdp_info.
> - Use id-pool for free subtable entry management and devmap indexes.
> - Rename --enable-bpf to --enable-xdp-offload.
> - Compile xdp flow api provider only with --enable-xdp-offload.
> - Tested again and updated performance numbers in cover letter (get
>    slightly better numbers).
> 
> v3:
> - Use ".ovs_meta" section to inform vswitchd of metadata like supported
>    keys.
> - Rewrite action loop logic in bpf to support multiple actions.
> - Add missing linux/types.h in acinclude.m4, as per William Tu.
> - Fix infinite reconfiguration loop when xsks_map is missing.
> - Add vlan-related actions in bpf program.
> - Fix CI build error.
> - Fix inability to delete subtable entries.
> 
> v2:
> - Add uninit callback of netdev-offload-xdp.
> - Introduce "offload-driver" other_config to specify offload driver.
> - Add --enable-bpf (HAVE_BPF) config option to build bpf programs.
> - Workaround incorrect UINTPTR_MAX in x64 clang bpf build.
> - Fix boot.sh autoconf warning.
> 
> 
> Toshiaki Makita (4):
>    netdev-offload: Add "offload-driver" other_config to specify offload
>      driver
>    netdev-offload: Add xdp flow api provider
>    bpf: Add reference XDP program implementation for netdev-offload-xdp
>    bpf: Add dummy program for veth devices
> 
> William Tu (1):
>    netdev-afxdp: Enable loading XDP program.
> 
>   .travis.yml                           |    2 +-
>   Documentation/intro/install/afxdp.rst |   59 ++
>   Makefile.am                           |    9 +-
>   NEWS                                  |    2 +
>   acinclude.m4                          |   60 ++
>   bpf/.gitignore                        |    4 +
>   bpf/Makefile.am                       |   83 ++
>   bpf/bpf_compiler.h                    |   25 +
>   bpf/bpf_miniflow.h                    |  179 ++++
>   bpf/bpf_netlink.h                     |   63 ++
>   bpf/bpf_workaround.h                  |   28 +
>   bpf/flowtable_afxdp.c                 |  585 ++++++++++++
>   bpf/xdp_noop.c                        |   31 +
>   configure.ac                          |    2 +
>   lib/automake.mk                       |    8 +
>   lib/bpf-util.c                        |   38 +
>   lib/bpf-util.h                        |   22 +
>   lib/netdev-afxdp.c                    |  373 +++++++-
>   lib/netdev-afxdp.h                    |    3 +
>   lib/netdev-linux-private.h            |    5 +
>   lib/netdev-offload-provider.h         |    8 +-
>   lib/netdev-offload-xdp.c              | 1213 +++++++++++++++++++++++++
>   lib/netdev-offload-xdp.h              |   49 +
>   lib/netdev-offload.c                  |   42 +
>   24 files changed, 2881 insertions(+), 12 deletions(-)
>   create mode 100644 bpf/.gitignore
>   create mode 100644 bpf/Makefile.am
>   create mode 100644 bpf/bpf_compiler.h
>   create mode 100644 bpf/bpf_miniflow.h
>   create mode 100644 bpf/bpf_netlink.h
>   create mode 100644 bpf/bpf_workaround.h
>   create mode 100644 bpf/flowtable_afxdp.c
>   create mode 100644 bpf/xdp_noop.c
>   create mode 100644 lib/bpf-util.c
>   create mode 100644 lib/bpf-util.h
>   create mode 100644 lib/netdev-offload-xdp.c
>   create mode 100644 lib/netdev-offload-xdp.h
> 


More information about the dev mailing list