[ovs-dev] [PATCH v4 0/5] XDP offload using flow API provider
Toshiaki Makita
toshiaki.makita1 at gmail.com
Mon Nov 2 16:36:29 UTC 2020
On 2020/11/02 18:37, Numan Siddique wrote:
> On Fri, Oct 30, 2020 at 10:49 AM Toshiaki Makita
> <toshiaki.makita1 at gmail.com> wrote:
>>
>> Hi all,
>>
>> It's about 3 months since I submitted this patch set.
>> Could someone review this?
>> Or should I resubmit the patch set on the top of current master?
>
> Since the patches don't apply cleanly, I think you can rebase and
> repost them and/or provide the
> ovs commit id on top of which these patches apply cleanly.
Hi Numan,
Thank you for the advice!
The patches are based on top of commit e8bf77748 ("odp-util: Fix clearing
match mask if set action is partially unnecessary.").
Actually I provided this information at the bottom of the cover letter.
Also you can pull the changes from
https://github.com/tmakita/ovs.git (xdp_offload_v4 branch).
Thanks,
Toshiaki Makita
>> On 2020/08/15 10:54, Toshiaki Makita wrote:
>>> Ping.
>>> Any feedback is welcome.
>>>
>>> Thanks,
>>> Toshiaki Makita
>>>
>>> On 2020/07/31 11:55, Toshiaki Makita wrote:
>>>> This patch adds an XDP-based flow cache using the OVS netdev-offload
>>>> flow API provider. When an OVS device with XDP offload enabled,
>>>> packets first are processed in the XDP flow cache (with parse, and
>>>> table lookup implemented in eBPF) and if hits, the action processing
>>>> are also done in the context of XDP, which has the minimum overhead.
>>>>
>>>> This provider is based on top of William's recently posted patch for
>>>> custom XDP load. When a custom XDP is loaded, the provider detects if
>>>> the program supports classifier, and if supported it starts offloading
>>>> flows to the XDP program.
>>>>
>>>> The patches are derived from xdp_flow[1], which is a mechanism similar to
>>>> this but implemented in kernel.
>>>>
>>>>
>>>> * Motivation
>>>>
>>>> While userspace datapath using netdev-afxdp or netdev-dpdk shows good
>>>> performance, there are use cases where packets better to be processed in
>>>> kernel, for example, TCP/IP connections, or container to container
>>>> connections. Current solution is to use tap device or af_packet with
>>>> extra kernel-to/from-userspace overhead. But with XDP, a better solution
>>>> is to steer packets earlier in the XDP program, and decides to send to
>>>> userspace datapath or stay in kernel.
>>>>
>>>> One problem with current netdev-afxdp is that it forwards all packets to
>>>> userspace, The first patch from William (netdev-afxdp: Enable loading XDP
>>>> program.) only provides the interface to load XDP program, howerver users
>>>> usually don't know how to write their own XDP program.
>>>>
>>>> XDP also supports HW-offload so it may be possible to offload flows to
>>>> HW through this provider in the future, although not currently.
>>>> The reason is that map-in-map is required for our program to support
>>>> classifier with subtables in XDP, but map-in-map is not offloadable.
>>>> If map-in-map becomes offloadable, HW-offload of our program may also
>>>> be possible.
>>>>
>>>>
>>>> * How to use
>>>>
>>>> 1. Install clang/llvm >= 9, libbpf >= 0.0.6 (included in kernel 5.5), and
>>>> kernel >= 5.3.
>>>>
>>>> 2. make with --enable-afxdp --enable-xdp-offload
>>>> --enable-bpf will generate XDP program "bpf/flowtable_afxdp.o". Note that
>>>> the BPF object will not be installed anywhere by "make install" at this point.
>>>>
>>>> 3. Load custom XDP program
>>>> E.g.
>>>> $ ovs-vsctl add-port ovsbr0 veth0 -- set int veth0 options:xdp-mode=native \
>>>> options:xdp-obj="/path/to/ovs/bpf/flowtable_afxdp.o"
>>>> $ ovs-vsctl add-port ovsbr0 veth1 -- set int veth1 options:xdp-mode=native \
>>>> options:xdp-obj="/path/to/ovs/bpf/flowtable_afxdp.o"
>>>>
>>>> 4. Enable XDP_REDIRECT
>>>> If you use veth devices, make sure to load some (possibly dummy) programs
>>>> on the peers of veth devices. This patch set includes a program which
>>>> does nothing but returns XDP_PASS. You can use it for the veth peer like
>>>> this:
>>>> $ ip link set veth1 xdpdrv object /path/to/ovs/bpf/xdp_noop.o section xdp
>>>>
>>>> Some HW NIC drivers require as many queues as cores on its system. Tweak
>>>> queues using "ethtool -L".
>>>>
>>>> 5. Enable hw-offload
>>>> $ ovs-vsctl set Open_vSwitch . other_config:offload-driver=linux_xdp
>>>> $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
>>>> This will starts offloading flows to the XDP program.
>>>>
>>>> You should be able to see some maps installed, including "debug_stats".
>>>> $ bpftool map
>>>>
>>>> If packets are successfully redirected by the XDP program,
>>>> debug_stats[2] will be counted.
>>>> $ bpftool map dump id <ID of debug_stats>
>>>>
>>>> Currently only very limited keys and output actions are supported.
>>>> For example NORMAL action entry and IP based matching work with current
>>>> key support. VLAN actions used by port tag/trunks are also supported.
>>>>
>>>>
>>>> * Performance
>>>>
>>>> Tested 2 cases. 1) i40e to veth, 2) i40e to i40e.
>>>> Test 1 Measured drop rate at veth interface with redirect action from
>>>> physical interface (i40e 25G NIC, XXV 710) to veth. The CPU is Xeon
>>>> Silver 4114 (2.20 GHz).
>>>> XDP_DROP
>>>> +------+ +-------+ +-------+
>>>> pktgen -- wire --> | eth0 | -- NORMAL ACTION --> | veth0 |----| veth2 |
>>>> +------+ +-------+ +-------+
>>>>
>>>> Test 2 uses i40e instead of veth, and measured tx packet rate at output
>>>> device.
>>>>
>>>> Single-flow performance test results:
>>>>
>>>> 1) i40e-veth
>>>>
>>>> a) no-zerocopy in i40e
>>>>
>>>> - xdp 3.7 Mpps
>>>> - afxdp 980 kpps
>>>>
>>>> b) zerocopy in i40e (veth does not have zc)
>>>>
>>>> - xdp 1.9 Mpps
>>>> - afxdp 980 Kpps
>>>>
>>>> 2) i40e-i40e
>>>>
>>>> a) no-zerocopy
>>>>
>>>> - xdp 3.5 Mpps
>>>> - afxdp 1.5 Mpps
>>>>
>>>> b) zerocopy
>>>>
>>>> - xdp 2.0 Mpps
>>>> - afxdp 4.4 Mpps
>>>>
>>>> ** xdp is better when zc is disabled. The reason of poor performance on zc
>>>> is that xdp_frame requires packet memory allocation and memcpy on
>>>> XDP_REDIRECT to other devices iff zc is enabled.
>>>>
>>>> ** afxdp with zc is better than xdp without zc, but afxdp is using 2 cores
>>>> in this case, one is pmd and the other is softirq. When pmd and softirq
>>>> were running on the same core, the performance was extremely poor as
>>>> pmd consumes cpus. I also tested afxdp-nonpmd to run softirq and
>>>> userspace processing on the same core, but the result was lower than
>>>> (pmd results) / 2.
>>>> With nonpmd, xdp performance was the same as xdp with pmd. This means
>>>> xdp only uses one core (for softirq only). Even with pmd, we need only
>>>> one pmd for xdp even when we want to use more cores for multi-flow.
>>>>
>>>>
>>>> This patch set is based on top of commit e8bf77748 ("odp-util: Fix clearing
>>>> match mask if set action is partially unnecessary.").
>>>>
>>>> To make review easier I left pre-squashed commits from v3 here.
>>>> https://github.com/tmakita/ovs/compare/xdp_offload_v3...tmakita:xdp_offload_v4_history?expand=1
>>>>
>>>> [1] https://lwn.net/Articles/802653/
>>>>
>>>> v4:
>>>> - Fix checkpatch errors.
>>>> - Fix duplicate flow api register.
>>>> - Don't call unnecessary flow api init callbacks when default flow api
>>>> provider can be used.
>>>> - Fix typo in comments.
>>>> - Improve bpf Makefile.am to support automatic dependencies.
>>>> - Add a dummy XDP program for veth peers.
>>>> - Rename netdev_info to netdev_xdp_info.
>>>> - Use id-pool for free subtable entry management and devmap indexes.
>>>> - Rename --enable-bpf to --enable-xdp-offload.
>>>> - Compile xdp flow api provider only with --enable-xdp-offload.
>>>> - Tested again and updated performance numbers in cover letter (get
>>>> slightly better numbers).
>>>>
>>>> v3:
>>>> - Use ".ovs_meta" section to inform vswitchd of metadata like supported
>>>> keys.
>>>> - Rewrite action loop logic in bpf to support multiple actions.
>>>> - Add missing linux/types.h in acinclude.m4, as per William Tu.
>>>> - Fix infinite reconfiguration loop when xsks_map is missing.
>>>> - Add vlan-related actions in bpf program.
>>>> - Fix CI build error.
>>>> - Fix inability to delete subtable entries.
>>>>
>>>> v2:
>>>> - Add uninit callback of netdev-offload-xdp.
>>>> - Introduce "offload-driver" other_config to specify offload driver.
>>>> - Add --enable-bpf (HAVE_BPF) config option to build bpf programs.
>>>> - Workaround incorrect UINTPTR_MAX in x64 clang bpf build.
>>>> - Fix boot.sh autoconf warning.
>>>>
>>>>
>>>> Toshiaki Makita (4):
>>>> netdev-offload: Add "offload-driver" other_config to specify offload
>>>> driver
>>>> netdev-offload: Add xdp flow api provider
>>>> bpf: Add reference XDP program implementation for netdev-offload-xdp
>>>> bpf: Add dummy program for veth devices
>>>>
>>>> William Tu (1):
>>>> netdev-afxdp: Enable loading XDP program.
>>>>
>>>> .travis.yml | 2 +-
>>>> Documentation/intro/install/afxdp.rst | 59 ++
>>>> Makefile.am | 9 +-
>>>> NEWS | 2 +
>>>> acinclude.m4 | 60 ++
>>>> bpf/.gitignore | 4 +
>>>> bpf/Makefile.am | 83 ++
>>>> bpf/bpf_compiler.h | 25 +
>>>> bpf/bpf_miniflow.h | 179 ++++
>>>> bpf/bpf_netlink.h | 63 ++
>>>> bpf/bpf_workaround.h | 28 +
>>>> bpf/flowtable_afxdp.c | 585 ++++++++++++
>>>> bpf/xdp_noop.c | 31 +
>>>> configure.ac | 2 +
>>>> lib/automake.mk | 8 +
>>>> lib/bpf-util.c | 38 +
>>>> lib/bpf-util.h | 22 +
>>>> lib/netdev-afxdp.c | 373 +++++++-
>>>> lib/netdev-afxdp.h | 3 +
>>>> lib/netdev-linux-private.h | 5 +
>>>> lib/netdev-offload-provider.h | 8 +-
>>>> lib/netdev-offload-xdp.c | 1213 +++++++++++++++++++++++++
>>>> lib/netdev-offload-xdp.h | 49 +
>>>> lib/netdev-offload.c | 42 +
>>>> 24 files changed, 2881 insertions(+), 12 deletions(-)
>>>> create mode 100644 bpf/.gitignore
>>>> create mode 100644 bpf/Makefile.am
>>>> create mode 100644 bpf/bpf_compiler.h
>>>> create mode 100644 bpf/bpf_miniflow.h
>>>> create mode 100644 bpf/bpf_netlink.h
>>>> create mode 100644 bpf/bpf_workaround.h
>>>> create mode 100644 bpf/flowtable_afxdp.c
>>>> create mode 100644 bpf/xdp_noop.c
>>>> create mode 100644 lib/bpf-util.c
>>>> create mode 100644 lib/bpf-util.h
>>>> create mode 100644 lib/netdev-offload-xdp.c
>>>> create mode 100644 lib/netdev-offload-xdp.h
>>>>
>> _______________________________________________
>> dev mailing list
>> dev at openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
More information about the dev
mailing list