[ovs-dev] [PATCHv14 2/2] netdev-afxdp: add new netdev type for AF_XDP.
u9012063 at gmail.com
Mon Jul 8 20:50:58 UTC 2019
Thanks for all the feedback!
On Fri, Jul 5, 2019 at 8:32 AM Ilya Maximets <i.maximets at samsung.com> wrote:
> On 01.07.2019 19:08, William Tu wrote:
> > The patch introduces experimental AF_XDP support for OVS netdev.
> > AF_XDP, the Address Family of the eXpress Data Path, is a new Linux socket
> > type built upon the eBPF and XDP technology. It is aims to have comparable
> > performance to DPDK but cooperate better with existing kernel's networking
> > stack. An AF_XDP socket receives and sends packets from an eBPF/XDP program
> > attached to the netdev, by-passing a couple of Linux kernel's subsystems
> > As a result, AF_XDP socket shows much better performance than AF_PACKET
> > For more details about AF_XDP, please see linux kernel's
> > Documentation/networking/af_xdp.rst. Note that by default, this feature is
> > not compiled in.
> > Signed-off-by: William Tu <u9012063 at gmail.com>
> > ---
> > v13-v14
> > - Mainly address issue reported by Ilya
> > https://protect2.fireeye.com/url?k=a0e04e091a64b944.a0e1c546-0896b5863118ce25&u=https://patchwork.ozlabs.org/patch/1118972/
> > when doing 'make check-afxdp'
> > - Fix xdp frame headroom issue
> > - Fix vlan test cases by disabling txvlan offload
> > - Skip cvlan
> > - Document TCP limitation (currently all tcp tests fail due to
> > kernel veth driver)
> Maybe it's better to skip them too? The easy way could be:
> diff --git a/tests/system-afxdp-macros.at b/tests/system-afxdp-macros.at
> index 5ee2ceb1a..f0683c0a9 100644
> --- a/tests/system-afxdp-macros.at
> +++ b/tests/system-afxdp-macros.at
> @@ -30,3 +30,10 @@ m4_define([CONFIGURE_VETH_OFFLOADS],
> AT_CHECK([ethtool -K $1 txvlan off], , [ignore], [ignore])
> +# OVS_START_L7([namespace], [protocol])
> +# AF_XDP doesn't work with TCP over virtual interfaces for now.
> + [AT_SKIP_IF([:])])
> BTW, documentation should stay.
OK Will do it!
I rebase to master, added the above and run
make check-afxdp TESTSUITEFLAGS='-v -x 2'
it hangs at
ip link del ovs-$1
I have to remove it to make it works
@@ -15,7 +15,6 @@ m4_define([ADD_VETH],
if test -n "$6"; then
NS_CHECK_EXEC([$2], [ip route add default via $6])
- on_exit 'ip link del ovs-$1'
I don't know why, but in the end
ip netns del at_ns0
clean up the ovs-p0 device
> > - Fix tunnel test cases due to --disable-system (another patch)
> > - Switch to use pthread_spin_lock, suggested by Ben
> > - Add coverage counter for debugging
> > - Fix buffer starvation issue at batch_send reported by Eelco
> > when using tap device with type=afxdp
> > - TODO:
> > 'make check-afxdp' still not all pass
> > IP fragmentation expiry test not fix yet, need to implement
> > deferral memory free, s.t like dpdk_mp_sweep. Currently hit
> > some missing umem descs when reclaiming.
> > ---
> Instead of inline comments I'll suggest incremental patch (below) with
> following explanations:
> * You still need to return EAGAIN in rxq_recv if socket is not ready.
> Otherwise upper layers will count the call as successful.
> * sendto() in SKB mode will send at most 16 packets, but our batch might
> be up to 32 packets. If batch size will be larger than 16, TX ring will
> be quickly exhausted and transmission will be blocked. We have to kick
> the kernel up to n_packets / 16 times to be sure that all packets are
> transmitted. This fixes my issues with zero traffic in test with iperf
> between 2 namespaces while more than 2 traffic flows involved.
Thanks for getting down to the root cause!
> * EAGAIN is very likely case for sendto.
> * Suggesting to reclaim all CONS_NUM_DESCS addresses from the completion
> queue each time instead of BATCH_SIZE addresses. This will guarantee
> that we'll have enough cq entries for all cases. This is important
> because we may have much more than BATCH_SIZE packets in outstanding_tx.
Good point, thanks.
> * No need to kick_tx inside afxdp_complete_tx since we'll retry more times
> while kicking on transmit.
> * Suggesting warning on exhausted memory pool. This is a bad case and should
> be reported.
> * 'concurrent_txq' is a likely case, because we can't configure tx queues
> separately from rx queues, so we'll likely have number of tx queues less
> than number of threads --> 'concurrent_txq' will likely be true.
> I think that we need to just remove 'OVS_UNLIKELY' macro there.
> * The worst case for the mpool size is larger than just size of the rings.
> There might be more packets currently under processing in OVS, stored
> in output batches in datapath. It's hard to estimate the good size.
> Suggesting just to increase two times for now.
Thanks, I saw your comments above the code.
> Few additional questions/requests:
> * Why we need dev->xdp_flags and dev->xdp_bind_flags? These are not used and
> mostly just duplicates the xdpmode.
I think you're right, we can combine them into one flag right now.
But the reason is AF_XDP can be configured in
driver mode + copy (for those drivers having XDP support but not
af_xdp support), ex:
cfg.bind_flags = XDP_COPY;
cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_DRV_MODE;
Now for OVS netdev-afxdp, I didn't enable this option,
so it's either SKB or DRV+ZEROCOPY.
> * Please, don't declare two structure fields at the same line:
> int xdpmode, requested_xdpmode;
> int xdp_flags, xdp_bind_flags;
More information about the dev