[ovs-dev] [PATCHv14 2/2] netdev-afxdp: add new netdev type for AF_XDP.

William Tu u9012063 at gmail.com
Mon Jul 8 20:50:58 UTC 2019


Hi Ilya,

Thanks for all the feedback!

On Fri, Jul 5, 2019 at 8:32 AM Ilya Maximets <i.maximets at samsung.com> wrote:
>
> On 01.07.2019 19:08, William Tu wrote:
> > The patch introduces experimental AF_XDP support for OVS netdev.
> > AF_XDP, the Address Family of the eXpress Data Path, is a new Linux socket
> > type built upon the eBPF and XDP technology.  It is aims to have comparable
> > performance to DPDK but cooperate better with existing kernel's networking
> > stack.  An AF_XDP socket receives and sends packets from an eBPF/XDP program
> > attached to the netdev, by-passing a couple of Linux kernel's subsystems
> > As a result, AF_XDP socket shows much better performance than AF_PACKET
> > For more details about AF_XDP, please see linux kernel's
> > Documentation/networking/af_xdp.rst. Note that by default, this feature is
> > not compiled in.
> >
> > Signed-off-by: William Tu <u9012063 at gmail.com>
> > ---
> >
> > v13-v14
> > - Mainly address issue reported by Ilya
> >   https://protect2.fireeye.com/url?k=a0e04e091a64b944.a0e1c546-0896b5863118ce25&u=https://patchwork.ozlabs.org/patch/1118972/
> >   when doing 'make check-afxdp'
> > - Fix xdp frame headroom issue
> > - Fix vlan test cases by disabling txvlan offload
> > - Skip cvlan
> > - Document TCP limitation (currently all tcp tests fail due to
> >   kernel veth driver)
>
> Maybe it's better to skip them too? The easy way could be:
>
> diff --git a/tests/system-afxdp-macros.at b/tests/system-afxdp-macros.at
> index 5ee2ceb1a..f0683c0a9 100644
> --- a/tests/system-afxdp-macros.at
> +++ b/tests/system-afxdp-macros.at
> @@ -30,3 +30,10 @@ m4_define([CONFIGURE_VETH_OFFLOADS],
>       AT_CHECK([ethtool -K $1 txvlan off], [0], [ignore], [ignore])
>      ]
>  )
> +
> +# OVS_START_L7([namespace], [protocol])
> +#
> +# AF_XDP doesn't work with TCP over virtual interfaces for now.
> +#
> +m4_define([OVS_START_L7],
> +   [AT_SKIP_IF([:])])
> ---
>
> BTW, documentation should stay.
>
OK Will do it!

I rebase to master, added the above and run
make check-afxdp TESTSUITEFLAGS='-v -x 2'
it hangs at
ip link del ovs-$1
I have to remove it to make it works
--- a/tests/system-afxdp-macros.at
+++ b/tests/system-afxdp-macros.at
@@ -15,7 +15,6 @@ m4_define([ADD_VETH],
       if test -n "$6"; then
         NS_CHECK_EXEC([$2], [ip route add default via $6])
       fi
-      on_exit 'ip link del ovs-$1'
     ]
 )
I don't know why, but in the end
ip netns del at_ns0
clean up the ovs-p0 device

> > - Fix tunnel test cases due to --disable-system (another patch)
> > - Switch to use pthread_spin_lock, suggested by Ben
> > - Add coverage counter for debugging
> > - Fix buffer starvation issue at batch_send reported by Eelco
> >   when using tap device with type=afxdp
> > - TODO:
> >   'make check-afxdp' still not all pass
> >   IP fragmentation expiry test not fix yet, need to implement
> >   deferral memory free, s.t like dpdk_mp_sweep.  Currently hit
> >   some missing umem descs when reclaiming.
> > ---
>
> Instead of inline comments I'll suggest incremental patch (below) with
> following explanations:
>
> * You still need to return EAGAIN in rxq_recv if socket is not ready.
>   Otherwise upper layers will count the call as successful.
>
> * sendto() in SKB mode will send at most 16 packets, but our batch might
>   be up to 32 packets. If batch size will be larger than 16, TX ring will
>   be quickly exhausted and transmission will be blocked. We have to kick
>   the kernel up to n_packets / 16 times to be sure that all packets are
>   transmitted. This fixes my issues with zero traffic in test with iperf
>   between 2 namespaces while more than 2 traffic flows involved.
>
Thanks for getting down to the root cause!

> * EAGAIN is very likely case for sendto.
>
> * Suggesting to reclaim all CONS_NUM_DESCS addresses from the completion
>   queue each time instead of BATCH_SIZE addresses. This will guarantee
>   that we'll have enough cq entries for all cases. This is important
>   because we may have much more than BATCH_SIZE packets in outstanding_tx.

Good point, thanks.

>
> * No need to kick_tx inside afxdp_complete_tx since we'll retry more times
>   while kicking on transmit.
>
> * Suggesting warning on exhausted memory pool. This is a bad case and should
>   be reported.
>
> * 'concurrent_txq' is a likely case, because we can't configure tx queues
>   separately from rx queues, so we'll likely have number of tx queues less
>   than number of threads --> 'concurrent_txq' will likely be true.
>   I think that we need to just remove 'OVS_UNLIKELY' macro there.
>
> * The worst case for the mpool size is larger than just size of the rings.
>   There might be more packets currently under processing in OVS, stored
>   in output batches in datapath. It's hard to estimate the good size.
>   Suggesting just to increase two times for now.

Thanks, I saw your comments above the code.

<snip>
>
>
> Few additional questions/requests:
>
> * Why we need dev->xdp_flags and dev->xdp_bind_flags? These are not used and
>   mostly just duplicates the xdpmode.

I think you're right, we can combine them into one flag right now.
But the reason is AF_XDP can be configured in
driver mode + copy (for those drivers having XDP support but not
af_xdp support), ex:
     cfg.bind_flags = XDP_COPY;
     cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_DRV_MODE;

Now for OVS netdev-afxdp, I didn't enable this option,
so it's either SKB or DRV+ZEROCOPY.

>
> * Please, don't declare two structure fields at the same line:
>
>   int xdpmode, requested_xdpmode;
>   int xdp_flags, xdp_bind_flags;
>
OK Thanks!
Regards,
William


More information about the dev mailing list