[ovs-dev] [PATCHv15 2/2] netdev-afxdp: add new netdev type for AF_XDP.

William Tu u9012063 at gmail.com
Fri Jul 12 23:15:24 UTC 2019


On Thu, Jul 11, 2019 at 6:42 AM Ilya Maximets <i.maximets at samsung.com> wrote:
>
> On 09.07.2019 22:35, William Tu wrote:
> > The patch introduces experimental AF_XDP support for OVS netdev.
> > AF_XDP, the Address Family of the eXpress Data Path, is a new Linux socket
> > type built upon the eBPF and XDP technology.  It is aims to have comparable
> > performance to DPDK but cooperate better with existing kernel's networking
> > stack.  An AF_XDP socket receives and sends packets from an eBPF/XDP program
> > attached to the netdev, by-passing a couple of Linux kernel's subsystems
> > As a result, AF_XDP socket shows much better performance than AF_PACKET
> > For more details about AF_XDP, please see linux kernel's
> > Documentation/networking/af_xdp.rst. Note that by default, this feature is
> > not compiled in.
> >
> > Signed-off-by: William Tu <u9012063 at gmail.com>
> > ---
> > v14:
> >  * Mainly address issue reported by Ilya
> >    https://protect2.fireeye.com/url?k=0b6c291c248670fb.0b6da253-6021601b254970fd&u=https://patchwork.ozlabs.org/patch/1118972/
> >    when doing 'make check-afxdp'
> >  * Fix xdp frame headroom issue
> >  * Fix vlan test cases by disabling txvlan offload
> >  * Skip cvlan
> >  * Document TCP limitation (currently all tcp tests fail due to
> >    kernel veth driver)
> >  * Fix tunnel test cases due to --disable-system (another patch)
> >  * Switch to use pthread_spin_lock, suggested by Ben
> >  * Add coverage counter for debugging
> >  * Fix buffer starvation issue at batch_send reported by Eelco
> >    when using tap device with type=afxdp
> >
> > v15:
> >  * address review feedback from Ilay
> >    https://protect2.fireeye.com/url?k=ceb755d3074c79a5.ceb6de9c-b1b2f6a490a479b8&u=https://patchwork.ozlabs.org/patch/1125476/
> >  * skip TCP related test cases
> >  * reclaim all CONS_NUM_DESC at complete tx
> >  * add retries to kick_tx
> >  * increase memory pool size
> >  * remove redundant xdp flag and bind flag
> >  * remove unused rx_dropped var
> >  * make tx_dropped counter atomic
> >  * refactor dp_packet_init_afxdp using dp_packet_init__
> >  * rebase to ovs master, test with latest bpf-next kernel commit b14a260e33ddb4
> >    Ilya's kernel patches are required
> >    commit 455302d1c9ae ("xdp: fix hang while unregistering device bound to xdp socket")
> >    commit 162c820ed896 ("xdp: hold device for umem regardless of zero-copy mode")
> >  Possible issues:
> >  * still lots of afxdp_cq_skip  (ovs-appctl coverage/show)
> >     afxdp_cq_skip  44325273.6/sec 34362312.683/sec   572705.2114/sec   total: 2106010377
> >  * TODO:
> >    'make check-afxdp' still not all pass
> >    IP fragmentation expiry test not fix yet, need to implement
> >    deferral memory free, s.t like dpdk_mp_sweep.  Currently hit
> >    some missing umem descs when reclaiming.
>
> Hi. Regarding this issue: We don't need to reclaim everything from the rings.
> We only need to count number of descriptors that are currently in rings.
> When we're xlosing xdp socket kernel stops processing rings, also, all the
> buffers in the rings are buffers from current umem. So, we could just count them
> and wait for the number of elements in umem pool to become (size - n_packets_in_rings).
> 'outstanding_tx' already counts all the packets that are in TX and CQ or in the
> middle of processing in kernel. If we'll count the same way number of packets
> in RX and FQ, we'll know the total number of buffers currently in kernel.

Thanks!
I think this idea is great.
I tried to reclaim descriptors from rx, tx, cq, fq but did not
get a consistent number. I will apply your diff below.

>
> It might be hard or even impossible to reclaim all the packets from rings
> because kernel updates consumer/producer heads not for every packet and it
> depends on the kernel implementation in which state rings will be after the
> closing of the socket.
>

Thanks
William

<snip>


More information about the dev mailing list