[ovs-dev] [PATCHv18] netdev-afxdp: add new netdev type for AF_XDP.

Ilya Maximets i.maximets at samsung.com
Thu Aug 8 12:09:55 UTC 2019

On 08.08.2019 14:42, Eelco Chaudron wrote:
> On 19 Jul 2019, at 16:54, Ilya Maximets wrote:
>> On 18.07.2019 23:11, William Tu wrote:
>>> The patch introduces experimental AF_XDP support for OVS netdev.
>>> AF_XDP, the Address Family of the eXpress Data Path, is a new Linux socket
>>> type built upon the eBPF and XDP technology.  It is aims to have comparable
>>> performance to DPDK but cooperate better with existing kernel's networking
>>> stack.  An AF_XDP socket receives and sends packets from an eBPF/XDP program
>>> attached to the netdev, by-passing a couple of Linux kernel's subsystems
>>> As a result, AF_XDP socket shows much better performance than AF_PACKET
>>> For more details about AF_XDP, please see linux kernel's
>>> Documentation/networking/af_xdp.rst. Note that by default, this feature is
>>> not compiled in.
>>> Signed-off-by: William Tu <u9012063 at gmail.com>
>> Thanks, William, Eelco and Ben!
>> I fixed couple of things and applied to master!
> Good to see this got merged into master while on PTO. However, when I got back I decided to test it once more…
> When testing PVP I got a couple of packets trough, and then it would stall. I thought it might be my kernel, so updated to yesterdays latest, no luck…
> I did see a bunch of “eno1: send failed due to exhausted memory pool.” messages in the log. Putting back patch v14, made my problems go away…
> After some debugging, I noticed the problem was with the “continue” case in the afxdp_complete_tx() function.
> Applying the following patch made it work again:
> diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
> index b7cc0d988..9b335ddf0 100644
> --- a/lib/netdev-afxdp.c
> +++ b/lib/netdev-afxdp.c
> @@ -823,16 +823,21 @@ afxdp_complete_tx(struct xsk_socket_info *xsk_info)
>          if (tx_to_free == BATCH_SIZE || j == tx_done - 1) {
>              umem_elem_push_n(&umem->mpool, tx_to_free, elems_push);
>              xsk_info->outstanding_tx -= tx_to_free;
>              tx_to_free = 0;
>          }
>      }
> +    if (tx_to_free) {
> +        umem_elem_push_n(&umem->mpool, tx_to_free, elems_push);
> +        xsk_info->outstanding_tx -= tx_to_free;
> +    }
> +
>      if (tx_done > 0) {
>          xsk_ring_cons__release(&umem->cq, tx_done);
>      } else {
>          COVERAGE_INC(afxdp_cq_empty);
>      }
>  }

Good catch! Will you submit a patch?
BTW, to reduce the code duplication I'd suggest to remove the 'continue'
like this:

    if (*addr != UINT64_MAX) {
        Do work;
    } else {

> Which made me wonder why we do mark elements as being used? To my knowledge (and looking at some of the code and examples), after the  xsk_ring_cons__release() function a xsk_ring_cons__peek() should not receive any duplicate slots.
> I see a rather high number of afxdp_cq_skip, which should to my knowledge never happen?

I tried to investigate this previously, but didn't find anything suspicious.
So, for my knowledge, this should never happen too.
However, I only looked at the code without actually running, because I had no
HW available for testing.

While investigation and stress-testing virtual ports I found few issues with
missing locking inside the kernel, so there is no trust for kernel part of XDP
implementation from my side. I'm suspecting that there are some other bugs in
kernel/libbpf that only could be reproduced with driver mode.

This never happens for virtual ports with SKB mode, so I never saw this coverage
counter being non-zero.

> $ ovs-appctl coverage/show  | grep xdp
> afxdp_cq_empty             0.0/sec   339.600/sec        5.6606/sec   total: 20378
> afxdp_tx_full              0.0/sec    29.967/sec        0.4994/sec   total: 1798
> afxdp_cq_skip              0.0/sec 61884770.167/sec  1174238.3644/sec   total: 4227258112
> You mentioned you saw this high number in your v15 change notes, did you do any research on why?
> Cheers,
> Eelco

More information about the dev mailing list