[ovs-dev] [PATCHv18] netdev-afxdp: add new netdev type for AF_XDP.

Ilya Maximets i.maximets at samsung.com
Thu Aug 8 15:38:42 UTC 2019


On 08.08.2019 17:53, Eelco Chaudron wrote:
> 
> 
> On 8 Aug 2019, at 14:09, Ilya Maximets wrote:
> 
>> On 08.08.2019 14:42, Eelco Chaudron wrote:
>>>
>>>
>>> On 19 Jul 2019, at 16:54, Ilya Maximets wrote:
>>>
>>>> On 18.07.2019 23:11, William Tu wrote:
>>>>> The patch introduces experimental AF_XDP support for OVS netdev.
>>>>> AF_XDP, the Address Family of the eXpress Data Path, is a new Linux socket
>>>>> type built upon the eBPF and XDP technology.  It is aims to have comparable
>>>>> performance to DPDK but cooperate better with existing kernel's networking
>>>>> stack.  An AF_XDP socket receives and sends packets from an eBPF/XDP program
>>>>> attached to the netdev, by-passing a couple of Linux kernel's subsystems
>>>>> As a result, AF_XDP socket shows much better performance than AF_PACKET
>>>>> For more details about AF_XDP, please see linux kernel's
>>>>> Documentation/networking/af_xdp.rst. Note that by default, this feature is
>>>>> not compiled in.
>>>>>
>>>>> Signed-off-by: William Tu <u9012063 at gmail.com>
>>>>
>>>>
>>>> Thanks, William, Eelco and Ben!
>>>>
>>>> I fixed couple of things and applied to master!
>>>
>>> Good to see this got merged into master while on PTO. However, when I got back I decided to test it once more…
>>>
>>> When testing PVP I got a couple of packets trough, and then it would stall. I thought it might be my kernel, so updated to yesterdays latest, no luck…
>>>
>>> I did see a bunch of “eno1: send failed due to exhausted memory pool.” messages in the log. Putting back patch v14, made my problems go away…
>>>
>>> After some debugging, I noticed the problem was with the “continue” case in the afxdp_complete_tx() function.
>>> Applying the following patch made it work again:
>>>
>>> diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
>>> index b7cc0d988..9b335ddf0 100644
>>> --- a/lib/netdev-afxdp.c
>>> +++ b/lib/netdev-afxdp.c
>>> @@ -823,16 +823,21 @@ afxdp_complete_tx(struct xsk_socket_info *xsk_info)
>>>
>>>          if (tx_to_free == BATCH_SIZE || j == tx_done - 1) {
>>>              umem_elem_push_n(&umem->mpool, tx_to_free, elems_push);
>>>              xsk_info->outstanding_tx -= tx_to_free;
>>>              tx_to_free = 0;
>>>          }
>>>      }
>>>
>>> +    if (tx_to_free) {
>>> +        umem_elem_push_n(&umem->mpool, tx_to_free, elems_push);
>>> +        xsk_info->outstanding_tx -= tx_to_free;
>>> +    }
>>> +
>>>      if (tx_done > 0) {
>>>          xsk_ring_cons__release(&umem->cq, tx_done);
>>>      } else {
>>>          COVERAGE_INC(afxdp_cq_empty);
>>>      }
>>>  }
>>
>> Good catch! Will you submit a patch?
>> BTW, to reduce the code duplication I'd suggest to remove the 'continue'
>> like this:
>>
>>     if (*addr != UINT64_MAX) {
>>         Do work;
>>     } else {
>>         COVERAGE_INC(afxdp_cq_skip);
>>     }
> 
> Done, patch has been sent out…
> 
>>>
>>>
>>> Which made me wonder why we do mark elements as being used? To my knowledge (and looking at some of the code and examples), after the  xsk_ring_cons__release() function a xsk_ring_cons__peek() should not receive any duplicate slots.
>>>
>>> I see a rather high number of afxdp_cq_skip, which should to my knowledge never happen?
>>
>> I tried to investigate this previously, but didn't find anything suspicious.
>> So, for my knowledge, this should never happen too.
>> However, I only looked at the code without actually running, because I had no
>> HW available for testing.
>>
>> While investigation and stress-testing virtual ports I found few issues with
>> missing locking inside the kernel, so there is no trust for kernel part of XDP
>> implementation from my side. I'm suspecting that there are some other bugs in
>> kernel/libbpf that only could be reproduced with driver mode.
>>
>> This never happens for virtual ports with SKB mode, so I never saw this coverage
>> counter being non-zero.
> 
> Did some quick debugging, as something else has come up that needs my attention :)
> 
> But once I’m in a faulty state and sent a single packet, causing afxdp_complete_tx() to be called, it tells me 2048 descriptors are ready, which is XSK_RING_PROD__DEFAULT_NUM_DESCS. So I guess that there might be some ring management bug. Maybe consumer and receiver are equal meaning 0 buffers, but it returns max? I did not look at the kernel code, so this is just a wild guess :)
> 
> (gdb) p tx_done
> $3 = 2048
> 
> (gdb) p umem->cq
> $4 = {cached_prod = 3830466864, cached_cons = 3578066899, mask = 2047, size = 2048, producer = 0x7f08486b8000, consumer = 0x7f08486b8040, ring = 0x7f08486b8080}

Thanks for debugging!

xsk_ring_cons__peek() just returns the difference between cached_prod
and cached_cons, but these values are too different:

3830466864 - 3578066899 = 252399965

Since this value > requested, it returns requested number (2048).

So, the ring is broken. At least broken its 'cached' part. It'll be good
to look at *consumer and *producer values to verify the state of the
actual ring.

> 
>>>
>>> $ ovs-appctl coverage/show  | grep xdp
>>> afxdp_cq_empty             0.0/sec   339.600/sec        5.6606/sec   total: 20378
>>> afxdp_tx_full              0.0/sec    29.967/sec        0.4994/sec   total: 1798
>>> afxdp_cq_skip              0.0/sec 61884770.167/sec  1174238.3644/sec   total: 4227258112
>>>
>>>
>>> You mentioned you saw this high number in your v15 change notes, did you do any research on why?
>>>
>>> Cheers,
>>>
>>> Eelco
> 
> 


More information about the dev mailing list