[ovs-dev] [PATCHv18] netdev-afxdp: add new netdev type for AF_XDP.

Eelco Chaudron echaudro at redhat.com
Tue Aug 20 11:19:11 UTC 2019



On 20 Aug 2019, at 12:10, Ilya Maximets wrote:

> On 14.08.2019 19:16, William Tu wrote:
>> On Wed, Aug 14, 2019 at 7:58 AM William Tu <u9012063 at gmail.com> 
>> wrote:
>>>
>>> On Wed, Aug 14, 2019 at 5:09 AM Eelco Chaudron <echaudro at redhat.com> 
>>> wrote:
>>>>
>>>>
>>>>
>>>> On 8 Aug 2019, at 17:38, Ilya Maximets wrote:
>>>>
>>>> <SNIP>
>>>>
>>>>>>>> I see a rather high number of afxdp_cq_skip, which should to my
>>>>>>>> knowledge never happen?
>>>>>>>
>>>>>>> I tried to investigate this previously, but didn't find anything
>>>>>>> suspicious.
>>>>>>> So, for my knowledge, this should never happen too.
>>>>>>> However, I only looked at the code without actually running, 
>>>>>>> because
>>>>>>> I had no
>>>>>>> HW available for testing.
>>>>>>>
>>>>>>> While investigation and stress-testing virtual ports I found few
>>>>>>> issues with
>>>>>>> missing locking inside the kernel, so there is no trust for 
>>>>>>> kernel
>>>>>>> part of XDP
>>>>>>> implementation from my side. I'm suspecting that there are some
>>>>>>> other bugs in
>>>>>>> kernel/libbpf that only could be reproduced with driver mode.
>>>>>>>
>>>>>>> This never happens for virtual ports with SKB mode, so I never 
>>>>>>> saw
>>>>>>> this coverage
>>>>>>> counter being non-zero.
>>>>>>
>>>>>> Did some quick debugging, as something else has come up that 
>>>>>> needs my
>>>>>> attention :)
>>>>>>
>>>>>> But once I’m in a faulty state and sent a single packet, 
>>>>>> causing
>>>>>> afxdp_complete_tx() to be called, it tells me 2048 descriptors 
>>>>>> are
>>>>>> ready, which is XSK_RING_PROD__DEFAULT_NUM_DESCS. So I guess that
>>>>>> there might be some ring management bug. Maybe consumer and 
>>>>>> receiver
>>>>>> are equal meaning 0 buffers, but it returns max? I did not look 
>>>>>> at
>>>>>> the kernel code, so this is just a wild guess :)
>>>>>>
>>>>>> (gdb) p tx_done
>>>>>> $3 = 2048
>>>>>>
>>>>>> (gdb) p umem->cq
>>>>>> $4 = {cached_prod = 3830466864, cached_cons = 3578066899, mask =
>>>>>> 2047, size = 2048, producer = 0x7f08486b8000, consumer =
>>>>>> 0x7f08486b8040, ring = 0x7f08486b8080}
>>>>>
>>>>> Thanks for debugging!
>>>>>
>>>>> xsk_ring_cons__peek() just returns the difference between 
>>>>> cached_prod
>>>>> and cached_cons, but these values are too different:
>>>>>
>>>>> 3830466864 - 3578066899 = 252399965
>>>>>
>>>>> Since this value > requested, it returns requested number (2048).
>>>>>
>>>>> So, the ring is broken. At least broken its 'cached' part. It'll 
>>>>> be
>>>>> good
>>>>> to look at *consumer and *producer values to verify the state of 
>>>>> the
>>>>> actual ring.
>>>>>
>>>>
>>>> I’ll try to find some more time next week to debug further.
>>>>
>>>> William I noticed your email in xdp-newbies where you mention this
>>>> problem of getting the wrong pointers. Did you ever follow up, or 
>>>> did
>>>> further trouble shooting on the above?
>>>
>>> Yes, I posted here
>>> https://www.spinics.net/lists/xdp-newbies/msg00956.html
>>> "Question/Bug about AF_XDP idx_cq from xsk_ring_cons__peek?"
>>>
>>> At that time I was thinking about reproducing the problem using the
>>> xdpsock sample code from kernel. But turned out that my reproduction
>>> code is not correct, so not able to show the case we hit here in 
>>> OVS.
>>>
>>> Then I put more similar code logic from OVS to xdpsock, but the 
>>> problem
>>> does not show up. As a result, I worked around it by marking addr as
>>> "*addr == UINT64_MAX".
>>>
>>> I will debug again this week once I get my testbed back.
>>>
>> Just to refresh my memory. The original issue is that
>> when calling:
>> tx_done = xsk_ring_cons__peek(&umem->cq, CONS_NUM_DESCS, &idx_cq);
>> xsk_ring_cons__release(&umem->cq, tx_done);
>>
>> I expect there are 'tx_done' elems on the CQ to re-cycle back to 
>> memory pool.
>> However, when I inspect these elems, I found some elems that 
>> 'already' been
>> reported complete last time I call xsk_ring_cons__peek. In other 
>> word, some
>> elems show up at CQ twice. And this cause overflow of the mempool.
>>
>> Thus, mark the elems on CQ as UINT64_MAX to indicate that we already
>> seen this elem.
>
> William, Eelco, which HW NIC you're using? Which kernel driver?

I’m using the below on the latest bpf-next driver:

01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit 
SFI/SFP+ Network Connection (rev 01)
01:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit 
SFI/SFP+ Network Connection (rev 01)

//Eelco



More information about the dev mailing list