[ovs-dev] [PATCHv18] netdev-afxdp: add new netdev type for AF_XDP.

Ilya Maximets i.maximets at samsung.com
Tue Aug 20 10:10:04 UTC 2019


On 14.08.2019 19:16, William Tu wrote:
> On Wed, Aug 14, 2019 at 7:58 AM William Tu <u9012063 at gmail.com> wrote:
>>
>> On Wed, Aug 14, 2019 at 5:09 AM Eelco Chaudron <echaudro at redhat.com> wrote:
>>>
>>>
>>>
>>> On 8 Aug 2019, at 17:38, Ilya Maximets wrote:
>>>
>>> <SNIP>
>>>
>>>>>>> I see a rather high number of afxdp_cq_skip, which should to my
>>>>>>> knowledge never happen?
>>>>>>
>>>>>> I tried to investigate this previously, but didn't find anything
>>>>>> suspicious.
>>>>>> So, for my knowledge, this should never happen too.
>>>>>> However, I only looked at the code without actually running, because
>>>>>> I had no
>>>>>> HW available for testing.
>>>>>>
>>>>>> While investigation and stress-testing virtual ports I found few
>>>>>> issues with
>>>>>> missing locking inside the kernel, so there is no trust for kernel
>>>>>> part of XDP
>>>>>> implementation from my side. I'm suspecting that there are some
>>>>>> other bugs in
>>>>>> kernel/libbpf that only could be reproduced with driver mode.
>>>>>>
>>>>>> This never happens for virtual ports with SKB mode, so I never saw
>>>>>> this coverage
>>>>>> counter being non-zero.
>>>>>
>>>>> Did some quick debugging, as something else has come up that needs my
>>>>> attention :)
>>>>>
>>>>> But once I’m in a faulty state and sent a single packet, causing
>>>>> afxdp_complete_tx() to be called, it tells me 2048 descriptors are
>>>>> ready, which is XSK_RING_PROD__DEFAULT_NUM_DESCS. So I guess that
>>>>> there might be some ring management bug. Maybe consumer and receiver
>>>>> are equal meaning 0 buffers, but it returns max? I did not look at
>>>>> the kernel code, so this is just a wild guess :)
>>>>>
>>>>> (gdb) p tx_done
>>>>> $3 = 2048
>>>>>
>>>>> (gdb) p umem->cq
>>>>> $4 = {cached_prod = 3830466864, cached_cons = 3578066899, mask =
>>>>> 2047, size = 2048, producer = 0x7f08486b8000, consumer =
>>>>> 0x7f08486b8040, ring = 0x7f08486b8080}
>>>>
>>>> Thanks for debugging!
>>>>
>>>> xsk_ring_cons__peek() just returns the difference between cached_prod
>>>> and cached_cons, but these values are too different:
>>>>
>>>> 3830466864 - 3578066899 = 252399965
>>>>
>>>> Since this value > requested, it returns requested number (2048).
>>>>
>>>> So, the ring is broken. At least broken its 'cached' part. It'll be
>>>> good
>>>> to look at *consumer and *producer values to verify the state of the
>>>> actual ring.
>>>>
>>>
>>> I’ll try to find some more time next week to debug further.
>>>
>>> William I noticed your email in xdp-newbies where you mention this
>>> problem of getting the wrong pointers. Did you ever follow up, or did
>>> further trouble shooting on the above?
>>
>> Yes, I posted here
>> https://www.spinics.net/lists/xdp-newbies/msg00956.html
>> "Question/Bug about AF_XDP idx_cq from xsk_ring_cons__peek?"
>>
>> At that time I was thinking about reproducing the problem using the
>> xdpsock sample code from kernel. But turned out that my reproduction
>> code is not correct, so not able to show the case we hit here in OVS.
>>
>> Then I put more similar code logic from OVS to xdpsock, but the problem
>> does not show up. As a result, I worked around it by marking addr as
>> "*addr == UINT64_MAX".
>>
>> I will debug again this week once I get my testbed back.
>>
> Just to refresh my memory. The original issue is that
> when calling:
> tx_done = xsk_ring_cons__peek(&umem->cq, CONS_NUM_DESCS, &idx_cq);
> xsk_ring_cons__release(&umem->cq, tx_done);
> 
> I expect there are 'tx_done' elems on the CQ to re-cycle back to memory pool.
> However, when I inspect these elems, I found some elems that 'already' been
> reported complete last time I call xsk_ring_cons__peek. In other word, some
> elems show up at CQ twice. And this cause overflow of the mempool.
> 
> Thus, mark the elems on CQ as UINT64_MAX to indicate that we already
> seen this elem.

William, Eelco, which HW NIC you're using? Which kernel driver?

Best regards, Ilya Maximets.


More information about the dev mailing list