[ovs-dev] [PATCHv18] netdev-afxdp: add new netdev type for AF_XDP.

Ilya Maximets i.maximets at samsung.com
Tue Aug 20 15:20:24 UTC 2019


On 20.08.2019 14:19, Eelco Chaudron wrote:
> 
> 
> On 20 Aug 2019, at 12:10, Ilya Maximets wrote:
> 
>> On 14.08.2019 19:16, William Tu wrote:
>>> On Wed, Aug 14, 2019 at 7:58 AM William Tu <u9012063 at gmail.com> wrote:
>>>>
>>>> On Wed, Aug 14, 2019 at 5:09 AM Eelco Chaudron <echaudro at redhat.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 8 Aug 2019, at 17:38, Ilya Maximets wrote:
>>>>>
>>>>> <SNIP>
>>>>>
>>>>>>>>> I see a rather high number of afxdp_cq_skip, which should to my
>>>>>>>>> knowledge never happen?
>>>>>>>>
>>>>>>>> I tried to investigate this previously, but didn't find anything
>>>>>>>> suspicious.
>>>>>>>> So, for my knowledge, this should never happen too.
>>>>>>>> However, I only looked at the code without actually running, because
>>>>>>>> I had no
>>>>>>>> HW available for testing.
>>>>>>>>
>>>>>>>> While investigation and stress-testing virtual ports I found few
>>>>>>>> issues with
>>>>>>>> missing locking inside the kernel, so there is no trust for kernel
>>>>>>>> part of XDP
>>>>>>>> implementation from my side. I'm suspecting that there are some
>>>>>>>> other bugs in
>>>>>>>> kernel/libbpf that only could be reproduced with driver mode.
>>>>>>>>
>>>>>>>> This never happens for virtual ports with SKB mode, so I never saw
>>>>>>>> this coverage
>>>>>>>> counter being non-zero.
>>>>>>>
>>>>>>> Did some quick debugging, as something else has come up that needs my
>>>>>>> attention :)
>>>>>>>
>>>>>>> But once I’m in a faulty state and sent a single packet, causing
>>>>>>> afxdp_complete_tx() to be called, it tells me 2048 descriptors are
>>>>>>> ready, which is XSK_RING_PROD__DEFAULT_NUM_DESCS. So I guess that
>>>>>>> there might be some ring management bug. Maybe consumer and receiver
>>>>>>> are equal meaning 0 buffers, but it returns max? I did not look at
>>>>>>> the kernel code, so this is just a wild guess :)
>>>>>>>
>>>>>>> (gdb) p tx_done
>>>>>>> $3 = 2048
>>>>>>>
>>>>>>> (gdb) p umem->cq
>>>>>>> $4 = {cached_prod = 3830466864, cached_cons = 3578066899, mask =
>>>>>>> 2047, size = 2048, producer = 0x7f08486b8000, consumer =
>>>>>>> 0x7f08486b8040, ring = 0x7f08486b8080}
>>>>>>
>>>>>> Thanks for debugging!
>>>>>>
>>>>>> xsk_ring_cons__peek() just returns the difference between cached_prod
>>>>>> and cached_cons, but these values are too different:
>>>>>>
>>>>>> 3830466864 - 3578066899 = 252399965
>>>>>>
>>>>>> Since this value > requested, it returns requested number (2048).
>>>>>>
>>>>>> So, the ring is broken. At least broken its 'cached' part. It'll be
>>>>>> good
>>>>>> to look at *consumer and *producer values to verify the state of the
>>>>>> actual ring.
>>>>>>
>>>>>
>>>>> I’ll try to find some more time next week to debug further.
>>>>>
>>>>> William I noticed your email in xdp-newbies where you mention this
>>>>> problem of getting the wrong pointers. Did you ever follow up, or did
>>>>> further trouble shooting on the above?
>>>>
>>>> Yes, I posted here
>>>> https://www.spinics.net/lists/xdp-newbies/msg00956.html
>>>> "Question/Bug about AF_XDP idx_cq from xsk_ring_cons__peek?"
>>>>
>>>> At that time I was thinking about reproducing the problem using the
>>>> xdpsock sample code from kernel. But turned out that my reproduction
>>>> code is not correct, so not able to show the case we hit here in OVS.
>>>>
>>>> Then I put more similar code logic from OVS to xdpsock, but the problem
>>>> does not show up. As a result, I worked around it by marking addr as
>>>> "*addr == UINT64_MAX".
>>>>
>>>> I will debug again this week once I get my testbed back.
>>>>
>>> Just to refresh my memory. The original issue is that
>>> when calling:
>>> tx_done = xsk_ring_cons__peek(&umem->cq, CONS_NUM_DESCS, &idx_cq);
>>> xsk_ring_cons__release(&umem->cq, tx_done);
>>>
>>> I expect there are 'tx_done' elems on the CQ to re-cycle back to memory pool.
>>> However, when I inspect these elems, I found some elems that 'already' been
>>> reported complete last time I call xsk_ring_cons__peek. In other word, some
>>> elems show up at CQ twice. And this cause overflow of the mempool.
>>>
>>> Thus, mark the elems on CQ as UINT64_MAX to indicate that we already
>>> seen this elem.
>>
>> William, Eelco, which HW NIC you're using? Which kernel driver?
> 
> I’m using the below on the latest bpf-next driver:
> 
> 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
> 01:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)

Thanks for information.
I found one suspicious place inside the ixgbe driver that could break
the completion queue ring and prepared a patch:
    https://patchwork.ozlabs.org/patch/1150244/

It'll be good if you can test it.

Best regards, Ilya Maximets.


More information about the dev mailing list