[ovs-dev] [PATCHv18] netdev-afxdp: add new netdev type for AF_XDP.

Eelco Chaudron echaudro at redhat.com
Wed Aug 21 09:31:52 UTC 2019



On 20 Aug 2019, at 17:20, Ilya Maximets wrote:

> On 20.08.2019 14:19, Eelco Chaudron wrote:
>>
>>
>> On 20 Aug 2019, at 12:10, Ilya Maximets wrote:
>>
>>> On 14.08.2019 19:16, William Tu wrote:
>>>> On Wed, Aug 14, 2019 at 7:58 AM William Tu <u9012063 at gmail.com> 
>>>> wrote:
>>>>>
>>>>> On Wed, Aug 14, 2019 at 5:09 AM Eelco Chaudron 
>>>>> <echaudro at redhat.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 8 Aug 2019, at 17:38, Ilya Maximets wrote:
>>>>>>
>>>>>> <SNIP>
>>>>>>
>>>>>>>>>> I see a rather high number of afxdp_cq_skip, which should to 
>>>>>>>>>> my
>>>>>>>>>> knowledge never happen?
>>>>>>>>>
>>>>>>>>> I tried to investigate this previously, but didn't find 
>>>>>>>>> anything
>>>>>>>>> suspicious.
>>>>>>>>> So, for my knowledge, this should never happen too.
>>>>>>>>> However, I only looked at the code without actually running, 
>>>>>>>>> because
>>>>>>>>> I had no
>>>>>>>>> HW available for testing.
>>>>>>>>>
>>>>>>>>> While investigation and stress-testing virtual ports I found 
>>>>>>>>> few
>>>>>>>>> issues with
>>>>>>>>> missing locking inside the kernel, so there is no trust for 
>>>>>>>>> kernel
>>>>>>>>> part of XDP
>>>>>>>>> implementation from my side. I'm suspecting that there are 
>>>>>>>>> some
>>>>>>>>> other bugs in
>>>>>>>>> kernel/libbpf that only could be reproduced with driver mode.
>>>>>>>>>
>>>>>>>>> This never happens for virtual ports with SKB mode, so I never 
>>>>>>>>> saw
>>>>>>>>> this coverage
>>>>>>>>> counter being non-zero.
>>>>>>>>
>>>>>>>> Did some quick debugging, as something else has come up that 
>>>>>>>> needs my
>>>>>>>> attention :)
>>>>>>>>
>>>>>>>> But once I’m in a faulty state and sent a single packet, 
>>>>>>>> causing
>>>>>>>> afxdp_complete_tx() to be called, it tells me 2048 descriptors 
>>>>>>>> are
>>>>>>>> ready, which is XSK_RING_PROD__DEFAULT_NUM_DESCS. So I guess 
>>>>>>>> that
>>>>>>>> there might be some ring management bug. Maybe consumer and 
>>>>>>>> receiver
>>>>>>>> are equal meaning 0 buffers, but it returns max? I did not look 
>>>>>>>> at
>>>>>>>> the kernel code, so this is just a wild guess :)
>>>>>>>>
>>>>>>>> (gdb) p tx_done
>>>>>>>> $3 = 2048
>>>>>>>>
>>>>>>>> (gdb) p umem->cq
>>>>>>>> $4 = {cached_prod = 3830466864, cached_cons = 3578066899, mask 
>>>>>>>> =
>>>>>>>> 2047, size = 2048, producer = 0x7f08486b8000, consumer =
>>>>>>>> 0x7f08486b8040, ring = 0x7f08486b8080}
>>>>>>>
>>>>>>> Thanks for debugging!
>>>>>>>
>>>>>>> xsk_ring_cons__peek() just returns the difference between 
>>>>>>> cached_prod
>>>>>>> and cached_cons, but these values are too different:
>>>>>>>
>>>>>>> 3830466864 - 3578066899 = 252399965
>>>>>>>
>>>>>>> Since this value > requested, it returns requested number 
>>>>>>> (2048).
>>>>>>>
>>>>>>> So, the ring is broken. At least broken its 'cached' part. It'll 
>>>>>>> be
>>>>>>> good
>>>>>>> to look at *consumer and *producer values to verify the state of 
>>>>>>> the
>>>>>>> actual ring.
>>>>>>>
>>>>>>
>>>>>> I’ll try to find some more time next week to debug further.
>>>>>>
>>>>>> William I noticed your email in xdp-newbies where you mention 
>>>>>> this
>>>>>> problem of getting the wrong pointers. Did you ever follow up, or 
>>>>>> did
>>>>>> further trouble shooting on the above?
>>>>>
>>>>> Yes, I posted here
>>>>> https://www.spinics.net/lists/xdp-newbies/msg00956.html
>>>>> "Question/Bug about AF_XDP idx_cq from xsk_ring_cons__peek?"
>>>>>
>>>>> At that time I was thinking about reproducing the problem using 
>>>>> the
>>>>> xdpsock sample code from kernel. But turned out that my 
>>>>> reproduction
>>>>> code is not correct, so not able to show the case we hit here in 
>>>>> OVS.
>>>>>
>>>>> Then I put more similar code logic from OVS to xdpsock, but the 
>>>>> problem
>>>>> does not show up. As a result, I worked around it by marking addr 
>>>>> as
>>>>> "*addr == UINT64_MAX".
>>>>>
>>>>> I will debug again this week once I get my testbed back.
>>>>>
>>>> Just to refresh my memory. The original issue is that
>>>> when calling:
>>>> tx_done = xsk_ring_cons__peek(&umem->cq, CONS_NUM_DESCS, &idx_cq);
>>>> xsk_ring_cons__release(&umem->cq, tx_done);
>>>>
>>>> I expect there are 'tx_done' elems on the CQ to re-cycle back to 
>>>> memory pool.
>>>> However, when I inspect these elems, I found some elems that 
>>>> 'already' been
>>>> reported complete last time I call xsk_ring_cons__peek. In other 
>>>> word, some
>>>> elems show up at CQ twice. And this cause overflow of the mempool.
>>>>
>>>> Thus, mark the elems on CQ as UINT64_MAX to indicate that we 
>>>> already
>>>> seen this elem.
>>>
>>> William, Eelco, which HW NIC you're using? Which kernel driver?
>>
>> I’m using the below on the latest bpf-next driver:
>>
>> 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit 
>> SFI/SFP+ Network Connection (rev 01)
>> 01:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit 
>> SFI/SFP+ Network Connection (rev 01)
>
> Thanks for information.
> I found one suspicious place inside the ixgbe driver that could break
> the completion queue ring and prepared a patch:
>     https://patchwork.ozlabs.org/patch/1150244/
>
> It'll be good if you can test it.

Hi Ilya, I was doping some testing of my own, and also concluded it was 
in the drivers' completion ring. I noticed after sending 512 packets the 
drivers TX counters kept increasing, which looks related to your fix.

Will try it out, and sent results to the upstream mailing list…

Thanks,

Eelco


More information about the dev mailing list