[ovs-dev] [PATCHv18] netdev-afxdp: add new netdev type for AF_XDP.

Eelco Chaudron echaudro at redhat.com
Wed Aug 14 12:09:17 UTC 2019



On 8 Aug 2019, at 17:38, Ilya Maximets wrote:

<SNIP>

>>>> I see a rather high number of afxdp_cq_skip, which should to my 
>>>> knowledge never happen?
>>>
>>> I tried to investigate this previously, but didn't find anything 
>>> suspicious.
>>> So, for my knowledge, this should never happen too.
>>> However, I only looked at the code without actually running, because 
>>> I had no
>>> HW available for testing.
>>>
>>> While investigation and stress-testing virtual ports I found few 
>>> issues with
>>> missing locking inside the kernel, so there is no trust for kernel 
>>> part of XDP
>>> implementation from my side. I'm suspecting that there are some 
>>> other bugs in
>>> kernel/libbpf that only could be reproduced with driver mode.
>>>
>>> This never happens for virtual ports with SKB mode, so I never saw 
>>> this coverage
>>> counter being non-zero.
>>
>> Did some quick debugging, as something else has come up that needs my 
>> attention :)
>>
>> But once I’m in a faulty state and sent a single packet, causing 
>> afxdp_complete_tx() to be called, it tells me 2048 descriptors are 
>> ready, which is XSK_RING_PROD__DEFAULT_NUM_DESCS. So I guess that 
>> there might be some ring management bug. Maybe consumer and receiver 
>> are equal meaning 0 buffers, but it returns max? I did not look at 
>> the kernel code, so this is just a wild guess :)
>>
>> (gdb) p tx_done
>> $3 = 2048
>>
>> (gdb) p umem->cq
>> $4 = {cached_prod = 3830466864, cached_cons = 3578066899, mask = 
>> 2047, size = 2048, producer = 0x7f08486b8000, consumer = 
>> 0x7f08486b8040, ring = 0x7f08486b8080}
>
> Thanks for debugging!
>
> xsk_ring_cons__peek() just returns the difference between cached_prod
> and cached_cons, but these values are too different:
>
> 3830466864 - 3578066899 = 252399965
>
> Since this value > requested, it returns requested number (2048).
>
> So, the ring is broken. At least broken its 'cached' part. It'll be 
> good
> to look at *consumer and *producer values to verify the state of the
> actual ring.
>

I’ll try to find some more time next week to debug further.

William I noticed your email in xdp-newbies where you mention this 
problem of getting the wrong pointers. Did you ever follow up, or did 
further trouble shooting on the above?

>>
>>>>
>>>> $ ovs-appctl coverage/show  | grep xdp
>>>> afxdp_cq_empty             0.0/sec   
>>>> 339.600/sec        5.6606/sec   total: 20378
>>>> afxdp_tx_full              0.0/sec    
>>>> 29.967/sec        0.4994/sec   total: 1798
>>>> afxdp_cq_skip              0.0/sec 61884770.167/sec  
>>>> 1174238.3644/sec   total: 4227258112
>>>>
>>>>
>>>> You mentioned you saw this high number in your v15 change notes, 
>>>> did you do any research on why?
>>>>
>>>> Cheers,
>>>>
>>>> Eelco
>>
>>


More information about the dev mailing list