[ovs-dev] [PATCHv13] netdev-afxdp: add new netdev type for AF_XDP.

Ilya Maximets i.maximets at samsung.com
Tue Jul 2 15:10:04 UTC 2019


On 28.06.2019 19:37, William Tu wrote:
>>
>>
>> One more thing I noticed is the same issue as you had with completion queue, but
>> with rx queue. When I'm trying to send traffic from 2 threads to the same port,
> 
> Is the 2 threads send traffic using afxdp tx?

Yes.

> 
>> I'm starting receiving same pointers from rx ring. Not only the same ring entries,
>> but there was cases where two identical pointers was stored sequentially in rx ring.
> 
> I use similar way as used in completion queue (assign UINT64_MAX to rx ring
> at netdev_afxdp_rxq_recv) but do not see any identical pointers.
> 
>> I'm more and more thinking that it's a kernel/libbpf bug. The last bit that left
>> for checking is the pointers inside the fill queue. All other parts in OVS seems to
>> work correctly. I'll send more information about the testcase later after re-checking
>> with the most recent bpf-next.
>>
> 
> Look forward to your investigation! Thanks a lot.

It was a kernel bug that generic receive path doesn't have any locks,
but generic receive could be triggered from different cores at the same
time breaking the rx an fill queues. I tried to run 2 traffic flows over
the veth pair, one side of which was opened by netdev-afxdp in OVS. And
OVS constantly crashed because two kernel threads tried to allocate same
addresses from fill queue and pushed them to rx queue. That is the root
cause of duplicated addresses in RX queue. Data in these descriptors
most probably was corrupted too.

I've send a patch for this issue:
    https://lore.kernel.org/bpf/20190702143634.19688-1-i.maximets@samsung.com/

I'm still having some troubles with this scenario. Sometimes the traffic
simply stops flowing. But this seems a different issue. Most likely, one
more kernel issue...
However, OVS doesn't crash for me anymore. And this is good news.


-------------------------
Full testcase description
-------------------------
ip netns add at_ns0
ip netns add at_ns1

ip link add p0 type veth peer name patch-p0
ethtool -K p0 tx off rxvlan off txvlan off
  
ip link set p0 netns at_ns0  
ip link set dev patch-p0 up
ip link set dev patch-p0 promisc on
  
ip netns exec at_ns0 ip addr add "10.1.1.1/24" dev p0
ip netns exec at_ns0 ip link set dev p0 up

ip link add p1 type veth peer name patch-p1
ethtool -K p1 tx off rxvlan off txvlan off

ip link set p1 netns at_ns1
ip link set dev patch-p1 up
ip link set dev patch-p1 promisc on

ip netns exec at_ns1 ip addr add "10.1.1.2/24" dev p1
ip netns exec at_ns1 ip link set dev p1 up

<start OVS and add patch-p0 and patch-p1 as afxdp ports>

# up the internal port of ovs bridge
ip link set dev br0 up
ip addr add dev br0 10.1.1.13/24


[shell#1] ip netns exec at_ns1 iperf3 -s
[shell#2] ip netns exec at_ns1 iperf3 -s -p 5008
[shell#3] ip netns exec at_ns0 iperf3 -c 10.1.1.2 -t 3600

[shell#4] iperf3 -c 10.1.1.2 -t 3600 -p 5008 # Works via internal port.

<Observe OVS crash>

-----
For this testcase to work you need 'skb_unclone' patch applied in kernel,
otherwise TCP traffic will not flow.


Best regards, Ilya Maximets.


More information about the dev mailing list