[ovs-dev] vhost-user invalid txqid cause discard of packets

Wed Mar 9 09:29:43 UTC 2016

On 09.03.2016 12:13, Wang, Zhihong wrote:
> 
> 
>> -----Original Message-----
>> From: Ilya Maximets [mailto:i.maximets at samsung.com]
>> Sent: Wednesday, March 9, 2016 3:39 PM
>> To: Wang, Zhihong <zhihong.wang at intel.com>; dev at openvswitch.org
>> Cc: Flavio Leitner <fbl at redhat.com>; Traynor, Kevin <kevin.traynor at intel.com>;
>> Dyasly Sergey <s.dyasly at samsung.com>
>> Subject: Re: vhost-user invalid txqid cause discard of packets
>>
>> OK. Finally I got it.
>>
>> There is not good distribution of rx queues between pmd
>> threads for dpdk0 port.
>>
>>> # ./ovs/utilities/ovs-appctl dpif-netdev/pmd-rxq-show
>>> pmd thread numa_id 0 core_id 13:
>>>         port: vhost-user1       queue-id: 1
>>>         port: dpdk0     queue-id: 3
>>> pmd thread numa_id 0 core_id 14:
>>>         port: vhost-user1       queue-id: 2
>>> pmd thread numa_id 0 core_id 16:
>>>         port: dpdk0     queue-id: 0
>>> pmd thread numa_id 0 core_id 17:
>>>         port: dpdk0     queue-id: 1
>>> pmd thread numa_id 0 core_id 12:
>>>         port: vhost-user1       queue-id: 0
>>>         port: dpdk0     queue-id: 2
>>> pmd thread numa_id 0 core_id 15:
>>>         port: vhost-user1       queue-id: 3
>>> ------------------------------------------------------
>>
>> As we can see above dpdk0 port polled by threads on cores:
>> 12, 13, 16 and 17.
>> By design of dpif-netdev, there is only one TX queue-id assigned
>> to each pmd thread. This queue-id's are sequential similar to
>> core-id's. And thread will send packets to queue with exact this
>> queue-id regardless of port.
>>
>> In our case:
>> pmd thread on core 12 will send packets to tx queue 0
>> pmd thread on core 13 will send packets to tx queue 1
>> ...
>> pmd thread on core 17 will send packets to tx queue 5
>>
>> So, for dpdk0 port:
>> core 12 --> TX queue-id 0
>> core 13 --> TX queue-id 1
>> core 16 --> TX queue-id 4
>> core 17 --> TX queue-id 5
>>
>> After truncating in netdev-dpdk:
>> core 12 --> TX queue-id 0 % 4 == 0
>> core 13 --> TX queue-id 1 % 4 == 1
>> core 16 --> TX queue-id 4 % 4 == 0
>> core 17 --> TX queue-id 5 % 4 == 1
>>
>> As a result only 2 queues used.
>> This is not a good behaviour. Thanks for reporting.
>> I'll try to fix rx queue distribution in dpif-netdev.
>>
>> Best regards, Ilya Maximets.
>>
>> P.S. There will be no packet loss on low speeds. Only 2x
>>      performance drop.
> 
> 
> Yeah, seems a better algorithm will be needed.
> 
> Also I see this behavior which I think will lead to packet loss:
> 
> In source code qid is calculated at runtime in
> __netdev_dpdk_vhost_send:
> qid = vhost_dev->tx_q[qid % vhost_dev->real_n_txq].map;
> 
> 8 cores:
> vhost txq: 4, 5, 6, 7 (become 0, 1, 2, 3)
> 
> 6 cores:
> vhost txq: 0, 1, 4, 5 (4 & 5 become -1 after qid calculation at runtime)

vhost_dev->real_n_txq == 4.
So, 4 --> 0 and 5 --> 1 if 0 and 1 queues was enabled by guest.
0: qid = vhost_dev->tx_q[0 % 4].map; => qid = vhost_dev->tx_q[0].map; => qid = 0;
1: qid = vhost_dev->tx_q[1 % 4].map; => qid = vhost_dev->tx_q[1].map; => qid = 1;
4: qid = vhost_dev->tx_q[4 % 4].map; => qid = vhost_dev->tx_q[0].map; => qid = 0;
5: qid = vhost_dev->tx_q[5 % 4].map; => qid = vhost_dev->tx_q[1].map; => qid = 1;

There should be no issues on OVS side.
As I already told, you can see exact mapping of vhost tx queues by enabling of
VLOG_DBG for netdev-dpdk. It must look like this after starting of VM:
2016-03-04T09:34:51Z|00459|dpdk(vhost_thread2)|DBG|TX queue mapping for /var/run/openvswitch/vhost-user1
2016-03-04T09:34:51Z|00460|dpdk(vhost_thread2)|DBG| 0 -->  0
2016-03-04T09:34:51Z|00461|dpdk(vhost_thread2)|DBG| 1 -->  1
2016-03-04T09:34:51Z|00462|dpdk(vhost_thread2)|DBG| 2 -->  2
2016-03-04T09:34:51Z|00463|dpdk(vhost_thread2)|DBG| 3 -->  3

Best regards, Ilya Maximets.