[ovs-dev] [PATCH] netdev-dpdk: Fix PMD threads hang in __netdev_dpdk_vhost_send().

Wed May 25 11:03:07 UTC 2016

On 23.05.2016 17:55, Traynor, Kevin wrote:
>> -----Original Message-----
>> From: Ilya Maximets [mailto:i.maximets at samsung.com]
>> Sent: Tuesday, May 17, 2016 4:09 PM
>> To: dev at openvswitch.org; Daniele Di Proietto <diproiettod at vmware.com>
>> Cc: Dyasly Sergey <s.dyasly at samsung.com>; Heetae Ahn
>> <heetae82.ahn at samsung.com>; Flavio Leitner <fbl at sysclose.org>;
>> Traynor, Kevin <kevin.traynor at intel.com>; Pravin B Shelar
>> <pshelar at ovn.org>; Ilya Maximets <i.maximets at samsung.com>
>> Subject: [PATCH] netdev-dpdk: Fix PMD threads hang in
>> __netdev_dpdk_vhost_send().
>>
>> There are situations when PMD thread can hang forever inside
>> __netdev_dpdk_vhost_send() because of broken virtqueue ring.
>>
>> This happens if rte_vring_available_entries() always positive and
>> rte_vhost_enqueue_burst() can't send anything (possible with broken
>> ring).
>>
>> In this case time expiration will be never checked and 'do {} while
>> (cnt)'
>> loop will be infinite.
>>
>> This scenario sometimes reproducible with dpdk-16.04-rc2 inside guest
>> VM.
>> Also it may be reproduced by manual braking of ring structure inside
>> the guest VM.
>>
>> Fix that by checking time expiration even if we have available
>> entries.
> 
> Hi Ilya,

Hi, Kevin.

Christian and Thiago CC-ed, because, I think, they're faced with similar issue.

> 
> Thanks for catching this. This intersects with something else I've seen
> wrt retry code and there's a few options...
> 
> 1. Remove retries when nothing sent. For the VM that needs retries it is a
> good thing to have, but Bhanu and I saw in a test with multiple VM's recently
> that if one VM causes a lot of retries there is a large performance degradation
> for the other VM's. So I changed the retry to only occur when at least one packet
> has been sent on the previous call. I put a patch up here.
> http://openvswitch.org/pipermail/dev/2016-May/071517.html
> 
> If we keep retries we can either
> 
> 2. Make more robust coordination between rte_ring_available_entries() and
> rte_vhost_enqueue_burst(), as per your patch.
> 
> 3. As you've shown that we can't rely on the rte_ring_available_entries() to know we
> can enqueue, how about just remove it and use rte_vhost_enqueue_burst() directly
> in the retry loop.
> 
> My preference would be for 1. because on balance I'd rather one VM did not degrade
> performance of others, more than I'd like it to have retries. Of course there could
> be some compromise between them as well i.e. reduce amount of retries, but any retries
> could affect performance for another path if they are using the same core.
> 
> What do you think?

I'm worry about scenarios with "pulsing" traffic, i.e. if we have not very big but
enough amount of packets to overload vring in a short time and long period of silence
after that. HW can keep in its RX queues much more packets than can be pushed to
virtio ring. In this scenario, without retrying, most of packets will be dropped.

How about just decreasing of VHOST_ENQ_RETRY_USECS to, may be, 1 usec with my fix
applied of course? Such interval should be enough to handle 20G traffic with 64B
packets by one PMD thread. And, also, this timeout may be applied to both cases
(something sent or not) to decrease cost of retrying.

Best regards, Ilya Maximets.

> 
> Kevin. 
> 
>>
>> Signed-off-by: Ilya Maximets <i.maximets at samsung.com>
>> ---
>>
>> How to reproduce manually:
>>
>> * Start packet flow.
>>
>> * Start testpmd inside guest VM under gdb:
>> 	# gdb --args ./testpmd -c 0x1f -n 2 --socket-mem=2048 -w
>> 0000:00:0a.0 \\
>> 	  -- --burst=64 --txd=512 --rxd=512
>>
>> * Break virtqueue ring somehow:
>> 	^C
>> 	# (gdb) break virtqueue_enqueue_xmit
>> 	# (gdb) p txvq->vq_ring.avail->idx
>> 	$1 = 30784
>> 	# (gdb) p &txvq->vq_ring.avail->idx
>> 	$2 = (uint16_t *) 0x7fff7ea9b002
>> 	# (gdb) set {uint16_t}0x7fff7ea9b002 = 0
>> 	# (gdb) disable 1
>> 	# (gdb) c
>> 	Continuing.
>>
>> * Hardly stop testpmd:
>> 	^C
>> 	# (gdb) quit
>> 	A debugging session is active.
>> 	Quit anyway? (y or n) y
>>
>> * Start testpmd inside VM again and see results:
>> 	# gdb --args ./testpmd -c 0x1f -n 2 --socket-mem=2048 -w
>> 0000:00:0a.0 \\
>> 	  -- --burst=64 --txd=512 --rxd=512
>> 	EAL: Detected 5 lcore(s)
>> 	EAL: Probing VFIO support...
>> 	EAL: PCI device 0000:00:0a.0 on NUMA socket -1
>> 	EAL:   probe driver: 1af4:1000 rte_virtio_pmd
>> 	*******SSH hangs and QEMU crashes here*******
>>
>> * On the OVS side we can see hang of PMD thread:
>> 	|00003|ovs_rcu(urcu2)|WARN|blocked 4005 ms waiting for pmd54 to
>> quiesce
>> 	|00004|ovs_rcu(urcu2)|WARN|blocked 8005 ms waiting for pmd54 to
>> quiesce
>> 	|00005|ovs_rcu(urcu2)|WARN|blocked 16004 ms waiting for pmd54 to
>> quiesce
>> 	....
>>
>>
>> lib/netdev-dpdk.c | 18 +++++++++---------
>>  1 file changed, 9 insertions(+), 9 deletions(-)
>>
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index 87879d5..aef6ea4 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -1367,24 +1367,24 @@ __netdev_dpdk_vhost_send(struct netdev
>> *netdev, int qid,
>>              cur_pkts = &cur_pkts[tx_pkts];
>>          } else {
>>              uint64_t timeout = VHOST_ENQ_RETRY_USECS *
>> rte_get_timer_hz() / 1E6;
>> -            unsigned int expired = 0;
>> +            bool expired = false;
>>
>> -            if (!start) {
>> +            if (start) {
>> +                expired = (rte_get_timer_cycles() - start) > timeout;
>> +            } else {
>>                  start = rte_get_timer_cycles();
>>              }
>> -
>>              /*
>>               * Unable to enqueue packets to vhost interface.
>>               * Check available entries before retrying.
>>               */
>> -            while (!rte_vring_available_entries(virtio_dev,
>> vhost_qid)) {
>> -                if (OVS_UNLIKELY((rte_get_timer_cycles() - start) >
>> timeout)) {
>> -                    expired = 1;
>> -                    break;
>> -                }
>> +            while (!rte_vring_available_entries(virtio_dev,
>> vhost_qid)
>> +                   && !expired) {
>> +                expired = (rte_get_timer_cycles() - start) > timeout;
>>              }
>> +
>>              if (expired) {
>> -                /* break out of main loop. */
>> +                /* Retrying took too much time. Break out of main
>> loop. */
>>                  break;
>>              }
>>          }
>> --
>> 2.5.0
> 
> 
>