[ovs-dev] [PATCH] netdev-dpdk: Fix PMD threads hang in __netdev_dpdk_vhost_send().

Mon May 23 14:55:40 UTC 2016

> -----Original Message-----
> From: Ilya Maximets [mailto:i.maximets at samsung.com]
> Sent: Tuesday, May 17, 2016 4:09 PM
> To: dev at openvswitch.org; Daniele Di Proietto <diproiettod at vmware.com>
> Cc: Dyasly Sergey <s.dyasly at samsung.com>; Heetae Ahn
> <heetae82.ahn at samsung.com>; Flavio Leitner <fbl at sysclose.org>;
> Traynor, Kevin <kevin.traynor at intel.com>; Pravin B Shelar
> <pshelar at ovn.org>; Ilya Maximets <i.maximets at samsung.com>
> Subject: [PATCH] netdev-dpdk: Fix PMD threads hang in
> __netdev_dpdk_vhost_send().
> 
> There are situations when PMD thread can hang forever inside
> __netdev_dpdk_vhost_send() because of broken virtqueue ring.
> 
> This happens if rte_vring_available_entries() always positive and
> rte_vhost_enqueue_burst() can't send anything (possible with broken
> ring).
> 
> In this case time expiration will be never checked and 'do {} while
> (cnt)'
> loop will be infinite.
> 
> This scenario sometimes reproducible with dpdk-16.04-rc2 inside guest
> VM.
> Also it may be reproduced by manual braking of ring structure inside
> the guest VM.
> 
> Fix that by checking time expiration even if we have available
> entries.

Hi Ilya,

Thanks for catching this. This intersects with something else I've seen
wrt retry code and there's a few options...

1. Remove retries when nothing sent. For the VM that needs retries it is a
good thing to have, but Bhanu and I saw in a test with multiple VM's recently
that if one VM causes a lot of retries there is a large performance degradation
for the other VM's. So I changed the retry to only occur when at least one packet
has been sent on the previous call. I put a patch up here.
http://openvswitch.org/pipermail/dev/2016-May/071517.html

If we keep retries we can either

2. Make more robust coordination between rte_ring_available_entries() and
rte_vhost_enqueue_burst(), as per your patch.

3. As you've shown that we can't rely on the rte_ring_available_entries() to know we
can enqueue, how about just remove it and use rte_vhost_enqueue_burst() directly
in the retry loop.

My preference would be for 1. because on balance I'd rather one VM did not degrade
performance of others, more than I'd like it to have retries. Of course there could
be some compromise between them as well i.e. reduce amount of retries, but any retries
could affect performance for another path if they are using the same core.

What do you think?

Kevin. 

> 
> Signed-off-by: Ilya Maximets <i.maximets at samsung.com>
> ---
> 
> How to reproduce manually:
> 
> * Start packet flow.
> 
> * Start testpmd inside guest VM under gdb:
> 	# gdb --args ./testpmd -c 0x1f -n 2 --socket-mem=2048 -w
> 0000:00:0a.0 \\
> 	  -- --burst=64 --txd=512 --rxd=512
> 
> * Break virtqueue ring somehow:
> 	^C
> 	# (gdb) break virtqueue_enqueue_xmit
> 	# (gdb) p txvq->vq_ring.avail->idx
> 	$1 = 30784
> 	# (gdb) p &txvq->vq_ring.avail->idx
> 	$2 = (uint16_t *) 0x7fff7ea9b002
> 	# (gdb) set {uint16_t}0x7fff7ea9b002 = 0
> 	# (gdb) disable 1
> 	# (gdb) c
> 	Continuing.
> 
> * Hardly stop testpmd:
> 	^C
> 	# (gdb) quit
> 	A debugging session is active.
> 	Quit anyway? (y or n) y
> 
> * Start testpmd inside VM again and see results:
> 	# gdb --args ./testpmd -c 0x1f -n 2 --socket-mem=2048 -w
> 0000:00:0a.0 \\
> 	  -- --burst=64 --txd=512 --rxd=512
> 	EAL: Detected 5 lcore(s)
> 	EAL: Probing VFIO support...
> 	EAL: PCI device 0000:00:0a.0 on NUMA socket -1
> 	EAL:   probe driver: 1af4:1000 rte_virtio_pmd
> 	*******SSH hangs and QEMU crashes here*******
> 
> * On the OVS side we can see hang of PMD thread:
> 	|00003|ovs_rcu(urcu2)|WARN|blocked 4005 ms waiting for pmd54 to
> quiesce
> 	|00004|ovs_rcu(urcu2)|WARN|blocked 8005 ms waiting for pmd54 to
> quiesce
> 	|00005|ovs_rcu(urcu2)|WARN|blocked 16004 ms waiting for pmd54 to
> quiesce
> 	....
> 
> 
> lib/netdev-dpdk.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 87879d5..aef6ea4 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -1367,24 +1367,24 @@ __netdev_dpdk_vhost_send(struct netdev
> *netdev, int qid,
>              cur_pkts = &cur_pkts[tx_pkts];
>          } else {
>              uint64_t timeout = VHOST_ENQ_RETRY_USECS *
> rte_get_timer_hz() / 1E6;
> -            unsigned int expired = 0;
> +            bool expired = false;
> 
> -            if (!start) {
> +            if (start) {
> +                expired = (rte_get_timer_cycles() - start) > timeout;
> +            } else {
>                  start = rte_get_timer_cycles();
>              }
> -
>              /*
>               * Unable to enqueue packets to vhost interface.
>               * Check available entries before retrying.
>               */
> -            while (!rte_vring_available_entries(virtio_dev,
> vhost_qid)) {
> -                if (OVS_UNLIKELY((rte_get_timer_cycles() - start) >
> timeout)) {
> -                    expired = 1;
> -                    break;
> -                }
> +            while (!rte_vring_available_entries(virtio_dev,
> vhost_qid)
> +                   && !expired) {
> +                expired = (rte_get_timer_cycles() - start) > timeout;
>              }
> +
>              if (expired) {
> -                /* break out of main loop. */
> +                /* Retrying took too much time. Break out of main
> loop. */
>                  break;
>              }
>          }
> --
> 2.5.0