[ovs-dev] [PATCH] netdev-dpdk: Fix PMD threads hang in __netdev_dpdk_vhost_send().

Ilya Maximets i.maximets at samsung.com
Tue May 17 15:09:24 UTC 2016


There are situations when PMD thread can hang forever inside
__netdev_dpdk_vhost_send() because of broken virtqueue ring.

This happens if rte_vring_available_entries() always positive and
rte_vhost_enqueue_burst() can't send anything (possible with broken ring).

In this case time expiration will be never checked and 'do {} while (cnt)'
loop will be infinite.

This scenario sometimes reproducible with dpdk-16.04-rc2 inside guest VM.
Also it may be reproduced by manual braking of ring structure inside
the guest VM.

Fix that by checking time expiration even if we have available entries.

Signed-off-by: Ilya Maximets <i.maximets at samsung.com>
---

How to reproduce manually:

* Start packet flow.

* Start testpmd inside guest VM under gdb:
	# gdb --args ./testpmd -c 0x1f -n 2 --socket-mem=2048 -w 0000:00:0a.0 \\
	  -- --burst=64 --txd=512 --rxd=512

* Break virtqueue ring somehow:
	^C
	# (gdb) break virtqueue_enqueue_xmit
	# (gdb) p txvq->vq_ring.avail->idx
	$1 = 30784
	# (gdb) p &txvq->vq_ring.avail->idx 
	$2 = (uint16_t *) 0x7fff7ea9b002
	# (gdb) set {uint16_t}0x7fff7ea9b002 = 0
	# (gdb) disable 1
	# (gdb) c
	Continuing.

* Hardly stop testpmd:
	^C
	# (gdb) quit
	A debugging session is active.
	Quit anyway? (y or n) y

* Start testpmd inside VM again and see results:
	# gdb --args ./testpmd -c 0x1f -n 2 --socket-mem=2048 -w 0000:00:0a.0 \\
	  -- --burst=64 --txd=512 --rxd=512
	EAL: Detected 5 lcore(s)
	EAL: Probing VFIO support...
	EAL: PCI device 0000:00:0a.0 on NUMA socket -1
	EAL:   probe driver: 1af4:1000 rte_virtio_pmd
	*******SSH hangs and QEMU crashes here*******

* On the OVS side we can see hang of PMD thread:
	|00003|ovs_rcu(urcu2)|WARN|blocked 4005 ms waiting for pmd54 to quiesce
	|00004|ovs_rcu(urcu2)|WARN|blocked 8005 ms waiting for pmd54 to quiesce
	|00005|ovs_rcu(urcu2)|WARN|blocked 16004 ms waiting for pmd54 to quiesce
	....


lib/netdev-dpdk.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 87879d5..aef6ea4 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -1367,24 +1367,24 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
             cur_pkts = &cur_pkts[tx_pkts];
         } else {
             uint64_t timeout = VHOST_ENQ_RETRY_USECS * rte_get_timer_hz() / 1E6;
-            unsigned int expired = 0;
+            bool expired = false;
 
-            if (!start) {
+            if (start) {
+                expired = (rte_get_timer_cycles() - start) > timeout;
+            } else {
                 start = rte_get_timer_cycles();
             }
-
             /*
              * Unable to enqueue packets to vhost interface.
              * Check available entries before retrying.
              */
-            while (!rte_vring_available_entries(virtio_dev, vhost_qid)) {
-                if (OVS_UNLIKELY((rte_get_timer_cycles() - start) > timeout)) {
-                    expired = 1;
-                    break;
-                }
+            while (!rte_vring_available_entries(virtio_dev, vhost_qid)
+                   && !expired) {
+                expired = (rte_get_timer_cycles() - start) > timeout;
             }
+
             if (expired) {
-                /* break out of main loop. */
+                /* Retrying took too much time. Break out of main loop. */
                 break;
             }
         }
-- 
2.5.0




More information about the dev mailing list