[ovs-discuss] ovsrcu_synchronize() blocking while indefinitely waiting for thread to quiesce

Patrik Andersson R patrik.r.andersson at ericsson.com
Mon Jan 25 14:09:09 UTC 2016


Hi,

during robustness testing, where VM:s are booted and deleted using nova
boot/delete in rather rapid succession, VMs get stuck in spawning state 
after
a few test cycles. Presumably this is due to the OVS not responding to port
additions and deletions anymore, or rather that responses to these requests
become painfully slow. Other requests towards the vswitchd fail to complete
in any reasonable time frame as well, ovs-appctl vlog/set is one example.

The only conclusion I can draw at the moment is that some thread (I've
observed main and dpdk_watchdog3) is blocking the ovsrcu_synchronize()
operation for "infinite" time and there is no fall-back to get out of 
this. To
recover, the minimum operation seems to be a service restart of the
openvswitch-switch service but that seems to cause other issues longer term.

In the vswitch log when this happens the following can be observed:

2016-01-24T20:36:14.601Z|02742|ovs_rcu(vhost_thread2)|WARN|blocked 1000 
ms waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:36:15.600Z|02743|ovs_rcu(vhost_thread2)|WARN|blocked 2000 
ms waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:36:17.601Z|02744|ovs_rcu(vhost_thread2)|WARN|blocked 4000 
ms waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:36:21.600Z|02745|ovs_rcu(vhost_thread2)|WARN|blocked 8000 
ms waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:36:24.511Z|00001|ovs_rcu(urcu1)|WARN|blocked 1000 ms 
waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:36:24.846Z|08246|ovs_rcu|WARN|blocked 1000 ms waiting for 
dpdk_watchdog3 to quiesce
2016-01-24T20:36:25.511Z|00002|ovs_rcu(urcu1)|WARN|blocked 2000 ms 
waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:36:25.846Z|08247|ovs_rcu|WARN|blocked 2000 ms waiting for 
dpdk_watchdog3 to quiesce
2016-01-24T20:36:27.510Z|00003|ovs_rcu(urcu1)|WARN|blocked 4000 ms 
waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:36:27.847Z|08248|ovs_rcu|WARN|blocked 4000 ms waiting for 
dpdk_watchdog3 to quiesce
2016-01-24T20:36:29.600Z|02746|ovs_rcu(vhost_thread2)|WARN|blocked 16000 
ms waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:36:31.510Z|00004|ovs_rcu(urcu1)|WARN|blocked 8000 ms 
waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:36:31.846Z|08249|ovs_rcu|WARN|blocked 8000 ms waiting for 
dpdk_watchdog3 to quiesce
2016-01-24T20:36:39.511Z|00005|ovs_rcu(urcu1)|WARN|blocked 16000 ms 
waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:36:39.846Z|08250|ovs_rcu|WARN|blocked 16000 ms waiting for 
dpdk_watchdog3 to quiesce
2016-01-24T20:36:45.600Z|02747|ovs_rcu(vhost_thread2)|WARN|blocked 32000 
ms waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:36:55.510Z|00006|ovs_rcu(urcu1)|WARN|blocked 32000 ms 
waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:36:55.846Z|08251|ovs_rcu|WARN|blocked 32000 ms waiting for 
dpdk_watchdog3 to quiesce
2016-01-24T20:37:17.600Z|02748|ovs_rcu(vhost_thread2)|WARN|blocked 64000 
ms waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:37:27.510Z|00007|ovs_rcu(urcu1)|WARN|blocked 64000 ms 
waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:37:27.846Z|08252|ovs_rcu|WARN|blocked 64000 ms waiting for 
dpdk_watchdog3 to quiesce
2016-01-24T20:38:21.601Z|02749|ovs_rcu(vhost_thread2)|WARN|blocked 
128000 ms waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:38:31.511Z|00008|ovs_rcu(urcu1)|WARN|blocked 128000 ms 
waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:38:31.846Z|08253|ovs_rcu|WARN|blocked 128000 ms waiting 
for dpdk_watchdog3 to quiesce
2016-01-24T20:40:29.600Z|02750|ovs_rcu(vhost_thread2)|WARN|blocked 
256000 ms waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:40:39.510Z|00009|ovs_rcu(urcu1)|WARN|blocked 256000 ms 
waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:40:39.846Z|08254|ovs_rcu|WARN|blocked 256000 ms waiting 
for dpdk_watchdog3 to quiesce
2016-01-24T20:44:45.601Z|02751|ovs_rcu(vhost_thread2)|WARN|blocked 
512000 ms waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:44:55.510Z|00010|ovs_rcu(urcu1)|WARN|blocked 512000 ms 
waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:44:55.846Z|08255|ovs_rcu|WARN|blocked 512000 ms waiting 
for dpdk_watchdog3 to quiesce
2016-01-24T20:53:17.600Z|02752|ovs_rcu(vhost_thread2)|WARN|blocked 
1024000 ms waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:53:27.510Z|00011|ovs_rcu(urcu1)|WARN|blocked 1024000 ms 
waiting for dpdk_watchdog3 to quiesce
2016-01-24T20:53:27.847Z|08256|ovs_rcu|WARN|blocked 1024000 ms waiting 
for dpdk_watchdog3 to quiesce
2016-01-24T21:10:21.600Z|02753|ovs_rcu(vhost_thread2)|WARN|blocked 
2048000 ms waiting for dpdk_watchdog3 to quiesce
2016-01-24T21:10:31.510Z|00012|ovs_rcu(urcu1)|WARN|blocked 2048000 ms 
waiting for dpdk_watchdog3 to quiesce
2016-01-24T21:10:31.846Z|08257|ovs_rcu|WARN|blocked 2048000 ms waiting 
for dpdk_watchdog3 to quiesce
2016-01-24T21:44:29.600Z|02754|ovs_rcu(vhost_thread2)|WARN|blocked 
4096000 ms waiting for dpdk_watchdog3 to quiesce
2016-01-24T21:44:39.510Z|00013|ovs_rcu(urcu1)|WARN|blocked 4096000 ms 
waiting for dpdk_watchdog3 to quiesce
2016-01-24T21:44:39.846Z|08258|ovs_rcu|WARN|blocked 4096000 ms waiting 
for dpdk_watchdog3 to quiesce
2016-01-24T22:52:45.601Z|02755|ovs_rcu(vhost_thread2)|WARN|blocked 
8192000 ms waiting for dpdk_watchdog3 to quiesce
2016-01-24T22:52:55.510Z|00014|ovs_rcu(urcu1)|WARN|blocked 8192000 ms 
waiting for dpdk_watchdog3 to quiesce
2016-01-24T22:52:55.846Z|08259|ovs_rcu|WARN|blocked 8192000 ms waiting 
for dpdk_watchdog3 to quiesce
2016-01-25T01:09:17.600Z|02756|ovs_rcu(vhost_thread2)|WARN|blocked 
16384000 ms waiting for dpdk_watchdog3 to quiesce
2016-01-25T01:09:27.511Z|00015|ovs_rcu(urcu1)|WARN|blocked 16384000 ms 
waiting for dpdk_watchdog3 to quiesce
2016-01-25T01:09:27.847Z|08260|ovs_rcu|WARN|blocked 16384000 ms waiting 
for dpdk_watchdog3 to quiesce
2016-01-25T05:42:21.600Z|02757|ovs_rcu(vhost_thread2)|WARN|blocked 
32768000 ms waiting for dpdk_watchdog3 to quiesce
2016-01-25T05:42:31.510Z|00016|ovs_rcu(urcu1)|WARN|blocked 32768000 ms 
waiting for dpdk_watchdog3 to quiesce
2016-01-25T05:42:31.846Z|08261|ovs_rcu|WARN|blocked 32768000 ms waiting 
for dpdk_watchdog3 to quiesce


Is this a known issue?

This issue can be reproduced by booting multiple VMs on the same
compute. It seems to be much easier to reproduce if each VM has
several vNICs. Then in a loop delete the VMs, for example:

for (( i=1; i<=$vm_count; i++ )); do nova delete "test_vm$i"; done

Regards,

Patrik




More information about the discuss mailing list