[ovs-discuss] bug report//ovs-vswitchd got stuck when delete dpdk port with ovs hw-offload enabled

Frank Wang(王培辉) wangpeihui at inspur.com
Thu Jun 18 10:52:33 UTC 2020


Hello experts

 

         I'm writing this email to bring some attention since I am one
hundred percentage certain the issue is a deadlock after digging on it,  

here is the procedure how to reproduce:

a.       turn on hw-offload=true, then restart ovs-vswitchd

b.       create a netdev bridge, then add nic to the bridge(already bound
vfio driver)

c.       the try to delete the dpdk port from the bridge

d.       It will got stuck in probability

 

The assumption when the deadlock occurs:

1.ovs-vswitchd main thread get the mutex lock (dp->port-mutex) in
dpif_netdev_port_del when delete port from bridge, 

2.Meantime, the revalidators will try to require mutex lock(dp->port_mutex)
in dpif_netdev_get_flow_offload_status when dump flows, they will hang up
because they can’t get the lock

3.ovs-vswitchd will pause revalidators through latch for purging pmd flows
in dp_purge_cb,  but the revalidators already sleeping to wait for the mutex
lock, the lead can’t response to this pause action

 

I have no idea how to fix it, please feel free to leave your comments.

 

Thanks.

 

发件人: Frank Wang(王培辉) 
发送时间: 2020年6月17日 19:08
收件人: 'ovs-dev at openvswitch.org' <ovs-dev at openvswitch.org>;
ovs-discuss at openvswitch.org
主题: ovs-vswitchd got stuck when delete dpdk port with ovs hw-offload
enabled

 

Hello

 

         I’m encountered a problem that ovs-vswitchd got stuck while I
tried to delete the dpdk port from bridge in in probability, when I turn off
hw-offload,it’s won’t happen again.I’m using the latest ovs 2.13.1
version, CentOS 7.6, please help me out here, thanks in advance.

 

Here is the ovs-vswitchd stack, it seems a deadlock:

Thread 41 (Thread 0x7f2e2fdfe700 (LWP 156099)):

#0  0x00007f2e975224ed in __lll_lock_wait () from /lib64/libpthread.so.0

#1  0x00007f2e9751dde6 in _L_lock_941 () from /lib64/libpthread.so.0

#2  0x00007f2e9751dcdf in pthread_mutex_lock () from /lib64/libpthread.so.0

#3  0x00007f2e98562318 in ovs_mutex_lock_at () from /lib64/libopenvswitch-2.
13.so.0

#4  0x00007f2e984c5469 in dpif_netdev_get_flow_offload_status () from
/lib64/libopenvswitch-2.13.so.0

#5  0x00007f2e984c5504 in get_dpif_flow_status () from
/lib64/libopenvswitch-2.13.so.0

#6  0x00007f2e984cd411 in dp_netdev_flow_to_dpif_flow () from
/lib64/libopenvswitch-2.13.so.0

#7  0x00007f2e984cd708 in dpif_netdev_flow_dump_next () from
/lib64/libopenvswitch-2.13.so.0

#8  0x00007f2e984d86f2 in dpif_flow_dump_next () from
/lib64/libopenvswitch-2.13.so.0

#9  0x00007f2e98b46547 in revalidate.isra.23 () from /lib64/libofproto-2.13.
so.0

#10 0x00007f2e98b46db3 in udpif_revalidator () from
/lib64/libofproto-2.13.so.0

#11 0x00007f2e9856307f in ovsthread_wrapper () from /lib64/libopenvswitch-2.
13.so.0

#12 0x00007f2e9751bdd5 in start_thread () from /lib64/libpthread.so.0

#13 0x00007f2e96a39ead in clone () from /lib64/libc.so.6

Thread 40 (Thread 0x7f2e2fbfd700 (LWP 156100)):

#0  0x00007f2e975224ed in __lll_lock_wait () from /lib64/libpthread.so.0

#1  0x00007f2e9751dde6 in _L_lock_941 () from /lib64/libpthread.so.0

#2  0x00007f2e9751dcdf in pthread_mutex_lock () from /lib64/libpthread.so.0

#3  0x00007f2e98562318 in ovs_mutex_lock_at () from /lib64/libopenvswitch-2.
13.so.0

#4  0x00007f2e984c5469 in dpif_netdev_get_flow_offload_status () from
/lib64/libopenvswitch-2.13.so.0

#5  0x00007f2e984c5504 in get_dpif_flow_status () from
/lib64/libopenvswitch-2.13.so.0

#6  0x00007f2e984cd411 in dp_netdev_flow_to_dpif_flow () from
/lib64/libopenvswitch-2.13.so.0

#7  0x00007f2e984cd708 in dpif_netdev_flow_dump_next () from
/lib64/libopenvswitch-2.13.so.0

#8  0x00007f2e984d86f2 in dpif_flow_dump_next () from
/lib64/libopenvswitch-2.13.so.0

#9  0x00007f2e98b46547 in revalidate.isra.23 () from /lib64/libofproto-2.13.
so.0

#10 0x00007f2e98b46db3 in udpif_revalidator () from
/lib64/libofproto-2.13.so.0

#11 0x00007f2e9856307f in ovsthread_wrapper () from /lib64/libopenvswitch-2.
13.so.0

#12 0x00007f2e9751bdd5 in start_thread () from /lib64/libpthread.so.0

#13 0x00007f2e96a39ead in clone () from /lib64/libc.so.6

 

…

Thread 1 (Thread 0x7f2eaa999000 (LWP 155698)):

#0  0x00007f2e96a2f20d in poll () from /lib64/libc.so.6

#1  0x00007f2e98593064 in time_poll () from /lib64/libopenvswitch-2.13.so.0

#2  0x00007f2e9857a50c in poll_block () from /lib64/libopenvswitch-2.13.so.0

#3  0x00007f2e98562f78 in ovs_barrier_block () from /lib64/libopenvswitch-2.
13.so.0

#4  0x00007f2e98b432be in dp_purge_cb () from /lib64/libofproto-2.13.so.0

#5  0x00007f2e984c8391 in dp_netdev_del_pmd () from /lib64/libopenvswitch-2.
13.so.0

#6  0x00007f2e984c9a67 in reconfigure_datapath () from
/lib64/libopenvswitch-2.13.so.0

#7  0x00007f2e984cacbd in do_del_port () from /lib64/libopenvswitch-2.13.so.
0

#8  0x00007f2e984ccd5f in dpif_netdev_port_del () from
/lib64/libopenvswitch-2.13.so.0

#9  0x00007f2e984d74dc in dpif_port_del () from
/lib64/libopenvswitch-2.13.so.0

#10 0x00007f2e98b31595 in port_del () from /lib64/libofproto-2.13.so.0

#11 0x00007f2e98b1fe90 in ofproto_port_del () from
/lib64/libofproto-2.13.so.0

#12 0x000056187a856aa6 in bridge_delete_or_reconfigure_ports ()

#13 0x000056187a858490 in bridge_reconfigure ()

#14 0x000056187a85be26 in bridge_run ()

#15 0x000056187a85235d in main ()

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20200618/40e8781e/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3623 bytes
Desc: not available
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20200618/40e8781e/attachment-0001.p7s>


More information about the discuss mailing list