[ovs-dev] 答复: bug report//ovs-vswitchd got stuck when delete dpdk port with ovs hw-offload enabled

Frank Wang(王培辉) wangpeihui at inspur.com
Fri Jun 19 01:24:37 UTC 2020


Thanks Illya, looking forward to your update if you have any thoughts

-----邮件原件-----
发件人: Ilya Maximets [mailto:i.maximets at ovn.org] 
发送时间: 2020年6月18日 20:34
收件人: Frank Wang(王培辉) <wangpeihui at inspur.com>; ovs-dev at openvswitch.org; ovs-discuss at openvswitch.org
抄送: i.maximets at ovn.org
主题: Re: [ovs-dev] bug report//ovs-vswitchd got stuck when delete dpdk port with ovs hw-offload enabled

On 6/18/20 12:52 PM, Frank Wang(王培辉) wrote:
> Hello experts
> 
>  
> 
>          I'm writing this email to bring some attention since I am one 
> hundred percentage certain the issue is a deadlock after digging on 
> it,
> 
> here is the procedure how to reproduce:
> 
> a.       turn on hw-offload=true, then restart ovs-vswitchd
> 
> b.       create a netdev bridge, then add nic to the bridge(already bound
> vfio driver)
> 
> c.       the try to delete the dpdk port from the bridge
> 
> d.       It will got stuck in probability
> 
>  
> 
> The assumption when the deadlock occurs:
> 
> 1.ovs-vswitchd main thread get the mutex lock (dp->port-mutex) in 
> dpif_netdev_port_del when delete port from bridge,
> 
> 2.Meantime, the revalidators will try to require mutex 
> lock(dp->port_mutex) in dpif_netdev_get_flow_offload_status when dump 
> flows, they will hang up because they can’t get the lock
> 
> 3.ovs-vswitchd will pause revalidators through latch for purging pmd 
> flows in dp_purge_cb,  but the revalidators already sleeping to wait 
> for the mutex lock, the lead can’t response to this pause action

Hi.  Thanks for the report.
This is definitely a deadlock and your analysis is correct.
The issue is derived from the fact that netdev-offload-dpdk is not thread safe and we have to hold dp->port_mutex during all the offloading related operations.

>  
> 
> I have no idea how to fix it, please feel free to leave your comments.

I don't have a solution for this right now in mind.  Will take a closer look.

> 
>  
> 
> Thanks.
> 
>  
> 
> 发件人: Frank Wang(王培辉)
> 发送时间: 2020年6月17日 19:08
> 收件人: 'ovs-dev at openvswitch.org' <ovs-dev at openvswitch.org>; 
> ovs-discuss at openvswitch.org
> 主题: ovs-vswitchd got stuck when delete dpdk port with ovs hw-offload 
> enabled
> 
>  
> 
> Hello
> 
>  
> 
>          I’m encountered a problem that ovs-vswitchd got stuck while I 
> tried to delete the dpdk port from bridge in in probability, when I 
> turn off hw-offload,it’s won’t happen again.I’m using the latest ovs 
> 2.13.1 version, CentOS 7.6, please help me out here, thanks in advance.
> 
>  
> 
> Here is the ovs-vswitchd stack, it seems a deadlock:
> 
> Thread 41 (Thread 0x7f2e2fdfe700 (LWP 156099)):
> 
> #0  0x00007f2e975224ed in __lll_lock_wait () from 
> /lib64/libpthread.so.0
> 
> #1  0x00007f2e9751dde6 in _L_lock_941 () from /lib64/libpthread.so.0
> 
> #2  0x00007f2e9751dcdf in pthread_mutex_lock () from 
> /lib64/libpthread.so.0
> 
> #3  0x00007f2e98562318 in ovs_mutex_lock_at () from /lib64/libopenvswitch-2.
> 13.so.0
> 
> #4  0x00007f2e984c5469 in dpif_netdev_get_flow_offload_status () from
> /lib64/libopenvswitch-2.13.so.0
> 
> #5  0x00007f2e984c5504 in get_dpif_flow_status () from
> /lib64/libopenvswitch-2.13.so.0
> 
> #6  0x00007f2e984cd411 in dp_netdev_flow_to_dpif_flow () from
> /lib64/libopenvswitch-2.13.so.0
> 
> #7  0x00007f2e984cd708 in dpif_netdev_flow_dump_next () from
> /lib64/libopenvswitch-2.13.so.0
> 
> #8  0x00007f2e984d86f2 in dpif_flow_dump_next () from
> /lib64/libopenvswitch-2.13.so.0
> 
> #9  0x00007f2e98b46547 in revalidate.isra.23 () from /lib64/libofproto-2.13.
> so.0
> 
> #10 0x00007f2e98b46db3 in udpif_revalidator () from
> /lib64/libofproto-2.13.so.0
> 
> #11 0x00007f2e9856307f in ovsthread_wrapper () from /lib64/libopenvswitch-2.
> 13.so.0
> 
> #12 0x00007f2e9751bdd5 in start_thread () from /lib64/libpthread.so.0
> 
> #13 0x00007f2e96a39ead in clone () from /lib64/libc.so.6
> 
> Thread 40 (Thread 0x7f2e2fbfd700 (LWP 156100)):
> 
> #0  0x00007f2e975224ed in __lll_lock_wait () from 
> /lib64/libpthread.so.0
> 
> #1  0x00007f2e9751dde6 in _L_lock_941 () from /lib64/libpthread.so.0
> 
> #2  0x00007f2e9751dcdf in pthread_mutex_lock () from 
> /lib64/libpthread.so.0
> 
> #3  0x00007f2e98562318 in ovs_mutex_lock_at () from /lib64/libopenvswitch-2.
> 13.so.0
> 
> #4  0x00007f2e984c5469 in dpif_netdev_get_flow_offload_status () from
> /lib64/libopenvswitch-2.13.so.0
> 
> #5  0x00007f2e984c5504 in get_dpif_flow_status () from
> /lib64/libopenvswitch-2.13.so.0
> 
> #6  0x00007f2e984cd411 in dp_netdev_flow_to_dpif_flow () from
> /lib64/libopenvswitch-2.13.so.0
> 
> #7  0x00007f2e984cd708 in dpif_netdev_flow_dump_next () from
> /lib64/libopenvswitch-2.13.so.0
> 
> #8  0x00007f2e984d86f2 in dpif_flow_dump_next () from
> /lib64/libopenvswitch-2.13.so.0
> 
> #9  0x00007f2e98b46547 in revalidate.isra.23 () from /lib64/libofproto-2.13.
> so.0
> 
> #10 0x00007f2e98b46db3 in udpif_revalidator () from
> /lib64/libofproto-2.13.so.0
> 
> #11 0x00007f2e9856307f in ovsthread_wrapper () from /lib64/libopenvswitch-2.
> 13.so.0
> 
> #12 0x00007f2e9751bdd5 in start_thread () from /lib64/libpthread.so.0
> 
> #13 0x00007f2e96a39ead in clone () from /lib64/libc.so.6
> 
>  
> 
>> 
> Thread 1 (Thread 0x7f2eaa999000 (LWP 155698)):
> 
> #0  0x00007f2e96a2f20d in poll () from /lib64/libc.so.6
> 
> #1  0x00007f2e98593064 in time_poll () from 
> /lib64/libopenvswitch-2.13.so.0
> 
> #2  0x00007f2e9857a50c in poll_block () from 
> /lib64/libopenvswitch-2.13.so.0
> 
> #3  0x00007f2e98562f78 in ovs_barrier_block () from /lib64/libopenvswitch-2.
> 13.so.0
> 
> #4  0x00007f2e98b432be in dp_purge_cb () from 
> /lib64/libofproto-2.13.so.0
> 
> #5  0x00007f2e984c8391 in dp_netdev_del_pmd () from /lib64/libopenvswitch-2.
> 13.so.0
> 
> #6  0x00007f2e984c9a67 in reconfigure_datapath () from
> /lib64/libopenvswitch-2.13.so.0
> 
> #7  0x00007f2e984cacbd in do_del_port () from /lib64/libopenvswitch-2.13.so.
> 0
> 
> #8  0x00007f2e984ccd5f in dpif_netdev_port_del () from
> /lib64/libopenvswitch-2.13.so.0
> 
> #9  0x00007f2e984d74dc in dpif_port_del () from
> /lib64/libopenvswitch-2.13.so.0
> 
> #10 0x00007f2e98b31595 in port_del () from /lib64/libofproto-2.13.so.0
> 
> #11 0x00007f2e98b1fe90 in ofproto_port_del () from
> /lib64/libofproto-2.13.so.0
> 
> #12 0x000056187a856aa6 in bridge_delete_or_reconfigure_ports ()
> 
> #13 0x000056187a858490 in bridge_reconfigure ()
> 
> #14 0x000056187a85be26 in bridge_run ()
> 
> #15 0x000056187a85235d in main ()
> 
>  
> 
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 



More information about the dev mailing list