[ovs-discuss] OVS 2.0.0: rmmod hangs while trying to remove openvswitch.ko
lin shaopeng
slin0209 at gmail.com
Thu Mar 20 08:25:58 UTC 2014
Greetings folks,
I am experiencing an issue with openvswitch-2.0.0。And I believe there is a
deadlock in it。
Steps to reproduce the issue:
1.Install openvswitch-2.0.0 on a CentOS machine with kernel version 3.0.57。
2.Start up vswitchd and ovsdb。
3.Create bridge and add vxlan port:
# ovs-vsctl add-br ovsbr1
# ovs-vsctl add-port ovsbr1 eth1
# ovs-vsctl add-port ovsbr1 vxlan2 -- set interface vxlan2 type=vxlan
options:local_ip=11.12.13.4 options:remote_ip=flow
4.Stop vswitchd and ovsdb。
5.Rmmod openvswitch。 The command hangs up, and never get returned。After 120
seconds,system log get some outputs:
2147 Mar 19 01:51:53 NODE-4 kernel: [49343.232348] ovs_workq D
ffff88007f052000 0 7558 2 0x00000000
2148 Mar 19 01:51:53 NODE-4 kernel: [49343.232355] ffff880044397d60
0000000000000246 ffff88004417e040 0000000000012000
2149 Mar 19 01:51:53 NODE-4 kernel: [49343.232362] ffff880044397fd8
ffff880000000000 ffff88004417e040 0000000000012000
2150 Mar 19 01:51:53 NODE-4 kernel: [49343.232369] ffff880044397fd8
ffff880044396010 ffff880044397fd8 0000000000012000
2151 Mar 19 01:51:53 NODE-4 kernel: [49343.232377] Call Trace:
2152 Mar 19 01:51:53 NODE-4 kernel: [49343.232386] [<ffffffff81974b4b>] ?
xen_hypervisor_callback+0x1b/0x20
2153 Mar 19 01:51:53 NODE-4 kernel: [49343.232390] [<ffffffff8197378a>] ?
error_exit+0x2a/0x60
2154 Mar 19 01:51:53 NODE-4 kernel: [49343.232393] [<ffffffff819732e1>] ?
retint_restore_args+0x5/0x6
2155 Mar 19 01:51:53 NODE-4 kernel: [49343.232395] [<ffffffff81972da9>] ?
_raw_spin_lock+0x9/0x10
2156 Mar 19 01:51:53 NODE-4 kernel: [49343.232401] [<ffffffff810e4dc6>] ?
kmem_cache_free+0xc6/0x1a0
2157 Mar 19 01:51:53 NODE-4 kernel: [49343.232404] [<ffffffff8197108a>]
schedule+0x3a/0x50
2158 Mar 19 01:51:53 NODE-4 kernel: [49343.232407] [<ffffffff8197197f>]
__mutex_lock_slowpath+0xdf/0x160
2159 Mar 19 01:51:53 NODE-4 kernel: [49343.232413] [<ffffffffa00636b0>] ?
vxlan_udp_encap_recv+0x160/0x160 [openvswitch]
2160 Mar 19 01:51:53 NODE-4 kernel: [49343.232416] [<ffffffff819717ce>]
mutex_lock+0x1e/0x40
2161 Mar 19 01:51:53 NODE-4 kernel: [49343.232420] [<ffffffff81851378>]
unregister_pernet_device+0x18/0x50
2162 Mar 19 01:51:53 NODE-4 kernel: [49343.232424] [<ffffffffa0063704>]
vxlan_del_work+0x54/0x70 [openvswitch]
2163 Mar 19 01:51:53 NODE-4 kernel: [49343.232429] [<ffffffffa0063e01>]
worker_thread+0xe1/0x1d0 [openvswitch]
2164 Mar 19 01:51:53 NODE-4 kernel: [49343.232433] [<ffffffff8106f020>] ?
wake_up_bit+0x40/0x40
2165 Mar 19 01:51:53 NODE-4 kernel: [49343.232438] [<ffffffffa0063d20>] ?
ovs_workqueues_exit+0x30/0x30 [openvswitch]
2166 Mar 19 01:51:53 NODE-4 kernel: [49343.232440] [<ffffffff8106eb66>]
kthread+0x96/0xa0
2167 Mar 19 01:51:53 NODE-4 kernel: [49343.232443] [<ffffffff81974a24>]
kernel_thread_helper+0x4/0x10
2168 Mar 19 01:51:53 NODE-4 kernel: [49343.232446] [<ffffffff81973b36>] ?
int_ret_from_sys_call+0x7/0x1b
2169 Mar 19 01:51:53 NODE-4 kernel: [49343.232448] [<ffffffff819732e1>] ?
retint_restore_args+0x5/0x6
2170 Mar 19 01:51:53 NODE-4 kernel: [49343.232451] [<ffffffff81974a20>] ?
gs_change+0x13/0x13
2171 Mar 19 01:51:53 NODE-4 kernel: [49343.232453] INFO: task rmmod:8479
blocked for more than 120 seconds.
2172 Mar 19 01:51:53 NODE-4 kernel: [49343.232455] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
2173 Mar 19 01:51:53 NODE-4 kernel: [49343.232456] rmmod D
ffff88007f1d2000 0 8479 1095 0x00000000
2174 Mar 19 01:51:53 NODE-4 kernel: [49343.232459] ffff880049c5bc18
0000000000000282 ffffffff32323532 ffffffff81d6730a
2175 Mar 19 01:51:53 NODE-4 kernel: [49343.232462] ffff880049c5bb78
ffff880000000000 ffff8800442844c0 0000000000012000
2176 Mar 19 01:51:53 NODE-4 kernel: [49343.232465] ffff880049c5bfd8
ffff880049c5a010 ffff880049c5bfd8 0000000000012000
2177 Mar 19 01:51:53 NODE-4 kernel: [49343.232468] Call Trace:
2178 Mar 19 01:51:53 NODE-4 kernel: [49343.232471] [<ffffffff81972e29>] ?
_raw_spin_unlock_irqrestore+0x19/0x20
2179 Mar 19 01:51:53 NODE-4 kernel: [49343.232474] [<ffffffff81095332>] ?
irq_to_desc+0x12/0x20
2180 Mar 19 01:51:53 NODE-4 kernel: [49343.232477] [<ffffffff810979d9>] ?
irq_get_irq_data+0x9/0x10
2181 Mar 19 01:51:53 NODE-4 kernel: [49343.232480] [<ffffffff8141d669>] ?
info_for_irq+0x9/0x20
2182 Mar 19 01:51:53 NODE-4 kernel: [49343.232483] [<ffffffff8197108a>]
schedule+0x3a/0x50
2183 Mar 19 01:51:53 NODE-4 kernel: [49343.232486] [<ffffffff819714a5>]
schedule_timeout+0x1a5/0x200
2184 Mar 19 01:51:53 NODE-4 kernel: [49343.232490] [<ffffffff81004bcf>] ?
xen_pte_val+0x2f/0x80
2185 Mar 19 01:51:53 NODE-4 kernel: [49343.232492] [<ffffffff810e5837>] ?
kfree+0x117/0x210
2186 Mar 19 01:51:53 NODE-4 kernel: [49343.232495] [<ffffffff810e5837>] ?
kfree+0x117/0x210
2187 Mar 19 01:51:53 NODE-4 kernel: [49343.232497] [<ffffffff81970531>]
wait_for_common+0xd1/0x180
2188 Mar 19 01:51:53 NODE-4 kernel: [49343.232501] [<ffffffff813b6e20>] ?
kobject_del+0x40/0x40
2189 Mar 19 01:51:53 NODE-4 kernel: [49343.232506] [<ffffffff8104e210>] ?
try_to_wake_up+0x260/0x260
2190 Mar 19 01:51:53 NODE-4 kernel: [49343.232509] [<ffffffff81970688>]
wait_for_completion+0x18/0x20
2191 Mar 19 01:51:53 NODE-4 kernel: [49343.232513] [<ffffffffa0063f71>]
__cancel_work_timer+0x81/0x1b0 [openvswitch]
2192 Mar 19 01:51:53 NODE-4 kernel: [49343.232516] [<ffffffff8104e183>] ?
try_to_wake_up+0x1d3/0x260
2193 Mar 19 01:51:53 NODE-4 kernel: [49343.232521] [<ffffffffa00640d0>] ?
rpl_cancel_delayed_work_sync+0x10/0x10 [openvswitch]
2194 Mar 19 01:51:53 NODE-4 kernel: [49343.232526] [<ffffffffa00640ab>]
cancel_work_sync+0xb/0x20 [openvswitch]
2195 Mar 19 01:51:53 NODE-4 kernel: [49343.232530] [<ffffffffa00643a3>]
ovs_exit_net+0x73/0x82 [openvswitch]
2196 Mar 19 01:51:53 NODE-4 kernel: [49343.232533] [<ffffffff818511d9>]
ops_exit_list+0x39/0x60
2197 Mar 19 01:51:53 NODE-4 kernel: [49343.232535] [<ffffffff8185132d>]
unregister_pernet_operations+0x3d/0x70
2198 Mar 19 01:51:53 NODE-4 kernel: [49343.232538] [<ffffffff81851389>]
unregister_pernet_device+0x29/0x50
2199 Mar 19 01:51:53 NODE-4 kernel: [49343.232541] [<ffffffffa00583c8>]
dp_cleanup+0x58/0x80 [openvswitch]
2200 Mar 19 01:51:53 NODE-4 kernel: [49343.232546] [<ffffffff810885a8>]
sys_delete_module+0x178/0x240
Some investigations to the code:
Thread rmmod:
Dp_cleanup -->
unregister_netdevice_notifier(&ovs_dp_device_notifier); -->
dp_device_event --> (event == NETDEV_UNREGISTER)
queue_work(&ovs_net->dp_notify_work); --> ovs_workq will deal with this
unregister_pernet_device(&ovs_net_ops); --> acquire net_mutex
unregister_pernet_operations -->
ops_exit_list -->
ovs_exit_net -->
cancel_work_sync -->
__cancel_work_timer -->
workqueue_barrier --> queue a barrier work to ovs_workq, and
waiting for it.
wait_for_completion(&barr.done);
Thread ovs_workq:
worker_thread -->
run_workqueue -->
ovs_dp_notify_wq -->
dp_detach_port_notify -->
ovs_dp_detach_port -->
ovs_vport_del -->
vport->ops->destroy(vport); -->
vxlan_tnl_destroy -->
vxlan_sock_release -->
queue_work(&vs->del_work); --> queue vxlan_del_work to ovs_workq
worker_thread -->
run_workqueue -->
vxlan_del_work -->
vxlan_cleanup_module -->
unregister_pernet_device(&vxlan_net_ops); --> acquire net_mutex
I believe there is a deadlock in it.
After rmmod is issued, unregister_netdevice_notifier queue dp_notify_work to
ovs_workq,then the thread go forward to unregister_pernet_device
(&ovs_net_ops); In unregister_pernet_device(), thread
rmmod holds net_mutex, and do ovs_exit_net, it adds an barrier work to the
tail
of ovs_workq workqueue. Then waiting for barrier to be done while holding
net_mutex.
On the other hand, thread ovs_workq is doing vxlan cleanup, to
unregister_pernet_device(&vxlan_net_ops),
it tries to acquire net_mutex, which is holding by rmmod. So vxlan_del_work
never get finished, and the barrier work never get executed, rmmod can't
continue and release net_mutex.
Can anyone confirm this? And is there a fix for this issue?
Thanks,
Lin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://openvswitch.org/pipermail/ovs-discuss/attachments/20140320/19c6d010/attachment-0002.html>
More information about the discuss
mailing list