[ovs-discuss] Crash on netns teardown
Florian Larysch
fl at n621.de
Fri Dec 9 15:13:35 UTC 2016
Hi,
I'm running OVS 2.5.1 on a Linux 4.4.35 kernel (the latter is patched
significantly with regard to netfilter, but I don't think this
intersects with the present problem). When I reboot the box (which has
two net namespaces), I get an oops:
[ 83.650231] Unable to handle kernel paging request for data at address 0x2ea84000
[ 83.657705] Faulting instruction address: 0xc022a20c
[ 83.662663] Oops: Kernel access of bad area, sig: 11 [#1]
[ 83.668050] SMP NR_CPUS=2 P2020 RDB
[ 83.671532] Modules linked in: [...]
[ 83.925212] CPU: 1 PID: 831 Comm: kworker/u4:2 Not tainted 4.4.35 #0
[ 83.931559] Workqueue: netns cleanup_net
[ 83.935474] task: ea9b1f40 ti: eea94000 task.ti: eea94000
[ 83.940862] NIP: c022a20c LR: c022a230 CTR: c0264154
[ 83.945817] REGS: eea95d60 TRAP: 0300 Not tainted (4.4.35)
[ 83.951551] MSR: 00021000 <CE,ME> CR: 44000008 XER: 20000000
[ 83.957387] DEAR: 2ea84000 ESR: 00000000
GPR00: c022a230 eea95e10 ea9b1f40 00000000 00000002 00000000 00000000 00000000
GPR08: c05636ec 00000000 2ea84000 00000004 c038669c 00000000 c004d49c ee4af100
GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
GPR24: 00000000 00000000 00000000 ea820800 00029000 eeb209c0 00000000 00000000
[ 83.989626] NIP [c022a20c] __percpu_counter_sum+0x60/0xbc
[ 83.995016] LR [c022a230] __percpu_counter_sum+0x84/0xbc
[ 84.000316] Call Trace:
[ 84.002756] [eea95e10] [c022a230] __percpu_counter_sum+0x84/0xbc (unreliable)
[ 84.009893] [eea95e30] [c0386774] inet_frags_exit_net+0xd8/0xfc
[ 84.015809] [eea95e50] [f16ca2f8] nf_ct_net_exit+0x2c/0x40 [nf_defrag_ipv6]
[ 84.022772] [eea95e60] [c02ee05c] ops_exit_list+0x40/0x80
[ 84.028162] [eea95e80] [c02ef7a4] cleanup_net+0x190/0x250
[ 84.033556] [eea95eb0] [c004703c] process_one_work+0x20c/0x330
[ 84.039383] [eea95ed0] [c0047688] worker_thread+0x1b4/0x2ec
[ 84.044954] [eea95ef0] [c004d56c] kthread+0xd0/0xdc
[ 84.049830] [eea95f40] [c000fa5c] ret_from_kernel_thread+0x5c/0x64
[ 84.056000] Instruction dump:
[ 84.058959] 7fa3eb78 481e75a5 7c7c1b78 83dd0008 83fd000c 3860ffff 48000028 813d0010
[ 84.066712] 546a103a 3d00c056 390836ec 7d48502e <7d29502e> 7d2afe70 7fe9f814 7fcaf114
[ 84.074643] ---[ end trace 31803f815add721e ]---
(FWIW: I've applied https://github.com/openvswitch/ovs/commit/e92669ba
to the OVS build I use already)
I've added some debugging and it seems that nf_ct_net_exit is called
twice: Once as the kernel version and once as the backported OVS
version:
[ 83.468964] inet_frags_exit_net called on nf eeb209c0 frags f1715d58 (counters d0d668d4)
[ 83.477080] CPU: 1 PID: 831 Comm: kworker/u4:2 Not tainted 4.4.35 #0
[ 83.483432] Workqueue: netns cleanup_net
[ 83.487346] Call Trace:
[ 83.489796] [eea95e10] [c01fded8] __dump_stack+0x24/0x34 (unreliable)
[ 83.496233] [eea95e20] [c01fdf5c] dump_stack+0x74/0xa0
[ 83.501374] [eea95e30] [c03866e0] inet_frags_exit_net+0x44/0xfc
[ 83.507338] [eea95e50] [f170eec8] nf_ct_net_exit+0x1c/0x2c [openvswitch]
[ 83.514042] [eea95e60] [c02ee05c] ops_exit_list+0x40/0x80
[ 83.519433] [eea95e80] [c02ef7a4] cleanup_net+0x190/0x250
[ 83.524826] [eea95eb0] [c004703c] process_one_work+0x20c/0x330
[ 83.530651] [eea95ed0] [c0047688] worker_thread+0x1b4/0x2ec
[ 83.536224] [eea95ef0] [c004d56c] kthread+0xd0/0xdc
[ 83.541102] [eea95f40] [c000fa5c] ret_from_kernel_thread+0x5c/0x64
[ 83.547308] inet_frags_exit_net: evict_again
[ 83.551782] __percpu_counter_sum: eeb209c0 | d0d668d4
[ 83.556870] percpu_counter_destroy: eeb209c0 | d0d668d4
[ 83.562132] inet_frags_exit_net called on nf eeb209c0 frags f16ca794 (counters (null))
[ 83.570247] CPU: 1 PID: 831 Comm: kworker/u4:2 Not tainted 4.4.35 #0
[ 83.576600] Workqueue: netns cleanup_net
[ 83.580515] Call Trace:
[ 83.582964] [eea95e10] [c01fded8] __dump_stack+0x24/0x34 (unreliable)
[ 83.589400] [eea95e20] [c01fdf5c] dump_stack+0x74/0xa0
[ 83.594539] [eea95e30] [c03866e0] inet_frags_exit_net+0x44/0xfc
[ 83.600460] [eea95e50] [f16ca2f8] nf_ct_net_exit+0x2c/0x40 [nf_defrag_ipv6]
[ 83.607424] [eea95e60] [c02ee05c] ops_exit_list+0x40/0x80
[ 83.612815] [eea95e80] [c02ef7a4] cleanup_net+0x190/0x250
[ 83.618208] [eea95eb0] [c004703c] process_one_work+0x20c/0x330
[ 83.624033] [eea95ed0] [c0047688] worker_thread+0x1b4/0x2ec
[ 83.629604] [eea95ef0] [c004d56c] kthread+0xd0/0xdc
[ 83.634481] [eea95f40] [c000fa5c] ret_from_kernel_thread+0x5c/0x64
[ 83.640696] inet_frags_exit_net: evict_again
[ 83.645080] __percpu_counter_sum: eeb209c0 | (null)
[ 83.650231] Unable to handle kernel paging request for data at address 0x2ea84000
[...]
(the "counters" value is nf->mem.counters in inet_frags_exit_net, which
gets set to NULL by percpu_counter_destroy the first time around)
I'm not sure to what extent OVS depends on the backported/custom
implementation of nf_ct_frag6_init and friends.
Does anybody have an idea what would be the right place to fix this?
Florian
More information about the discuss
mailing list