[ovs-discuss] Crash on netns teardown

Florian Larysch fl at n621.de
Fri Dec 9 15:13:35 UTC 2016


Hi,

I'm running OVS 2.5.1 on a Linux 4.4.35 kernel (the latter is patched
significantly with regard to netfilter, but I don't think this
intersects with the present problem). When I reboot the box (which has
two net namespaces), I get an oops:

[   83.650231] Unable to handle kernel paging request for data at address 0x2ea84000
[   83.657705] Faulting instruction address: 0xc022a20c
[   83.662663] Oops: Kernel access of bad area, sig: 11 [#1]
[   83.668050] SMP NR_CPUS=2 P2020 RDB
[   83.671532] Modules linked in: [...]
[   83.925212] CPU: 1 PID: 831 Comm: kworker/u4:2 Not tainted 4.4.35 #0
[   83.931559] Workqueue: netns cleanup_net
[   83.935474] task: ea9b1f40 ti: eea94000 task.ti: eea94000
[   83.940862] NIP: c022a20c LR: c022a230 CTR: c0264154
[   83.945817] REGS: eea95d60 TRAP: 0300   Not tainted  (4.4.35)
[   83.951551] MSR: 00021000 <CE,ME>  CR: 44000008  XER: 20000000
[   83.957387] DEAR: 2ea84000 ESR: 00000000
GPR00: c022a230 eea95e10 ea9b1f40 00000000 00000002 00000000 00000000 00000000
GPR08: c05636ec 00000000 2ea84000 00000004 c038669c 00000000 c004d49c ee4af100
GPR16: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
GPR24: 00000000 00000000 00000000 ea820800 00029000 eeb209c0 00000000 00000000
[   83.989626] NIP [c022a20c] __percpu_counter_sum+0x60/0xbc
[   83.995016] LR [c022a230] __percpu_counter_sum+0x84/0xbc
[   84.000316] Call Trace:
[   84.002756] [eea95e10] [c022a230] __percpu_counter_sum+0x84/0xbc (unreliable)
[   84.009893] [eea95e30] [c0386774] inet_frags_exit_net+0xd8/0xfc
[   84.015809] [eea95e50] [f16ca2f8] nf_ct_net_exit+0x2c/0x40 [nf_defrag_ipv6]
[   84.022772] [eea95e60] [c02ee05c] ops_exit_list+0x40/0x80
[   84.028162] [eea95e80] [c02ef7a4] cleanup_net+0x190/0x250
[   84.033556] [eea95eb0] [c004703c] process_one_work+0x20c/0x330
[   84.039383] [eea95ed0] [c0047688] worker_thread+0x1b4/0x2ec
[   84.044954] [eea95ef0] [c004d56c] kthread+0xd0/0xdc
[   84.049830] [eea95f40] [c000fa5c] ret_from_kernel_thread+0x5c/0x64
[   84.056000] Instruction dump:
[   84.058959] 7fa3eb78 481e75a5 7c7c1b78 83dd0008 83fd000c 3860ffff 48000028 813d0010
[   84.066712] 546a103a 3d00c056 390836ec 7d48502e <7d29502e> 7d2afe70 7fe9f814 7fcaf114
[   84.074643] ---[ end trace 31803f815add721e ]---

(FWIW: I've applied https://github.com/openvswitch/ovs/commit/e92669ba
to the OVS build I use already)

I've added some debugging and it seems that nf_ct_net_exit is called
twice: Once as the kernel version and once as the backported OVS
version:

[   83.468964] inet_frags_exit_net called on nf eeb209c0 frags f1715d58 (counters d0d668d4)
[   83.477080] CPU: 1 PID: 831 Comm: kworker/u4:2 Not tainted 4.4.35 #0
[   83.483432] Workqueue: netns cleanup_net
[   83.487346] Call Trace:
[   83.489796] [eea95e10] [c01fded8] __dump_stack+0x24/0x34 (unreliable)
[   83.496233] [eea95e20] [c01fdf5c] dump_stack+0x74/0xa0
[   83.501374] [eea95e30] [c03866e0] inet_frags_exit_net+0x44/0xfc
[   83.507338] [eea95e50] [f170eec8] nf_ct_net_exit+0x1c/0x2c [openvswitch]
[   83.514042] [eea95e60] [c02ee05c] ops_exit_list+0x40/0x80
[   83.519433] [eea95e80] [c02ef7a4] cleanup_net+0x190/0x250
[   83.524826] [eea95eb0] [c004703c] process_one_work+0x20c/0x330
[   83.530651] [eea95ed0] [c0047688] worker_thread+0x1b4/0x2ec
[   83.536224] [eea95ef0] [c004d56c] kthread+0xd0/0xdc
[   83.541102] [eea95f40] [c000fa5c] ret_from_kernel_thread+0x5c/0x64
[   83.547308] inet_frags_exit_net: evict_again
[   83.551782] __percpu_counter_sum: eeb209c0 | d0d668d4
[   83.556870] percpu_counter_destroy: eeb209c0 | d0d668d4
[   83.562132] inet_frags_exit_net called on nf eeb209c0 frags f16ca794 (counters   (null))
[   83.570247] CPU: 1 PID: 831 Comm: kworker/u4:2 Not tainted 4.4.35 #0
[   83.576600] Workqueue: netns cleanup_net
[   83.580515] Call Trace:
[   83.582964] [eea95e10] [c01fded8] __dump_stack+0x24/0x34 (unreliable)
[   83.589400] [eea95e20] [c01fdf5c] dump_stack+0x74/0xa0
[   83.594539] [eea95e30] [c03866e0] inet_frags_exit_net+0x44/0xfc
[   83.600460] [eea95e50] [f16ca2f8] nf_ct_net_exit+0x2c/0x40 [nf_defrag_ipv6]
[   83.607424] [eea95e60] [c02ee05c] ops_exit_list+0x40/0x80
[   83.612815] [eea95e80] [c02ef7a4] cleanup_net+0x190/0x250
[   83.618208] [eea95eb0] [c004703c] process_one_work+0x20c/0x330
[   83.624033] [eea95ed0] [c0047688] worker_thread+0x1b4/0x2ec
[   83.629604] [eea95ef0] [c004d56c] kthread+0xd0/0xdc
[   83.634481] [eea95f40] [c000fa5c] ret_from_kernel_thread+0x5c/0x64
[   83.640696] inet_frags_exit_net: evict_again
[   83.645080] __percpu_counter_sum: eeb209c0 |   (null)
[   83.650231] Unable to handle kernel paging request for data at address 0x2ea84000
[...]

(the "counters" value is nf->mem.counters in inet_frags_exit_net, which
gets set to NULL by percpu_counter_destroy the first time around)

I'm not sure to what extent OVS depends on the backported/custom
implementation of nf_ct_frag6_init and friends.

Does anybody have an idea what would be the right place to fix this?

Florian


More information about the discuss mailing list