[ovs-discuss] memory corruption in the kernel module

Pravin Shelar pshelar at nicira.com
Fri Sep 25 23:26:57 UTC 2015


On Fri, Sep 25, 2015 at 5:00 AM, Nikolay Borisov
<n.borisov at siteground.com> wrote:
> Hello,
>
> I'm using openvswitch on kernel 3.12.28 and опенжсвитцх 2.3.1 recently got the following warnings from the kernel:
>
> [8003509.804409] ------------[ cut here ]------------
> [8003509.804653] WARNING: CPU: 28 PID: 12584 at mm/slub.c:3318 ksize+0xbd/0xc0()
> [8003509.804880] Modules linked in: xt_REDIRECT tcp_diag inet_diag act_police cls_basic sch_ingress xt_iprange xt_multiport xt_pkttype xt_state veth openvswitch gre vxlan ip_tunnel xt_owner xt_conntrack iptable_mangle xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_CT nf_conntrack iptable_raw ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core ext2 dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio dm_mirror dm_region_hash dm_log i2c_i801 lpc_ich mfd_core ioapic ioatdma igb dca ipmi_devintf ipmi_si ipmi_msghandler megaraid_sas
> [8003509.806801] CPU: 28 PID: 12584 Comm: handler42 Not tainted 3.12.28-clouder7 #1
> [8003509.807026] Hardware name: Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.2 01/16/2015
> [8003509.807465]  0000000000000cf6 ffff883fcfc75858 ffffffff815717b9 0000000000000cf6
> [8003509.807699]  0000000000000000 ffff883fcfc75898 ffffffff810786c2 ffff883fcfc75960
> [8003509.807932]  ffffea005264d5c0 ffff883fcfc759b0 0000000000000000 0000000000000008
> [8003509.808163] Call Trace:
> [8003509.808385]  [<ffffffff815717b9>] dump_stack+0x49/0x60
> [8003509.808611]  [<ffffffff810786c2>] warn_slowpath_common+0x82/0xb0
> [8003509.808836]  [<ffffffff81078705>] warn_slowpath_null+0x15/0x20
> [8003509.809062]  [<ffffffff8113a0dd>] ksize+0xbd/0xc0
> [8003509.809290]  [<ffffffffa0290390>] reserve_sfa_size+0x30/0xe0 [openvswitch]
> [8003509.809518]  [<ffffffffa02907a3>] validate_and_copy_actions+0x1e3/0x530 [openvswitch]
> [8003509.809748]  [<ffffffff8113c1f1>] ? __kmalloc+0x31/0x190
> [8003509.809973]  [<ffffffffa0290cc3>] ovs_packet_cmd_execute+0x1a3/0x230 [openvswitch]
> [8003509.810202]  [<ffffffff814e8d61>] genl_family_rcv_msg+0x221/0x390
> [8003509.810432]  [<ffffffff814e8ed0>] ? genl_family_rcv_msg+0x390/0x390
> [8003509.810660]  [<ffffffff814e8f2b>] genl_rcv_msg+0x5b/0xa0
> [8003509.810885]  [<ffffffff814e72b9>] netlink_rcv_skb+0x99/0xc0
> [8003509.811110]  [<ffffffff814e87d7>] genl_rcv+0x27/0x40
> [8003509.811335]  [<ffffffff814e634f>] netlink_unicast+0x10f/0x190
> [8003509.811563]  [<ffffffff814e7cf3>] netlink_sendmsg+0x2c3/0x750
> [8003509.811788]  [<ffffffff811666e0>] ? __pollwait+0xf0/0xf0
> [8003509.812014]  [<ffffffff814a620b>] sock_sendmsg+0x8b/0xb0
> [8003509.812239]  [<ffffffff811666e0>] ? __pollwait+0xf0/0xf0
> [8003509.812465]  [<ffffffff8157583b>] ? _raw_spin_lock_bh+0x1b/0x40
> [8003509.812689]  [<ffffffff81575700>] ? _raw_spin_unlock_bh+0x10/0x20
> [8003509.812916]  [<ffffffff814b3091>] ? verify_iovec+0x61/0xe0
> [8003509.813143]  [<ffffffff814a70d1>] ___sys_sendmsg+0x411/0x430
> [8003509.813373]  [<ffffffff81193ca9>] ? ep_scan_ready_list+0x189/0x1b0
> [8003509.813600]  [<ffffffff81193e2f>] ? ep_poll+0x13f/0x370
> [8003509.813825]  [<ffffffff81089c4b>] ? k_getrusage+0x13b/0x380
> [8003509.814047]  [<ffffffff814a72f4>] __sys_sendmsg+0x44/0x80
> [8003509.814268]  [<ffffffff814a7344>] SyS_sendmsg+0x14/0x20
> [8003509.819684]  [<ffffffff815763a2>] system_call_fastpath+0x16/0x1b
> [8003509.819910] ---[ end trace af1b9000d8cc8f35 ]---
>
> And almost instantly after that:
>
> [8003509.820257] ------------[ cut here ]------------
> [8003509.820480] kernel BUG at mm/slub.c:3338!
> [8003509.820699] invalid opcode: 0000 [#1] SMP
> [8003509.820922] Modules linked in: xt_REDIRECT tcp_diag inet_diag act_police cls_basic sch_ingress xt_iprange xt_multiport xt_pkttype xt_state veth openvswitch gre vxlan ip_tunnel xt_owner xt_conntrack iptable_mangle xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_CT nf_conntrack iptable_raw ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa ib_mad ib_core ext2 dm_thin_pool dm_bio_prison dm_persistent_data dm_bufio dm_mirror dm_region_hash dm_log i2c_i801 lpc_ich mfd_core ioapic ioatdma igb dca ipmi_devintf ipmi_si ipmi_msghandler megaraid_sas
> [8003509.822840] CPU: 28 PID: 12584 Comm: handler42 Tainted: G        W    3.12.28-clouder7 #1
> [8003509.823068] Hardware name: Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.2 01/16/2015
> [8003509.823514] task: ffff883fce76dac0 ti: ffff883fcfc74000 task.ti: ffff883fcfc74000
> [8003509.823739] RIP: 0010:[<ffffffff8113a9b2>]  [<ffffffff8113a9b2>] kfree+0xe2/0xf0
> [8003509.823970] RSP: 0018:ffff883fcfc75938  EFLAGS: 00010246
> [8003509.824194] RAX: 02fc000000000824 RBX: ffff880261eb4cf0 RCX: ffff883fce76dac0
> [8003509.824423] RDX: 0000000000402140 RSI: 0000000000000000 RDI: ffff8814993577e0
> [8003509.824647] RBP: ffff883fcfc75948 R08: ffff881fffc91640 R09: ffffea005264d5c0
> [8003509.824874] R10: ffffffff814b04ca R11: 0000000000000000 R12: 0000000000000000
> [8003509.825098] R13: ffff880261eb4cf0 R14: ffff880261eb4d28 R15: ffff8818a080ec00
> [8003509.825327] FS:  00007fa9c9ffb700(0000) GS:ffff881fffc80000(0000) knlGS:0000000000000000
> [8003509.825556] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [8003509.825780] CR2: ffffffffff600400 CR3: 0000003fce560000 CR4: 00000000001407e0
> [8003509.826007] Stack:
> [8003509.826225]  ffff883fcfc75978 ffff880261eb4cf0 ffff883fcfc75968 ffffffffa0296c98
> [8003509.826459]  ffff883fcfc75978 ffff880261eb4cf0 ffff883fcfc75988 ffffffffa0296ce9
> [8003509.826693]  ffff8818a080ff00 ffff883fce2f3300 ffff883fcfc759e8 ffffffffa0290d2c
> [8003509.826926] Call Trace:
> [8003509.827150]  [<ffffffffa0296c98>] __flow_free+0x18/0x30 [openvswitch]
> [8003509.827377]  [<ffffffffa0296ce9>] ovs_flow_free+0x39/0x70 [openvswitch]
> [8003509.827609]  [<ffffffffa0290d2c>] ovs_packet_cmd_execute+0x20c/0x230 [openvswitch]
> [8003509.827842]  [<ffffffff814e8d61>] genl_family_rcv_msg+0x221/0x390
> [8003509.828067]  [<ffffffff814e8ed0>] ? genl_family_rcv_msg+0x390/0x390
> [8003509.828288]  [<ffffffff814e8f2b>] genl_rcv_msg+0x5b/0xa0
> [8003509.828508]  [<ffffffff814e72b9>] netlink_rcv_skb+0x99/0xc0
> [8003509.828729]  [<ffffffff814e87d7>] genl_rcv+0x27/0x40
> [8003509.828949]  [<ffffffff814e634f>] netlink_unicast+0x10f/0x190
> [8003509.829170]  [<ffffffff814e7cf3>] netlink_sendmsg+0x2c3/0x750
> [8003509.829393]  [<ffffffff811666e0>] ? __pollwait+0xf0/0xf0
> [8003509.829615]  [<ffffffff814a620b>] sock_sendmsg+0x8b/0xb0
> [8003509.829837]  [<ffffffff811666e0>] ? __pollwait+0xf0/0xf0
> [8003509.830063]  [<ffffffff8157583b>] ? _raw_spin_lock_bh+0x1b/0x40
> [8003509.830289]  [<ffffffff81575700>] ? _raw_spin_unlock_bh+0x10/0x20
> [8003509.830517]  [<ffffffff814b3091>] ? verify_iovec+0x61/0xe0
> [8003509.830742]  [<ffffffff814a70d1>] ___sys_sendmsg+0x411/0x430
> [8003509.830971]  [<ffffffff81193ca9>] ? ep_scan_ready_list+0x189/0x1b0
> [8003509.831199]  [<ffffffff81193e2f>] ? ep_poll+0x13f/0x370
> [8003509.831426]  [<ffffffff81089c4b>] ? k_getrusage+0x13b/0x380
> [8003509.831649]  [<ffffffff814a72f4>] __sys_sendmsg+0x44/0x80
> [8003509.831876]  [<ffffffff814a7344>] SyS_sendmsg+0x14/0x20
> [8003509.832102]  [<ffffffff815763a2>] system_call_fastpath+0x16/0x1b
> [8003509.832327] Code: ce 4c 89 d7 e8 f0 fc ff ff eb d6 66 41 f7 01 00 c0 74 18 49 8b 01 31 f6 f6 c4 40 74 04 41 8b 71 68 4c 89 cf e8 f0 3e fc ff eb b6 <0f> 0b eb fe 66 2e 0f 1f 84 00 00 00 00 00 55 8b 05 79 33 ab 00
> [8003509.833151] RIP  [<ffffffff8113a9b2>] kfree+0xe2/0xf0
> [8003509.833380]  RSP <ffff883fcfc75938>
> [8003509.833972] ---[ end trace af1b9000d8cc8f36 ]---
>
> It goes without saying that when this occurs the machine loses connectivity on all interface which are hooked up in an openvswitch bridge and ultimately the machine crashes. I did some review of the surrounding code, particularly in the
>
> ovs_packet_cmd_execute->validate_and_copy_actions -> copy_action -> reserve_sfa_size -> ksize call chain and it is pretty clear that the sfa pointer is indeed allocated from the heap (it's the acts in ovs_packet_cmd_execute, which is allocated in ovs_flow_actions_alloc), yet the sanity check in the kernel fails to see if this is indeed a slab allocated memory. Same thing with the second bug splat. The flow is being allocated with ovs_flow_alloc() and by the time it has to be freed it is corrupted. I can see there are a lot of calculations happening with the netlink headers and packets and frankly this is making me a bit uneasy as it seems error prone.
>
> So does any of you have experienced similar corruptions?
>

I have not seen such bug report before. Is this reproducible?



More information about the discuss mailing list