[ovs-dev] Null Pointer / Kernel Panic

Sean Swehla sean.swehla at gmail.com
Tue Feb 25 20:56:56 UTC 2014


Hello,

I'm currently hitting a null pointer dereference and kernel panic that
seems to be in ovs. The problem is sporadic. I have one production machine
that's hit it four times in the past 24hrs, and one lab machine that I
can't get to hit it at all.

We rebuilt openvswitch with debugging symbols turned on, and traced the
null pointer dereference to datapath/linux/flow.c:814 . Do you have any
advice on how to trace this back to a root cause (or, ideally, a fix) ?
I've scoured Google for related issues but come up short. (I'll happily
accept that my google-fu is lacking, though.)

I would greatly appreciate any guidance you could offer. Here's some more
information about my system, for context.

All nodes have the following versions:

root at node:~# uname -a
Linux node 3.2.0-58-generic #88-Ubuntu SMP Tue Dec 3 17:37:58 UTC 2013
x86_64 x86_64 x86_64 GNU/Linux
root at node:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 12.04.4 LTS
Release:        12.04
Codename:       precise
root at node:~# dpkg --list | grep openvswitch
ii  openvswitch-common               1.10.2-0ubuntu2~cloud0
     Open vSwitch common components
ii  openvswitch-datapath-dkms        1.10.2-0ubuntu2~cloud0
     Open vSwitch datapath module source - DKMS version
ii  openvswitch-switch               1.10.2-0ubuntu2~cloud0
     Open vSwitch switch implementations
root at node:~#

The stack trace from the console of a panic'd machine:

[259616.202845] Pid: 28568, comm: vhost-28567 Tainted: G        WC O
3.2.0-58-generic #88-Ubuntu    /0PXXHP$
[259616.213437] RIP: 0010:[<ffffffffa024ecb2>]  [<ffffffffa024ecb2>]
ovs_flow_tbl_lookup+0xb2/0x100 [openvswitch]$
[259616.224611] RSP: 0018:ffff88180f243cb8  EFLAGS: 00010282$
[259616.230630] RAX: 0000000000000020 RBX: ffff880c72c407c0 RCX:
ffffffffffffffe0$
[259616.238678] RDX: ffff88010a0ba678 RSI: 0000000000000004 RDI:
ffff8807aa5ac000$
[259616.246728] RBP: ffff88180f243cf8 R08: 000000000000002c R09:
000000003fc3955c$
[259616.254776] R10: 0000000000000001 R11: 0000000000000001 R12:
000000000c2aa5f8$
[259616.262825] R13: 0000000000000018 R14: ffff88180f243d48 R15:
000000000000002c$
[259616.270877] FS:  0000000000000000(0000) GS:ffff88180f240000(0000)
knlGS:0000000000000000$
[259616.279990] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b$
[259616.286490] CR2: 0000000000000010 CR3: 0000000924208000 CR4:
00000000000426e0$
[259616.294539] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000$
[259616.302588] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400$
[259616.310640] Process vhost-28567 (pid: 28568, threadinfo
ffff8805e20e6000, task ffff880bed3a0000)$
[259616.320527] Stack:$
[259616.322866]  ffff881206128c00 ffff880d00000044 ffff88180f243cf8
ffff880d2f61f100$
[259616.331256]  ffffe8ffffa42188 ffff8817f0faf2c0 ffff881206128c00
ffff88180f254454$
[259616.339649]  ffff88180f243dd8 ffffffffa024cf15 ffffffff8108f501
ffff88180f243d30$
[259616.348041] Call Trace:$
[259616.350863]  <IRQ> $
[259616.353321]  [<ffffffffa024cf15>]
ovs_dp_process_received_packet+0xc5/0x140 [openvswitch]$
[259616.362543]  [<ffffffff8108f501>] ? hrtimer_forward+0x51/0xd0$
[259616.369057]  [<ffffffffa025117c>] ovs_vport_receive+0x4c/0x50
[openvswitch]$
[259616.376926]  [<ffffffffa0252203>] netdev_frame_hook+0xa3/0xf0
[openvswitch]$
[259616.384795]  [<ffffffffa0252160>] ? netdev_create+0x110/0x110
[openvswitch]$
[259616.392660]  [<ffffffff81546c60>] __netif_receive_skb+0x1d0/0x560$
[259616.399558]  [<ffffffff81547411>] process_backlog+0xb1/0x190$
[259616.405973]  [<ffffffff81548734>] net_rx_action+0x134/0x290$
[259616.412288]  [<ffffffff8106fa08>] __do_softirq+0xa8/0x210$
[259616.418413]  [<ffffffff8166c62c>] call_softirq+0x1c/0x30$
[259616.424429]  <EOI> $
[259616.426883]  [<ffffffff810162f5>] do_softirq+0x65/0xa0$
[259616.432714]  [<ffffffff81548c08>] netif_rx_ni+0x28/0x30$
[259616.438643]  [<ffffffff8147d89b>] tun_get_user+0x2fb/0x4a0$
[259616.444863]  [<ffffffff8147da65>] tun_sendmsg+0x25/0x30$
[259616.450790]  [<ffffffffa040f9d6>] handle_tx+0x296/0x520 [vhost_net]$
[259616.457880]  [<ffffffffa040fc95>] handle_tx_kick+0x15/0x20 [vhost_net]$
[259616.465260]  [<ffffffffa040ce4d>] vhost_worker+0xdd/0x170 [vhost_net]$
[259616.472543]  [<ffffffffa040cd70>] ? vhost_set_memory+0x130/0x130
[vhost_net]$
[259616.480506]  [<ffffffff8108b63c>] kthread+0x8c/0xa0$
[259616.486048]  [<ffffffff8166c534>] kernel_thread_helper+0x4/0x10$
[259616.492752]  [<ffffffff8108b5b0>] ? flush_kthread_worker+0xa0/0xa0$
[259616.499747]  [<ffffffff8166c530>] ? gs_change+0x13/0x13$
[259616.505665] Code: 00 48 63 53 20 48 8d 42 01 48 c1 e0 04 48 01 c1 48 8b
01 48 85 c0 74 51 48 8b 09 48 c1 e2 04 48 83 c2 10 48 29 d1 48 85 c9 74 26
<44> 39 61 30 75 d0 4a 8d 7c 29 38 4c 89 fa 4c 89 f6 48 89 4d c8 $
[259616.527456] RIP  [<ffffffffa024ecb2>] ovs_flow_tbl_lookup+0xb2/0x100
[openvswitch]$
[259616.536016]  RSP <ffff88180f243cb8>$
[259616.540000] CR2: 0000000000000010$
[259616.544395] ---[ end trace 7cd7ddd24540f1d3 ]---$
[259616.549662] Kernel panic - not syncing: Fatal exception in interrupt$
[259616.556849] Pid: 28568, comm: vhost-28567 Tainted: G      D WC O
3.2.0-58-generic #88-Ubuntu$
[259616.566357] Call Trace:$
[259616.569183]  <IRQ>  [<ffffffff81649285>] panic+0x91/0x1a4$
[259616.575345]  [<ffffffff81662f5a>] oops_end+0xea/0xf0$
[259616.580994]  [<ffffffff8164812f>] no_context+0x150/0x15d$
[259616.587028]  [<ffffffff81648307>] __bad_area_nosemaphore+0x1cb/0x1ea$
[259616.594227]  [<ffffffff811645eb>] ? kfree+0x3b/0x140$
[259616.599873]  [<ffffffff810e234e>] ? rcu_irq_exit+0xe/0x10$
[259616.606009]  [<ffffffff81648339>] bad_area_nosemaphore+0x13/0x15$
[259616.612820]  [<ffffffff81665bab>] do_page_fault+0x46b/0x540$
[259616.619143]  [<ffffffff8153a455>] ? kfree_skb+0x45/0xc0$
[259616.625082]  [<ffffffff81571479>] ? netlink_attachskb+0x1d9/0x220$
[259616.631989]  [<ffffffff810608e0>] ? try_to_wake_up+0x200/0x200$
[259616.638608]  [<ffffffff816624f5>] page_fault+0x25/0x30$
[259616.644456]  [<ffffffffa024ecb2>] ? ovs_flow_tbl_lookup+0xb2/0x100
[openvswitch]$
[259616.652817]  [<ffffffffa024ec5a>] ? ovs_flow_tbl_lookup+0x5a/0x100
[openvswitch]$
[259616.661180]  [<ffffffffa024cf15>]
ovs_dp_process_received_packet+0xc5/0x140 [openvswitch]$
[259616.670410]  [<ffffffff8108f501>] ? hrtimer_forward+0x51/0xd0$
[259616.676936]  [<ffffffffa025117c>] ovs_vport_receive+0x4c/0x50
[openvswitch]$
[259616.684813]  [<ffffffffa0252203>] netdev_frame_hook+0xa3/0xf0
[openvswitch]$
[259616.692696]  [<ffffffffa0252160>] ? netdev_create+0x110/0x110
[openvswitch]$
[259616.700575]  [<ffffffff81546c60>] __netif_receive_skb+0x1d0/0x560$
[259616.707482]  [<ffffffff81547411>] process_backlog+0xb1/0x190$
[259616.713915]  [<ffffffff81548734>] net_rx_action+0x134/0x290$
[259616.720242]  [<ffffffff8106fa08>] __do_softirq+0xa8/0x210$
[259616.726384]  [<ffffffff8166c62c>] call_softirq+0x1c/0x30$
[259616.732410]  <EOI>  [<ffffffff810162f5>] do_softirq+0x65/0xa0$

-- 
/thor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-dev/attachments/20140225/9df04f21/attachment-0005.html>


More information about the dev mailing list