[ovs-dev] kernel oops while doing OVN testing
Russell Bryant
rbryant at redhat.com
Fri Apr 17 17:48:13 UTC 2015
I'm seeing a kernel oops while doing some OVN testing.
> [69503.759887] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
> [69503.759905] IP: [<ffffffffa0397915>] ovs_lookup_vport+0x5/0x60 [openvswitch]
> [69503.759915] PGD 11bc28067 PUD 139bc6067 PMD 0
> [69503.759921] Oops: 0000 [#1] SMP
> [69503.759926] Modules linked in: xt_nat xt_mark xt_REDIRECT nf_nat_redirect xt_CHECKSUM xt_comment openvswitch libcrc32c ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device iosf_mbi snd_pcm crct10dif_pclmul crc32_pclmul crc32c_intel ppdev ghash_clmulni_intel snd_timer parport_pc serio_raw virtio_console snd virtio_balloon pvpanic parport soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd
> [69503.760020] grace sunrpc virtio_net virtio_blk qxl drm_kms_helper ttm drm virtio_pci virtio_ring virtio ata_generic pata_acpi
> [69503.760020] CPU: 0 PID: 18288 Comm: ovs-vswitchd Not tainted 3.19.1-201.fc21.x86_64 #1
> [69503.760020] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014
> [69503.760020] task: ffff8800bb734d80 ti: ffff880139cec000 task.ti: ffff880139cec000
> [69503.760020] RIP: 0010:[<ffffffffa0397915>] [<ffffffffa0397915>] ovs_lookup_vport+0x5/0x60 [openvswitch]
> [69503.760020] RSP: 0018:ffff880139cef960 EFLAGS: 00010246
> [69503.760020] RAX: ffff88011bc34058 RBX: ffff88011bc34058 RCX: 0000000000000000
> [69503.760020] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> [69503.760731] RBP: ffff880139cef9d8 R08: 0000000000000008 R09: ffff88011bc3405c
> [69503.760731] R10: 0000000000004770 R11: 0000000000003e7c R12: ffff8800bb211f00
> [69503.760731] R13: ffff880036a67000 R14: 0000000000000000 R15: ffff88008eb7b310
> [69503.760731] FS: 00007fa06644fa40(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000
> [69503.760731] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [69503.760731] CR2: 0000000000000048 CR3: 000000012b64b000 CR4: 00000000000406f0
> [69503.760731] Stack:
> [69503.760731] ffffffffa0398383 0000000200000002 000000000000000a 000000000000000d
> [69503.760731] 00000000000002dc 0000000000000432 0000000000000000 0000000000000000
> [69503.760731] 0000000000000000 0000000000000000 000000009d414cae ffff880036a67000
> [69503.760731] Call Trace:
> [69503.760731] [<ffffffffa0398383>] ? ovs_vport_cmd_fill_info+0x53/0x1b0 [openvswitch]
> [69503.760731] [<ffffffffa039859c>] ovs_vport_cmd_dump+0xbc/0x120 [openvswitch]
> [69503.760731] [<ffffffff8168acfa>] netlink_dump+0x11a/0x2d0
> [69503.760731] [<ffffffff8168b633>] __netlink_dump_start+0x193/0x1d0
> [69503.760731] [<ffffffff8168e1d0>] ? genl_family_rcv_msg+0x3e0/0x3e0
> [69503.760731] [<ffffffff8168e1ad>] genl_family_rcv_msg+0x3bd/0x3e0
> [69503.760731] [<ffffffffa03984e0>] ? ovs_vport_cmd_fill_info+0x1b0/0x1b0 [openvswitch]
> [69503.760731] [<ffffffff811fe5c9>] ? __kmalloc_node_track_caller+0x259/0x320
> [69503.760731] [<ffffffff813aa616>] ? rhashtable_lookup_compare+0x36/0x70
> [69503.760731] [<ffffffff8168e1d0>] ? genl_family_rcv_msg+0x3e0/0x3e0
> [69503.760731] [<ffffffff8168e249>] genl_rcv_msg+0x79/0xc0
> [69503.760731] [<ffffffff8168d6c9>] netlink_rcv_skb+0xb9/0xe0
> [69503.760731] [<ffffffff8168dddc>] genl_rcv+0x2c/0x40
> [69503.760731] [<ffffffff8168cddd>] netlink_unicast+0x12d/0x1c0
> [69503.760731] [<ffffffff8168d197>] netlink_sendmsg+0x327/0x680
> [69503.760731] [<ffffffff8163dc8c>] do_sock_sendmsg+0x9c/0x110
> [69503.760731] [<ffffffff81063bca>] ? __do_page_fault+0x21a/0x5b0
> [69503.760731] [<ffffffff81238175>] ? __fget_light+0x25/0x70
> [69503.760731] [<ffffffff8163debb>] SYSC_sendto+0x12b/0x1d0
> [69503.760731] [<ffffffff8163ed45>] ? __sys_recvmsg+0x85/0x90
> [69503.760731] [<ffffffff8163e74e>] SyS_sendto+0xe/0x10
> [69503.760731] [<ffffffff81774029>] system_call_fastpath+0x12/0x17
> [69503.760731] Code: 84 00 00 00 00 00 66 66 66 66 90 55 48 c7 c7 e0 50 3a a0 48 89 e5 e8 ab a4 3d e1 5d c3 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 <48> 8b 47 48 89 f2 81 e6 ff 03 00 00 55 48 8d 04 f0 48 89 e5 48
> [69503.760731] RIP [<ffffffffa0397915>] ovs_lookup_vport+0x5/0x60 [openvswitch]
> [69503.760731] RSP <ffff880139cef960>
> [69503.760731] CR2: 0000000000000048
> [69504.248932] ---[ end trace ee300d6bca7ba796 ]---
This is the openvswitch module that came with the following Fedora
kernel: 3.19.1-201.fc21.x86_64
I can easily reproduce this. It happens when running devstack multiple
times to stand up OpenStack + OVS + OVN + OpenStack Neutron OVN
integration. So far, it seems to consistently break the 2nd time I run
devstack after a reboot.
The relevant devstack code is here:
http://git.openstack.org/cgit/stackforge/networking-ovn/tree/devstack/plugin.sh
In particular, take a look at init_ovn, install_ovn, start_ovn, and
stop_ovn.
When the VM gets in a bad state, ovs-vswitchd exits fairly quickly on
startup when the oops occurs. If I follow it with gdb, I get:
> 482 retval = send(sock->fd, msg->data, msg->size,
> (gdb) bt
> #0 nl_sock_send__ (sock=0x7e2390, msg=msg at entry=0x7c8090, nlmsg_seq=5, wait=wait at entry=true) at lib/netlink-socket.c:482
> #1 0x00000000004f8105 in nl_dump_start (dump=dump at entry=0x7dcf60, protocol=protocol at entry=16, request=request at entry=0x7c8090) at lib/netlink-socket.c:977
> #2 0x00000000004ed535 in dpif_netlink_port_dump_start__ (dump=dump at entry=0x7dcf60, dpif=<optimized out>) at lib/dpif-netlink.c:1081
> #3 0x00000000004ed590 in dpif_netlink_port_dump_start (dpif_=0x7e2a50, statep=0x7fffffffdc80) at lib/dpif-netlink.c:1092
> #4 0x000000000045b10d in dpif_port_dump_start (dump=dump at entry=0x7fffffffdc70, dpif=0x7e2a50) at lib/dpif.c:718
> #5 0x0000000000425884 in open_dpif_backer (type=0x7e1f20 "system", backerp=backerp at entry=0x7ee738) at ofproto/ofproto-dpif.c:956
> #6 0x000000000042a7b4 in construct (ofproto_=0x7ee4b0) at ofproto/ofproto-dpif.c:1241
> #7 0x000000000041cfd4 in ofproto_create (datapath_name=0x7c1460 "br-ex", datapath_type=<optimized out>, ofprotop=ofprotop at entry=0x7eb7a8) at ofproto/ofproto.c:535
> #8 0x000000000040dd9c in bridge_reconfigure (ovs_cfg=ovs_cfg at entry=0x7f4670) at vswitchd/bridge.c:629
> #9 0x000000000040edb3 in bridge_run () at vswitchd/bridge.c:2961
> #10 0x00000000004057cd in main (argc=2, argv=0x7fffffffe548) at vswitchd/ovs-vswitchd.c:116
> (gdb) next
>
> Program terminated with signal SIGKILL, Killed.
--
Russell Bryant
More information about the dev
mailing list