[ovs-dev] kernel oops while doing OVN testing

Russell Bryant rbryant at redhat.com
Fri Apr 17 17:48:13 UTC 2015


I'm seeing a kernel oops while doing some OVN testing.

> [69503.759887] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
> [69503.759905] IP: [<ffffffffa0397915>] ovs_lookup_vport+0x5/0x60 [openvswitch]
> [69503.759915] PGD 11bc28067 PUD 139bc6067 PMD 0 
> [69503.759921] Oops: 0000 [#1] SMP 
> [69503.759926] Modules linked in: xt_nat xt_mark xt_REDIRECT nf_nat_redirect xt_CHECKSUM xt_comment openvswitch libcrc32c ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device iosf_mbi snd_pcm crct10dif_pclmul crc32_pclmul crc32c_intel ppdev ghash_clmulni_intel snd_timer parport_pc serio_raw virtio_console snd virtio_balloon pvpanic parport soundcore i2c_piix4 nfsd auth_rpcgss nfs_acl lockd
> [69503.760020]  grace sunrpc virtio_net virtio_blk qxl drm_kms_helper ttm drm virtio_pci virtio_ring virtio ata_generic pata_acpi
> [69503.760020] CPU: 0 PID: 18288 Comm: ovs-vswitchd Not tainted 3.19.1-201.fc21.x86_64 #1
> [69503.760020] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014
> [69503.760020] task: ffff8800bb734d80 ti: ffff880139cec000 task.ti: ffff880139cec000
> [69503.760020] RIP: 0010:[<ffffffffa0397915>]  [<ffffffffa0397915>] ovs_lookup_vport+0x5/0x60 [openvswitch]
> [69503.760020] RSP: 0018:ffff880139cef960  EFLAGS: 00010246
> [69503.760020] RAX: ffff88011bc34058 RBX: ffff88011bc34058 RCX: 0000000000000000
> [69503.760020] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> [69503.760731] RBP: ffff880139cef9d8 R08: 0000000000000008 R09: ffff88011bc3405c
> [69503.760731] R10: 0000000000004770 R11: 0000000000003e7c R12: ffff8800bb211f00
> [69503.760731] R13: ffff880036a67000 R14: 0000000000000000 R15: ffff88008eb7b310
> [69503.760731] FS:  00007fa06644fa40(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000
> [69503.760731] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [69503.760731] CR2: 0000000000000048 CR3: 000000012b64b000 CR4: 00000000000406f0
> [69503.760731] Stack:
> [69503.760731]  ffffffffa0398383 0000000200000002 000000000000000a 000000000000000d
> [69503.760731]  00000000000002dc 0000000000000432 0000000000000000 0000000000000000
> [69503.760731]  0000000000000000 0000000000000000 000000009d414cae ffff880036a67000
> [69503.760731] Call Trace:
> [69503.760731]  [<ffffffffa0398383>] ? ovs_vport_cmd_fill_info+0x53/0x1b0 [openvswitch]
> [69503.760731]  [<ffffffffa039859c>] ovs_vport_cmd_dump+0xbc/0x120 [openvswitch]
> [69503.760731]  [<ffffffff8168acfa>] netlink_dump+0x11a/0x2d0
> [69503.760731]  [<ffffffff8168b633>] __netlink_dump_start+0x193/0x1d0
> [69503.760731]  [<ffffffff8168e1d0>] ? genl_family_rcv_msg+0x3e0/0x3e0
> [69503.760731]  [<ffffffff8168e1ad>] genl_family_rcv_msg+0x3bd/0x3e0
> [69503.760731]  [<ffffffffa03984e0>] ? ovs_vport_cmd_fill_info+0x1b0/0x1b0 [openvswitch]
> [69503.760731]  [<ffffffff811fe5c9>] ? __kmalloc_node_track_caller+0x259/0x320
> [69503.760731]  [<ffffffff813aa616>] ? rhashtable_lookup_compare+0x36/0x70
> [69503.760731]  [<ffffffff8168e1d0>] ? genl_family_rcv_msg+0x3e0/0x3e0
> [69503.760731]  [<ffffffff8168e249>] genl_rcv_msg+0x79/0xc0
> [69503.760731]  [<ffffffff8168d6c9>] netlink_rcv_skb+0xb9/0xe0
> [69503.760731]  [<ffffffff8168dddc>] genl_rcv+0x2c/0x40
> [69503.760731]  [<ffffffff8168cddd>] netlink_unicast+0x12d/0x1c0
> [69503.760731]  [<ffffffff8168d197>] netlink_sendmsg+0x327/0x680
> [69503.760731]  [<ffffffff8163dc8c>] do_sock_sendmsg+0x9c/0x110
> [69503.760731]  [<ffffffff81063bca>] ? __do_page_fault+0x21a/0x5b0
> [69503.760731]  [<ffffffff81238175>] ? __fget_light+0x25/0x70
> [69503.760731]  [<ffffffff8163debb>] SYSC_sendto+0x12b/0x1d0
> [69503.760731]  [<ffffffff8163ed45>] ? __sys_recvmsg+0x85/0x90
> [69503.760731]  [<ffffffff8163e74e>] SyS_sendto+0xe/0x10
> [69503.760731]  [<ffffffff81774029>] system_call_fastpath+0x12/0x17
> [69503.760731] Code: 84 00 00 00 00 00 66 66 66 66 90 55 48 c7 c7 e0 50 3a a0 48 89 e5 e8 ab a4 3d e1 5d c3 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 <48> 8b 47 48 89 f2 81 e6 ff 03 00 00 55 48 8d 04 f0 48 89 e5 48 
> [69503.760731] RIP  [<ffffffffa0397915>] ovs_lookup_vport+0x5/0x60 [openvswitch]
> [69503.760731]  RSP <ffff880139cef960>
> [69503.760731] CR2: 0000000000000048
> [69504.248932] ---[ end trace ee300d6bca7ba796 ]---

This is the openvswitch module that came with the following Fedora
kernel: 3.19.1-201.fc21.x86_64

I can easily reproduce this.  It happens when running devstack multiple
times to stand up OpenStack + OVS + OVN + OpenStack Neutron OVN
integration.  So far, it seems to consistently break the 2nd time I run
devstack after a reboot.

The relevant devstack code is here:

http://git.openstack.org/cgit/stackforge/networking-ovn/tree/devstack/plugin.sh

In particular, take a look at init_ovn, install_ovn, start_ovn, and
stop_ovn.

When the VM gets in a bad state, ovs-vswitchd exits fairly quickly on
startup when the oops occurs.  If I follow it with gdb, I get:

> 482	        retval = send(sock->fd, msg->data, msg->size,
> (gdb) bt
> #0  nl_sock_send__ (sock=0x7e2390, msg=msg at entry=0x7c8090, nlmsg_seq=5, wait=wait at entry=true) at lib/netlink-socket.c:482
> #1  0x00000000004f8105 in nl_dump_start (dump=dump at entry=0x7dcf60, protocol=protocol at entry=16, request=request at entry=0x7c8090) at lib/netlink-socket.c:977
> #2  0x00000000004ed535 in dpif_netlink_port_dump_start__ (dump=dump at entry=0x7dcf60, dpif=<optimized out>) at lib/dpif-netlink.c:1081
> #3  0x00000000004ed590 in dpif_netlink_port_dump_start (dpif_=0x7e2a50, statep=0x7fffffffdc80) at lib/dpif-netlink.c:1092
> #4  0x000000000045b10d in dpif_port_dump_start (dump=dump at entry=0x7fffffffdc70, dpif=0x7e2a50) at lib/dpif.c:718
> #5  0x0000000000425884 in open_dpif_backer (type=0x7e1f20 "system", backerp=backerp at entry=0x7ee738) at ofproto/ofproto-dpif.c:956
> #6  0x000000000042a7b4 in construct (ofproto_=0x7ee4b0) at ofproto/ofproto-dpif.c:1241
> #7  0x000000000041cfd4 in ofproto_create (datapath_name=0x7c1460 "br-ex", datapath_type=<optimized out>, ofprotop=ofprotop at entry=0x7eb7a8) at ofproto/ofproto.c:535
> #8  0x000000000040dd9c in bridge_reconfigure (ovs_cfg=ovs_cfg at entry=0x7f4670) at vswitchd/bridge.c:629
> #9  0x000000000040edb3 in bridge_run () at vswitchd/bridge.c:2961
> #10 0x00000000004057cd in main (argc=2, argv=0x7fffffffe548) at vswitchd/ovs-vswitchd.c:116
> (gdb) next
> 
> Program terminated with signal SIGKILL, Killed.

-- 
Russell Bryant



More information about the dev mailing list