[ovs-discuss] Null pointer when allocating OVS flow in ovs_packet_cmd_execute

Daryl Wang darylywang at google.com
Fri May 29 21:26:49 UTC 2020


On Fri, May 29, 2020 at 8:04 AM Aaron Conole <aconole at redhat.com> wrote:

> Daryl Wang via discuss <ovs-discuss at openvswitch.org> writes:
>
> > We noticed that an OVS datapath stopped responding to stats request. On
> checking dmesg, we found that OVS
> > ran into a null pointer in ovs_flow_alloc while in
> ovs_packet_cmd_execute. Is this a known bug?
>
> Not as far as I am aware.
>
> > We have not seen the failure again, so we don't have a good sense of
> what triggers the error. Logs didn't record
> > any noteworthy changes to the datapath around the time of failure.
> > Open vSwitch version is 2.11.2. Unfortunately we did not have kernel
> debugging enabled at the time of the
> > crash:
> >
> > May 15 09:23:14 hostname kernel: BUG: kernel NULL pointer dereference,
> address: 0000000000000000
> > May 15 09:23:14 hostname kernel: #PF: supervisor read access in kernel
> mode
> > May 15 09:23:14 hostname kernel: #PF: error_code(0x0000) - not-present
> page
> > May 15 09:23:14 hostname kernel: PGD 0 P4D 0
> > May 15 09:23:14 hostname kernel: Oops: 0000 [#1] SMP PTI
> > May 15 09:23:14 hostname kernel: CPU: 8 PID: 158558 Comm: handler91
> Tainted: G           O
> > 5.2.17-1rodete3-amd64 #1 Debian 5.2.17-1rodete3
> > May 15 09:23:14 hostname kernel: Hardware name: "Hardware information"
> > May 15 09:23:14 hostname kernel: RIP:
> 0010:kmem_cache_alloc_node+0x7e/0x1f0
>
> Note that this is the kmem_cache infra that gets either the flow or
> flow_stats object during allocation.
>
> How easily can you reproduce this?  It looks like something broke that
> cache object - was something unloading / reloading the ovs module during
> this time?  Just wondering how to reproduce it.
>
>
Reproduction has been a major problem unfortunately. It's only hit once and
nothing seemed to be happening with the datapath at the time of failure, so
we haven't been able to determine a trigger. The controller didn't log
anything unusual until after datapath stopped responding, and the flow
additions and deletions in the ovs-vswitchd.log look the same as normal.

Logs from the controller don't seem to indicate any tampering with ovs.
We'll try to check other logs to see if we can find anything that was
tampering with the ovs module.


> > May 15 09:23:14 hostname kernel: Code: 75 01 00 00 4d 8b 07 65 49 8b 50
> 08 65 4c 03 05 70 0e fc 74 4d
> > 8b 30 4d 85 f6 74 1a 41 83 fc ff 0f 84 83 00 00 00 49 8b 40 10 <48>
> >  8b 00 48 c1 e8 3a 41 39 c4 74 73 48 8b 0c 24 44 89 e2 89 ee 4c
> > May 15 09:23:14 hostname kernel: RSP: 0018:ffffbc910e0afa68 EFLAGS:
> 00010213
> > May 15 09:23:14 hostname kernel: RAX: 0000000000000000 RBX:
> 0000000000000000 RCX:
> > 0000000000000000
> > May 15 09:23:14 hostname kernel: RDX: 0000000000043ce7 RSI:
> 0000000000000dc0 RDI: ffff9a7f80126280
> > May 15 09:23:14 hostname kernel: RBP: 0000000000000dc0 R08:
> ffffdc90fc317330 R09: ffff9a7b06225b60
> > May 15 09:23:14 hostname kernel: R10: 0000000000000003 R11:
> ffffffffc0c5e510 R12: 0000000000000000
> > May 15 09:23:14 hostname kernel: R13: ffff9a7f80126280 R14:
> ffff9a7f64dbada0 R15: ffff9a7f80126280
> > May 15 09:23:14 hostname kernel: FS:  00007fdb877fe700(0000)
> GS:ffff9a803f100000(0000)
> > knlGS:0000000000000000
> > May 15 09:23:14 hostname kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
> 0000000080050033
> > May 15 09:23:14 hostname kernel: CR2: 0000000000000000 CR3:
> 0000000ed7d46004 CR4:
> > 00000000003606e0
> > May 15 09:23:14 hostname kernel: DR0: 0000000000000000 DR1:
> 0000000000000000 DR2:
> > 0000000000000000
> > May 15 09:23:14 hostname kernel: DR3: 0000000000000000 DR6:
> 00000000fffe0ff0 DR7:
> > 0000000000000400
> > May 15 09:23:14 hostname kernel: Call Trace:
> > May 15 09:23:14 hostname kernel: ? ovs_flow_alloc+0x4d/0x90 [openvswitch]
> > May 15 09:23:14 hostname kernel: ovs_flow_alloc+0x4d/0x90 [openvswitch]
> > May 15 09:23:14 hostname kernel: ovs_packet_cmd_execute+0xd0/0x2a0
> [openvswitch]
> > May 15 09:23:14 hostname kernel: ? _cond_resched+0x15/0x30
> > May 15 09:23:14 hostname kernel: genl_family_rcv_msg+0x1d2/0x410
> > May 15 09:23:14 hostname kernel: ? recalibrate_cpu_khz+0x10/0x10
> > May 15 09:23:14 hostname kernel: ? ktime_get_raw_ts64+0x32/0xc0
> > May 15 09:23:14 hostname kernel: genl_rcv_msg+0x47/0x90
> > May 15 09:23:14 hostname kernel: ?
> __kmalloc_node_track_caller+0x1cb/0x290
> > May 15 09:23:14 hostname kernel: ? genl_family_rcv_msg+0x410/0x410
> > May 15 09:23:14 hostname kernel: netlink_rcv_skb+0x49/0x110
> > May 15 09:23:14 hostname kernel: genl_rcv+0x24/0x40
> > May 15 09:23:15 hostname kernel: netlink_unicast+0x17e/0x200
> > May 15 09:23:15 hostname kernel: netlink_sendmsg+0x204/0x3d0
> > May 15 09:23:15 hostname kernel: sock_sendmsg+0x4c/0x50
> > May 15 09:23:15 hostname kernel: ___sys_sendmsg+0x29f/0x300
> > May 15 09:23:15 hostname kernel: ? ep_send_events_proc+0xf7/0x250
> > May 15 09:23:15 hostname kernel: ? ep_read_events_proc+0xe0/0xe0
> > May 15 09:23:15 hostname kernel: ?
> ep_scan_ready_list.constprop.21+0x1fe/0x230
> > May 15 09:23:15 hostname kernel: __sys_sendmsg+0x57/0xa0
> > May 15 09:23:15 hostname kernel: do_syscall_64+0x53/0x130
> > May 15 09:23:15 hostname kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > May 15 09:23:15 hostname kernel: RIP: 0033:0x7fdd0141f22d
> > May 15 09:23:15 hostname kernel: Code: 28 89 54 24 1c 48 89 74 24 10 89
> 7c 24 08 e8 0a ed ff ff 8b 54 24
> > 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00
> f0 ff ff 77 2f 44 89 c7 48 89 44 24
> > 08 e8 3e ed ff ff 48
> > May 15 09:23:15 hostname kernel: RSP: 002b:00007fdb8779f100 EFLAGS:
> 00000293 ORIG_RAX:
> > 000000000000002e
> > May 15 09:23:15 hostname kernel: RAX: ffffffffffffffda RBX:
> 00007fdb8779ff80 RCX: 00007fdd0141f22d
> > May 15 09:23:15 hostname kernel: RDX: 0000000000000000 RSI:
> 00007fdb8779f190 RDI:
> > 0000000000000015
> >
> >
> >
> > _______________________________________________
> > discuss mailing list
> > discuss at openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20200529/09297fa5/attachment-0002.html>


More information about the discuss mailing list