[ovs-discuss] udpif_revalidator: Failed ofpbuf_resize__ for stack-allocated transaction buffer during dpif_flow_get

Trent Lloyd trent.lloyd at canonical.com
Thu Feb 18 07:57:05 UTC 2021


Hi All,

I recently ran into a crash due to assertion in a live environment (an
openstack neutron-openvswitch environment). While this was using an old
version of 2.9.2-0ubuntu0.18.04.3 from Ubuntu 18.04 (Bionic) my analysis
suggests that the issue could still exist in master but was hoping for some
assistance in understanding how to reproduce or understand the specific
situation/netlink packet that led to the crash.

The general issue appears to be that the udpif_revaliditator thread tried
to expand a stack-allocated ofpbuf to fit a netlink reply with size 3204
but the buffer is of size 2048. This intentionally raises an assertion as
we can't expand the memory on the stack. I've included the full backtrace
text at the end of the e-mail. The relevant source tree can also be found
here:
git clone -b applied/2.9.2-0ubuntu0.18.04.3
https://git.launchpad.net/ubuntu/+source/openvswitch
https://git.launchpad.net/ubuntu/+source/openvswitch/tree/?h=applied/2.9.2-0ubuntu0.18.04.3

The crash in __ofpbuf_resize__ appears due to OVS_NOT_REACHED() being
called because b->source = OFPBUF_STACK (the line number indicates it's the
default: case but this appears to be an optimiser quirk, b->source is
OFPBUF_STACK). We can't realloc() the buffer memory if it's allocated on
the stack.

This buffer is provided in #7 nl_sock_transact_multiple__ during the call
to nl_sock_recv__, specified as buf_txn->reply. In this specific case it
seems we found transactions[0] available and so we used that rather than
tmp_txn.
The original source of transactions (it's passed through most of the
function calls) appears to be op_auxdata allocated on the stack at the top
of the dpif_netlink_operate__ function (dpif-netlink.c:1875).

The size of this particular message was 3204, so 2048 went into the buffer
and 1156 went into the tail iovector setup inside nl_sock_recv__ which it
then tried to expand the ofpbuf to hold. Various nl_sock_* functions have
comments about the buffer ideally being the right size for optimal
performance (I guess to avoid the reallocation), but it seems like a
possible oversight in the dpif_netlink_operate__ workflow that the
nl_sock_* functions may ultimately want to try to expand that buffer and
then fail because of the stack allocation.

What I am having difficulty figuring out from the core file, and where I
was hoping for some help, is the actual content of this particular netlink
message to understand why it was larger than 2048 bytes and may also give a
hint as to how to reproduce the issue and/or understand the justification
for needing to increase its size or similar. We only hit this issue once
that I know of and doesn't seem to recur easily at least. I also had a hunt
and couldn't find any obvious commits/changes touching these areas since
that version.

I had trouble trying to decode the netlink message, the dpif_op structure,
etc to figure out it's actual contents including with the GDB helpers. I
suspect I am missing something obvious on how to do that, and hoping
someone may have a suggestion on how I can do that and would appreciate any
input there or if the issue seems obvious otherwise from my description I
am all ears :)

Backtrace below.

Regards,
Trent

Thread 1 (Thread 0x7f3e0ffff700 (LWP 1539131)):
#0  0x00007f3ed30c8428 in __GI_raise (sig=sig at entry=6) at
../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007f3ed30ca02a in __GI_abort () at abort.c:89
#2  0x00000000004e5035 in ofpbuf_resize__ (b=b at entry=0x7f3e0fffb050,
new_headroom=<optimized out>, new_tailroom=new_tailroom at entry=1156) at
../lib/ofpbuf.c:262
#3  0x00000000004e5338 in ofpbuf_prealloc_tailroom (b=b at entry=0x7f3e0fffb050,
size=size at entry=1156) at ../lib/ofpbuf.c:291
#4  0x00000000004e54e5 in ofpbuf_put_uninit (size=size at entry=1156,
b=b at entry=0x7f3e0fffb050)
at ../lib/ofpbuf.c:365
#5  ofpbuf_put (b=b at entry=0x7f3e0fffb050, p=p at entry=0x7f3e0ffcf0a0,
size=size at entry=1156) at ../lib/ofpbuf.c:388
#6  0x00000000005392a6 in nl_sock_recv__ (sock=sock at entry=0x7f3e50009150,
buf=0x7f3e0fffb050, wait=wait at entry=false) at ../lib/netlink-socket.c:705
#7  0x0000000000539474 in nl_sock_transact_multiple__
(sock=sock at entry=0x7f3e50009150,
transactions=transactions at entry=0x7f3e0ffdff20, n=1,
done=done at entry=0x7f3e0ffdfe10)
at ../lib/netlink-socket.c:824
#8  0x000000000053980a in nl_sock_transact_multiple (sock=0x7f3e50009150,
transactions=transactions at entry=0x7f3e0ffdff20, n=n at entry=1) at
../lib/netlink-socket.c:1009
#9  0x000000000053aa1b in nl_sock_transact_multiple (n=1,
transactions=0x7f3e0ffdff20, sock=<optimized out>) at
../lib/netlink-socket.c:1765
#10 nl_transact_multiple (protocol=protocol at entry=16,
transactions=transactions at entry=0x7f3e0ffdff20, n=n at entry=1) at
../lib/netlink-socket.c:1764
#11 0x0000000000528b01 in dpif_netlink_operate__ (dpif=dpif at entry=0x25a6150,
ops=ops at entry=0x7f3e0fffaf28, n_ops=n_ops at entry=1) at
../lib/dpif-netlink.c:1964
#12 0x0000000000529956 in dpif_netlink_operate_chunks (n_ops=1,
ops=0x7f3e0fffaf28, dpif=<optimized out>) at ../lib/dpif-netlink.c:2243
#13 dpif_netlink_operate (dpif_=0x25a6150, ops=<optimized out>,
n_ops=<optimized out>) at ../lib/dpif-netlink.c:2279
#14 0x00000000004756de in dpif_operate (dpif=0x25a6150, ops=<optimized
out>, ops at entry=0x7f3e0fffaf28, n_ops=n_ops at entry=1) at ../lib/dpif.c:1359
#15 0x00000000004758e7 in dpif_flow_get (dpif=<optimized out>,
key=<optimized out>, key_len=<optimized out>, ufid=<optimized out>,
pmd_id=<optimized out>, buf=buf at entry=0x7f3e0fffb050, flow=<optimized out>)
at ../lib/dpif.c:1014
#16 0x000000000043f662 in ukey_create_from_dpif_flow (udpif=0x229cbf0,
udpif=0x229cbf0, ukey=<synthetic pointer>, flow=0x7f3e0fffc790) at
../ofproto/ofproto-dpif-upcall.c:1709
#17 ukey_acquire (error=<synthetic pointer>, result=<synthetic pointer>,
flow=0x7f3e0fffc790, udpif=0x229cbf0) at
../ofproto/ofproto-dpif-upcall.c:1914
#18 revalidate (revalidator=0x250eaa8) at
../ofproto/ofproto-dpif-upcall.c:2473
#19 0x000000000043f816 in udpif_revalidator (arg=0x250eaa8) at
../ofproto/ofproto-dpif-upcall.c:913
#20 0x00000000004ea4b4 in ovsthread_wrapper (aux_=<optimized out>) at
../lib/ovs-thread.c:348
#21 0x00007f3ed39756ba in start_thread (arg=0x7f3e0ffff700) at
pthread_create.c:333
#22 0x00007f3ed319a41d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20210218/55d0734e/attachment.html>


More information about the discuss mailing list