[ovs-discuss] Patch for PACKET_OUT getting deleted twice crash
Ilya Maximets
i.maximets at ovn.org
Fri Jun 4 17:34:48 UTC 2021
> Here is a patch with both a test and a fix.
Hi. Thanks for working n this!
CC: ovs-dev
> Not submitting as a formal
> patch because I would like some feedback on whether 1) maintainers feel
> this is worth fixing and
I can reproduce the crash with your test. Basically, actions in userspace
datapath may drop packets if something goes wrong. 'meter' action just
seems to be the most explicit variant. So, I think, this is definitely
worth fixing as some other condition might trigger this crash on packet-out
as well.
==2568112==ERROR: AddressSanitizer: heap-use-after-free
on address 0x61600000699c at pc 0x000000573860 bp 0x7ffebc6cc880 sp 0x7ffebc6cc878
READ of size 1 at 0x61600000699c thread T0
#0 0x57385f in dp_packet_delete lib/dp-packet.h:242:16
#1 0x57372c in ofproto_packet_out_uninit ofproto/ofproto.c:3562:5
#2 0x585e77 in handle_packet_out ofproto/ofproto.c:3722:5
#3 0x583801 in handle_single_part_openflow ofproto/ofproto.c:8499:16
#4 0x570c9c in handle_openflow ofproto/ofproto.c:8686:21
#5 0x611781 in ofconn_run ofproto/connmgr.c:1329:13
#6 0x6112ed in connmgr_run ofproto/connmgr.c:356:9
#7 0x56fdf4 in ofproto_run ofproto/ofproto.c:1891:5
#8 0x545ec0 in bridge_run__ vswitchd/bridge.c:3251:9
#9 0x5456a5 in bridge_run vswitchd/bridge.c:3310:5
#10 0x55f5b1 in main vswitchd/ovs-vswitchd.c:127:9
#11 0x7f85bfe09081 in __libc_start_main (/lib64/libc.so.6+0x27081)
#12 0x46d00d in _start (vswitchd/ovs-vswitchd+0x46d00d)
> 2) whether this is the way to fix it.
See inline.
>
> I have tried to make the most minimal change possible, but this means that
> there might be paths through the code that give unexpected behaviour (which
> in the worst case would be a memory leak I suppose).
>
> Tony
>
> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
> index 246be14d0..5e0dabe67 100644
> --- a/lib/dp-packet.h
> +++ b/lib/dp-packet.h
> @@ -739,6 +739,7 @@ struct dp_packet_batch {
> size_t count;
> bool trunc; /* true if the batch needs truncate. */
> bool do_not_steal; /* Indicate that the packets should not be stolen.
> */
> + bool packet_out; /* Indicate single packet is PACKET_OUT */
> struct dp_packet *packets[NETDEV_MAX_BURST];
> };
>
> @@ -748,6 +749,7 @@ dp_packet_batch_init(struct dp_packet_batch *batch)
> batch->count = 0;
> batch->trunc = false;
> batch->do_not_steal = false;
> + batch->packet_out = false;
> }
>
> static inline void
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 650e67ab3..deba4a94a 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -4170,6 +4170,7 @@ dpif_netdev_execute(struct dpif *dpif, struct
> dpif_execute *execute)
>
> dp_packet_batch_init_packet(&pp, execute->packet);
> pp.do_not_steal = true;
> + pp.packet_out = execute->packet_out;
> dp_netdev_execute_actions(pmd, &pp, false, execute->flow,
> execute->actions, execute->actions_len);
There is already a dirty hack named "do_not_steal" that was introduced,
I guess, exactly to avoid crash in the conntrack code that could drop/steal
the packet just like meter action. And it seems that here in
dpif_netdev_execute() is the only problematic entry point as all other
normal paths expects that packet might be destroyed.
The problem was, I suppose, introduced when we tried to unify semantics
of "may_steal" flag by turning it into "should_steal". But it seems that
in this function we really need to prohibit stealing of the packet since
ofproto layer still owns it regardless of the result of execution.
I don't think that we need one more flag here, but we have several options
how to fix the crash:
1. Start honoring batch->do_not_steal flag in all actions that may result
in packet drops. As the original idea of having 'do_not_steal' flag for
a batch is very hacky, I'd like to not do that.
2. Try to propagate information that packet was deleted up to ofproto layer,
i.e. make handle_packet_out() aware of that. Will, probably, be not that
easy to do.
3. This function (dpif_netdev_execute) is not on a hot path in userspace
datapath, IIUC. It might be that it's just easier to remove the
'do_not_steal' flag entirely, clone the packet here and call the
dp_netdev_execute_actions() with should_steal=true.
This sounds like the best solution for me, unless I overlooked some
scenario, where this code is on a hot path.
Thoughts?
Aaron, you have a patch[1] to remove 'do_not_steal' flag, so fix for this issue
will, likely, touch the same parts of the code. What do you think about this
issue and possible solutions?
Best regards, Ilya Maximets.
[1] https://patchwork.ozlabs.org/project/openvswitch/patch/20210521175905.165779-1-aconole@redhat.com/
Non-line-wrapped version of the test for convenience:
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index 31064ed95..d01f438b8 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -2159,6 +2159,27 @@ meter:controller flow_count:0 packet_in_count:8 byte_in_count:112 duration:0.0s
OVS_VSWITCHD_STOP
AT_CLEANUP
+AT_SETUP([ofproto-dpif packet-out table meter drop qwe])
+OVS_VSWITCHD_START
+add_of_ports br0 1 2
+
+AT_CHECK([ovs-ofctl -O OpenFlow13 add-meter br0 'meter=1 pktps bands=type=drop rate=1'])
+AT_CHECK([ovs-ofctl -O OpenFlow13 add-flow br0 'in_port=1 action=meter:1,output:2'])
+
+ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000400080000 actions=resubmit(,0)"
+ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000400080000 actions=resubmit(,0)"
+
+# Check that vswitchd hasn't crashed by dumping the meter added above
+AT_CHECK([ovs-ofctl -O OpenFlow13 dump-meters br0 | ofctl_strip], [0], [dnl
+OFPST_METER_CONFIG reply (OF1.3):
+meter=1 pktps bands=
+type=drop rate=1
+])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
+
AT_SETUP([ofproto-dpif - MPLS handling])
OVS_VSWITCHD_START([dnl
add-port br0 p1 -- set Interface p1 type=dummy
---
More information about the discuss
mailing list