[ovs-dev] [ovs-discuss] Patch for PACKET_OUT getting deleted twice crash

Ilya Maximets i.maximets at ovn.org
Mon Jun 7 13:59:15 UTC 2021


On 6/7/21 3:09 PM, Aaron Conole wrote:
> Ilya Maximets <i.maximets at ovn.org> writes:
> 
>>> Here is a patch with both a test and a fix.
> 
> Thanks so much!  It's nice to get fixes, but I think it's really great
> when test cases come along with them.
> 
>> Hi.  Thanks for working n this!
>>
>> CC: ovs-dev
>>
>>> Not submitting as a formal
>>> patch because I would like some feedback on whether 1) maintainers feel
>>> this is worth fixing and
>>
>> I can reproduce the crash with your test.  Basically, actions in userspace
>> datapath may drop packets if something goes wrong.  'meter' action just
>> seems to be the most explicit variant.  So, I think, this is definitely
>> worth fixing as some other condition might trigger this crash on packet-out
>> as well.
>>
>> ==2568112==ERROR: AddressSanitizer: heap-use-after-free
>> on address 0x61600000699c at pc 0x000000573860 bp 0x7ffebc6cc880 sp 0x7ffebc6cc878
>> READ of size 1 at 0x61600000699c thread T0
>>     #0 0x57385f in dp_packet_delete lib/dp-packet.h:242:16
>>     #1 0x57372c in ofproto_packet_out_uninit ofproto/ofproto.c:3562:5
>>     #2 0x585e77 in handle_packet_out ofproto/ofproto.c:3722:5
>>     #3 0x583801 in handle_single_part_openflow ofproto/ofproto.c:8499:16
>>     #4 0x570c9c in handle_openflow ofproto/ofproto.c:8686:21
>>     #5 0x611781 in ofconn_run ofproto/connmgr.c:1329:13
>>     #6 0x6112ed in connmgr_run ofproto/connmgr.c:356:9
>>     #7 0x56fdf4 in ofproto_run ofproto/ofproto.c:1891:5
>>     #8 0x545ec0 in bridge_run__ vswitchd/bridge.c:3251:9
>>     #9 0x5456a5 in bridge_run vswitchd/bridge.c:3310:5
>>     #10 0x55f5b1 in main vswitchd/ovs-vswitchd.c:127:9
>>     #11 0x7f85bfe09081 in __libc_start_main (/lib64/libc.so.6+0x27081)
>>     #12 0x46d00d in _start (vswitchd/ovs-vswitchd+0x46d00d)
>>
>>> 2) whether this is the way to fix it.
>>
>> See inline.
>>
>>>
>>> I have tried to make the most minimal change possible, but this means that
>>> there might be paths through the code that give unexpected behaviour (which
>>> in the worst case would be a memory leak I suppose).
>>>
>>> Tony
>>>
>>> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
>>> index 246be14d0..5e0dabe67 100644
>>> --- a/lib/dp-packet.h
>>> +++ b/lib/dp-packet.h
>>> @@ -739,6 +739,7 @@ struct dp_packet_batch {
>>>     size_t count;
>>>     bool trunc; /* true if the batch needs truncate. */
>>>     bool do_not_steal; /* Indicate that the packets should not be stolen.
>>> */
>>> +    bool packet_out; /* Indicate single packet is PACKET_OUT */
>>>     struct dp_packet *packets[NETDEV_MAX_BURST];
>>> };
>>>
>>> @@ -748,6 +749,7 @@ dp_packet_batch_init(struct dp_packet_batch *batch)
>>>     batch->count = 0;
>>>     batch->trunc = false;
>>>     batch->do_not_steal = false;
>>> +    batch->packet_out = false;
>>> }
>>>
>>> static inline void
>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>> index 650e67ab3..deba4a94a 100644
>>> --- a/lib/dpif-netdev.c
>>> +++ b/lib/dpif-netdev.c
>>> @@ -4170,6 +4170,7 @@ dpif_netdev_execute(struct dpif *dpif, struct
>>> dpif_execute *execute)
>>>
>>>     dp_packet_batch_init_packet(&pp, execute->packet);
>>>     pp.do_not_steal = true;
>>> +    pp.packet_out = execute->packet_out;
>>>     dp_netdev_execute_actions(pmd, &pp, false, execute->flow,
>>>                               execute->actions, execute->actions_len);
>>
>> There is already a dirty hack named "do_not_steal" that was introduced,
>> I guess, exactly to avoid crash in the conntrack code that could drop/steal
>> the packet just like meter action.  And it seems that here in
>> dpif_netdev_execute() is the only problematic entry point as all other
>> normal paths expects that packet might be destroyed.
>>
>> The problem was, I suppose, introduced when we tried to unify semantics
>> of "may_steal" flag by turning it into "should_steal".  But it seems that
>> in this function we really need to prohibit stealing of the packet since
>> ofproto layer still owns it regardless of the result of execution.
>>
>> I don't think that we need one more flag here, but we have several options
>> how to fix the crash:
>>
>> 1. Start honoring batch->do_not_steal flag in all actions that may result
>>    in packet drops.  As the original idea of having 'do_not_steal' flag for
>>    a batch is very hacky, I'd like to not do that.
> 
> +1
> 
>> 2. Try to propagate information that packet was deleted up to ofproto layer,
>>    i.e. make handle_packet_out() aware of that.  Will, probably, be not that
>>    easy to do.
> 
> I had a look at doing this, but as you note it is quite intrusive, and
> we need to make changes all over.
> 
>> 3. This function (dpif_netdev_execute) is not on a hot path in userspace
>>    datapath, IIUC.  It might be that it's just easier to remove the
>>    'do_not_steal' flag entirely, clone the packet here and call the
>>    dp_netdev_execute_actions() with should_steal=true.
>>    This sounds like the best solution for me, unless I overlooked some
>>    scenario, where this code is on a hot path.
> 
> I like this approach.
> 
>> Thoughts?
>>
>> Aaron, you have a patch[1] to remove 'do_not_steal' flag, so fix for this issue
>> will, likely, touch the same parts of the code.  What do you think about this
>> issue and possible solutions?
> 
> I guess we should do the same thing we do in other places, ie: default
> assume that the packet cannot be 'stolen' and we should clone our own
> copies.

I'm not sure that I understood this correctly, but the idea was that default
assumption is that packet can be stolen at any point of datapath processing
and higher layers should deal with this.

For the IP fragmentation handling this will mean that ipf will just take a
packet directly from the original batch without copying, so the original
batch will not have this packet anymore and ipf is allowed to free it at
any point in time, because now it owns this packet.
This aligns with the "should_steal" semantics, as any function called with
"should_steal=true" must take the ownership of the packet.  If the function
called with "should_steal=false" it still allowed to take ownership of some
packets from the batch and caller must be prepared for that.
If some function in datapath has no "should_steal" argument, it should be
treated as a function with "should_steal=false".   This applies to both
conntrack_execute() and dp_netdev_run_meter(), also to netdev_pop_header()
and so on.

> If we are worried about the time it takes to copy the dp_packet
> structure and buffer, we can always introduce a reference counting
> mechanism later as an optimization.
> 
> I would just prefer to do clone, and then the functional area which
> needs to hold a reference to a valid packet buffer can delete when it
> makes sense.
> 
> Hope it helps.
> 
>> Best regards, Ilya Maximets.
>>
>>
>> [1] https://patchwork.ozlabs.org/project/openvswitch/patch/20210521175905.165779-1-aconole@redhat.com/
>>
>> Non-line-wrapped version of the test for convenience:
>>
>> diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
>> index 31064ed95..d01f438b8 100644
>> --- a/tests/ofproto-dpif.at
>> +++ b/tests/ofproto-dpif.at
>> @@ -2159,6 +2159,27 @@ meter:controller flow_count:0 packet_in_count:8 byte_in_count:112 duration:0.0s
>>  OVS_VSWITCHD_STOP
>>  AT_CLEANUP
>>  
>> +AT_SETUP([ofproto-dpif packet-out table meter drop qwe])
>> +OVS_VSWITCHD_START
>> +add_of_ports br0 1 2
>> +
>> +AT_CHECK([ovs-ofctl -O OpenFlow13 add-meter br0 'meter=1 pktps bands=type=drop rate=1'])
>> +AT_CHECK([ovs-ofctl -O OpenFlow13 add-flow br0 'in_port=1 action=meter:1,output:2'])
>> +
>> +ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000400080000 actions=resubmit(,0)"
>> +ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000400080000 actions=resubmit(,0)"
>> +
>> +# Check that vswitchd hasn't crashed by dumping the meter added above
>> +AT_CHECK([ovs-ofctl -O OpenFlow13 dump-meters br0 | ofctl_strip], [0], [dnl
>> +OFPST_METER_CONFIG reply (OF1.3):
>> +meter=1 pktps bands=
>> +type=drop rate=1
>> +])
>> +
>> +OVS_VSWITCHD_STOP
>> +AT_CLEANUP
>> +
>> +
>>  AT_SETUP([ofproto-dpif - MPLS handling])
>>  OVS_VSWITCHD_START([dnl
>>     add-port br0 p1 -- set Interface p1 type=dummy
>> ---
> 



More information about the dev mailing list