[ovs-dev] [PATCH v1] Fix crash due to multiple tnl push action

Ilya Maximets i.maximets at samsung.com
Mon May 14 12:57:55 UTC 2018


Hello Anju, Ben,

Looks like I fixed a leak reported here in my recent patch:
  https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/346892.html

Could you, please, take a look at it?

I actually managed to reproduce the packet leak on tunnel-push-pop
unit tests with valgrind:

==6445== 2,912 (2,208 direct, 704 indirect) bytes in 4 blocks are definitely lost in loss record 400 of 410
==6445==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6445==    by 0x516314: xmalloc (util.c:120)
==6445==    by 0x466154: dp_packet_new (dp-packet.c:138)
==6445==    by 0x466154: dp_packet_new_with_headroom (dp-packet.c:148)
==6445==    by 0x4F6CFE: eth_from_hex (packets.c:498)
==6445==    by 0x48EB43: eth_from_packet (netdev-dummy.c:1450)
==6445==    by 0x48EB43: netdev_dummy_receive (netdev-dummy.c:1562)
==6445==    by 0x515980: process_command (unixctl.c:313)
==6445==    by 0x515980: run_connection (unixctl.c:347)
==6445==    by 0x515980: unixctl_server_run (unixctl.c:398)
==6445==    by 0x406F1E: main (ovs-vswitchd.c:120)

Above patch fixes it.

Best regards, Ilya Maximets.

> Hi Ben,
> 
> I will work on these changes as well.
> 
> Regards
> Anju
> 
> -----Original Message-----
> From: Ben Pfaff [mailto:blp at ovn.org] 
> Sent: Friday, May 11, 2018 2:00 AM
> To: Anju Thomas <anju.thomas at ericsson.com>
> Cc: dev at openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v1] Fix crash due to multiple tnl push action
> 
> On Thu, May 10, 2018 at 08:51:03AM +0000, Anju Thomas wrote:
>> It looks like your commit message describes at least two other bugs in 
>> OVS, though.  First, if OVS crashes when it pushes tunnel headers, 
>> even if there's not enough headroom, that's really bad.  At worst, it 
>> should drop the packet.  Do you know where the crash occurs?  We 
>> should fix the problem, since it might recur in some other context.
>> 
>> [Anju] OVS was actually crashing in the dpdk datapath. The crash is a manual assert in our case.
>> The rootcause is that dp receives the actions after the upcall (say with >=3 tunnel pushes ) . Now as part of action processing , since it is a tunnel push action , we try to find space in the dpdk mbuf packet headroom (which Is 128 bytes). By the time we try to process the third tunnel push , there is no headroom left since each tunnel header is of 50 bytes (50 *3 > 128 bytes headroom). Hence it manually asserts .  This assert is to catch any unexpected code flow . Do you think that in this case we should still go ahead and  prevent the crash ? 
> 
> I don't understand why it's OK to crash in this case.  Why do you think so?
> 
>> Second, it's a little shocking to hear that an encapsulation action without a following output action causes a memory leak.  We also need to fix that.  Do you have any more details?
>> [Anju] Now as explained above, the crash happens because we run out of headroom. But in case we have say 2 or less than 2 tunnel pushes we will have a mem leak as packet is never freed because the tnl push is the dp last action and there is no other output action or any other action like recirc that may translate to output action in the end leading  to packet buffer not being freed.
>> Are you proposing that we have some sort of preventive fix in the dp to handle an incorrect action list from the upcall handling? 
> 
> Yes.  It's unacceptable to leak memory because there's a packet modification without an output action.  The kernel datapath never does this, for example.



More information about the dev mailing list