[ovs-dev] [PATCH v1] ofproto: Fix OVS crash when packets hit Openflow rules with certain combinations of nested actions

Anil Kumar Koli anilkumar.k at altencalsoftlabs.com
Fri Sep 13 02:50:19 UTC 2019


Hi Ben,

Please consider this as a gentle remainder and provide your inputs on the 
below issue.

Thanks & Regards,
Anil Kumar

-----Original Message-----
From: Anil Kumar Koli <anilkumar.k at altencalsoftlabs.com>
Sent: Thursday, 5 September, 2019 11:12 PM
To: 'Ben Pfaff' <blp at ovn.org>
Cc: 'Ilya Maximets' <i.maximets at samsung.com>; 'ovs-dev at openvswitch.org' 
<ovs-dev at openvswitch.org>
Subject: RE: [ovs-dev] [PATCH v1] ofproto: Fix OVS crash when packets hit 
Openflow rules with certain combinations of nested actions

Hi Ben,

Please consider this as a gentle remainder and provide your inputs on the
below issue.
This issue has been discussed earlier in the following thread.
https://mail.openvswitch.org/pipermail/ovs-discuss/2019-April/048494.html

Best Regards,
Anil Kumar.
-----Original Message-----
From: Anil Kumar Koli <anilkumar.k at altencalsoftlabs.com>
Sent: Tuesday, 3 September, 2019 11:35 AM
To: 'Ilya Maximets' <i.maximets at samsung.com>; 'ovs-dev at openvswitch.org'
<ovs-dev at openvswitch.org>
Cc: 'Ben Pfaff' <blp at ovn.org>
Subject: RE: [ovs-dev] [PATCH v1] ofproto: Fix OVS crash when packets hit
Openflow rules with certain combinations of nested actions

Hi Ilya,

Sorry for late reply and thanks for the review comments.
Yes, the issue happens only in the userspace datapath and not in kernel
datapath. I have tested it on kernel datapath and could not see this issue for
the same build.

Regarding the suggested solution, I still feel like going with recursive mutex
approach is more cleaner then releasing the lock for
ofproto_packet_out_finish().  We have the ofproto_dpif_xcache_execute() which
may lead to race when a flow mod learn is called from another thread. May be
it could also be possible that in certain combination we may result the crash
in ofproto_packet_out_start().  May be Ben can share his inputs as well.

Best Regards,Anil
Kumar

-----Original Message-----
From: Ilya Maximets <i.maximets at samsung.com>
Sent: Thursday, 29 August, 2019 05:10 PM
To: Anil Kumar Koli <anilkumar.k at altencalsoftlabs.com>;
ovs-dev at openvswitch.org
Cc: 'Ben Pfaff' <blp at ovn.org>
Subject: Re: [ovs-dev] [PATCH v1] ofproto: Fix OVS crash when packets hit
Openflow rules with certain combinations of nested actions

On 29.08.2019 12:52, Anil Kumar Koli wrote:
> Hi Ilya,
>
> Thanks for the review.
> Please find below the stack trace of the crash
>
>  (gdb) bt
> #0  0x00007f0a3da46c37 in __GI_raise (sig=sig at entry=6) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> #1  0x00007f0a3da4a028 in __GI_abort () at abort.c:89
> #2  0x000000000094262e in ovs_abort_valist (err_no=<optimized out>,
> format=<optimized out>, args=args at entry=0x7fffaf59e308) at
> lib/util.c:419
> #3  0x00000000009426b7 in ovs_abort (err_no=<optimized out>,
> format=format at entry=0xb0289f "%s: pthread_%s_%s failed") at
> lib/util.c:411
> #4  0x00000000009104f9 in ovs_mutex_lock_at (l_=l_ at entry=0xe47cc0
> <ofproto_mutex>, where=where at entry=0xad4199 "ofproto/ofproto.c:5391")
> at lib/ovs-thread.c:75
> #5  0x000000000083fb1f in ofproto_flow_mod_learn
> (ofm=ofm at entry=0x7fffaf59e620, keep_ref=<optimized out>,
> limit=<optimized out>, below_limitp=below_limitp at entry=0x7fffaf59e540)
> at ofproto/ofproto.c:5391
> #6  0x000000000085e5d0 in xlate_learn_action
> (ctx=ctx at entry=0x7fffaf5a02e0, learn=learn at entry=0x2b18308) at
> ofproto/ofproto-dpif-xlate.c:5378
> #7  0x00000000008615c3 in do_xlate_actions (ofpacts=<optimized out>,
> ofpacts_len=<optimized out>, ctx=<optimized out>,
> is_last_action=<optimized out>, group_bucket_action=<optimized out>)
> at ofproto/ofproto-dpif-xlate.c:6893
> #8  0x000000000085edba in xlate_recursively (actions_xlator=0x860730
> <do_xlate_actions>, is_last_action=false, deepens=<optimized out>,
> rule=0x2b8e440, ctx=0x7fffaf5a02e0) at
> ofproto/ofproto-dpif-xlate.c:4233
> #9  xlate_table_action (ctx=0x7fffaf5a02e0, in_port=<optimized out>,
> table_id=<optimized out>, may_packet_in=<optimized out>,
> honor_table_miss=<optimized out>, with_ct_orig=<optimized out>,
> is_last_action=false,
>     xlator=0x860730 <do_xlate_actions>) at
> ofproto/ofproto-dpif-xlate.c:4361
> #10 0x0000000000861aaa in xlate_ofpact_resubmit (resubmit=<optimized
> out>, resubmit=<optimized out>, resubmit=<optimized out>,
> is_last_action=<optimized out>, ctx=0x7fffaf5a02e0) at
> ofproto/ofproto-dpif-xlate.c:4672
> #11 do_xlate_actions (ofpacts=ofpacts at entry=0x2b654e8,
> ofpacts_len=ofpacts_len at entry=48, ctx=ctx at entry=0x7fffaf5a02e0,
> is_last_action=is_last_action at entry=true,
> group_bucket_action=group_bucket_action at entry=false)
>     at ofproto/ofproto-dpif-xlate.c:6773
> #12 0x00000000008696d6 in xlate_actions (xin=xin at entry=0x7fffaf5a0da0,
> xout=xout at entry=0x7fffaf5a11b0) at ofproto/ofproto-dpif-xlate.c:7570
> #13 0x0000000000859b56 in upcall_xlate (wc=0x7fffaf5a23f0,
> odp_actions=0x7fffaf5a1bc0, upcall=0x7fffaf5a1150, udpif=0x2b10670) at
> ofproto/ofproto-dpif-upcall.c:1197
> #14 process_upcall (udpif=udpif at entry=0x2b10670,
> upcall=upcall at entry=0x7fffaf5a1150,
> odp_actions=odp_actions at entry=0x7fffaf5a1bc0,
> wc=wc at entry=0x7fffaf5a23f0) at ofproto/ofproto-dpif-upcall.c:1413
> #15 0x000000000085a9eb in upcall_cb (packet=<optimized out>,
> flow=0x7fffaf5a2150, ufid=<optimized out>, pmd_id=<optimized out>,
> type=<optimized out>, userdata=<optimized out>, actions=0x7fffaf5a1bc0,
> wc=0x7fffaf5a23f0,
>     put_actions=0x7fffaf5a1c00, aux=0x2b10670) at
> ofproto/ofproto-dpif-upcall.c:1315
> #16 0x000000000088419c in dp_netdev_upcall (pmd=pmd at entry=0x7f0a35f34010,
> packet_=packet_ at entry=0x2b68930, flow=flow at entry=0x7fffaf5a2150,
> wc=wc at entry=0x7fffaf5a23f0, ufid=ufid at entry=0x7fffaf5a1ba0,
> type=type at entry=DPIF_UC_MISS,
>     userdata=userdata at entry=0x0, actions=actions at entry=0x7fffaf5a1bc0,
> put_actions=put_actions at entry=0x7fffaf5a1c00) at
> lib/dpif-netdev.c:6236
> #17 0x000000000088d053 in handle_packet_upcall
> (put_actions=0x7fffaf5a1c00, actions=0x7fffaf5a1bc0,
> key=0x7fffaf5a3080, packet=0x2b68930, pmd=0x7f0a35f34010) at
> lib/dpif-netdev.c:6591
> #18 fast_path_processing (pmd=pmd at entry=0x7f0a35f34010,
> packets_=packets_ at entry=0x7fffaf5a35f0, keys=keys at entry=0x7fffaf5a3040,
> flow_map=flow_map at entry=0x7fffaf5a2ef0,
> index_map=index_map at entry=0x7fffaf5a2ee0 "",
>     in_port=<optimized out>) at lib/dpif-netdev.c:6709
> #19 0x000000000088e9ce in dp_netdev_input__
> (pmd=pmd at entry=0x7f0a35f34010, packets=packets at entry=0x7fffaf5a35f0,
> md_is_valid=md_is_valid at entry=true, port_no=port_no at entry=0) at
> lib/dpif-netdev.c:6797
> #20 0x0000000000890710 in dp_netdev_recirculate
> (packets=0x7fffaf5a35f0, pmd=0x7f0a35f34010) at lib/dpif-netdev.c:6842
> #21 dp_execute_cb (aux_=aux_ at entry=0x7fffaf5a3ea0,
> packets_=0x7fffaf5a35f0, packets_ at entry=0x7fffaf5a3eb0,
> a=a at entry=0x2b1c7d4, should_steal=<optimized out>) at
> lib/dpif-netdev.c:7158
> #22 0x00000000008c3048 in odp_execute_actions (dp=dp at entry=0x7fffaf5a3ea0,
> batch=batch at entry=0x7fffaf5a3eb0, steal=steal at entry=false,
> actions=<optimized out>, actions_len=<optimized out>,
>     dp_execute_action=dp_execute_action at entry=0x8904f0
> <dp_execute_cb>) at lib/odp-execute.c:794
> #23 0x0000000000887bbd in dp_netdev_execute_actions
> (actions_len=<optimized out>, actions=<optimized out>, flow=<optimized
> out>, should_steal=false, packets=0x7fffaf5a3eb0, pmd=0x7f0a35f34010)
> at lib/dpif-netdev.c:7332
> #24 dpif_netdev_execute (dpif=dpif at entry=0x2b105d0,
> execute=execute at entry=0x7fffaf5a40f8) at lib/dpif-netdev.c:3725
> #25 0x000000000088dccc in dpif_netdev_operate (dpif=0x2b105d0,
> ops=0x7fffaf5a40e8, n_ops=1, offload_type=<optimized out>) at
> lib/dpif-netdev.c:3756
> #26 0x0000000000895b07 in dpif_operate (dpif=0x2b105d0,
> ops=ops at entry=0x7fffaf5a40e8, n_ops=n_ops at entry=1,
> offload_type=offload_type at entry=DPIF_OFFLOAD_AUTO) at lib/dpif.c:1367
> #27 0x0000000000896233 in dpif_execute (dpif=<optimized out>,
> execute=execute at entry=0x7fffaf5a4180) at lib/dpif.c:1321
> #28 0x00000000008459eb in packet_execute (ofproto_=0x2b3ec60,
> opo=0x7fffaf5a4200) at ofproto/ofproto-dpif.c:4760
> #29 0x000000000083a9bb in ofproto_packet_out_finish
> (opo=0x7fffaf5a4200, ofproto=0x2b3ec60) at ofproto/ofproto.c:3594
> #30 handle_packet_out (ofconn=ofconn at entry=0x2b23d50,
> oh=oh at entry=0x2b89470) at ofproto/ofproto.c:3635
> #31 0x000000000083f06f in handle_single_part_openflow
> (type=OFPTYPE_PACKET_OUT, oh=0x2b89470, ofconn=0x2b23d50) at
> ofproto/ofproto.c:8400
> #32 handle_openflow (ofconn=ofconn at entry=0x2b23d50,
> msgs=msgs at entry=0x7fffaf5a5600) at ofproto/ofproto.c:8587
> #33 0x000000000086ff6b in ofconn_run (handle_openflow=0x83e1f0
> <handle_openflow>, ofconn=0x2b23d50) at ofproto/connmgr.c:1318
> #34 connmgr_run (mgr=0x2a26820,
> handle_openflow=handle_openflow at entry=0x83e1f0 <handle_openflow>) at
> ofproto/connmgr.c:355
> #35 0x0000000000838706 in ofproto_run (p=0x2b3ec60) at
> ofproto/ofproto.c:1826
> #36 0x0000000000826234 in bridge_run__ () at vswitchd/bridge.c:2965
> #37 0x000000000082bae8 in bridge_run () at vswitchd/bridge.c:3023
> #38 0x0000000000422a05 in main (argc=11, argv=0x7fffaf5a5a28) at
> vswitchd/ovs-vswitchd.c:127
> (gdb) f 5
> #5  0x000000000083fb1f in ofproto_flow_mod_learn
> (ofm=ofm at entry=0x7fffaf59e620, keep_ref=<optimized out>, limit=<optimized
> out>, below_limitp=below_limitp at entry=0x7fffaf59e540) at
> ofproto/ofproto.c:5391
> 5391    ofproto/ofproto.c: No such file or directory.
> (gdb) p ofproto_mutex
> $2 = {lock = {__data = {__lock = 1, __count = 0, __owner = 67357, __nusers =
> 1, __kind = 2, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next =
> 0x0}},
>     __size =
> "\001\000\000\000\000\000\000\000\035\a\001\000\001\000\000\000\002",
> '\000' <repeats 22 times>, __align = 1}, where = 0xad3dd6
> "ofproto/ofproto.c:3631"}
>
> Please find the discussion link regarding this crash with community.
> https://mail.openvswitch.org/pipermail/ovs-discuss/2019-April/048494.h
> tml

Thanks for the trace and link. It's very useful.
You may, probably, add above link to the commit message via 'Reported-at' tag.

Regarding the issue.
Is it correct that this issue only happens for userspace datapath?
In case of userspace datapath execution of the packet actions happens in the
same thread that handles the ofproto packet-out request. This is seen in above
stack trace.  But in case of kernel datapath,
packet_execute() sends packet to the kernel via netlink and further processing
will happen in a separate thread that handles upcall (if any).

If above statements are correct, there could be another way to fix the issue.

Ben, you said previously:

> How do we fix this?
>
> * We can't get rid of ofproto_mutex in do_bundle_commit(), or drop it
>   temporarily around flow translation (i.e. the call to
>   ofproto_packet_out_start()), because it might need to revert all the
>   flow table changes and dropping the mutex would allow other threads to
>   race in and make conflicting changes

But it seems that the issue happens on ofproto_packet_out_finish() and not on
ofproto_packet_out_start(). So, the flow translation should be OK under the
ofproto_mutex and revert is still possible under the mutex.
The only thing we need to take out of the mutex is real action execution by
the datapath triggered by ofproto_packet_out_finish(). Callers never check the
status of this function so it should be not so hard.

So, possible solution:
* Move ofproto_packet_out_finish() out of ofproto_mutex in all the callers:
  * It's easy for handle_packet_out()
  * For do_bundle_commit() we could temporary store all the ofproto_packet_out
    entities and finish them later.

Am I missing something?  Is there reason for ofproto_packet_out_finish() to
require ofproto_mutex?

Ben, Anil, what do you think?

Best regards, Ilya Maximets.


More information about the dev mailing list