[ovs-dev] [PATCH v1] ofproto: Fix OVS crash when packets hit Openflow rules with certain combinations of nested actions

Ilya Maximets i.maximets at samsung.com
Thu Aug 29 11:39:39 UTC 2019


On 29.08.2019 12:52, Anil Kumar Koli wrote:
> Hi Ilya,
> 
> Thanks for the review.
> Please find below the stack trace of the crash 
> 
>  (gdb) bt
> #0  0x00007f0a3da46c37 in __GI_raise (sig=sig at entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> #1  0x00007f0a3da4a028 in __GI_abort () at abort.c:89
> #2  0x000000000094262e in ovs_abort_valist (err_no=<optimized out>, format=<optimized out>, args=args at entry=0x7fffaf59e308) at lib/util.c:419
> #3  0x00000000009426b7 in ovs_abort (err_no=<optimized out>, format=format at entry=0xb0289f "%s: pthread_%s_%s failed") at lib/util.c:411
> #4  0x00000000009104f9 in ovs_mutex_lock_at (l_=l_ at entry=0xe47cc0 <ofproto_mutex>, where=where at entry=0xad4199 "ofproto/ofproto.c:5391") at lib/ovs-thread.c:75
> #5  0x000000000083fb1f in ofproto_flow_mod_learn (ofm=ofm at entry=0x7fffaf59e620, keep_ref=<optimized out>, limit=<optimized out>, below_limitp=below_limitp at entry=0x7fffaf59e540) at ofproto/ofproto.c:5391
> #6  0x000000000085e5d0 in xlate_learn_action (ctx=ctx at entry=0x7fffaf5a02e0, learn=learn at entry=0x2b18308) at ofproto/ofproto-dpif-xlate.c:5378
> #7  0x00000000008615c3 in do_xlate_actions (ofpacts=<optimized out>, ofpacts_len=<optimized out>, ctx=<optimized out>, is_last_action=<optimized out>, group_bucket_action=<optimized out>) at ofproto/ofproto-dpif-xlate.c:6893
> #8  0x000000000085edba in xlate_recursively (actions_xlator=0x860730 <do_xlate_actions>, is_last_action=false, deepens=<optimized out>, rule=0x2b8e440, ctx=0x7fffaf5a02e0) at ofproto/ofproto-dpif-xlate.c:4233
> #9  xlate_table_action (ctx=0x7fffaf5a02e0, in_port=<optimized out>, table_id=<optimized out>, may_packet_in=<optimized out>, honor_table_miss=<optimized out>, with_ct_orig=<optimized out>, is_last_action=false,
>     xlator=0x860730 <do_xlate_actions>) at ofproto/ofproto-dpif-xlate.c:4361
> #10 0x0000000000861aaa in xlate_ofpact_resubmit (resubmit=<optimized out>, resubmit=<optimized out>, resubmit=<optimized out>, is_last_action=<optimized out>, ctx=0x7fffaf5a02e0) at ofproto/ofproto-dpif-xlate.c:4672
> #11 do_xlate_actions (ofpacts=ofpacts at entry=0x2b654e8, ofpacts_len=ofpacts_len at entry=48, ctx=ctx at entry=0x7fffaf5a02e0, is_last_action=is_last_action at entry=true, group_bucket_action=group_bucket_action at entry=false)
>     at ofproto/ofproto-dpif-xlate.c:6773
> #12 0x00000000008696d6 in xlate_actions (xin=xin at entry=0x7fffaf5a0da0, xout=xout at entry=0x7fffaf5a11b0) at ofproto/ofproto-dpif-xlate.c:7570
> #13 0x0000000000859b56 in upcall_xlate (wc=0x7fffaf5a23f0, odp_actions=0x7fffaf5a1bc0, upcall=0x7fffaf5a1150, udpif=0x2b10670) at ofproto/ofproto-dpif-upcall.c:1197
> #14 process_upcall (udpif=udpif at entry=0x2b10670, upcall=upcall at entry=0x7fffaf5a1150, odp_actions=odp_actions at entry=0x7fffaf5a1bc0, wc=wc at entry=0x7fffaf5a23f0) at ofproto/ofproto-dpif-upcall.c:1413
> #15 0x000000000085a9eb in upcall_cb (packet=<optimized out>, flow=0x7fffaf5a2150, ufid=<optimized out>, pmd_id=<optimized out>, type=<optimized out>, userdata=<optimized out>, actions=0x7fffaf5a1bc0, wc=0x7fffaf5a23f0,
>     put_actions=0x7fffaf5a1c00, aux=0x2b10670) at ofproto/ofproto-dpif-upcall.c:1315
> #16 0x000000000088419c in dp_netdev_upcall (pmd=pmd at entry=0x7f0a35f34010, packet_=packet_ at entry=0x2b68930, flow=flow at entry=0x7fffaf5a2150, wc=wc at entry=0x7fffaf5a23f0, ufid=ufid at entry=0x7fffaf5a1ba0, type=type at entry=DPIF_UC_MISS,
>     userdata=userdata at entry=0x0, actions=actions at entry=0x7fffaf5a1bc0, put_actions=put_actions at entry=0x7fffaf5a1c00) at lib/dpif-netdev.c:6236
> #17 0x000000000088d053 in handle_packet_upcall (put_actions=0x7fffaf5a1c00, actions=0x7fffaf5a1bc0, key=0x7fffaf5a3080, packet=0x2b68930, pmd=0x7f0a35f34010) at lib/dpif-netdev.c:6591
> #18 fast_path_processing (pmd=pmd at entry=0x7f0a35f34010, packets_=packets_ at entry=0x7fffaf5a35f0, keys=keys at entry=0x7fffaf5a3040, flow_map=flow_map at entry=0x7fffaf5a2ef0, index_map=index_map at entry=0x7fffaf5a2ee0 "",
>     in_port=<optimized out>) at lib/dpif-netdev.c:6709
> #19 0x000000000088e9ce in dp_netdev_input__ (pmd=pmd at entry=0x7f0a35f34010, packets=packets at entry=0x7fffaf5a35f0, md_is_valid=md_is_valid at entry=true, port_no=port_no at entry=0) at lib/dpif-netdev.c:6797
> #20 0x0000000000890710 in dp_netdev_recirculate (packets=0x7fffaf5a35f0, pmd=0x7f0a35f34010) at lib/dpif-netdev.c:6842
> #21 dp_execute_cb (aux_=aux_ at entry=0x7fffaf5a3ea0, packets_=0x7fffaf5a35f0, packets_ at entry=0x7fffaf5a3eb0, a=a at entry=0x2b1c7d4, should_steal=<optimized out>) at lib/dpif-netdev.c:7158
> #22 0x00000000008c3048 in odp_execute_actions (dp=dp at entry=0x7fffaf5a3ea0, batch=batch at entry=0x7fffaf5a3eb0, steal=steal at entry=false, actions=<optimized out>, actions_len=<optimized out>,
>     dp_execute_action=dp_execute_action at entry=0x8904f0 <dp_execute_cb>) at lib/odp-execute.c:794
> #23 0x0000000000887bbd in dp_netdev_execute_actions (actions_len=<optimized out>, actions=<optimized out>, flow=<optimized out>, should_steal=false, packets=0x7fffaf5a3eb0, pmd=0x7f0a35f34010) at lib/dpif-netdev.c:7332
> #24 dpif_netdev_execute (dpif=dpif at entry=0x2b105d0, execute=execute at entry=0x7fffaf5a40f8) at lib/dpif-netdev.c:3725
> #25 0x000000000088dccc in dpif_netdev_operate (dpif=0x2b105d0, ops=0x7fffaf5a40e8, n_ops=1, offload_type=<optimized out>) at lib/dpif-netdev.c:3756
> #26 0x0000000000895b07 in dpif_operate (dpif=0x2b105d0, ops=ops at entry=0x7fffaf5a40e8, n_ops=n_ops at entry=1, offload_type=offload_type at entry=DPIF_OFFLOAD_AUTO) at lib/dpif.c:1367
> #27 0x0000000000896233 in dpif_execute (dpif=<optimized out>, execute=execute at entry=0x7fffaf5a4180) at lib/dpif.c:1321
> #28 0x00000000008459eb in packet_execute (ofproto_=0x2b3ec60, opo=0x7fffaf5a4200) at ofproto/ofproto-dpif.c:4760
> #29 0x000000000083a9bb in ofproto_packet_out_finish (opo=0x7fffaf5a4200, ofproto=0x2b3ec60) at ofproto/ofproto.c:3594
> #30 handle_packet_out (ofconn=ofconn at entry=0x2b23d50, oh=oh at entry=0x2b89470) at ofproto/ofproto.c:3635
> #31 0x000000000083f06f in handle_single_part_openflow (type=OFPTYPE_PACKET_OUT, oh=0x2b89470, ofconn=0x2b23d50) at ofproto/ofproto.c:8400
> #32 handle_openflow (ofconn=ofconn at entry=0x2b23d50, msgs=msgs at entry=0x7fffaf5a5600) at ofproto/ofproto.c:8587
> #33 0x000000000086ff6b in ofconn_run (handle_openflow=0x83e1f0 <handle_openflow>, ofconn=0x2b23d50) at ofproto/connmgr.c:1318
> #34 connmgr_run (mgr=0x2a26820, handle_openflow=handle_openflow at entry=0x83e1f0 <handle_openflow>) at ofproto/connmgr.c:355
> #35 0x0000000000838706 in ofproto_run (p=0x2b3ec60) at ofproto/ofproto.c:1826
> #36 0x0000000000826234 in bridge_run__ () at vswitchd/bridge.c:2965
> #37 0x000000000082bae8 in bridge_run () at vswitchd/bridge.c:3023
> #38 0x0000000000422a05 in main (argc=11, argv=0x7fffaf5a5a28) at vswitchd/ovs-vswitchd.c:127
> (gdb) f 5
> #5  0x000000000083fb1f in ofproto_flow_mod_learn (ofm=ofm at entry=0x7fffaf59e620, keep_ref=<optimized out>, limit=<optimized out>, below_limitp=below_limitp at entry=0x7fffaf59e540) at ofproto/ofproto.c:5391
> 5391    ofproto/ofproto.c: No such file or directory.
> (gdb) p ofproto_mutex
> $2 = {lock = {__data = {__lock = 1, __count = 0, __owner = 67357, __nusers = 1, __kind = 2, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}},
>     __size = "\001\000\000\000\000\000\000\000\035\a\001\000\001\000\000\000\002", '\000' <repeats 22 times>, __align = 1}, where = 0xad3dd6 "ofproto/ofproto.c:3631"}
> 
> Please find the discussion link regarding this crash with community. 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2019-April/048494.html

Thanks for the trace and link. It's very useful.
You may, probably, add above link to the commit message via 'Reported-at' tag.

Regarding the issue.
Is it correct that this issue only happens for userspace datapath?
In case of userspace datapath execution of the packet actions happens
in the same thread that handles the ofproto packet-out request. This
is seen in above stack trace.  But in case of kernel datapath,
packet_execute() sends packet to the kernel via netlink and further
processing will happen in a separate thread that handles upcall (if any).

If above statements are correct, there could be another way to fix the
issue.

Ben, you said previously:

> How do we fix this?
> 
> * We can't get rid of ofproto_mutex in do_bundle_commit(), or drop it
>   temporarily around flow translation (i.e. the call to
>   ofproto_packet_out_start()), because it might need to revert all the
>   flow table changes and dropping the mutex would allow other threads to
>   race in and make conflicting changes

But it seems that the issue happens on ofproto_packet_out_finish() and
not on ofproto_packet_out_start(). So, the flow translation should be OK
under the ofproto_mutex and revert is still possible under the mutex.
The only thing we need to take out of the mutex is real action execution
by the datapath triggered by ofproto_packet_out_finish(). Callers never
check the status of this function so it should be not so hard.

So, possible solution:
* Move ofproto_packet_out_finish() out of ofproto_mutex in all the callers:
  * It's easy for handle_packet_out()
  * For do_bundle_commit() we could temporary store all the ofproto_packet_out
    entities and finish them later.

Am I missing something?  Is there reason for ofproto_packet_out_finish() to
require ofproto_mutex?

Ben, Anil, what do you think?

Best regards, Ilya Maximets.


More information about the dev mailing list