[ovs-dev] Questions about rcu_postpone's wrong use in classifier.c

Ben Pfaff blp at ovn.org
Mon Apr 1 14:03:01 UTC 2019


On April 1, 2019 2:56:13 PM GMT+02:00, Ilya Maximets <i.maximets at samsung.com> wrote:
>Hi.
>
>Thanks for the report.
>
>It looks like an rcu misusing inside pvector implementation.
>The following sequence will lead to the use-after-free:
>
>     Thread#1                           Thread#2
>
>1.   pvector_init(pvec);
>2.   pvector_insert(pvec, elem1);
>3.   pvector_publish(pvec);
>
>     /* Remove from 'temp' impl. */
>4.   pvector_remove(pvec, elem1);
>     /* Postpone the free() */
>5.   ovsrcu_postpone(free, elem1);      
>6.                                       cursor__ =
>pvector_cursor_init(PVECTOR, 0, 0);
>                                      /* Get pointer from main impl. */
>7.                                       elem1 =
>pvector_cursor_next(&cursor__, INT_MIN, 0, 0);
>
>8.                                       ovsrcu_quiesce();
>
>9.                                       cursor__ =
>pvector_cursor_init(PVECTOR, 0, 0);
>                                      /* Get pointer from main impl. */
>10.                                      elem1 =
>pvector_cursor_next(&cursor__, INT_MIN, 0, 0);
>
>     /* Change 'temp' with main impl. */
>     /* Postpone the free() of old impl. */
>11.  pvector_publish(pvec);
>/* Postponed free() frees the elem1 because both threads quiesced once
>since postpone(). */
>12.  ovsrcu_quiesce();
>
>                     /* Try to use obtained pointer. Use after free. */
>13.                                      elem1->use();
>                           /* Postponed free() frees the old 'impl'. */
>14.                                      ovsrcu_quiesce();
>
>
>Thread#2 assumes that it's safe to use pvector element within its grace
>period,
>i.e. between steps 8 and 14, however it was already freed on step 12.
>
>So, in current implementation it's not allowed for other threads to
>quiesce while
>one thread modifies the pvector.
>
>Main issue is that ovsrcu_get(struct pvector_impl *, &pvec->impl)
>doesn't return
>the most recent implementation. It continues to return old one until
>changes published.
>But it can't return the new one, because it's not sorted. So, I'm not
>sure how to
>fix this properly. In fact we need to postpone freeing elements after
>freeing the
>pvector->impl. This probably could be achieved by double postponing the
>freeing of
>the pvector elements. I'm not sure how to implement this correctly.
>Another option
>is to force callers to call ovsrcu_postpone(free, elem1) only after
>pvector_publish().
>
>Ben, what do you think?
>
>
>Best regards, Ilya Maximets.
>
>> Hi all,
>> 
>> We find a core dump file in our environment with ovs+dpdk, the
>following is call stack:
>> 
>> #0  0x00007f354d6eb237 in raise () from /lib64/libc.so.6
>> #1  0x00007f354d6ec928 in abort () from /lib64/libc.so.6
>> #2  0x00000000006a6d99 in PAT_abort ()
>> #3  0x00000000006a3edd in patchIllInsHandler ()
>> #4  <signal handler called>
>> #5  0x00000000004c2275 in rehash (hash=1661256324, impl=0x0) at
>lib/ccmap.c:135
>> #6  ccmap_find (ccmap=ccmap at entry=0x26b654b8, hash=1661256324) at
>lib/ccmap.c:229
>> #7  0x00000000004bf73b in find_match_wc (wc=0x7f350ce67370,
>n_tries=0, trie_ctx=0x7f350ce63e20, flow=0x7f350ce66dc0,
>>     version=18446744073709551614, subtable=<optimized out>) at
>lib/classifier.c:1676
>> #8  classifier_lookup__ (cls=cls at entry=0xee2360 <cls>,
>version=version at entry=18446744073709551614,
>>     flow=flow at entry=0x7f350ce66dc0, wc=wc at entry=0x7f350ce67370,
>allow_conjunctive_matches=allow_conjunctive_matches at entry=true)
>>     at lib/classifier.c:972
>> #9  0x00000000004c193b in classifier_lookup (cls=cls at
>entry=0xee2360 <cls>, version=version at entry=18446744073709551614,
>>     flow=flow at entry=0x7f350ce66dc0, wc=wc at entry=0x7f350ce67370)
>at lib/classifier.c:1166
>> #10 0x0000000000575acb in tnl_port_map_lookup (flow=flow at
>entry=0x7f350ce66dc0, wc=wc at entry=0x7f350ce67370)
>>     at lib/tnl_ports.c:287
>> #11 0x00000000004a7049 in compose_output_action__ (ctx=ctx at
>entry=0x7f350ce65d60, ofp_port=1, xr=0x0,
>>     check_stp=check_stp at entry=true) at
>ofproto/ofproto_dpif_xlate.c:3753
>> #12 0x00000000004a7967 in compose_output_action (xr=<optimized out>,
>ofp_port=<optimized out>, ctx=0x7f350ce65d60)
>>     at ofproto/ofproto_dpif_xlate.c:3832
>> #13 output_normal (ctx=ctx at entry=0x7f350ce65d60,
>out_xbundle=out_xbundle at entry=0x2cfe8120, xvlan=xvlan at
>entry=0x7f350ce655e0)
>>     at ofproto/ofproto_dpif_xlate.c:2288
>> #14 0x00000000004a7df5 in xlate_normal_flood (ctx=ctx at
>entry=0x7f350ce65d60, in_xbundle=in_xbundle at entry=0x9b547c0,
>>     xvlan=xvlan at entry=0x7f350ce655e0) at
>ofproto/ofproto_dpif_xlate.c:2757
>> #15 0x00000000004aaf39 in xlate_normal (ctx=0x7f350ce65d60) at
>ofproto/ofproto_dpif_xlate.c:2987
>> #16 xlate_output_action (ctx=ctx at entry=0x7f350ce65d60,
>port=<optimized out>, max_len=<optimized out>,
>>     may_packet_in=may_packet_in at entry=true) at
>ofproto/ofproto_dpif_xlate.c:4577
>> #17 0x00000000004a920e in do_xlate_actions (ofpacts=<optimized out>,
>ofpacts_len=<optimized out>, ctx=ctx at entry=0x7f350ce65d60)
>>     at ofproto/ofproto_dpif_xlate.c:5565
>> #18 0x00000000004ac25f in xlate_actions (xin=xin at
>entry=0x7f350ce66db0, xout=xout at entry=0x7f350ce675f0)
>>    at ofproto/ofproto_dpif_xlate.c:6456
>> #19 0x000000000049d4b7 in xlate_key (key=<optimized out>,
>len=<optimized out>, push=push at entry=0x7f350ce670f0,
>>     ctx=ctx at entry=0x7f350ce675d0, udpif=<optimized out>) at
>ofproto/ofproto_dpif_upcall.c:1974
>> #20 0x000000000049da48 in xlate_ukey (ukey=0x7f34ec032a20,
>ukey=0x7f34ec032a20, ctx=0x7f350ce675d0, tcp_flags=<optimized out>,
>>     udpif=0x6b6c550) at ofproto/ofproto_dpif_upcall.c:1986
>> #21 revalidate_ukey__ (udpif=udpif at entry=0x6b6c550, ukey=ukey at
>entry=0x7f34ec032a20, tcp_flags=<optimized out>,
>>     odp_actions=0x7f350ce679e0, recircs=recircs at
>entry=0x7f350ce679d0, xcache=<optimized out>)
>>    at ofproto/ofproto_dpif_upcall.c:2032
>> #22 0x000000000049dc8d in revalidate_ukey (udpif=udpif at
>entry=0x6b6c550, ukey=ukey at entry=0x7f34ec032a20,
>>     stats=stats at entry=0x7f350ce688b8, odp_actions=odp_actions at
>entry=0x7f350ce679e0, reval_seq=reval_seq at entry=268144954,
>>     recircs=recircs at entry=0x7f350ce679d0) at
>ofproto/ofproto_dpif_upcall.c:2128
>> #23 0x00000000004a0853 in revalidate (revalidator=0x75715d0) at
>ofproto/ofproto_dpif_upcall.c:2488
>> #24 0x00000000004a0a4b in udpif_revalidator (arg=0x75715d0) at
>ofproto/ofproto_dpif_upcall.c:958
>> #25 0x000000000054ca48 in ovsthread_wrapper (aux_=<optimized out>) at
>lib/ovs_thread.c:682
>> #26 0x00007f354f303dd5 in start_thread () from /lib64/libpthread.so.0
>> #27 0x00007f354d7b359d in clone () from /lib64/libc.so.6
>> 
>> It is raised by the wrong use of rcu_postpone when deleting subtable
>from classifier->subtables(pvector), the following is the triggering
>steps:
>> 
>> 1.       Modify a hnic port's ip(by ifconfig hnic x.x.x.x/24)
>> 2.       Ovs route's reset will flush all entries in the tnl_ports's
>classifier(cls)
>> 3.       Ovs main thread call destroy_subtable(): call
>ovsrcu_postpone() to free the subtale,it occurs the
>ovsrcu_flush_cbset() by accident. But cls->subtables->impl  still
>stores the deleted subtable's pointer,it will be moved out in
>pvector_publish(in step 7)
>> 4.       Rcu thread start to ovsrcu_synchronize() before free the
>subtable in flush_list.( support target_seqno is 2, main thread's
>seq_no is smaller than 2,support it is 1.)
>> 5.       The pmd thread is still working. It adds the global_seqno to
>3.
>> 6.       Revalidator thread wakes up form poll_block with seq_no 3.
>Then, It acquires the subtable's pointer in cls->sutables which shoud
>not be used.
>> 7.       Ovs main thread call pvector_publish and go to poll_block(),
>moves out from ovsrcu_threads
>> 8.       ovsrcu_synchronize in step 4 find all thread's seq_no is
>larger than 2,free the subtable
>> 9.       revalidator is still using the subtable's pointer(has been
>freed),then coredump!!!
>> 
>> Is there any useful patch to fix this bug? Can we call the
>postpone(to free subtable) after  pvector_publish?
>> 
>> Looking forward to your reply
>> 
>> Thanks.

Hey, I'm out at a conference this week, so I probably won't be able to take a look at this soon.


More information about the dev mailing list