[ovs-dev] Questions about rcu_postpone's wrong use in classifier.c

fuzhantao fuzhantao at huawei.com
Tue Apr 2 03:17:05 UTC 2019


Hi Ilya & Ben
    Thanks for your reply and suggestion. The following is my patch to fix this bug(refer to Ilya's double postponing method).
     It is useful via our fault-injection test.  Could you help to check if it will cause other problems?

Thanks
--Fu zhantao


diff --git a/lib/classifier.c b/lib/classifier.c
index 782ef07..cd6aa09 100644
--- a/lib/classifier.c
+++ b/lib/classifier.c
@@ -1443,6 +1443,19 @@
     return map;
 }
 
+static void
+subtable_destroy_cb(struct cls_subtable *subtable)
+{
+    int i;
+
+    for (i = 0; i < subtable->n_indices; i++) {
+        ccmap_destroy(&subtable->indices[i]);
+    }
+    cmap_destroy(&subtable->rules);
+
+    ovsrcu_postpone(free, subtable);
+}
+
 /* The new subtable will be visible to the readers only after this. */
 static struct cls_subtable *
 insert_subtable(struct classifier *cls, const struct minimask *mask)
@@ -1505,12 +1518,11 @@
     return subtable;
 }
 
-/* RCU readers may still access the subtable before it is actually freed. */
+/* RCU readers may still access the subtable before it is actually freed.
+ * double postpone for subtable to avoid use-after-free. */
 static void
 destroy_subtable(struct classifier *cls, struct cls_subtable *subtable)
 {
-    int i;
-
     pvector_remove(&cls->subtables, subtable);
     cmap_remove(&cls->subtables_map, &subtable->cmap_node,
                 minimask_hash(&subtable->mask, 0));
@@ -1520,11 +1532,7 @@
     ovs_assert(cmap_is_empty(&subtable->rules));
     ovs_assert(rculist_is_empty(&subtable->rules_list));
 
-    for (i = 0; i < subtable->n_indices; i++) {
-        ccmap_destroy(&subtable->indices[i]);
-    }
-    cmap_destroy(&subtable->rules);
-    ovsrcu_postpone(free, subtable);
+    ovsrcu_postpone(subtable_destroy_cb, subtable);
 }
-----邮件原件-----
发件人: Ilya Maximets [mailto:i.maximets at samsung.com] 
发送时间: 2019年4月1日 20:56
收件人: ovs-dev at openvswitch.org; Ben Pfaff <blp at ovn.org>; fuzhantao <fuzhantao at huawei.com>
主题: Re: [ovs-dev] Questions about rcu_postpone's wrong use in classifier.c

Hi.

Thanks for the report.

It looks like an rcu misusing inside pvector implementation.
The following sequence will lead to the use-after-free:

     Thread#1                           Thread#2

1.   pvector_init(pvec);
2.   pvector_insert(pvec, elem1);
3.   pvector_publish(pvec);

     /* Remove from 'temp' impl. */
4.   pvector_remove(pvec, elem1);
     /* Postpone the free() */
5.   ovsrcu_postpone(free, elem1);      
6.                                       cursor__ = pvector_cursor_init(PVECTOR, 0, 0);
                                         /* Get pointer from main impl. */
7.                                       elem1 = pvector_cursor_next(&cursor__, INT_MIN, 0, 0);

8.                                       ovsrcu_quiesce();

9.                                       cursor__ = pvector_cursor_init(PVECTOR, 0, 0);
                                         /* Get pointer from main impl. */
10.                                      elem1 = pvector_cursor_next(&cursor__, INT_MIN, 0, 0);

     /* Change 'temp' with main impl. */
     /* Postpone the free() of old impl. */ 11.  pvector_publish(pvec);
     /* Postponed free() frees the elem1 because both threads quiesced once since postpone(). */ 12.  ovsrcu_quiesce();

                                         /* Try to use obtained pointer. Use after free. */
13.                                      elem1->use();
                                         /* Postponed free() frees the old 'impl'. */
14.                                      ovsrcu_quiesce();


Thread#2 assumes that it's safe to use pvector element within its grace period, i.e. between steps 8 and 14, however it was already freed on step 12.

So, in current implementation it's not allowed for other threads to quiesce while one thread modifies the pvector.

Main issue is that ovsrcu_get(struct pvector_impl *, &pvec->impl) doesn't return the most recent implementation. It continues to return old one until changes published.
But it can't return the new one, because it's not sorted. So, I'm not sure how to fix this properly. In fact we need to postpone freeing elements after freeing the
pvector->impl. This probably could be achieved by double postponing the 
pvector->freeing of
the pvector elements. I'm not sure how to implement this correctly. Another option is to force callers to call ovsrcu_postpone(free, elem1) only after pvector_publish().

Ben, what do you think?


Best regards, Ilya Maximets.

> Hi all,
> 
> We find a core dump file in our environment with ovs+dpdk, the following is call stack:
> 
> #0  0x00007f354d6eb237 in raise () from /lib64/libc.so.6
> #1  0x00007f354d6ec928 in abort () from /lib64/libc.so.6
> #2  0x00000000006a6d99 in PAT_abort ()
> #3  0x00000000006a3edd in patchIllInsHandler ()
> #4  <signal handler called>
> #5  0x00000000004c2275 in rehash (hash=1661256324, impl=0x0) at 
> lib/ccmap.c:135
> #6  ccmap_find (ccmap=ccmap at entry=0x26b654b8, hash=1661256324) at 
> lib/ccmap.c:229
> #7  0x00000000004bf73b in find_match_wc (wc=0x7f350ce67370, n_tries=0, trie_ctx=0x7f350ce63e20, flow=0x7f350ce66dc0,
>     version=18446744073709551614, subtable=<optimized out>) at 
> lib/classifier.c:1676
> #8  classifier_lookup__ (cls=cls at entry=0xee2360 <cls>, version=version at entry=18446744073709551614,
>     flow=flow at entry=0x7f350ce66dc0, wc=wc at entry=0x7f350ce67370, allow_conjunctive_matches=allow_conjunctive_matches at entry=true)
>     at lib/classifier.c:972
> #9  0x00000000004c193b in classifier_lookup (cls=cls at entry=0xee2360 <cls>, version=version at entry=18446744073709551614,
>     flow=flow at entry=0x7f350ce66dc0, wc=wc at entry=0x7f350ce67370) 
> at lib/classifier.c:1166
> #10 0x0000000000575acb in tnl_port_map_lookup (flow=flow at entry=0x7f350ce66dc0, wc=wc at entry=0x7f350ce67370)
>     at lib/tnl_ports.c:287
> #11 0x00000000004a7049 in compose_output_action__ (ctx=ctx at entry=0x7f350ce65d60, ofp_port=1, xr=0x0,
>     check_stp=check_stp at entry=true) at 
> ofproto/ofproto_dpif_xlate.c:3753
> #12 0x00000000004a7967 in compose_output_action (xr=<optimized out>, ofp_port=<optimized out>, ctx=0x7f350ce65d60)
>     at ofproto/ofproto_dpif_xlate.c:3832
> #13 output_normal (ctx=ctx at entry=0x7f350ce65d60, out_xbundle=out_xbundle at entry=0x2cfe8120, xvlan=xvlan at entry=0x7f350ce655e0)
>     at ofproto/ofproto_dpif_xlate.c:2288
> #14 0x00000000004a7df5 in xlate_normal_flood (ctx=ctx at entry=0x7f350ce65d60, in_xbundle=in_xbundle at entry=0x9b547c0,
>     xvlan=xvlan at entry=0x7f350ce655e0) at 
> ofproto/ofproto_dpif_xlate.c:2757
> #15 0x00000000004aaf39 in xlate_normal (ctx=0x7f350ce65d60) at 
> ofproto/ofproto_dpif_xlate.c:2987
> #16 xlate_output_action (ctx=ctx at entry=0x7f350ce65d60, port=<optimized out>, max_len=<optimized out>,
>     may_packet_in=may_packet_in at entry=true) at 
> ofproto/ofproto_dpif_xlate.c:4577
> #17 0x00000000004a920e in do_xlate_actions (ofpacts=<optimized out>, ofpacts_len=<optimized out>, ctx=ctx at entry=0x7f350ce65d60)
>     at ofproto/ofproto_dpif_xlate.c:5565
> #18 0x00000000004ac25f in xlate_actions (xin=xin at entry=0x7f350ce66db0, xout=xout at entry=0x7f350ce675f0)
>    at ofproto/ofproto_dpif_xlate.c:6456
> #19 0x000000000049d4b7 in xlate_key (key=<optimized out>, len=<optimized out>, push=push at entry=0x7f350ce670f0,
>     ctx=ctx at entry=0x7f350ce675d0, udpif=<optimized out>) at 
> ofproto/ofproto_dpif_upcall.c:1974
> #20 0x000000000049da48 in xlate_ukey (ukey=0x7f34ec032a20, ukey=0x7f34ec032a20, ctx=0x7f350ce675d0, tcp_flags=<optimized out>,
>     udpif=0x6b6c550) at ofproto/ofproto_dpif_upcall.c:1986
> #21 revalidate_ukey__ (udpif=udpif at entry=0x6b6c550, ukey=ukey at entry=0x7f34ec032a20, tcp_flags=<optimized out>,
>     odp_actions=0x7f350ce679e0, recircs=recircs at entry=0x7f350ce679d0, xcache=<optimized out>)
>    at ofproto/ofproto_dpif_upcall.c:2032
> #22 0x000000000049dc8d in revalidate_ukey (udpif=udpif at entry=0x6b6c550, ukey=ukey at entry=0x7f34ec032a20,
>     stats=stats at entry=0x7f350ce688b8, odp_actions=odp_actions at entry=0x7f350ce679e0, reval_seq=reval_seq at entry=268144954,
>     recircs=recircs at entry=0x7f350ce679d0) at 
> ofproto/ofproto_dpif_upcall.c:2128
> #23 0x00000000004a0853 in revalidate (revalidator=0x75715d0) at 
> ofproto/ofproto_dpif_upcall.c:2488
> #24 0x00000000004a0a4b in udpif_revalidator (arg=0x75715d0) at 
> ofproto/ofproto_dpif_upcall.c:958
> #25 0x000000000054ca48 in ovsthread_wrapper (aux_=<optimized out>) at 
> lib/ovs_thread.c:682
> #26 0x00007f354f303dd5 in start_thread () from /lib64/libpthread.so.0
> #27 0x00007f354d7b359d in clone () from /lib64/libc.so.6
> 
> It is raised by the wrong use of rcu_postpone when deleting subtable from classifier->subtables(pvector), the following is the triggering steps:
> 
> 1.       Modify a hnic port's ip(by ifconfig hnic x.x.x.x/24)
> 2.       Ovs route's reset will flush all entries in the tnl_ports's classifier(cls)
> 3.       Ovs main thread call destroy_subtable(): call ovsrcu_postpone() to free the subtale,it occurs the ovsrcu_flush_cbset() by accident. But cls->subtables->impl  still stores the deleted subtable's pointer,it will be moved out in pvector_publish(in step 7)
> 4.       Rcu thread start to ovsrcu_synchronize() before free the subtable in flush_list.( support target_seqno is 2, main thread's seq_no is smaller than 2,support it is 1.)
> 5.       The pmd thread is still working. It adds the global_seqno to 3.
> 6.       Revalidator thread wakes up form poll_block with seq_no 3. Then, It acquires the subtable's pointer in cls->sutables which shoud not be used.
> 7.       Ovs main thread call pvector_publish and go to poll_block(), moves out from ovsrcu_threads
> 8.       ovsrcu_synchronize in step 4 find all thread's seq_no is larger than 2,free the subtable
> 9.       revalidator is still using the subtable's pointer(has been freed),then coredump!!!
> 
> Is there any useful patch to fix this bug? Can we call the postpone(to free subtable) after  pvector_publish?
> 
> Looking forward to your reply
> 
> Thanks.



More information about the dev mailing list