[ovs-dev] Questions about rcu_postpone's wrong use in classifier.c
Ben Pfaff
blp at ovn.org
Mon Apr 1 14:03:01 UTC 2019
On April 1, 2019 2:56:13 PM GMT+02:00, Ilya Maximets <i.maximets at samsung.com> wrote:
>Hi.
>
>Thanks for the report.
>
>It looks like an rcu misusing inside pvector implementation.
>The following sequence will lead to the use-after-free:
>
> Thread#1 Thread#2
>
>1. pvector_init(pvec);
>2. pvector_insert(pvec, elem1);
>3. pvector_publish(pvec);
>
> /* Remove from 'temp' impl. */
>4. pvector_remove(pvec, elem1);
> /* Postpone the free() */
>5. ovsrcu_postpone(free, elem1);
>6. cursor__ =
>pvector_cursor_init(PVECTOR, 0, 0);
> /* Get pointer from main impl. */
>7. elem1 =
>pvector_cursor_next(&cursor__, INT_MIN, 0, 0);
>
>8. ovsrcu_quiesce();
>
>9. cursor__ =
>pvector_cursor_init(PVECTOR, 0, 0);
> /* Get pointer from main impl. */
>10. elem1 =
>pvector_cursor_next(&cursor__, INT_MIN, 0, 0);
>
> /* Change 'temp' with main impl. */
> /* Postpone the free() of old impl. */
>11. pvector_publish(pvec);
>/* Postponed free() frees the elem1 because both threads quiesced once
>since postpone(). */
>12. ovsrcu_quiesce();
>
> /* Try to use obtained pointer. Use after free. */
>13. elem1->use();
> /* Postponed free() frees the old 'impl'. */
>14. ovsrcu_quiesce();
>
>
>Thread#2 assumes that it's safe to use pvector element within its grace
>period,
>i.e. between steps 8 and 14, however it was already freed on step 12.
>
>So, in current implementation it's not allowed for other threads to
>quiesce while
>one thread modifies the pvector.
>
>Main issue is that ovsrcu_get(struct pvector_impl *, &pvec->impl)
>doesn't return
>the most recent implementation. It continues to return old one until
>changes published.
>But it can't return the new one, because it's not sorted. So, I'm not
>sure how to
>fix this properly. In fact we need to postpone freeing elements after
>freeing the
>pvector->impl. This probably could be achieved by double postponing the
>freeing of
>the pvector elements. I'm not sure how to implement this correctly.
>Another option
>is to force callers to call ovsrcu_postpone(free, elem1) only after
>pvector_publish().
>
>Ben, what do you think?
>
>
>Best regards, Ilya Maximets.
>
>> Hi all,
>>
>> We find a core dump file in our environment with ovs+dpdk, the
>following is call stack:
>>
>> #0 0x00007f354d6eb237 in raise () from /lib64/libc.so.6
>> #1 0x00007f354d6ec928 in abort () from /lib64/libc.so.6
>> #2 0x00000000006a6d99 in PAT_abort ()
>> #3 0x00000000006a3edd in patchIllInsHandler ()
>> #4 <signal handler called>
>> #5 0x00000000004c2275 in rehash (hash=1661256324, impl=0x0) at
>lib/ccmap.c:135
>> #6 ccmap_find (ccmap=ccmap at entry=0x26b654b8, hash=1661256324) at
>lib/ccmap.c:229
>> #7 0x00000000004bf73b in find_match_wc (wc=0x7f350ce67370,
>n_tries=0, trie_ctx=0x7f350ce63e20, flow=0x7f350ce66dc0,
>> version=18446744073709551614, subtable=<optimized out>) at
>lib/classifier.c:1676
>> #8 classifier_lookup__ (cls=cls at entry=0xee2360 <cls>,
>version=version at entry=18446744073709551614,
>> flow=flow at entry=0x7f350ce66dc0, wc=wc at entry=0x7f350ce67370,
>allow_conjunctive_matches=allow_conjunctive_matches at entry=true)
>> at lib/classifier.c:972
>> #9 0x00000000004c193b in classifier_lookup (cls=cls at
>entry=0xee2360 <cls>, version=version at entry=18446744073709551614,
>> flow=flow at entry=0x7f350ce66dc0, wc=wc at entry=0x7f350ce67370)
>at lib/classifier.c:1166
>> #10 0x0000000000575acb in tnl_port_map_lookup (flow=flow at
>entry=0x7f350ce66dc0, wc=wc at entry=0x7f350ce67370)
>> at lib/tnl_ports.c:287
>> #11 0x00000000004a7049 in compose_output_action__ (ctx=ctx at
>entry=0x7f350ce65d60, ofp_port=1, xr=0x0,
>> check_stp=check_stp at entry=true) at
>ofproto/ofproto_dpif_xlate.c:3753
>> #12 0x00000000004a7967 in compose_output_action (xr=<optimized out>,
>ofp_port=<optimized out>, ctx=0x7f350ce65d60)
>> at ofproto/ofproto_dpif_xlate.c:3832
>> #13 output_normal (ctx=ctx at entry=0x7f350ce65d60,
>out_xbundle=out_xbundle at entry=0x2cfe8120, xvlan=xvlan at
>entry=0x7f350ce655e0)
>> at ofproto/ofproto_dpif_xlate.c:2288
>> #14 0x00000000004a7df5 in xlate_normal_flood (ctx=ctx at
>entry=0x7f350ce65d60, in_xbundle=in_xbundle at entry=0x9b547c0,
>> xvlan=xvlan at entry=0x7f350ce655e0) at
>ofproto/ofproto_dpif_xlate.c:2757
>> #15 0x00000000004aaf39 in xlate_normal (ctx=0x7f350ce65d60) at
>ofproto/ofproto_dpif_xlate.c:2987
>> #16 xlate_output_action (ctx=ctx at entry=0x7f350ce65d60,
>port=<optimized out>, max_len=<optimized out>,
>> may_packet_in=may_packet_in at entry=true) at
>ofproto/ofproto_dpif_xlate.c:4577
>> #17 0x00000000004a920e in do_xlate_actions (ofpacts=<optimized out>,
>ofpacts_len=<optimized out>, ctx=ctx at entry=0x7f350ce65d60)
>> at ofproto/ofproto_dpif_xlate.c:5565
>> #18 0x00000000004ac25f in xlate_actions (xin=xin at
>entry=0x7f350ce66db0, xout=xout at entry=0x7f350ce675f0)
>> at ofproto/ofproto_dpif_xlate.c:6456
>> #19 0x000000000049d4b7 in xlate_key (key=<optimized out>,
>len=<optimized out>, push=push at entry=0x7f350ce670f0,
>> ctx=ctx at entry=0x7f350ce675d0, udpif=<optimized out>) at
>ofproto/ofproto_dpif_upcall.c:1974
>> #20 0x000000000049da48 in xlate_ukey (ukey=0x7f34ec032a20,
>ukey=0x7f34ec032a20, ctx=0x7f350ce675d0, tcp_flags=<optimized out>,
>> udpif=0x6b6c550) at ofproto/ofproto_dpif_upcall.c:1986
>> #21 revalidate_ukey__ (udpif=udpif at entry=0x6b6c550, ukey=ukey at
>entry=0x7f34ec032a20, tcp_flags=<optimized out>,
>> odp_actions=0x7f350ce679e0, recircs=recircs at
>entry=0x7f350ce679d0, xcache=<optimized out>)
>> at ofproto/ofproto_dpif_upcall.c:2032
>> #22 0x000000000049dc8d in revalidate_ukey (udpif=udpif at
>entry=0x6b6c550, ukey=ukey at entry=0x7f34ec032a20,
>> stats=stats at entry=0x7f350ce688b8, odp_actions=odp_actions at
>entry=0x7f350ce679e0, reval_seq=reval_seq at entry=268144954,
>> recircs=recircs at entry=0x7f350ce679d0) at
>ofproto/ofproto_dpif_upcall.c:2128
>> #23 0x00000000004a0853 in revalidate (revalidator=0x75715d0) at
>ofproto/ofproto_dpif_upcall.c:2488
>> #24 0x00000000004a0a4b in udpif_revalidator (arg=0x75715d0) at
>ofproto/ofproto_dpif_upcall.c:958
>> #25 0x000000000054ca48 in ovsthread_wrapper (aux_=<optimized out>) at
>lib/ovs_thread.c:682
>> #26 0x00007f354f303dd5 in start_thread () from /lib64/libpthread.so.0
>> #27 0x00007f354d7b359d in clone () from /lib64/libc.so.6
>>
>> It is raised by the wrong use of rcu_postpone when deleting subtable
>from classifier->subtables(pvector), the following is the triggering
>steps:
>>
>> 1. Modify a hnic port's ip(by ifconfig hnic x.x.x.x/24)
>> 2. Ovs route's reset will flush all entries in the tnl_ports's
>classifier(cls)
>> 3. Ovs main thread call destroy_subtable(): call
>ovsrcu_postpone() to free the subtale,it occurs the
>ovsrcu_flush_cbset() by accident. But cls->subtables->impl still
>stores the deleted subtable's pointer,it will be moved out in
>pvector_publish(in step 7)
>> 4. Rcu thread start to ovsrcu_synchronize() before free the
>subtable in flush_list.( support target_seqno is 2, main thread's
>seq_no is smaller than 2,support it is 1.)
>> 5. The pmd thread is still working. It adds the global_seqno to
>3.
>> 6. Revalidator thread wakes up form poll_block with seq_no 3.
>Then, It acquires the subtable's pointer in cls->sutables which shoud
>not be used.
>> 7. Ovs main thread call pvector_publish and go to poll_block(),
>moves out from ovsrcu_threads
>> 8. ovsrcu_synchronize in step 4 find all thread's seq_no is
>larger than 2,free the subtable
>> 9. revalidator is still using the subtable's pointer(has been
>freed),then coredump!!!
>>
>> Is there any useful patch to fix this bug? Can we call the
>postpone(to free subtable) after pvector_publish?
>>
>> Looking forward to your reply
>>
>> Thanks.
Hey, I'm out at a conference this week, so I probably won't be able to take a look at this soon.
More information about the dev
mailing list