[ovs-dev] [ovs-dev v2] ofproto: add refcount to ofproto to fix crash at ofproto_destroy

Ilya Maximets i.maximets at ovn.org
Wed Feb 24 16:01:38 UTC 2021


On 2/24/21 3:29 PM, Ilya Maximets wrote:
> On 7/17/20 3:50 AM, hepeng.0320 wrote:
>> From: Peng He <hepeng.0320 at bytedance.com>
>>
>> The call stack of rule_destroy_cb:
>>
>> remove_rules_postponed (one grace period)
>>     -> remove_rule_rcu
>>         -> remove_rule_rcu__
>>             -> ofproto_rule_unref -> ref count != 1
>>                 -> ... more grace periods.
>>                 -> rule_destroy_cb (> 2 grace periods)
>>                     -> free
>>
>> Currently ofproto_destory waits only two grace periods, this means when
>> rule_destroy_cb is called, the ofproto might have been already destroyed.
>>
>> This patch adds a refcount for ofproto to ensure ofproto exists when it
>> is needed to call free functions.
>>
>> This patch fixes the crashes found:
>> Program terminated with signal SIGSEGV, Segmentation fault.
>> 0  0x0000562a55169e49 in rule_destroy_cb (rule=0x562a598be2c0) at ofproto/ofproto.c:2956
>> 1  0x0000562a552623d6 in ovsrcu_call_postponed () at lib/ovs-rcu.c:348
>> 2  0x0000562a55262504 in ovsrcu_postpone_thread (arg=<optimized out>) at lib/ovs-rcu.c:364
>> 3  0x0000562a55264aef in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:383
>> 4  0x00007febe715a4a4 in start_thread (arg=0x7febe4c1c700) at pthread_create.c:456
>> 5  0x00007febe6990d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
>> (gdb) p rule->ofproto->ofproto_class
>> $3 = (const struct ofproto_class *) 0x0
>>
>> Also:
>> 0  ofproto_dpif_credit_table_stats (ofproto=0x5583583332b0, table_id=0 '\000', n_matches=5, n_misses=0) at ofproto/ofproto-dpif.c:4310
>> 1  0x0000558354c68514 in xlate_push_stats_entry (entry=entry at entry=0x7ff488031398, stats=stats at entry=0x7ff49af30b60) at ofproto/ofproto-dpif-xlate-cache.c:99
>> 2  0x0000558354c686f3 in xlate_push_stats (xcache=<optimized out>, stats=stats at entry=0x7ff49af30b60) at ofproto/ofproto-dpif-xlate-cache.c:181
>> 3  0x0000558354c5411a in revalidate_ukey (udpif=udpif at entry=0x5583583baba0, ukey=ukey at entry=0x7ff47809f770, stats=stats at entry=0x7ff49af32128, odp_actions=odp_actions at entry=0x7ff49af30c50, reval_seq=reval_seq at entry=275670456,
>>    recircs=recircs at entry=0x7ff49af30cd0) at ofproto/ofproto-dpif-upcall.c:2292
>> 4  0x0000558354c57cbc in revalidate (revalidator=<optimized out>) at ofproto/ofproto-dpif-upcall.c:2681
>> 5  0x0000558354c57f8e in udpif_revalidator (arg=0x5583583d2a90) at ofproto/ofproto-dpif-upcall.c:934
>> 6  0x0000558354d24aef in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:383
>> 7  0x00007ff4a479d4a4 in start_thread (arg=0x7ff49af35700) at pthread_create.c:456
>> 8  0x00007ff4a3fd3d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
>> (gdb) p ofproto->up.ofproto_class
>> $3 = (const struct ofproto_class *) 0x0
>>
>> Signed-off-by: Peng He <hepeng.0320 at bytedance.com>
>> ---

CC: Hongzhi Guo

And I just remembered that there is a very similar patch that
was submitted to the mail-list several months betore this one.
Ben started review, but didn't finish:

http://patchwork.ozlabs.org/project/openvswitch/patch/20200306130555.19884-1-guohongzhi1@huawei.com/

Both patches has smilar issues, i.e. mixing RCU with refcounters
and not covering all the cases.

There are at least 2 major places where ofproto needs to be refcounted:
1. rules
   1.a xlate caches?
2. ofproto groups

IMO, mixing RCU with refcounts basically means that we do not have a
clear understanding on how it works and using RCU as a safety net.
We shuld stop using RCU for orproto and just have refcounts.  Once
ofproto refcount reaches zero it should be immediately destroyed.
All the places where we need to increase refcount needs to be tracked
down.

It'll be great if you can cooperate to prepare a complete solution
for this issue.

Best regards, Ilya Maximets.


More information about the dev mailing list