[ovs-dev] [PATCHv2] connmgr: Check nullptr inside ofmonitor_report()

Ilya Maximets i.maximets at ovn.org
Mon Feb 22 12:12:01 UTC 2021


On 2/21/21 5:11 PM, William Tu wrote:
> On Fri, Feb 19, 2021 at 6:29 PM 贺鹏 <xnhp0320 at gmail.com> wrote:
>>
>> Hi, Ilya
>>
>> Ilya Maximets <i.maximets at ovn.org> 于2021年2月19日周五 下午7:19写道:
>>>
>>> On 2/19/21 3:12 AM, 贺鹏 wrote:
>>>> Hi,
>>>>
>>>> Looks like this bug is caused by violating the fact that if a rule is
>>>> referenced, the related ofproto should not be destroyed.
>>>>
>>>> If so, I have a patch that also fixes the problem, not sure if this helps.
>>>>
>>>> http://patchwork.ozlabs.org/project/openvswitch/patch/20200717015041.82746-1-hepeng.0320@bytedance.com/
>>>
>>> There is at least one more problem that is not strictly related but
>>> in more or less the same part of the code:
>>>   https://mail.openvswitch.org/pipermail/ovs-dev/2021-February/380582.html
>>
>> So maybe before add it into *ovsrcu_postpone*, we should add refcount
>> of ofproto also?
>>
> Yes, I think that will fix the issue. But are we able to find out all the
> places that we need to add refcount of ofproto?

destruct() is called for ofproto_dpif unconditionally without any deferring,
so this will not help unless we're delaying the the destruct() itself, and
I'm not sure if we can do that.

> 
> Looks like we might have multiple rcu postponed function that might
> access the 'ofproto'. And 'ofproto' might be freed already and cause segfault.
> 
> Hepeng's patch fixes two places.
>   http://patchwork.ozlabs.org/project/openvswitch/patch/20200717015041.82746-1-hepeng.0320@bytedance.com/
> Ilya pointed out another place
>   https://mail.openvswitch.org/pipermail/ovs-dev/2021-February/380582.html
> yifeng's case is a little different (not due to ofproto = NULL, but
> due to setting
> the p->connmgr = NULL before postponed)

In my case ofproto is alive too.  The problem is destroyed ofproto->baker.
And in case of meter_id we don't really need to refcount the ofproto itself.
'baker' already has a refcount, so we could increase it and pass 'baker'
pointer to free_meter_id instead of 'ofproto'.  This function doesn't
actually need an 'ofproto' pointer.

Regarding this connmgr related patch... It's still valid even with refcounts
because destruction of connmgr is explicitly not deferred for a reason.  It's
to free the listening socket that users might want to reuse.  Refcounts will
not help here.
Also for the meters related issue.. destruct() is called for ofproto_dpif
without any postponing and I'm not sure if we need or actually able to
postpone it without consequences.  So, baker->refcount might be a better
solution keeping the ofproto->refcount only for rules.  This will reduce
the scope of changes, otherwise we will have to do a lot more work tracking
down all the users that holds 'ofproto' pointer and will miss some of them
with high probability.

I'll recheck this (connmgr) patch and, likely, apply it.   Will also review
ofproto refcount patch.

Best regards, Ilya Maximets.


More information about the dev mailing list