[ovs-dev] [ovs-dev v2] ofproto: add refcount to ofproto to fix crash at ofproto_destroy

William Tu u9012063 at gmail.com
Thu Feb 25 18:31:35 UTC 2021


On Thu, Feb 25, 2021 at 4:54 AM Ilya Maximets <i.maximets at ovn.org> wrote:
>
> On 2/25/21 4:32 AM, Guohongzhi (Russell Lab) wrote:
> > Refcount and RCU are not mutually exclusive.
> > IMO, the main reason for this problem is that the rule uses both refcount and rcu, while the ofproto uses only rcu, and the rule retains the pointer of the ofproto. More importantly, ofproto cannot guarantee a longer grace period than the rule.
> >
>
> I understand that refcount and RCU are not mutually exclusive.
> However, in this particular case RCU for ofproto was introduced for one
> and only one reason - to avoid freeing ofproto while rules are still
> alive.  This was done in commit f416c8d61601 ("ofproto: RCU postpone
> rule destruction.").  The goal was to allow using rules without
> refcounting them within a single grace period.  And that forced us
> to postpone destruction of the ofproto for a single grace period.
> Later commit 39c9459355b6 ("Use classifier versioning.") made it
> possible for rules to be alive for more than one grace period, so
> the commit made ofproto wait for 2 grace periods by double postponing.
> As we can see now, that wasn't enough and we have to wait for more
> than 2 grace periods in certain cases.
>
> Now we're introducing refcounts to wait for all rules to be destroyed
> before destroying the ofproto.  But what is the point of having
> RCU if we already know that if refcount is zero than all rules are
> already destroyed and we don't need to wait anything and could just
> destroy the ofproto?
>
> RCU doesn't protect ofproto itself here, it actually protects rules,
> i.e. keeps ofproto alive while rules alive, but we can fully replace
> this with refcounts and I see no value in having RCU additionally.
>
> To have a full picture: right now we also have groups protected by
> RCU, so we need to refcount ofproto for them too.  But that is almost
> exactly same situation as we have with rules.
>
Thanks Guohongzhi, Hepeng, and Ilya, I learned a lot from your discussions.

I think refcount and RCU are mutually exclusive. They are two different ways of
doing synchronization and somehow we mix them together by using RCU to
optimize refcount.

Before the commit f416c8d61601 ("ofproto: RCU postpone rule destruction."),
we are using refcount to protect rules, and as a result every time OVS
references
a rule, we have to take refcount. It works ok but this probably has
high performance
overhead because it's doing atomic operations.

So the commit f416c8d61601 optimizes it by doing
1) not taking refcount of rule "within a grace period"
2) introduce RCU for rule
The assumption is that a grace period is like refcount == 0, so it's
valid to do so.
A side effect is that ofproto_destroy__()  needs to be postponed.
Note that if a rule is alive across grace period, it has to take refcount.

However, later commit 39c9459355b6 ("Use classifier versioning.")
makes rule alive across grace period. It breaks the first commit's assumption
and it has to introduce double postponing for ofproto, which we found
out is the problem now.

Since my point is RCU and refcount are mutually exclusive (A grace period
is like refcount ==0) , I agree with Ilya that we can just use refcount to fix
ofproto issue.

Regards,
William


More information about the dev mailing list