[ovs-dev] [PATCH v2] ofproto-dpif-upcall: Fix race condition while purging

Ilya Maximets i.maximets at ovn.org
Mon May 31 17:58:01 UTC 2021


On 5/28/21 5:16 AM, Jianbo Liu wrote:
> The 05/27/2021 12:43, Ilya Maximets wrote:
>> On 5/27/21 12:29 PM, Jianbo Liu wrote:
>>> There is a race condidtion between purger and handler in dpif-netlink.
>>> Handler may create new ukey and install it while executing 'ovs-appctl
>>> revalidator/purge' command. However, before handler calls
>>> transition_ukey() in handle_upcalls(), purger may get this ukey from
>>> umap, then evict and delete it. This will trigger ovs_abort in
>>> transition_ukey() for handler because it is trying to set state to
>>> EVICTED or OPERATIONAL, but ukey is already in DELETED state.
>>> To fix this issue, purger must not delete ukey in VISIBLE state.
>>
>> Hi.  This is not a good thing to trigger abort(), but "purge" means
>> "purge".  And, AFAIU, most of ukeys are visible.  This is a purely
>> debug interface that was introduced to test functionality of OVS and
>> should not be used in production environment.  The fact that "purge"
>> doesn't remove visible ukeys, in my opinion, just makes the appctl
>> "revalidator/purge" useless.  Can we replace abort() with rate-limited
> 
> But currently ukey is also not removed if we can't hold its lock.
> Besides, new ukey will be installed while purging. We can't make sure
> that no visilbe/operational ukeys exist at the monent this command is
> finished. So in order to resolve the race, visible ukey should be kept,
> it is just like a new incoming ukey.

The purpose of this call is to be used in the testsuite where we have
a full control over traffic that flows through OVS ports, so locking
issue or installation of a new ukey is not possible there.

> 
>> error message for this scenario instead to avoid process termination?
> 
> No sure if there are issues, because ukey is deleted (maybe freed) while
> handler still access it by the pointer.

Well, if there is a race that could lead to a crash, it make sense
fixing, but it doesn't make sense to fix only for one datapath type.

If it's not possible to fix keeping the semantics of the call, I'd
prefer to not fix that at all as we need this appctl for unit testing,
other use cases are not supported. i.e. there is no point to change
this call if the fix beaks its main functionality.

>>
>> BTW, how did you catch this?  Is it reproducible with system tests?
> 
> We found this issue when testing CT offload, do purge without stop
> traffic.

What is the point of calling 'purge' in this scenario if it will
not purge visible ukeys?

> 
> Thanks!
> Jianbo
> 
>>
>> Best regards, Ilya Maximets.
> 



More information about the dev mailing list