[ovs-dev] [PATCH] ofproto-dpif-upcall: Fix flow setup/delete race.

Joe Stringer joe at ovn.org
Mon Mar 27 19:55:18 UTC 2017


On 21 March 2017 at 02:44, Paul Blakey <paulb at mellanox.com> wrote:
>
>
> On 21/03/2017 10:04, Paul Blakey wrote:
>>
>>
>>
>> On 20/03/2017 23:08, Joe Stringer wrote:
>>>
>>> If a handler thread takes a long time to set up a set of flows, it is
>>> possible for one of the installed flows to be dumped and scheduled
>>> for deletion by a revalidator thread before the handler is able to
>>> transition the ukey into an operational state---Between the
>>> dpif_operate() above this function and the ukey lock / transition logic
>>> modified by this patch.
>>>
>>> Only transition the ukey for the flow if it wasn't already transitioned
>>> to a later state by a revalidator thread.
>>>
>>> Fixes: 54ebeff4c03d ("upcall: Track ukey states.")
>>> Reported-by: Paul Blakey <paulb at mellanox.com>
>>> Signed-off-by: Joe Stringer <joe at ovn.org>
>>> ---
>>>  ofproto/ofproto-dpif-upcall.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/ofproto/ofproto-dpif-upcall.c
>>> b/ofproto/ofproto-dpif-upcall.c
>>> index 07086ee385cc..0854807e4482 100644
>>> --- a/ofproto/ofproto-dpif-upcall.c
>>> +++ b/ofproto/ofproto-dpif-upcall.c
>>> @@ -1404,7 +1404,7 @@ handle_upcalls(struct udpif *udpif, struct
>>> upcall *upcalls,
>>>              ovs_mutex_lock(&ukey->mutex);
>>>              if (ops[i].dop.error) {
>>>                  transition_ukey(ukey, UKEY_EVICTED);
>>> -            } else {
>>> +            } else if (ukey->state < UKEY_OPERATIONAL) {
>>>                  transition_ukey(ukey, UKEY_OPERATIONAL);
>>>              }
>>>              ovs_mutex_unlock(&ukey->mutex);
>>>
>>
>> Hi Joe,
>> As per other thread, I think there is a trouble locking the mutex in
>> case there is no error, as the flow is installed it can be removed
>> entirely by revalidator's revalidate/sweep:
>>
>> Here is the timing:
>>
>> Handler installs the flow
>> Handler thread is scheduled out before trying to lock the mutex
>> Revalidators revalidate dump the installed flow and decide to evict it
>> Revalidators evict the flow (push_dp_ops in revalidate)
>> Revalidator sweep delete the ukey ukey_delete(umap, ukey);
>> Revalidator calling ukey_delete postpones ukey_delete__ to free the ukey
>> and mutex
>> Handler thread is scheduled again and tries to acquire the freed lock.
>>
>> Is this correct?
>> If so, and I didn't miss something, I've suggested a alternate patch in
>> the other thread (only lock the mutex and transition to EVICTED in case
>> of an ops[i] error, leave OEPRATIONAL for dump)
>>
>
> Hi,
> The above timing/scheduling can't happen as the actual freeing of the ukey
> (ukey_delete__) is postponed after the handler returns from handle_upcalls.
> I've missed that with the other thread's patch because it used xsleep as you
> said which calls ovsrcu_quiesce_start. With sleep instead it seems to be
> postponed to after poll_block in the handler thread runs (and we don't have
> references there anymore).
>
> I don't know how the handler thread actually signals it doesn't have any
> references to the ukey anymore, can you expand on that mechanism?

Hi Paul,

Thanks for testing this out. I've been a bit busy last week so didn't
get a chance to respond.

The handler thread will not quiesce while installing a flow, so it is
guaranteed to be within the same RCU "locked" period. Until it
quiesces, any RCU-protected structures must not be freed. It adds the
ukey into the cmap which makes it available to be read from other
threads, and retains a reference to it. The revalidator will dump the
flow, lookup the entry and find it, then apparently decide to delete
the ukey. It removes the ukey from the cmap right now, but it is not
allowed to actually free it until the next RCU grace period. The
revalidator thread may continue on and reach an RCU grace period,
quiesce, and then from that thread's perspective it should be fine to
free the ukey. However, all threads must quiesce before any of the
memory currently referenced can be freed. So the handler thread must
finish dealing with this upcall then quiesce before the ukey will be
freed. Finally, once the handler quiesces, both threads have reached
the end of that RCU "locked" period and the RCU thread may go through
and clean up the ukey.

I hope that clears things up?

Cheers,
Joe


More information about the dev mailing list