[ovs-dev] [PATCH] ofproto-dpif-upcall: Fix flow setup/delete race.

Joe Stringer joe at ovn.org
Wed Mar 29 17:19:37 UTC 2017


On 28 March 2017 at 23:59, Paul Blakey <paulb at mellanox.com> wrote:
>
>
> On 27/03/2017 22:55, Joe Stringer wrote:
>>
>> On 21 March 2017 at 02:44, Paul Blakey <paulb at mellanox.com> wrote:
>>>
>>>
>>>
>>> On 21/03/2017 10:04, Paul Blakey wrote:
>>>>
>>>>
>>>>
>>>>
>>>> On 20/03/2017 23:08, Joe Stringer wrote:
>>>>>
>>>>>
>>>>> If a handler thread takes a long time to set up a set of flows, it is
>>>>> possible for one of the installed flows to be dumped and scheduled
>>>>> for deletion by a revalidator thread before the handler is able to
>>>>> transition the ukey into an operational state---Between the
>>>>> dpif_operate() above this function and the ukey lock / transition logic
>>>>> modified by this patch.
>>>>>
>>>>> Only transition the ukey for the flow if it wasn't already transitioned
>>>>> to a later state by a revalidator thread.
>>>>>
>>>>> Fixes: 54ebeff4c03d ("upcall: Track ukey states.")
>>>>> Reported-by: Paul Blakey <paulb at mellanox.com>
>>>>> Signed-off-by: Joe Stringer <joe at ovn.org>
>>>>> ---
>>>>>  ofproto/ofproto-dpif-upcall.c | 2 +-
>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/ofproto/ofproto-dpif-upcall.c
>>>>> b/ofproto/ofproto-dpif-upcall.c
>>>>> index 07086ee385cc..0854807e4482 100644
>>>>> --- a/ofproto/ofproto-dpif-upcall.c
>>>>> +++ b/ofproto/ofproto-dpif-upcall.c
>>>>> @@ -1404,7 +1404,7 @@ handle_upcalls(struct udpif *udpif, struct
>>>>> upcall *upcalls,
>>>>>              ovs_mutex_lock(&ukey->mutex);
>>>>>              if (ops[i].dop.error) {
>>>>>                  transition_ukey(ukey, UKEY_EVICTED);
>>>>> -            } else {
>>>>> +            } else if (ukey->state < UKEY_OPERATIONAL) {
>>>>>                  transition_ukey(ukey, UKEY_OPERATIONAL);
>>>>>              }
>>>>>              ovs_mutex_unlock(&ukey->mutex);
>>>>>
>>>>
>>>> Hi Joe,
>>>> As per other thread, I think there is a trouble locking the mutex in
>>>> case there is no error, as the flow is installed it can be removed
>>>> entirely by revalidator's revalidate/sweep:
>>>>
>>>> Here is the timing:
>>>>
>>>> Handler installs the flow
>>>> Handler thread is scheduled out before trying to lock the mutex
>>>> Revalidators revalidate dump the installed flow and decide to evict it
>>>> Revalidators evict the flow (push_dp_ops in revalidate)
>>>> Revalidator sweep delete the ukey ukey_delete(umap, ukey);
>>>> Revalidator calling ukey_delete postpones ukey_delete__ to free the ukey
>>>> and mutex
>>>> Handler thread is scheduled again and tries to acquire the freed lock.
>>>>
>>>> Is this correct?
>>>> If so, and I didn't miss something, I've suggested a alternate patch in
>>>> the other thread (only lock the mutex and transition to EVICTED in case
>>>> of an ops[i] error, leave OEPRATIONAL for dump)
>>>>
>>>
>>> Hi,
>>> The above timing/scheduling can't happen as the actual freeing of the
>>> ukey
>>> (ukey_delete__) is postponed after the handler returns from
>>> handle_upcalls.
>>> I've missed that with the other thread's patch because it used xsleep as
>>> you
>>> said which calls ovsrcu_quiesce_start. With sleep instead it seems to be
>>> postponed to after poll_block in the handler thread runs (and we don't
>>> have
>>> references there anymore).
>>>
>>> I don't know how the handler thread actually signals it doesn't have any
>>> references to the ukey anymore, can you expand on that mechanism?
>>
>>
>> Hi Paul,
>>
>> Thanks for testing this out. I've been a bit busy last week so didn't
>> get a chance to respond.
>>
>> The handler thread will not quiesce while installing a flow, so it is
>> guaranteed to be within the same RCU "locked" period. Until it
>> quiesces, any RCU-protected structures must not be freed.
>
>
>
> It adds the
>>
>> ukey into the cmap which makes it available to be read from other
>> threads, and retains a reference to it. The revalidator will dump the
>> flow, lookup the entry and find it, then apparently decide to delete
>> the ukey. It removes the ukey from the cmap right now, but it is not
>> allowed to actually free it until the next RCU grace period. The
>> revalidator thread may continue on and reach an RCU grace period,
>> quiesce, and then from that thread's perspective it should be fine to
>> free the ukey. However, all threads must quiesce before any of the
>> memory currently referenced can be freed. So the handler thread must
>> finish dealing with this upcall then quiesce before the ukey will be
>> freed. Finally, once the handler quiesces, both threads have reached
>> the end of that RCU "locked" period and the RCU thread may go through
>> and clean up the ukey.
>>
>> I hope that clears things up?
>>
>> Cheers,
>> Joe
>>
>
> Hi,
> Yes it does, but where exactly does the handler thread enters a quiesce
> state after installing the flow? (Doesn't it have to call
> ovsrcu_quiesce_start? is the thread entering a sleeping state enough?)

After every batch of upcalls that it handles (UPCALL_MAX_BATCH=64), it
will call poll_block() and quiesce via that.


More information about the dev mailing list