[ovs-dev] About a race condition between handle_upcalls and revalidate

Paul Blakey paulb at mellanox.com
Tue Mar 21 07:54:42 UTC 2017



On 20/03/2017 23:08, Joe Stringer wrote:
> On 19 March 2017 at 07:28, Paul Blakey <paulb at mellanox.com> wrote:
>> Hi all,
>>
>> While using out patches for HW offload we've noticed we get a ovs assertion
>> at transition ukey, which tries to
>> transition the ukey state from EVICTED back to OPERATIONAL.
>> With furthur investigation it seem that this can happen without our HW
>> offload patches as there might be a race between handle_upcalls and
>> revalidate.
>>
>> The procedure is as such:
>>
>> handle_upcalls receives a new upcall and creates a new ukey, its state is
>> VISIBLE, it then it installs a flow (FLOW_PUT) to the datapat and
>> upon success wants to set the ukey state to OPERATIONAL (line 1408). for
>> that the handler running handle_upcalls wants to reaquirce the ukey lock,
>> but in the meantime revalidators dump (line 2261) the already inserted flow
>> and want to delete this flow (line 2328, say because of openflow db changes,
>> or aging). The revalidator deletes the flow and moves the ukey from
>> VISIBLE -> OPERATIONAL (line 2320) -> EVICTING (line 2220) -> EVICTED (line
>> 2134)
>>
>> finally handler succesfuly acquires the flow and now set the state to
>> OPERERTIONAL which will cause the assert in transition_ukey.
>>
>> Line numbers in question are from ofproto/ofproto-dpif-upcall.c
>>
>> I can provide a test the could show this happening, basicly adding a sleep
>> before (writing it now).
>
> Thanks for the report Paul, I've sent a patch - would you mind
> reviewing and testing it?
>
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-March/330029.html
>


Hi Joe,
I've a patch that I'm testing now which leave only the EVICTED change in 
handler code in case of an error (ukey will be changed to OPERATIONAL 
only when its confirimed by dump):


    /* Execute batch. */
     n_opsp = 0;
     for (i = 0; i < n_ops; i++) {
         opsp[n_opsp++] = &ops[i].dop;
     }
     dpif_operate(udpif->dpif, opsp, n_opsp);
     for (i = 0; i < n_ops; i++) {
         struct udpif_key *ukey = ops[i].ukey;

         if (ukey && ops[i].dop.error) {
             ovs_mutex_lock(&ukey->mutex);
             transition_ukey(__func__, __LINE__, ukey, UKEY_EVICTED);
             ovs_mutex_unlock(&ukey->mutex);
         }
     }





Since ukey (ops[i].ukey) can be entirely deleted and freed in the 
revalidator revalidate/sweep, the handlers will try and acquire a non 
existant lock and freeze up (this is why I added the second 
xsleep/global variable for the revalidator).
I think this can happen with your patch.

Thanks,
Paul.




More information about the dev mailing list