[ovs-dev] About a race condition between handle_upcalls and revalidate

Joe Stringer joe at ovn.org
Mon Mar 20 21:09:55 UTC 2017


On 19 March 2017 at 08:44, Paul Blakey <paulb at mellanox.com> wrote:
>
>
> On 19/03/2017 17:10, Paul Blakey wrote:
>>>
>>> Hi all,
>>>
>>> While using out patches for HW offload we've noticed we get a ovs
>>> assertion at transition ukey, which tries to
>>> transition the ukey state from EVICTED back to OPERATIONAL.
>>> With furthur investigation it seem that this can happen without our HW
>>> offload patches as there might be a race between handle_upcalls and
>>> revalidate.
>>>
>>> The procedure is as such:
>>>
>>> handle_upcalls receives a new upcall and creates a new ukey, its state
>>> is VISIBLE, it then it installs a flow (FLOW_PUT) to the datapat and
>>> upon success wants to set the ukey state to OPERATIONAL (line 1408). for
>>> that the handler running handle_upcalls wants to reaquirce the ukey
>>> lock, but in the meantime revalidators dump (line 2261) the already
>>> inserted flow and want to delete this flow (line 2328, say because of
>>> openflow db changes, or aging). The revalidator deletes the flow and
>>> moves the ukey from
>>> VISIBLE -> OPERATIONAL (line 2320) -> EVICTING (line 2220) -> EVICTED
>>> (line 2134)
>>>
>>> finally handler succesfuly acquires the flow and now set the state to
>>> OPERERTIONAL which will cause the assert in transition_ukey.
>>>
>>> Line numbers in question are from ofproto/ofproto-dpif-upcall.c
>>>
>>> I can provide a test the could show this happening, basicly adding a
>>> sleep before (writing it now).
>>>
>>> Thanks,
>>> Paul Blakey.
>>>
>>>
>>>
>>
>>
>> Here's a test:
>> 1) Install the below patch
>> 2) Set max-idle to 500ms, n-handlers to 1, revalidators to 10
>> 3) Create a simple setup (like veth pair in OVS_TRAFFIC_VSWITCHD_START
>> traffic testsuite)
>> 4) Ping one of the machines/veths slowly and wait till you get to
>> sleeping part and stop (put: 24 print)
>>
>>
>> Here is my log:
>> .......
>>
>> 2017-03-19T14:48:18.609Z|00121|ofproto_dpif_upcall(handler12)|INFO|ukey_install__
>> 1696 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 transition from 0 to
>> VISIBLE
>> 2017-03-19T14:48:18.609Z|00122|ofproto_dpif_upcall(handler12)|INFO|state
>> transition for ukey ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1: 0 -> 1
>>
>> 2017-03-19T14:48:18.609Z|00123|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 1/20 sec 1
>>
>> 2017-03-19T14:48:18.832Z|00001|ofproto_dpif_upcall(revalidator15)|INFO|revalidate
>> 2376 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 transition from 1 to
>> OPERATIONAL
>>
>> 2017-03-19T14:48:18.832Z|00002|ofproto_dpif_upcall(revalidator15)|INFO|state
>> transition for ukey ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1: 1 -> 2
>>
>> 2017-03-19T14:48:18.832Z|00003|ofproto_dpif_upcall(revalidator15)|INFO|reval_op_init
>> 2272 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 transition from 2 to
>> EVICITING
>>
>> 2017-03-19T14:48:18.832Z|00004|ofproto_dpif_upcall(revalidator15)|INFO|state
>> transition for ukey ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1: 2 -> 3
>>
>> 2017-03-19T14:48:18.832Z|00005|ofproto_dpif_upcall(revalidator15)|INFO|push_dp_ops
>> 2182 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 transition from 3 to
>> EVICTED
>>
>> 2017-03-19T14:48:18.832Z|00006|ofproto_dpif_upcall(revalidator15)|INFO|state
>> transition for ukey ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1: 3 -> 4
>>
>> 2017-03-19T14:48:19.609Z|00124|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 2/20 sec 4
>>
>> 2017-03-19T14:48:20.610Z|00125|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 3/20 sec 4
>>
>> 2017-03-19T14:48:21.610Z|00126|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 4/20 sec 4
>>
>> 2017-03-19T14:48:22.610Z|00127|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 5/20 sec 4
>>
>> 2017-03-19T14:48:23.610Z|00128|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 6/20 sec 4
>>
>> 2017-03-19T14:48:24.610Z|00129|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 7/20 sec 4
>>
>> 2017-03-19T14:48:25.611Z|00130|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 8/20 sec 4
>>
>> 2017-03-19T14:48:26.611Z|00131|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 9/20 sec 4
>>
>> 2017-03-19T14:48:27.611Z|00132|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 10/20 sec 4
>>
>> 2017-03-19T14:48:28.611Z|00133|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 11/20 sec 4
>>
>> 2017-03-19T14:48:29.611Z|00134|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 12/20 sec 4
>>
>> 2017-03-19T14:48:30.611Z|00135|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 13/20 sec 4
>>
>> 2017-03-19T14:48:31.611Z|00136|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 14/20 sec 4
>>
>> 2017-03-19T14:48:32.612Z|00137|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 15/20 sec 4
>>
>> 2017-03-19T14:48:33.612Z|00138|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 16/20 sec 4
>>
>> 2017-03-19T14:48:34.612Z|00139|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 17/20 sec 4
>>
>> 2017-03-19T14:48:35.612Z|00140|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 18/20 sec 4
>>
>> 2017-03-19T14:48:36.612Z|00141|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 19/20 sec 4
>>
>> 2017-03-19T14:48:37.612Z|00142|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 20/20 sec 4
>> 2017-03-19T14:48:38.613Z|00143|ofproto_dpif_upcall(handler12)|INFO|put: 25
>>
>> 2017-03-19T14:48:38.613Z|00144|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1434 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 transition from 4 to
>> OPERATIONAL
>> 2017-03-19T14:48:38.613Z|00145|ofproto_dpif_upcall(handler12)|INFO|state
>> transition for ukey ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1: 4 -> 2
>>
>> 2017-03-19T14:48:38.613Z|00146|util(handler12)|EMER|ofproto/ofproto-dpif-upcall.c:1716:
>> assertion dst >= ukey->state failed in transition_ukey()
>>
>>
>>
>>
>> commit 4755254db3053bcf7ceac88fedce60fd228c8640 (HEAD)
>> Author: Paul Blakey <paulb at mellanox.com>
>> Date:   Sun Mar 19 17:00:30 2017 +0200
>>
>>     test
>>     add prints and forcefully slow down some actions (EVICTED->DELETED,
>> and VISIBLE -> OPERATIONAL)
>>
>>
>> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
>> index 660383f..0dfffdb 100644
>> --- a/ofproto/ofproto-dpif-upcall.c
>> +++ b/ofproto/ofproto-dpif-upcall.c
>> @@ -45,6 +45,8 @@
>>  #define UPCALL_MAX_BATCH 64
>>  #define REVALIDATE_MAX_BATCH 50
>>
>> +int global_temp = 0;
>> +
>>  VLOG_DEFINE_THIS_MODULE(ofproto_dpif_upcall);
>>
>>  COVERAGE_DEFINE(dumped_duplicate_flow);
>> @@ -1399,15 +1401,34 @@ handle_upcalls(struct udpif *udpif, struct
>> upcall *upcalls,
>>      dpif_operate(udpif->dpif, opsp, n_opsp);
>>      for (i = 0; i < n_ops; i++) {
>>          struct udpif_key *ukey = ops[i].ukey;
>> +        static int i = 0;
>>
>>          if (ukey) {
>> +            struct ds ds = DS_EMPTY_INITIALIZER;
>> +            odp_format_ufid(&ukey->ufid, &ds);
>> +
>> +            if (i++%25 == 25-1) {
>> +                global_temp = 1;
>> +                for (int z = 0; z < 20; z++) {
>> +                    VLOG_INFO("%s %d %s sleeping %d/%d sec %d",
>> __func__, __LINE__, ds_cstr(&ds), z+1, 20, ukey->state);
>> +                    xsleep(1);
>> +                }
>> +            }
>> +            VLOG_INFO("put: %d", i);
>> +
>>              ovs_mutex_lock(&ukey->mutex);
>>              if (ops[i].dop.error) {
>> +                VLOG_INFO("%s %d %s transition from %d to EVICTED",
>> __func__, __LINE__, ds_cstr(&ds), ukey->state);
>>                  transition_ukey(ukey, UKEY_EVICTED);
>>              } else {
>> +                VLOG_INFO("%s %d %s transition from %d to OPERATIONAL",
>>  __func__, __LINE__, ds_cstr(&ds), ukey->state);
>>                  transition_ukey(ukey, UKEY_OPERATIONAL);
>>              }
>>              ovs_mutex_unlock(&ukey->mutex);
>> +
>> +            global_temp = 0;
>> +
>> +            ds_destroy(&ds);
>>          }
>>      }
>>  }
>> @@ -1659,6 +1680,10 @@ ukey_install__(struct udpif *udpif, struct
>> udpif_key *new_ukey)
>>      } else {
>>          ovs_mutex_lock(&new_ukey->mutex);
>>          cmap_insert(&umap->cmap, &new_ukey->cmap_node, new_ukey->hash);
>> +        struct ds ds = DS_EMPTY_INITIALIZER;
>> +        odp_format_ufid(&new_ukey->ufid, &ds);
>> +        VLOG_INFO("%s %d %s transition from %d to VISIBLE",  __func__,
>> __LINE__, ds_cstr(&ds), new_ukey->state);
>> +        ds_destroy(&ds);
>>          transition_ukey(new_ukey, UKEY_VISIBLE);
>>          locked = true;
>>      }
>> @@ -1671,6 +1696,12 @@ static void
>>  transition_ukey(struct udpif_key *ukey, enum ukey_state dst)
>>      OVS_REQUIRES(ukey->mutex)
>>  {
>> +    struct ds ds = DS_EMPTY_INITIALIZER;
>> +
>> +    odp_format_ufid(&ukey->ufid, &ds);
>> +    VLOG_INFO("state transition for ukey %s: %d -> %d", ds_cstr(&ds),
>> ukey->state, dst);
>> +    ds_destroy(&ds);
>> +
>>      ovs_assert(dst >= ukey->state);
>>      if (ukey->state == dst && dst == UKEY_OPERATIONAL) {
>>          return;
>> @@ -1699,12 +1730,12 @@ transition_ukey(struct udpif_key *ukey, enum
>> ukey_state dst)
>>                                     dst < UKEY_DELETED)) {
>>          ukey->state = dst;
>>      } else {
>> -        struct ds ds = DS_EMPTY_INITIALIZER;
>> +        struct ds ds2 = DS_EMPTY_INITIALIZER;
>>
>> -        odp_format_ufid(&ukey->ufid, &ds);
>> +        odp_format_ufid(&ukey->ufid, &ds2);
>>          VLOG_WARN_RL(&rl, "Invalid state transition for ukey %s: %d ->
>> %d",
>> -                     ds_cstr(&ds), ukey->state, dst);
>> -        ds_destroy(&ds);
>> +                     ds_cstr(&ds2), ukey->state, dst);
>> +        ds_destroy(&ds2);
>>      }
>>  }
>>
>> @@ -1797,6 +1828,10 @@ ukey_delete(struct umap *umap, struct udpif_key
>> *ukey)
>>      if (ukey->state < UKEY_DELETED) {
>>          cmap_remove(&umap->cmap, &ukey->cmap_node, ukey->hash);
>>          ovsrcu_postpone(ukey_delete__, ukey);
>> +        struct ds ds = DS_EMPTY_INITIALIZER;
>> +        odp_format_ufid(&ukey->ufid, &ds);
>> +        VLOG_INFO("%s %d %s transition from %d to DELETED",  __func__,
>> __LINE__, ds_cstr(&ds), ukey->state);
>> +        ds_destroy(&ds);
>>          transition_ukey(ukey, UKEY_DELETED);
>>      }
>>      ovs_mutex_unlock(&ukey->mutex);
>> @@ -2131,6 +2166,10 @@ push_dp_ops(struct udpif *udpif, struct ukey_op
>> *ops, size_t n_ops)
>>
>>          if (op->ukey) {
>>              ovs_mutex_lock(&op->ukey->mutex);
>> +            struct ds ds = DS_EMPTY_INITIALIZER;
>> +            odp_format_ufid(&op->ukey->ufid, &ds);
>> +            VLOG_INFO("%s %d %s transition from %d to EVICTED",
>> __func__, __LINE__, ds_cstr(&ds), op->ukey->state);
>> +            ds_destroy(&ds);
>>              transition_ukey(op->ukey, UKEY_EVICTED);
>>              push->used = MAX(stats->used, op->ukey->stats.used);
>>              push->tcp_flags = stats->tcp_flags |
>> op->ukey->stats.tcp_flags;
>> @@ -2217,6 +2256,10 @@ reval_op_init(struct ukey_op *op, enum
>> reval_result result,
>>  {
>>      if (result == UKEY_DELETE) {
>>          delete_op_init(udpif, op, ukey);
>> +        struct ds ds = DS_EMPTY_INITIALIZER;
>> +        odp_format_ufid(&ukey->ufid, &ds);
>> +        VLOG_INFO("%s %d %s transition from %d to EVICITING", __func__,
>> __LINE__, ds_cstr(&ds), ukey->state);
>> +        ds_destroy(&ds);
>>          transition_ukey(ukey, UKEY_EVICTING);
>>      } else if (result == UKEY_MODIFY) {
>>          /* Store the new recircs. */
>> @@ -2317,6 +2360,10 @@ revalidate(struct revalidator *revalidator)
>>              }
>>
>>              /* The flow is now confirmed to be in the datapath. */
>> +            struct ds ds = DS_EMPTY_INITIALIZER;
>> +            odp_format_ufid(&ukey->ufid, &ds);
>> +            VLOG_INFO("%s %d %s transition from %d to OPERATIONAL",
>> __func__, __LINE__, ds_cstr(&ds), ukey->state);
>> +            ds_destroy(&ds);
>>              transition_ukey(ukey, UKEY_OPERATIONAL);
>>
>>              if (!used) {
>> @@ -2417,6 +2464,11 @@ revalidator_sweep__(struct revalidator
>> *revalidator, bool purge)
>>              ovs_mutex_unlock(&ukey->mutex);
>>
>>              if (ukey_state == UKEY_EVICTED) {
>> +
>> +                while (global_temp) {
>> +                    xsleep(1);
>> +                }
>> +
>>                  /* The common flow deletion case involves deletion of
>> the flow
>>                   * during the dump phase and ukey deletion here. */
>>                  ovs_mutex_lock(&umap->mutex);
>> @@ -2471,6 +2523,10 @@ dp_purge_cb(void *aux, unsigned pmd_id)
>>          CMAP_FOR_EACH(ukey, cmap_node, &umap->cmap) {
>>              if (ukey->pmd_id == pmd_id) {
>>                  delete_op_init(udpif, &ops[n_ops++], ukey);
>> +                struct ds ds = DS_EMPTY_INITIALIZER;
>> +                odp_format_ufid(&ukey->ufid, &ds);
>> +                VLOG_INFO("%s %d %s transition from %d to EVICITING",
>> __func__, __LINE__, ds_cstr(&ds), ukey->state);
>> +                ds_destroy(&ds);
>>                  transition_ukey(ukey, UKEY_EVICTING);
>>
>>                  if (n_ops == REVALIDATE_MAX_BATCH) {
>
>
>
> What I also notice happening without the second xsleep (in
> revalidator_sweep__) is that the ukey gets deleted entirely and handler is
> stuck on acquiring the non existent lock.

Any xsleep is a bit dangerous here because xsleep will quiesce, and
we're relying on the fact that handlers will not quiesce RCU while
installing flows. If it quiesces, then the memory may end up being
properly freed so you would have to perform a fresh lookup to find the
ukey.


More information about the dev mailing list