[ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first

Ben Pfaff blp at ovn.org
Wed Jun 10 18:40:14 UTC 2020


On Wed, Jun 03, 2020 at 04:33:28PM +0200, Ilya Maximets wrote:
> On 6/3/20 1:08 PM, Linhaifeng wrote:
> > 
> > 
> >> -----Original Message-----
> >> From: Ilya Maximets [mailto:i.maximets at ovn.org]
> >> Sent: Wednesday, June 3, 2020 6:50 PM
> >> To: Linhaifeng <haifeng.lin at huawei.com>; Ben Pfaff <blp at ovn.org>
> >> Cc: i.maximets at ovn.org; dev at openvswitch.org; Lilijun (Jerry)
> >> <jerry.lilijun at huawei.com>; Lichunhe <lichunhe at huawei.com>; nd
> >> <nd at arm.com>; chenchanghu <chenchanghu at huawei.com>
> >> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
> >>
> >> On 6/3/20 9:04 AM, Linhaifeng wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Ben Pfaff [mailto:blp at ovn.org]
> >>>> Sent: Wednesday, June 3, 2020 1:26 PM
> >>>> To: Linhaifeng <haifeng.lin at huawei.com>
> >>>> Cc: Yanqin Wei <Yanqin.Wei at arm.com>; dev at openvswitch.org; nd
> >>>> <nd at arm.com>; Lilijun (Jerry) <jerry.lilijun at huawei.com>; chenchanghu
> >>>> <chenchanghu at huawei.com>; Lichunhe <lichunhe at huawei.com>
> >>>> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
> >>>>
> >>>> On Wed, Jun 03, 2020 at 01:22:52AM +0000, Linhaifeng wrote:
> >>>>>
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: Ben Pfaff [mailto:blp at ovn.org]
> >>>>> Sent: Wednesday, June 3, 2020 1:28 AM
> >>>>> To: Linhaifeng <haifeng.lin at huawei.com>
> >>>>> Cc: Yanqin Wei <Yanqin.Wei at arm.com>; dev at openvswitch.org; nd
> >>>>> <nd at arm.com>; Lilijun (Jerry) <jerry.lilijun at huawei.com>;
> >>>>> chenchanghu <chenchanghu at huawei.com>; Lichunhe
> >> <lichunhe at huawei.com>
> >>>>> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
> >>>>>
> >>>>> On Tue, Jun 02, 2020 at 07:27:59AM +0000, Linhaifeng wrote:
> >>>>>> We should update rcu pointer first then use ovsrcu_postpone to free
> >>>>>> otherwise maybe cause use-after-free.
> >>>>>> e.g.,reader indicates momentary quiescent and access old pointer
> >>>>>> after writer postpone free old pointer and before setting new pointer.
> >>>>>>
> >>>>>> Signed-off-by: Linhaifeng <haifeng.lin at huawei.com>
> >>>>>
> >>>>> I don't see how that's possible, since the writer hasn't quiesced.
> >>>>>
> >>>>> I think the logic is as follow, Could you help me find out where is incorrect?
> >>>>>
> >>>>> 1.1 -> 1.2 -> 3.1 -> 3.2 -> 2.1 -> 2.2 -> 2.3 -> 2.1 -> 1.3 -> 1.4
> >>>>> ->
> >>>>> 3.3 -> 2.2(use after free)
> >>>>>
> >>>>> wirter:
> >>>>> 1.1 use postone to free old pointer
> >>>>> 1.2 flush cbsets to flushed_cbsets
> >>>>> 1.3 update new pointer
> >>>>> 1.4 quiesced
> >>>>>
> >>>>> Read:
> >>>>> 2.1. read pointer
> >>>>> 2.2. use pointer
> >>>>> 2.3. quiesced
> >>>>>
> >>>>> Rcu:
> >>>>> 3.1 pop flushed_cbsets
> >>>>> 3.2 ovsrcu_synchronize
> >>>>> 3.3 call all cb to free
> >>>>
> >>>> So you're saying this:
> >>>>
> >>>>     1.1 use postone to free old pointer (A)
> >>>>     1.2 flush cbsets to flushed_cbsets
> >>>>
> >>>>     3.1 pop flushed_cbsets
> >>>>     3.2 ovsrcu_synchronize
> >>>>
> >>>>     2.1. read pointer (A)
> >>>>     2.2. use pointer (A)
> >>>>     2.3. quiesced
> >>>>
> >>>>     2.1. read pointer (A)
> >>>>
> >>>>     1.3 update new pointer (B)
> >>>>     1.4 quiesced
> >>>>
> >>>>     3.3 call all cb to free (A)
> >>>>
> >>>>     2.2. use pointer (A)
> >>>>
> >>>> Wow, you are absolutely right.  This had never occurred to me.  Thank
> >> you!
> >>>> I'll review your patch.
> >>>
> >>> Yes, it's really hard to happen. If it happened it's also hard to find the reason
> >> so I suggest it can be a rule for using rcu.
> >>
> >> I agree that there is an issue here, but I think that we should not force users to
> >> call ovsrcu_set() before ovsrcu_postpone().  Current users doesn't do
> >> anything illegal since pointer must not be freed before the next grace period
> >> from their point of view.
> >>
> >> For me it looks like the main issue is existence of point 1.2, i.e. flushing cbsets
> >> while writer is not quiesced yet.  And we need to fix this inside rcu library itself.
> >> For example, we could avoid flushing inside
> >> ovsrcu_postpone() by making cbs[16] a dynamically allocated array and using
> >> x2nrealloc instead of flushing.
> >>
> >> Thoughts?
> >>
> > Hi, Ilya Maximets
> > 
> > May be this is a good idea therefor the users not need to think about call ovsrcu_set() first or ovsrcu_postpone().
> > How about you think, ben? May be you can send a patch to modify the ovsrcu_postpone() not to flush cbsets to
> > replace of my patches.
> 
> The change could look like this:
> 
> diff --git a/lib/ovs-rcu.c b/lib/ovs-rcu.c
> index ebc8120f0..cde1e925b 100644
> --- a/lib/ovs-rcu.c
> +++ b/lib/ovs-rcu.c
> @@ -30,6 +30,8 @@
>  
>  VLOG_DEFINE_THIS_MODULE(ovs_rcu);
>  
> +#define MIN_CBS 16
> +
>  struct ovsrcu_cb {
>      void (*function)(void *aux);
>      void *aux;
> @@ -37,7 +39,8 @@ struct ovsrcu_cb {
>  
>  struct ovsrcu_cbset {
>      struct ovs_list list_node;
> -    struct ovsrcu_cb cbs[16];
> +    struct ovsrcu_cb *cbs;
> +    size_t n_allocated;
>      int n_cbs;
>  };
>  
> @@ -310,16 +313,19 @@ ovsrcu_postpone__(void (*function)(void *aux), void *aux)
>      cbset = perthread->cbset;
>      if (!cbset) {
>          cbset = perthread->cbset = xmalloc(sizeof *perthread->cbset);
> +        cbset->cbs = xmalloc(MIN_CBS * sizeof *cbset->cbs);
> +        cbset->n_allocated = MIN_CBS;
>          cbset->n_cbs = 0;
>      }
>  
> +    if (cbset->n_cbs == cbset->n_allocated) {
> +        cbset->cbs = x2nrealloc(cbset->cbs, &cbset->n_allocated,
> +                                sizeof *cbset->cbs);
> +    }
> +
>      cb = &cbset->cbs[cbset->n_cbs++];
>      cb->function = function;
>      cb->aux = aux;
> -
> -    if (cbset->n_cbs >= ARRAY_SIZE(cbset->cbs)) {
> -        ovsrcu_flush_cbset(perthread);
> -    }
>  }
>  
>  static bool
> @@ -341,6 +347,7 @@ ovsrcu_call_postponed(void)
>          for (cb = cbset->cbs; cb < &cbset->cbs[cbset->n_cbs]; cb++) {
>              cb->function(cb->aux);
>          }
> +        free(cbset->cbs);
>          free(cbset);
>      }
>  
> ---
> 
> 
> I could submit a formal patch, if agreed.
> 
> Ben, what do you think?

I was pretty resigned to the idea that we'd have to change the callers
and the expectations.  This seems better and the code looks fine to me.
Thank you so much for thinking of this approach!

This approach also has the significant advantage that it should be easy
to backport.  I was worried that we would either have to laboriously
backport and check lots of old versions, or just declare that old
versions wouldn't get fixed.

As long as you tested this:

Acked-by: Ben Pfaff <blp at ovn.org>


More information about the dev mailing list