[ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first

Ilya Maximets i.maximets at ovn.org
Wed Jun 10 18:53:10 UTC 2020


On 6/10/20 8:41 PM, Ben Pfaff wrote:
> On Wed, Jun 10, 2020 at 11:40:14AM -0700, Ben Pfaff wrote:
>> On Wed, Jun 03, 2020 at 04:33:28PM +0200, Ilya Maximets wrote:
>>> On 6/3/20 1:08 PM, Linhaifeng wrote:
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Ilya Maximets [mailto:i.maximets at ovn.org]
>>>>> Sent: Wednesday, June 3, 2020 6:50 PM
>>>>> To: Linhaifeng <haifeng.lin at huawei.com>; Ben Pfaff <blp at ovn.org>
>>>>> Cc: i.maximets at ovn.org; dev at openvswitch.org; Lilijun (Jerry)
>>>>> <jerry.lilijun at huawei.com>; Lichunhe <lichunhe at huawei.com>; nd
>>>>> <nd at arm.com>; chenchanghu <chenchanghu at huawei.com>
>>>>> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
>>>>>
>>>>> On 6/3/20 9:04 AM, Linhaifeng wrote:
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Ben Pfaff [mailto:blp at ovn.org]
>>>>>>> Sent: Wednesday, June 3, 2020 1:26 PM
>>>>>>> To: Linhaifeng <haifeng.lin at huawei.com>
>>>>>>> Cc: Yanqin Wei <Yanqin.Wei at arm.com>; dev at openvswitch.org; nd
>>>>>>> <nd at arm.com>; Lilijun (Jerry) <jerry.lilijun at huawei.com>; chenchanghu
>>>>>>> <chenchanghu at huawei.com>; Lichunhe <lichunhe at huawei.com>
>>>>>>> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
>>>>>>>
>>>>>>> On Wed, Jun 03, 2020 at 01:22:52AM +0000, Linhaifeng wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Ben Pfaff [mailto:blp at ovn.org]
>>>>>>>> Sent: Wednesday, June 3, 2020 1:28 AM
>>>>>>>> To: Linhaifeng <haifeng.lin at huawei.com>
>>>>>>>> Cc: Yanqin Wei <Yanqin.Wei at arm.com>; dev at openvswitch.org; nd
>>>>>>>> <nd at arm.com>; Lilijun (Jerry) <jerry.lilijun at huawei.com>;
>>>>>>>> chenchanghu <chenchanghu at huawei.com>; Lichunhe
>>>>> <lichunhe at huawei.com>
>>>>>>>> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
>>>>>>>>
>>>>>>>> On Tue, Jun 02, 2020 at 07:27:59AM +0000, Linhaifeng wrote:
>>>>>>>>> We should update rcu pointer first then use ovsrcu_postpone to free
>>>>>>>>> otherwise maybe cause use-after-free.
>>>>>>>>> e.g.,reader indicates momentary quiescent and access old pointer
>>>>>>>>> after writer postpone free old pointer and before setting new pointer.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Linhaifeng <haifeng.lin at huawei.com>
>>>>>>>>
>>>>>>>> I don't see how that's possible, since the writer hasn't quiesced.
>>>>>>>>
>>>>>>>> I think the logic is as follow, Could you help me find out where is incorrect?
>>>>>>>>
>>>>>>>> 1.1 -> 1.2 -> 3.1 -> 3.2 -> 2.1 -> 2.2 -> 2.3 -> 2.1 -> 1.3 -> 1.4
>>>>>>>> ->
>>>>>>>> 3.3 -> 2.2(use after free)
>>>>>>>>
>>>>>>>> wirter:
>>>>>>>> 1.1 use postone to free old pointer
>>>>>>>> 1.2 flush cbsets to flushed_cbsets
>>>>>>>> 1.3 update new pointer
>>>>>>>> 1.4 quiesced
>>>>>>>>
>>>>>>>> Read:
>>>>>>>> 2.1. read pointer
>>>>>>>> 2.2. use pointer
>>>>>>>> 2.3. quiesced
>>>>>>>>
>>>>>>>> Rcu:
>>>>>>>> 3.1 pop flushed_cbsets
>>>>>>>> 3.2 ovsrcu_synchronize
>>>>>>>> 3.3 call all cb to free
>>>>>>>
>>>>>>> So you're saying this:
>>>>>>>
>>>>>>>     1.1 use postone to free old pointer (A)
>>>>>>>     1.2 flush cbsets to flushed_cbsets
>>>>>>>
>>>>>>>     3.1 pop flushed_cbsets
>>>>>>>     3.2 ovsrcu_synchronize
>>>>>>>
>>>>>>>     2.1. read pointer (A)
>>>>>>>     2.2. use pointer (A)
>>>>>>>     2.3. quiesced
>>>>>>>
>>>>>>>     2.1. read pointer (A)
>>>>>>>
>>>>>>>     1.3 update new pointer (B)
>>>>>>>     1.4 quiesced
>>>>>>>
>>>>>>>     3.3 call all cb to free (A)
>>>>>>>
>>>>>>>     2.2. use pointer (A)
>>>>>>>
>>>>>>> Wow, you are absolutely right.  This had never occurred to me.  Thank
>>>>> you!
>>>>>>> I'll review your patch.
>>>>>>
>>>>>> Yes, it's really hard to happen. If it happened it's also hard to find the reason
>>>>> so I suggest it can be a rule for using rcu.
>>>>>
>>>>> I agree that there is an issue here, but I think that we should not force users to
>>>>> call ovsrcu_set() before ovsrcu_postpone().  Current users doesn't do
>>>>> anything illegal since pointer must not be freed before the next grace period
>>>>> from their point of view.
>>>>>
>>>>> For me it looks like the main issue is existence of point 1.2, i.e. flushing cbsets
>>>>> while writer is not quiesced yet.  And we need to fix this inside rcu library itself.
>>>>> For example, we could avoid flushing inside
>>>>> ovsrcu_postpone() by making cbs[16] a dynamically allocated array and using
>>>>> x2nrealloc instead of flushing.
>>>>>
>>>>> Thoughts?
>>>>>
>>>> Hi, Ilya Maximets
>>>>
>>>> May be this is a good idea therefor the users not need to think about call ovsrcu_set() first or ovsrcu_postpone().
>>>> How about you think, ben? May be you can send a patch to modify the ovsrcu_postpone() not to flush cbsets to
>>>> replace of my patches.
>>>
>>> The change could look like this:
>>>
>>> diff --git a/lib/ovs-rcu.c b/lib/ovs-rcu.c
>>> index ebc8120f0..cde1e925b 100644
>>> --- a/lib/ovs-rcu.c
>>> +++ b/lib/ovs-rcu.c
>>> @@ -30,6 +30,8 @@
>>>  
>>>  VLOG_DEFINE_THIS_MODULE(ovs_rcu);
>>>  
>>> +#define MIN_CBS 16
>>> +
>>>  struct ovsrcu_cb {
>>>      void (*function)(void *aux);
>>>      void *aux;
>>> @@ -37,7 +39,8 @@ struct ovsrcu_cb {
>>>  
>>>  struct ovsrcu_cbset {
>>>      struct ovs_list list_node;
>>> -    struct ovsrcu_cb cbs[16];
>>> +    struct ovsrcu_cb *cbs;
>>> +    size_t n_allocated;
>>>      int n_cbs;
>>>  };
>>>  
>>> @@ -310,16 +313,19 @@ ovsrcu_postpone__(void (*function)(void *aux), void *aux)
>>>      cbset = perthread->cbset;
>>>      if (!cbset) {
>>>          cbset = perthread->cbset = xmalloc(sizeof *perthread->cbset);
>>> +        cbset->cbs = xmalloc(MIN_CBS * sizeof *cbset->cbs);
>>> +        cbset->n_allocated = MIN_CBS;
>>>          cbset->n_cbs = 0;
>>>      }
>>>  
>>> +    if (cbset->n_cbs == cbset->n_allocated) {
>>> +        cbset->cbs = x2nrealloc(cbset->cbs, &cbset->n_allocated,
>>> +                                sizeof *cbset->cbs);
>>> +    }
>>> +
>>>      cb = &cbset->cbs[cbset->n_cbs++];
>>>      cb->function = function;
>>>      cb->aux = aux;
>>> -
>>> -    if (cbset->n_cbs >= ARRAY_SIZE(cbset->cbs)) {
>>> -        ovsrcu_flush_cbset(perthread);
>>> -    }
>>>  }
>>>  
>>>  static bool
>>> @@ -341,6 +347,7 @@ ovsrcu_call_postponed(void)
>>>          for (cb = cbset->cbs; cb < &cbset->cbs[cbset->n_cbs]; cb++) {
>>>              cb->function(cb->aux);
>>>          }
>>> +        free(cbset->cbs);
>>>          free(cbset);
>>>      }
>>>  
>>> ---
>>>
>>>
>>> I could submit a formal patch, if agreed.
>>>
>>> Ben, what do you think?
>>
>> I was pretty resigned to the idea that we'd have to change the callers
>> and the expectations.  This seems better and the code looks fine to me.
>> Thank you so much for thinking of this approach!
>>
>> This approach also has the significant advantage that it should be easy
>> to backport.  I was worried that we would either have to laboriously
>> backport and check lots of old versions, or just declare that old
>> versions wouldn't get fixed.
>>
>> As long as you tested this:
>>
>> Acked-by: Ben Pfaff <blp at ovn.org>
> 
> Oh, and please credit Linhaifeng for reporting the problem.  This is
> subtle and we've had years to notice and Linhaifeng is the first one to
> actually do that.

Sure.  Thanks for review!

I'll send patch formally, give it one more test run and apply.

Best regards, Ilya Maximets.


More information about the dev mailing list