[ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first

Ilya Maximets i.maximets at ovn.org
Wed Jun 3 14:33:28 UTC 2020


On 6/3/20 1:08 PM, Linhaifeng wrote:
> 
> 
>> -----Original Message-----
>> From: Ilya Maximets [mailto:i.maximets at ovn.org]
>> Sent: Wednesday, June 3, 2020 6:50 PM
>> To: Linhaifeng <haifeng.lin at huawei.com>; Ben Pfaff <blp at ovn.org>
>> Cc: i.maximets at ovn.org; dev at openvswitch.org; Lilijun (Jerry)
>> <jerry.lilijun at huawei.com>; Lichunhe <lichunhe at huawei.com>; nd
>> <nd at arm.com>; chenchanghu <chenchanghu at huawei.com>
>> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
>>
>> On 6/3/20 9:04 AM, Linhaifeng wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Ben Pfaff [mailto:blp at ovn.org]
>>>> Sent: Wednesday, June 3, 2020 1:26 PM
>>>> To: Linhaifeng <haifeng.lin at huawei.com>
>>>> Cc: Yanqin Wei <Yanqin.Wei at arm.com>; dev at openvswitch.org; nd
>>>> <nd at arm.com>; Lilijun (Jerry) <jerry.lilijun at huawei.com>; chenchanghu
>>>> <chenchanghu at huawei.com>; Lichunhe <lichunhe at huawei.com>
>>>> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
>>>>
>>>> On Wed, Jun 03, 2020 at 01:22:52AM +0000, Linhaifeng wrote:
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Ben Pfaff [mailto:blp at ovn.org]
>>>>> Sent: Wednesday, June 3, 2020 1:28 AM
>>>>> To: Linhaifeng <haifeng.lin at huawei.com>
>>>>> Cc: Yanqin Wei <Yanqin.Wei at arm.com>; dev at openvswitch.org; nd
>>>>> <nd at arm.com>; Lilijun (Jerry) <jerry.lilijun at huawei.com>;
>>>>> chenchanghu <chenchanghu at huawei.com>; Lichunhe
>> <lichunhe at huawei.com>
>>>>> Subject: Re: [ovs-dev] [PATCH v2] ovs rcu: update rcu pointer first
>>>>>
>>>>> On Tue, Jun 02, 2020 at 07:27:59AM +0000, Linhaifeng wrote:
>>>>>> We should update rcu pointer first then use ovsrcu_postpone to free
>>>>>> otherwise maybe cause use-after-free.
>>>>>> e.g.,reader indicates momentary quiescent and access old pointer
>>>>>> after writer postpone free old pointer and before setting new pointer.
>>>>>>
>>>>>> Signed-off-by: Linhaifeng <haifeng.lin at huawei.com>
>>>>>
>>>>> I don't see how that's possible, since the writer hasn't quiesced.
>>>>>
>>>>> I think the logic is as follow, Could you help me find out where is incorrect?
>>>>>
>>>>> 1.1 -> 1.2 -> 3.1 -> 3.2 -> 2.1 -> 2.2 -> 2.3 -> 2.1 -> 1.3 -> 1.4
>>>>> ->
>>>>> 3.3 -> 2.2(use after free)
>>>>>
>>>>> wirter:
>>>>> 1.1 use postone to free old pointer
>>>>> 1.2 flush cbsets to flushed_cbsets
>>>>> 1.3 update new pointer
>>>>> 1.4 quiesced
>>>>>
>>>>> Read:
>>>>> 2.1. read pointer
>>>>> 2.2. use pointer
>>>>> 2.3. quiesced
>>>>>
>>>>> Rcu:
>>>>> 3.1 pop flushed_cbsets
>>>>> 3.2 ovsrcu_synchronize
>>>>> 3.3 call all cb to free
>>>>
>>>> So you're saying this:
>>>>
>>>>     1.1 use postone to free old pointer (A)
>>>>     1.2 flush cbsets to flushed_cbsets
>>>>
>>>>     3.1 pop flushed_cbsets
>>>>     3.2 ovsrcu_synchronize
>>>>
>>>>     2.1. read pointer (A)
>>>>     2.2. use pointer (A)
>>>>     2.3. quiesced
>>>>
>>>>     2.1. read pointer (A)
>>>>
>>>>     1.3 update new pointer (B)
>>>>     1.4 quiesced
>>>>
>>>>     3.3 call all cb to free (A)
>>>>
>>>>     2.2. use pointer (A)
>>>>
>>>> Wow, you are absolutely right.  This had never occurred to me.  Thank
>> you!
>>>> I'll review your patch.
>>>
>>> Yes, it's really hard to happen. If it happened it's also hard to find the reason
>> so I suggest it can be a rule for using rcu.
>>
>> I agree that there is an issue here, but I think that we should not force users to
>> call ovsrcu_set() before ovsrcu_postpone().  Current users doesn't do
>> anything illegal since pointer must not be freed before the next grace period
>> from their point of view.
>>
>> For me it looks like the main issue is existence of point 1.2, i.e. flushing cbsets
>> while writer is not quiesced yet.  And we need to fix this inside rcu library itself.
>> For example, we could avoid flushing inside
>> ovsrcu_postpone() by making cbs[16] a dynamically allocated array and using
>> x2nrealloc instead of flushing.
>>
>> Thoughts?
>>
> Hi, Ilya Maximets
> 
> May be this is a good idea therefor the users not need to think about call ovsrcu_set() first or ovsrcu_postpone().
> How about you think, ben? May be you can send a patch to modify the ovsrcu_postpone() not to flush cbsets to
> replace of my patches.

The change could look like this:

diff --git a/lib/ovs-rcu.c b/lib/ovs-rcu.c
index ebc8120f0..cde1e925b 100644
--- a/lib/ovs-rcu.c
+++ b/lib/ovs-rcu.c
@@ -30,6 +30,8 @@
 
 VLOG_DEFINE_THIS_MODULE(ovs_rcu);
 
+#define MIN_CBS 16
+
 struct ovsrcu_cb {
     void (*function)(void *aux);
     void *aux;
@@ -37,7 +39,8 @@ struct ovsrcu_cb {
 
 struct ovsrcu_cbset {
     struct ovs_list list_node;
-    struct ovsrcu_cb cbs[16];
+    struct ovsrcu_cb *cbs;
+    size_t n_allocated;
     int n_cbs;
 };
 
@@ -310,16 +313,19 @@ ovsrcu_postpone__(void (*function)(void *aux), void *aux)
     cbset = perthread->cbset;
     if (!cbset) {
         cbset = perthread->cbset = xmalloc(sizeof *perthread->cbset);
+        cbset->cbs = xmalloc(MIN_CBS * sizeof *cbset->cbs);
+        cbset->n_allocated = MIN_CBS;
         cbset->n_cbs = 0;
     }
 
+    if (cbset->n_cbs == cbset->n_allocated) {
+        cbset->cbs = x2nrealloc(cbset->cbs, &cbset->n_allocated,
+                                sizeof *cbset->cbs);
+    }
+
     cb = &cbset->cbs[cbset->n_cbs++];
     cb->function = function;
     cb->aux = aux;
-
-    if (cbset->n_cbs >= ARRAY_SIZE(cbset->cbs)) {
-        ovsrcu_flush_cbset(perthread);
-    }
 }
 
 static bool
@@ -341,6 +347,7 @@ ovsrcu_call_postponed(void)
         for (cb = cbset->cbs; cb < &cbset->cbs[cbset->n_cbs]; cb++) {
             cb->function(cb->aux);
         }
+        free(cbset->cbs);
         free(cbset);
     }
 
---


I could submit a formal patch, if agreed.

Ben, what do you think?


More information about the dev mailing list