[ovs-dev] [PATCH] classifier: Prevent tries vs n_tries race leading to NULL dereference.

Eiichi Tsukata eiichi.tsukata at nutanix.com
Mon Apr 20 00:08:38 UTC 2020



> On Apr 18, 2020, at 1:14, Ilya Maximets <i.maximets at ovn.org> wrote:
> 
> On 4/15/20 2:24 AM, Eiichi Tsukata wrote:
>> Thanks for comments, Ilya.
>> 
>>> On Apr 14, 2020, at 21:11, Ilya Maximets <i.maximets at ovn.org> wrote:
>>> 
>>> On 4/14/20 6:00 AM, Eiichi Tsukata wrote:
>>>> 
>>>> 
>>>> @@ -1575,11 +1573,16 @@ check_tries(struct trie_ctx trie_ctx[CLS_MAX_TRIES], unsigned int n_tries,
>>>>     * fields using the prefix tries.  The trie checks are done only as
>>>>     * needed to avoid folding in additional bits to the wildcards mask. */
>>>>    for (j = 0; j < n_tries; j++) {
>>>> -        /* Is the trie field relevant for this subtable, and
>>>> -           is the trie field within the current range of fields? */
>>>> -        if (field_plen[j] &&
>>>> -            flowmap_is_set(&range_map, trie_ctx[j].be32ofs / 2)) {
>>>> -            struct trie_ctx *ctx = &trie_ctx[j];
>>>> +        struct trie_ctx *ctx = &trie_ctx[j];
>>>> +        /* Check if trie field is relevant for this subtable and
>>>> +         * synchronized with field_plen. */
>>>> +        if (!field_plen[j] || !ctx->trie->field) {
>>>> +            continue;
>>>> +        }
>>>> +
>>>> +        uint8_t be32ofs = ctx->trie->field->flow_be32ofs;
>>> 
>>> What will prevent 'field' being cleared between above two lines and produce
>>> a NULL pointer dereference here or inside trie_lookup()?
>> 
>> 
>> Main thread calls ovsrcu_synchronize() before it sets cls->tries[i]->field = NULL.
>> Here ovsrcu_synchronize() waits for other readers(revalidator/handler thread)
>> of rcu protected pointers going through RCU quiescent state.
>> 
>> So here, main thread who called ovsrcu_synchronize() waits for revalidator/handler thread
>> going through quiescent state. It means 'be32ofs = ctx->trie->field->flow_be32ofs;'
>> is executed before main thread executes 'cls->tries[i]->field = NULL;’.
> 
> It's correct that ovsrcu_synchronize() waits for other threads to go through RCU
> quiescent state, but it doesn't work as a thread barrier.  Threads are free to
> run, noone stops them and they could be at any point of their execution flow.
> 
> And at the moment main thread sets field to NULL it's immediately visible to all
> threads in common case.  And if one of the readers was right between above two
> lines it will crash.

The code checks "!field_plen[j]” (RCU protected subtable->plen[j]) before it checks !ctx->trie->field.
It means that reader threads do not see NULL field before main thread completes calling
osvrcu_synchronize().


> 
>> 
>>> 
>>> trie->field is not protected by anything and not even an atomic variable.
>>> Since it always points to some static memory objects it doesn't need to
>>> be RCU protected, but it should be an atomic variable and it must be checked
>>> after any read.  In fact it might be made rcu protected just to make things
>>> easier and more accurate.
>>> 
>> 
>> I also had thought that protecting trie->field by RCU is a good idea, but found the
>> following race condition:
>> 
>> [main thread]                 [revalidator/handler thread]
>> ========================================================================
>> ovsrcu_set(&trie->field, NULL);
>> ovsrcu_synchronize();
>>                               Quiecense State;
>>                               for (int i = 0; i < cls->n_tries; i++)
>>                               /* field == NULL */
>>                               field = ovsrcu_get(struct mf_field *, &trie->field);
>>                               be32ofs = field->flow_be32ofs;
> 
> If fields is an RCU protected variable, you must check the value after each
> ovsrcu_get() operation.

I see.
Now I’ve come to think that my initial approach is not easy to understand and fragile through
discussion with you. I’ll rewrite my patch to make mf_field RCU protected.
Please let me know if you have any other good idea.

Thanks

Eiichi



More information about the dev mailing list