[ovs-dev] [PATCH v2] classifier: Prevent tries vs n_tries race leading to NULL dereference.

Eiichi Tsukata eiichi.tsukata at nutanix.com
Thu Apr 30 04:56:33 UTC 2020


Thanks for comments Ilya,

> On Apr 29, 2020, at 20:17, Ilya Maximets <i.maximets at ovn.org> wrote:
> 
> On 4/22/20 10:36 AM, Eiichi Tsukata wrote:
>> Currently classifier tries and n_tries can be updated not atomically,
>> there is a race condition which can lead to NULL dereference.
>> The race can happen when main thread updates a classifier tries and
>> n_tries in classifier_set_prefix_fields() and at the same time revalidator
>> or handler thread try to lookup them in classifier_lookup__(). Such race
>> can be triggered when user changes prefixes of flow_table.
>> 
>> Race(user changes flow_table prefixes: ip_dst,ip_src => none):
>> 
>>  [main thread]             [revalidator/handler thread]
>>  ===========================================================
>>                            /* cls->n_tries == 2 */
>>                            for (int i = 0; i < cls->n_tries; i++) {
>>  trie_init(cls, i, NULL);
>>  /* n_tries == 0 */
>>  cls->n_tries = n_tries;
>>                            /* cls->tries[i]->feild is NULL */
>>                            trie_ctx_init(&trie_ctx[i],&cls->tries[i]);
>>                            /* trie->field is NULL */
>>                            ctx->be32ofs = trie->field->flow_be32ofs;
>> 
>> To prevent the race, instead of re-introducing internal mutex
>> implemented in the commit fccd7c092e09 ("classifier: Remove internal
>> mutex."), this patch makes trie field RCU protected and checks it after
>> read.
>> 
>> Fixes: fccd7c092e09 ("classifier: Remove internal mutex.")
>> Signed-off-by: Eiichi Tsukata <eiichi.tsukata at nutanix.com>
>> ---
>> lib/classifier.c        | 36 ++++++++++++++++++++++--------------
>> lib/classifier.h        |  6 ++++--
>> tests/test-classifier.c |  5 +++--
>> 3 files changed, 29 insertions(+), 18 deletions(-)
> 
> Hi.  Thanks for the new version it seems kind of OK except few comments inline.
> 
> IIUC, the main issue this patch is trying to solve is that order of reads in
> readers is opposite to what writer thinks it is, i.e. writer assumes that
> 'trie->field' will be accessed only after checking trie_plen, but readers reads
> 'trie->field' first to the local context and only after that checks the trie_plen.
> And this patch seems to fix this issue.
> 
> But in general I'm still not convinced that the code here is thread- and logically
> safe.  And it's really hard to read it and understand what's happening and why.
> For me the most controversial part is that we're using subtable->trie_plen as
> an indicator that we could use the trie and the second one is that we're modifying
> all the non-protected fields of struct cls_trie and cls->n_tries separately while
> it only makes sense doing that at the same time which it was whan we had a lock.
> 
> So, in long term instead of converting each filed of struct cls_trie to rcu protected
> variable I'd like to see something like conversion of cls->tries from an array to an
> rcu protected dynamic array like pvector.  This way we could remove n_tries filed
> from cls and be sure that all the tries that are in our vector are in consistent
> state and could actually be used.

Excellent detailed explanation of the problem, I really appreciate it.
Yes how current code uses subtable->trie_plen is very difficult to understand.

I think your idea to make cls->tries something like pvector is great idea and
will make codes much easier to understand.

> 
>> 
>> diff --git a/lib/classifier.c b/lib/classifier.c
>> index 0fad953..3d0a49e 100644
>> --- a/lib/classifier.c
>> +++ b/lib/classifier.c
>> @@ -393,7 +393,9 @@ classifier_set_prefix_fields(struct classifier *cls,
>>         bitmap_set1(fields.bm, trie_fields[i]);
>> 
>>         new_fields[n_tries] = NULL;
>> -        if (n_tries >= cls->n_tries || field != cls->tries[n_tries].field) {
>> +        const struct mf_field *cls_field
>> +            = ovsrcu_get(struct mf_field *, &cls->tries[n_tries].field);
>> +        if (n_tries >= cls->n_tries || field != cls_field) {
>>             new_fields[n_tries] = field;
>>             changed = true;
>>         }
>> @@ -454,7 +456,7 @@ trie_init(struct classifier *cls, int trie_idx, const struct mf_field *field)
>>     } else {
>>         ovsrcu_set_hidden(&trie->root, NULL);
>>     }
>> -    trie->field = field;
>> +    ovsrcu_set(&trie->field, field);
> 
> Why not ovsrcu_set_hidden?

As you say, we should use ovsrcu_set_hidden():memory_order_relaxed here, and changes will be
published in the later ovsrcu_set():memory_order_release. I’ll fix it in v3.

> 
>> 
>>     /* Add existing rules to the new trie. */
>>     CMAP_FOR_EACH (subtable, cmap_node, &cls->subtables_map) {
>> @@ -849,7 +851,6 @@ static void
>> trie_ctx_init(struct trie_ctx *ctx, const struct cls_trie *trie)
>> {
>>     ctx->trie = trie;
>> -    ctx->be32ofs = trie->field->flow_be32ofs;
> 
> Shouldn't we remove 'be32ofs' filed from 'struct trie_ctx'?
> 
>>     ctx->lookup_done = false;
>> 

Oops, I forgot it. I’ll remove it.
I’m going to send v3 patch soon.

Thanks

Eiichi



More information about the dev mailing list