[ovs-dev] [PATCH] ovsdb-idl.c: Clear conditions when clearing IDL.

Dumitru Ceara dceara at redhat.com
Tue Mar 3 12:47:48 UTC 2020


CC-ing Andy and Ben.

On 3/3/20 7:43 AM, Han Zhou wrote:
> 
> 
> On Mon, Mar 2, 2020 at 7:55 AM Dumitru Ceara <dceara at redhat.com
> <mailto:dceara at redhat.com>> wrote:
>>
>> On 2/29/20 12:00 AM, Dumitru Ceara wrote:
>> > If the ovsdb-server reply to "monitor_cond_since" requests has
>> > "found" == false then ovsdb_idl_db_parse_monitor_reply() calls
>> > ovsdb_idl_db_clear() which iterates through all tables and
>> > unconditionally sets table->cond_changed to false.
>> >
>> > However, if the client had already set a new condition for some of the
>> > tables, this new condition request will never be sent to ovsdb-server
>> > until the condition is reset to a different value. This is due to the
>> > check in ovsdb_idl_db_set_condition().
>> >
>> > In order to fix this we now also call ovsdb_idl_condition_clear() for
>> > each table condition in ovsdb_idl_db_clear(). This ensures that
>> > resetting the condition to the same value as before the
>> > "monitor_cond_since" reply will trigger a new request.
>> >
>> > One way to replicate the issue is described in the bugzilla reporting
>> > the bug, when ovn-controller is configured to use "ovn-monitor-all":
>> > https://bugzilla.redhat.com/show_bug.cgi?id=1808125#c6
>> >
>> > Reported-by: Dan Williams <dcbw at redhat.com <mailto:dcbw at redhat.com>>
>> > Reported-at: https://bugzilla.redhat.com/1808125
>> > CC: Han Zhou <hzhou at ovn.org <mailto:hzhou at ovn.org>>
>> > Fixes: 403a6a0cb003 ("ovsdb-idl: Fast resync from server when
> connection reset.")
>> > Signed-off-by: Dumitru Ceara <dceara at redhat.com
> <mailto:dceara at redhat.com>>
>>
>> Hi Han,
>>
>> I was wondering if you could review this change because ovn-kubernetes
>> is trying to enable ovn-monitor-all for scalability reasons but their CI
>> fails due to the bug fixed by the patch below.
>>
>> Thanks,
>> Dumitru
>>
>> > ---
>> >  lib/ovsdb-idl.c | 6 +++++-
>> >  1 file changed, 5 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
>> > index 190143f..64721ca 100644
>> > --- a/lib/ovsdb-idl.c
>> > +++ b/lib/ovsdb-idl.c
>> > @@ -610,7 +610,11 @@ ovsdb_idl_db_clear(struct ovsdb_idl_db *db)
>> >          struct ovsdb_idl_table *table = &db->tables[i];
>> >          struct ovsdb_idl_row *row, *next_row;
>> >
>> > -        table->cond_changed = false;
>> > +        if (table->cond_changed) {
>> > +            table->cond_changed = false;
>> > +            ovsdb_idl_condition_clear(&table->condition);
>> > +        }
>> > +
>> >          if (hmap_is_empty(&table->rows)) {
>> >              continue;
>> >          }
>> >
>>
> 
> Hi Dumitru>
> If I understand the problem, it is when the client updated monitor
> condition and at the same time it received monitor_cond_since reply from
> server for it's previous conditional monitor request, and the handling
> of the reply overwrite the flag table->cond_changed from true to false,
> and because of this, the new condition is not sent to server.
>
Hi Han,

Yes, that's what's happening.

> However, I don't understand the fix. It seems that the fix should be
> "don't overwrite the flag", instead of clearing the conditions as well.
> Also, by simply looking at this code I don't understand why the flag
> needs to be set to false when clearing data in IDL. What would be the
> problem if we keep it as what it is? If you understand the details,
> could you explain more about the fix?
> 
> In addition, I didn't change this logic when implementing
> monitor_cond_since in 403a6a0cb003 ("ovsdb-idl: Fast resync from server
> when connection reset."). In that commit it only reduced the chance of
> calling ovsdb_idl_db_clear() as an optimization. Before that commit,
> ovsdb_idl_db_parse_monitor_reply() always calls ovsdb_idl_db_clear() and
> it would result in same problem.
You're right, my bad, I messed up the "git blame", sorry about that. It
was actually introduced by:

https://github.com/openvswitch/ovs/commit/5351980b047f4dd40be7a59a1e4b910df21eca0a

I'll fix up the commit message in V2 once we decide on the best approach.

Hi Andy, Ben,

As far as I understand, whenever ovsdb_idl_db_clear() is called during
reconnect, any buffered but unsent conditions should be cleared because
they will be anyway resent with the following monitoring condition
update message after reconnect has completed.

However, clearing the "table->cond_changed" flag is not enough because
when the IDL client (ovn-controller in my case) will try to set the same
condition as before (e.g., "true" if ovn-monitor-all=true)
ovsdb_idl_db_set_condition() will return early:

https://github.com/openvswitch/ovs/blob/master/lib/ovsdb-idl.c#L1518

Therefore in the cases when ovsdb_idl_db_clear() is called, we should
also clear the existing conditions on the DB tables.

As Han suggested, we could also revert the changes from 5351980b047f
("ovsdb-idl: Avoid sending redundant conditional monitoring updates")
and we wouldn't hit the issue anymore. I'm not sure however if this has
any other major implications.

What do you think?

Thanks,
Dumitru

> 
> Thanks,
> Han



More information about the dev mailing list