[ovs-dev] [PATCH] ovsdb-idl.c: Clear conditions when clearing IDL.

Ilya Maximets i.maximets at ovn.org
Wed Mar 25 18:21:04 UTC 2020


On 3/3/20 1:47 PM, Dumitru Ceara wrote:
> CC-ing Andy and Ben.
> 
> On 3/3/20 7:43 AM, Han Zhou wrote:
>>
>>
>> On Mon, Mar 2, 2020 at 7:55 AM Dumitru Ceara <dceara at redhat.com
>> <mailto:dceara at redhat.com>> wrote:
>>>
>>> On 2/29/20 12:00 AM, Dumitru Ceara wrote:
>>>> If the ovsdb-server reply to "monitor_cond_since" requests has
>>>> "found" == false then ovsdb_idl_db_parse_monitor_reply() calls
>>>> ovsdb_idl_db_clear() which iterates through all tables and
>>>> unconditionally sets table->cond_changed to false.
>>>>
>>>> However, if the client had already set a new condition for some of the
>>>> tables, this new condition request will never be sent to ovsdb-server
>>>> until the condition is reset to a different value. This is due to the
>>>> check in ovsdb_idl_db_set_condition().
>>>>
>>>> In order to fix this we now also call ovsdb_idl_condition_clear() for
>>>> each table condition in ovsdb_idl_db_clear(). This ensures that
>>>> resetting the condition to the same value as before the
>>>> "monitor_cond_since" reply will trigger a new request.
>>>>
>>>> One way to replicate the issue is described in the bugzilla reporting
>>>> the bug, when ovn-controller is configured to use "ovn-monitor-all":
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1808125#c6
>>>>
>>>> Reported-by: Dan Williams <dcbw at redhat.com <mailto:dcbw at redhat.com>>
>>>> Reported-at: https://bugzilla.redhat.com/1808125
>>>> CC: Han Zhou <hzhou at ovn.org <mailto:hzhou at ovn.org>>
>>>> Fixes: 403a6a0cb003 ("ovsdb-idl: Fast resync from server when
>> connection reset.")
>>>> Signed-off-by: Dumitru Ceara <dceara at redhat.com
>> <mailto:dceara at redhat.com>>
>>>
>>> Hi Han,
>>>
>>> I was wondering if you could review this change because ovn-kubernetes
>>> is trying to enable ovn-monitor-all for scalability reasons but their CI
>>> fails due to the bug fixed by the patch below.
>>>
>>> Thanks,
>>> Dumitru
>>>
>>>> ---
>>>>  lib/ovsdb-idl.c | 6 +++++-
>>>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
>>>> index 190143f..64721ca 100644
>>>> --- a/lib/ovsdb-idl.c
>>>> +++ b/lib/ovsdb-idl.c
>>>> @@ -610,7 +610,11 @@ ovsdb_idl_db_clear(struct ovsdb_idl_db *db)
>>>>          struct ovsdb_idl_table *table = &db->tables[i];
>>>>          struct ovsdb_idl_row *row, *next_row;
>>>>
>>>> -        table->cond_changed = false;
>>>> +        if (table->cond_changed) {
>>>> +            table->cond_changed = false;
>>>> +            ovsdb_idl_condition_clear(&table->condition);
>>>> +        }
>>>> +
>>>>          if (hmap_is_empty(&table->rows)) {
>>>>              continue;
>>>>          }
>>>>
>>>
>>
>> Hi Dumitru>
>> If I understand the problem, it is when the client updated monitor
>> condition and at the same time it received monitor_cond_since reply from
>> server for it's previous conditional monitor request, and the handling
>> of the reply overwrite the flag table->cond_changed from true to false,
>> and because of this, the new condition is not sent to server.
>>
> Hi Han,
> 
> Yes, that's what's happening.
> 
>> However, I don't understand the fix. It seems that the fix should be
>> "don't overwrite the flag", instead of clearing the conditions as well.
>> Also, by simply looking at this code I don't understand why the flag
>> needs to be set to false when clearing data in IDL. What would be the
>> problem if we keep it as what it is? If you understand the details,
>> could you explain more about the fix?
>>
>> In addition, I didn't change this logic when implementing
>> monitor_cond_since in 403a6a0cb003 ("ovsdb-idl: Fast resync from server
>> when connection reset."). In that commit it only reduced the chance of
>> calling ovsdb_idl_db_clear() as an optimization. Before that commit,
>> ovsdb_idl_db_parse_monitor_reply() always calls ovsdb_idl_db_clear() and
>> it would result in same problem.
> You're right, my bad, I messed up the "git blame", sorry about that. It
> was actually introduced by:
> 
> https://github.com/openvswitch/ovs/commit/5351980b047f4dd40be7a59a1e4b910df21eca0a
> 
> I'll fix up the commit message in V2 once we decide on the best approach.
> 
> Hi Andy, Ben,
> 
> As far as I understand, whenever ovsdb_idl_db_clear() is called during
> reconnect, any buffered but unsent conditions should be cleared because
> they will be anyway resent with the following monitoring condition
> update message after reconnect has completed.
> 
> However, clearing the "table->cond_changed" flag is not enough because
> when the IDL client (ovn-controller in my case) will try to set the same
> condition as before (e.g., "true" if ovn-monitor-all=true)
> ovsdb_idl_db_set_condition() will return early:
> 
> https://github.com/openvswitch/ovs/blob/master/lib/ovsdb-idl.c#L1518
> 
> Therefore in the cases when ovsdb_idl_db_clear() is called, we should
> also clear the existing conditions on the DB tables.
> 
> As Han suggested, we could also revert the changes from 5351980b047f
> ("ovsdb-idl: Avoid sending redundant conditional monitoring updates")
> and we wouldn't hit the issue anymore. I'm not sure however if this has
> any other major implications.

If we're clearing existing conditions, we will send empty conditions in
initial monitor_cond after reconnect, AFAIU, i.e. application will stop
receiving any updates.  I think that breaks the logic that was behind commit
5351980b047f.  The application that uses idl library will have to set up
conditions from scratch in order to avoid logic changes.  I think that
reconnection process should not affect application anyhow and we can't force
application to track all the reconnections and reconfigure conditions.

So, I think that it's better to revert commit 5351980b047f instead, because
otherwise we will break its logic anyway.  And, in the end, this was just a
performance optimization.
Also, after commit 403a6a0cb003 ("ovsdb-idl: Fast resync from server when
connection reset."),  ovsdb_idl_db_parse_monitor_reply() doesn't call
ovsdb_idl_db_clear() in most cases, i.e. the performance optimization of
sending same conditions twice might be less important.

Best regards, Ilya Maximets.


More information about the dev mailing list