[ovs-dev] [PATCH v3 2/3] ovsdb-idl: Preserve references for deleted rows.

Dumitru Ceara dceara at redhat.com
Tue Mar 23 12:00:24 UTC 2021

On 3/22/21 8:45 PM, Han Zhou wrote:
> On Mon, Mar 22, 2021 at 4:27 AM Dumitru Ceara <dceara at redhat.com> wrote:
>> On 3/19/21 6:55 PM, Han Zhou wrote:
>>> On Fri, Mar 19, 2021 at 1:42 AM Dumitru Ceara <dceara at redhat.com> wrote:
>>>> On 3/19/21 2:41 AM, Han Zhou wrote:
>>>>> On Thu, Mar 18, 2021 at 6:27 PM Han Zhou <hzhou at ovn.org
>>>>> <mailto:hzhou at ovn.org>> wrote:
>>>>>> On Thu, Mar 18, 2021 at 11:07 AM Ilya Maximets <i.maximets at ovn.org
>>>>> <mailto:i.maximets at ovn.org>> wrote:
>>>>>>> On 3/12/21 1:07 PM, Dumitru Ceara wrote:
>>>>>>>> Considering two DB rows, 'a' from table A and 'b' from table B
>>> (with
>>>>>>>> column 'ref_a' a reference to table A):
>>>>>>>> a = {A._uuid=<U1>}
>>>>>>>> b = {B._uuid=<U2>, B.ref_a=<U1>}
>>>>>>>> Assuming both records are present in the IDL client's in-memory
>>>>> view of
>>>>>>>> the database, depending whether row 'b' is also deleted in the same
>>>>>>>> transaction or not, deletion of row 'a' should generate the
>>> following
>>>>>>>> tracked changes:
>>>>>>>> 1. only row 'a' is deleted:
>>>>>>>> - for table A:
>>>>>>>>   - deleted records: a = {A._uuid=<U1>}
>>>>>>>> - for table B:
>>>>>>>>   - updated records: b = {B._uuid=<U2>, B.ref_a=[]}
>>>>>>>> 2. row 'a' and row 'b' are deleted in the same update:
>>>>>>>> - for table A:
>>>>>>>>   - deleted records: a = {A._uuid=<U1>}
>>>>>>>> - for table B:
>>>>>>>>   - deleted records: b = {B._uuid=<U2>, B.ref_a=<U1>}
>>>>>>>> To ensure this, we now delay reparsing row backrefs for deleted
>>> rows
>>>>>>>> until all updates in the current run have been processed.
>>>>>>>> Without this change, in scenario 2 above, the tracked changes for
>>>>> table
>>>>>>>> B would be:
>>>>>>>> - deleted records: b = {B._uuid=<U2>, B.ref_a=[]}
>>>>>>>> In particular, for strong references, row 'a' can never be deleted
>>> in
>>>>>>>> a transaction that happens strictly before row 'b' is deleted.  In
>>>>> some
>>>>>>>> cases [0] both rows are deleted in the same transaction and having
>>>>>>>> B.ref_a=[] would violate the integrity of the database from client
>>>>>>>> perspective.  This would force the client to always validate that
>>>>>>>> strong reference fields are non-NULL.  This is not really an option
>>>>>>>> because the information in the original reference is required for
>>>>>>>> incrementally processing the record deletion.
>>>>>>>> [0] with ovn-monitor-all=true, the following command triggers a
>>> crash
>>>>>>>>     in ovn-controller because a strong reference field becomes
>>> NULL:
>>>>>>>>     $ ovn-nbctl --wait=hv -- lr-add r -- lrp-add r rp
>>>>> 00:00:00:00:00:01 <>
>>>>>>>>     $ ovn-nbctl lr-del r
>>>>>>>> Reported-at: https://bugzilla.redhat.com/1932642
>>>>> <https://bugzilla.redhat.com/1932642>
>>>>>>>> Fixes: 72aeb243a52a ("ovsdb-idl: Tracking - preserve data for
>>>>> deleted rows.")
>>>>>>>> Signed-off-by: Dumitru Ceara <dceara at redhat.com
>>>>> <mailto:dceara at redhat.com>>
>>>>>>>> ---
>>>>>>>>  lib/ovsdb-idl.c     |  135 ++++++++++++++++++++++++++--------
>>>>>>>>  tests/ovsdb-idl.at <http://ovsdb-idl.at>  |  203
>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>  tests/test-ovsdb.c  |   50 +++++++++++++
>>>>>>>>  tests/test-ovsdb.py |   14 ++++
>>>>>>>>  4 files changed, 370 insertions(+), 32 deletions(-)
>>>>>>>> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
>>>>>>>> index 9e1e787..1542da9 100644
>>>>>>>> --- a/lib/ovsdb-idl.c
>>>>>>>> +++ b/lib/ovsdb-idl.c
>>>>>>>> @@ -92,6 +92,7 @@ struct ovsdb_idl {
>>>>>>>>      struct ovsdb_idl_txn *txn;
>>>>>>>>      struct hmap outstanding_txns;
>>>>>>>>      bool verify_write_only;
>>>>>>>> +    struct ovs_list orphan_rows; /* All deleted but still
>>>>> referenced rows. */
>>>>>>> The name here is a bit confusing.  I mean it's not a list
>>>>>>> of orphan rows, at least it's not a list of all the orphan
>>>>>>> rows.  So, we should, probably, rename it to something like
>>>>>>> 'deleted_rows' instead and also rename all related functions.
>>>>>>> What do you think?
>>>>>> I agree with Ilya that the "orphan_rows" is confusing. However,
>>>>> "deleted_rows" may also be confusing, because the current code also
>>>>> maintains deleted rows in the track_list of each table (no matter if
>>>>> tracking is enabled or not). If tracking is enabled, the rows are
>>>>> destroyed (for real) when ovsdb_idl_track_clear() is called; if not,
> it
>>>>> is destroyed in ovsdb_idl_row_destroy_postprocess(). The part of logic
>>>>> was really confusing. This patch changes that behavior further by
>>>>> introducing this orphan_rows list as a temporary place (for some
>>>>> situations) before moving to the track_list. Since this patch already
>>>>> did some refactoring (which is great), I'd suggest doing a little more
>>>>> to avoid the above confusion: for deleted rows, shall we just put the
>>>>> untracked rows in a "deleted_untracked_rows" list (i.e. rename the
>>>>> "orphan_rows" of this patch), and let the "track_list" only hold the
>>>>> tracked ones, and update the ovsdb_idl_row_destroy_postprocess() to
>>>>> destroy deleted_untracked_rows without worrying the track_list.
>>>>> track_list should exist only if tracking is enabled, and should be
> taken
>>>>> care by ovsdb_idl_track_clear().
>>>> Thanks, Ilya and Han, for the review and suggestions.
>>>> Renaming 'orphan_rows' definitely makes sense, I'll do that.  However,
>>>> regarding using a different list for tables without change tracking I'm
>>>> not so sure if that's OK.  Even if the table doesn't have change
>>>> tracking its deleted rows should still first go to the "temporary"
>>>> orphan_rows list.
>>>> For example:
>>>> Table T1 (no change tracking):
>>>> - Row a = {A.uuid=<U1>}
>>>> Table T2 (change tracking enabled):
>>>> - Row b = {B.uuid=<U2>, B.ref_to_A=<U1>}
>>>> If in a transaction both rows 'a' and 'b' are deleted, and updates are
>>>> received in this order, if we wouldn't add 'a' to 'orphan_rows' then
>>>> we'd be reparsing backrefs for 'a' immediately leading to:
>>>> Row b (updated) = {B.uuid=<U2>, B.ref_to_A=None}
>>>> Then the update to delete 'b' is processed:
>>>> Row b (deleted) = {B.uuid=<U2>, B.ref_to_A=None}
>>>> The above is wrong and breaks consistency if ref_to_A is a strong
>>> reference.
>>> Sorry that I didn't make it clear enough. The "deleted_untracked_rows"
> list
>>> in my suggestion is just a renaming of your "orphan_rows". The real
> change
>>> I suggested was that let's don't use it as a temporary place to hold
> these
>>> rows. We can make it clear that deleted_untracked_rows is for deleted
>>> untracked rows and the "track_list" of each table contains only tracked
>>> rows.
>> I think there's some confusion: 'orphan_rows' doesn't contain deleted
>> untracked rows.  What it actually stores is "rows deleted in the current
>> IDL run regardless of change tracking enabled or not".
> I think we are saying the same thing from different perspectives. For
> "deleted untracked rows" I am not talking about whether change tracking is
> enabled or not, but about whether the deleted row is in "track_list" or
> not. In this patch, a deleted row is in either "track_list" (what I called
> "tracked") or "orphan_rows" (what I called "deleted and untracked"). You
> always call "ovsdb_idl_row_untrack_change" before putting a row to
> "orphan_rows", so I think it would be clear to rename "orphan_rows" as
> "deleted_untracked_rows".

Oh, I see now, it was a misunderstanding indeed.  I thought we were
referring to change tracking being enabled for the table or not.  I can
change the name to "deleted_untracked_rows" in the next revision.

>>> In the above example, row a (without change tracking) will be put to
>>> deleted_untracked_rows, but no need to move to "track_list" later.
>>> In ovsdb_idl_row_destroy_postprocess, we can destroy
>>> deleted_untracked_rows, and no need to touch "track_list".
>>> Would this be more clear?
>> The problem is with deleted rows that are part of tables with change
>> tracking enabled.  Taking the example from the commit log:
>> Considering two DB rows, 'a' from table A and 'b' from table B (with
>> column 'ref_a' a reference to table A), both tables with change tracking
>> enabled:
>> a = {A._uuid=<U1>}
>> b = {B._uuid=<U2>, B.ref_a=<U1>}
>> If row 'a' and row 'b' are deleted in the same update then the IDL user
>> should be notified of the following events.
>> - for table A:
>>   - deleted records: a = {A._uuid=<U1>}
>> - for table B:
>>   - deleted records: b = {B._uuid=<U2>, B.ref_a=<U1>}
>> To achieve this we need to make sure all "delete" updates are processed
>> before reparsing any backrefs.  This allows to check in
>> ovsdb_idl_row_reparse_backrefs() if an arc's source (B in the example
>> above) is also deleted in the current run, and skip unparsing B.ref_a so
>> that that the IDL user (e.g., ovn-controller) can still access it.
> In fact I didn't see the check you mentioned above. What's in the code is:
> +        /* Skip reparsing deleted refs.  If ref's table has change tracking
> +         * enabled, the users need the original information (and that will
> be
> +         * cleaned up by ovsdb_idl_track_clear()).  Otherwise, 'row' is
> +         * already orphan and is not accessible to the IDL user and will be
> +         * cleaned up at the end of the current IDL run.
> +         */
> +        if (ovsdb_idl_row_get_seqno(row, OVSDB_IDL_CHANGE_DELETE)) {
> +            continue;
> +        }
> +
> 1) "row" is "a" in the above example, and "ref" is "b". It seems you should
> check if "ref" is deleted?

You're right, this should be "ref" instead of "row, thanks for spotting
this!  I'll add a test case to exercise this scenario to avoid
regressions in the future.

Also, looking more at the code in my patch, OVSDB_IDL_CHANGE_DELETE will
never be set for rows that are on the "orphan_rows" list, so this
condition would never be true.

I'll rework this part and make sure that OVSDB_IDL_CHANGE_DELETE is
always set for rows that are on the "orphan_rows" list.

> 2) The comments says "if ref's table has change tracking enabled", but in
> fact the above code didn't check if change tracking is enabled or not.
> Should we check it?

I can try to rephrase the comment.  What I meant was:

"Skip reparsing deleted refs.  This is OK because either
- ref's table has change tracking enabled and the users need the
  original datum in 'ref', which will be cleaned up by
- ref's table has change tracking disabled, 'ref' is deleted so its
  datum will be cleaned up in ovsdb_idl_row_destroy_postprocess()."

> On the other hand, I understand the problem you are trying to solve, but I
> don't see why it is related to my comment regarding clearly distinguishing
> deleted_untracked_rows v.s. track_list.

It's not, with your explanation about deleted_untracked_rows above it's
now clear to me.

>>>>>> What do you think?
>>>>> I took a second look at the patch, and it seems when tracking is not
>>>>> enabled it would result in the rows in "orphan_rows" leaked, because
> in
>>>>> ovsdb_idl_process_orphans() the rows are moved to "track_list" ONLY IF
>>>>> tracking is enabled for the table. If I am correct, would the above
>>>>> suggested refactoring help avoiding such kinds of problems?
>>>> Actually, there's no leak AFAICT.  ovsdb_idl_process_orphans() moves
>>>> rows to 'track_list' in two scenarios:
>>>> a. they're not referenced anymore (ovs_list_is_empty(&row->dst_arcs))
>>>> b. they're in a table with change tracking enabled.
>>>> If the table doesn't have change tracking enabled and the row is not
>>>> referenced anymore it will be added to 'track_list' and will be cleaned
>>>> up properly by ovsdb_idl_row_destroy_postprocess() (case "a" above).
>>>> If the table doesn't have change tracking enabled but the row (let's
>>>> call it 'R1') is still referenced (i.e., it became "orphan") then we
>>>> shouldn't delete it yet.  When, in the future, the rows referring to
>>>> this orphan row are deleted/updated (let's call them 'R2'),
>>>> ovsdb_idl_row_clear_arcs(R2, true) is called and the orphan row R1 is
>>>> now properly deleted.
>>> Ok, it seems no leak at all. I think I was confused by this check:
>>> +        /* Orphan rows that are still unreferenced or are part of
> tables
>>> that
>>> +         * have hange tracking enabled should be added to their table's
>>> +         * 'track_list'.
>>> +         */
>>> +        if (ovs_list_is_empty(&row->dst_arcs)
>>> +                || ovsdb_idl_track_is_set(row->table)) {
>>> +            ovsdb_idl_row_track_change(row, OVSDB_IDL_CHANGE_DELETE);
>>> +        }
>>> For R1, since "ovs_list_is_empty(&row->dst_arcs)" is true, it will be
> moved
>>> to track_list and finally get destroyed.
>> R1 is still referenced by R2 at this point in the example above, that is
>> there is an arc from R2 to R1.
> Ok, my mistake.
>>> But I think this "if" can be removed because deleted rows should always
> go
>>> to "track_list" so that they can be destroyed (assume not doing the
>>> refactoring I suggested earlier).
>> R1 should not be destroyed until R2 is destroyed too.  The complexity
>> here exists because for tables with change tracking enabled, the current
>> design is to always add deleted rows (even if they're still referenced)
>> to 'track_list'.  But, even for those, ovsdb_idl_track_clear() will not
>> destroy them as long as they're still referenced.
> I am confused again. According to the above code:
> 1) If "row" is not referenced, ovs_list_is_empty(&row->dst_arcs) is true
> and the row will be put to track_list.
> 2) If "row" is referenced
>    2.1)  if the table is tracked, ovsdb_idl_track_is_set(row->table) is
> true and the row will be put to track_list.
>    2.2)  if the table is not tracked, the row will NOT be put to
> track_list. The row will NOT be put to track_list.
> If I understand correctly, in the current implementation (and in the
> patch), everything in track_list will finally get released (either in
> post_processing or in track_clear).

Those rows (R1) are "orphan", their contents were deleted from the IDL
database view, but there still exist other non-orphan rows (R2) that
refer to them.

> For 2.2), since the row is NOT put to track_list, how will it be released?

When R2 is deleted, ovsdb_idl_delete_row(R2) is called which calls
ovsdb_idl_row_clear_arcs(R2, true).

R2 has an arc to R1, and if R2 was the only row referring to R1 then
ovsdb_idl_row_clear_arcs(R2, true) calls ovsdb_idl_row_destroy(R1).

At this point ovs_list_is_empty(&R1->dst_arcs) == true so R1 is finally
removed from its table's hashtable, added to its table's 'track_list'
and will be completely freed in ovsdb_idl_row_destroy_postprocess().

> And for "But, even for those, ovsdb_idl_track_clear() will not destroy them
> as long as they're still referenced.", do you mean it is possible that a
> row is NOT destroyed even after the IDL iteration after user calling
> track_clear()? If yes, why is that necessary? And how is it achieved (in
> track_clear() it goes through each track_list and release all is_orphaned()
> rows, so how could it not get released)?

The reason is the same as above, there might still exist an R2 referring R1.

ovsdb_idl_track_clear(idl) calls ovsdb_idl_track_clear__(idl, false)
which means that rows that are on 'track_list' but still referenced will
not be freed because they're still in their table's hashtable:


>>> Before this check, ovsdb_idl_row_reparse_backrefs(row) is called to make
>>> sure the backrefs are cleared, so "ovs_list_is_empty(&row->dst_arcs)"
>>> should always be true, right?
>> No, even if there was an update to delete R1, R2 might still exist
>> (e.g., monitor condition change causes R1 to be deleted from the local
>> database view) and then ovsdb_idl_row_reparse_backrefs(R1) would create
>> an arc from R2 to R1.
>> I agree that the existing change tracking code is complex and can
>> potentially be simplified and/or refactored.  However, that's a big
>> effort and I think should not be in the scope of this bug fix.  I did
>> have to refactor some things but I limited myself to only changing parts
>> that are absolutely required to fix the bug.
> Sure, I agree that refactoring shouldn't be required in this patch. I am
> totally ok with it being merged without the refactoring I suggested, but
> regardless of the refactoring, could you still have my above concerns
> clarified? Thanks a lot for your patience!

Thanks for all the comments, they help question the implementation and
spot issues!

> Thanks,
> Han


>> For the time being, without this fix ovn-controller is crashing because
>> of the inconsistent view of the database the IDL change tracking code
>> creates.
>> I think this should be fixed as soon as possible.  In my opinion all
>> other refactoring and hardening should be done as follow up.
>> Regards,
>> Dumitru
>>>>> Thanks,
>>>>> Han
>>>>>>> For the rest of the code... It seems fine, but it's really hard
>>>>>>> to review and I still have a feeling that something might go
>>>>>>> wrong (not more wrong than in current code, though), but I can't
>>>>>>> think of any example.  So, it's, probably, OK.
>>>> Thanks, I'm not sure how we can improve the stability/reliability of
>>>> this code without major refactoring/rewrite.  Until then though, the
>>>> fact that we break consistency of the IDL client's view of the database
>>>> is something that needs to be addressed because it's taken for granted
>>>> by ovn-controller in quite a few cases.
>>>>>>> Best regards, Ilya Maximets.
>>>> Regards,
>>>> Dumitru

More information about the dev mailing list