[ovs-dev] [PATCH v3 2/3] ovsdb-idl: Preserve references for deleted rows.

Dumitru Ceara dceara at redhat.com
Tue Mar 23 22:26:41 UTC 2021

On 3/23/21 8:15 PM, Han Zhou wrote:
> On Tue, Mar 23, 2021 at 5:00 AM Dumitru Ceara <dceara at redhat.com> wrote:
>> On 3/22/21 8:45 PM, Han Zhou wrote:
>>> On Mon, Mar 22, 2021 at 4:27 AM Dumitru Ceara <dceara at redhat.com> wrote:
>>>> On 3/19/21 6:55 PM, Han Zhou wrote:
>>>>> On Fri, Mar 19, 2021 at 1:42 AM Dumitru Ceara <dceara at redhat.com>
> wrote:
>>>>>> On 3/19/21 2:41 AM, Han Zhou wrote:
>>>>>>> On Thu, Mar 18, 2021 at 6:27 PM Han Zhou <hzhou at ovn.org
>>>>>>> <mailto:hzhou at ovn.org>> wrote:
>>>>>>>> On Thu, Mar 18, 2021 at 11:07 AM Ilya Maximets <i.maximets at ovn.org
>>>>>>> <mailto:i.maximets at ovn.org>> wrote:
>>>>>>>>> On 3/12/21 1:07 PM, Dumitru Ceara wrote:
>>>>>>>>>> Considering two DB rows, 'a' from table A and 'b' from table B
>>>>> (with
>>>>>>>>>> column 'ref_a' a reference to table A):
>>>>>>>>>> a = {A._uuid=<U1>}
>>>>>>>>>> b = {B._uuid=<U2>, B.ref_a=<U1>}
>>>>>>>>>> Assuming both records are present in the IDL client's in-memory
>>>>>>> view of
>>>>>>>>>> the database, depending whether row 'b' is also deleted in the
> same
>>>>>>>>>> transaction or not, deletion of row 'a' should generate the
>>>>> following
>>>>>>>>>> tracked changes:
>>>>>>>>>> 1. only row 'a' is deleted:
>>>>>>>>>> - for table A:
>>>>>>>>>>   - deleted records: a = {A._uuid=<U1>}
>>>>>>>>>> - for table B:
>>>>>>>>>>   - updated records: b = {B._uuid=<U2>, B.ref_a=[]}
>>>>>>>>>> 2. row 'a' and row 'b' are deleted in the same update:
>>>>>>>>>> - for table A:
>>>>>>>>>>   - deleted records: a = {A._uuid=<U1>}
>>>>>>>>>> - for table B:
>>>>>>>>>>   - deleted records: b = {B._uuid=<U2>, B.ref_a=<U1>}
>>>>>>>>>> To ensure this, we now delay reparsing row backrefs for deleted
>>>>> rows
>>>>>>>>>> until all updates in the current run have been processed.
>>>>>>>>>> Without this change, in scenario 2 above, the tracked changes for
>>>>>>> table
>>>>>>>>>> B would be:
>>>>>>>>>> - deleted records: b = {B._uuid=<U2>, B.ref_a=[]}
>>>>>>>>>> In particular, for strong references, row 'a' can never be
> deleted
>>>>> in
>>>>>>>>>> a transaction that happens strictly before row 'b' is deleted.
> In
>>>>>>> some
>>>>>>>>>> cases [0] both rows are deleted in the same transaction and
> having
>>>>>>>>>> B.ref_a=[] would violate the integrity of the database from
> client
>>>>>>>>>> perspective.  This would force the client to always validate that
>>>>>>>>>> strong reference fields are non-NULL.  This is not really an
> option
>>>>>>>>>> because the information in the original reference is required for
>>>>>>>>>> incrementally processing the record deletion.
>>>>>>>>>> [0] with ovn-monitor-all=true, the following command triggers a
>>>>> crash
>>>>>>>>>>     in ovn-controller because a strong reference field becomes
>>>>> NULL:
>>>>>>>>>>     $ ovn-nbctl --wait=hv -- lr-add r -- lrp-add r rp
>>>>>>> 00:00:00:00:00:01 <>
>>>>>>>>>>     $ ovn-nbctl lr-del r
>>>>>>>>>> Reported-at: https://bugzilla.redhat.com/1932642
>>>>>>> <https://bugzilla.redhat.com/1932642>
>>>>>>>>>> Fixes: 72aeb243a52a ("ovsdb-idl: Tracking - preserve data for
>>>>>>> deleted rows.")
>>>>>>>>>> Signed-off-by: Dumitru Ceara <dceara at redhat.com
>>>>>>> <mailto:dceara at redhat.com>>
>>>>>>>>>> ---
>>>>>>>>>>  lib/ovsdb-idl.c     |  135 ++++++++++++++++++++++++++--------
>>>>>>>>>>  tests/ovsdb-idl.at <http://ovsdb-idl.at>  |  203
>>>>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>>  tests/test-ovsdb.c  |   50 +++++++++++++
>>>>>>>>>>  tests/test-ovsdb.py |   14 ++++
>>>>>>>>>>  4 files changed, 370 insertions(+), 32 deletions(-)
>>>>>>>>>> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
>>>>>>>>>> index 9e1e787..1542da9 100644
>>>>>>>>>> --- a/lib/ovsdb-idl.c
>>>>>>>>>> +++ b/lib/ovsdb-idl.c
>>>>>>>>>> @@ -92,6 +92,7 @@ struct ovsdb_idl {
>>>>>>>>>>      struct ovsdb_idl_txn *txn;
>>>>>>>>>>      struct hmap outstanding_txns;
>>>>>>>>>>      bool verify_write_only;
>>>>>>>>>> +    struct ovs_list orphan_rows; /* All deleted but still
>>>>>>> referenced rows. */
>>>>>>>>> The name here is a bit confusing.  I mean it's not a list
>>>>>>>>> of orphan rows, at least it's not a list of all the orphan
>>>>>>>>> rows.  So, we should, probably, rename it to something like
>>>>>>>>> 'deleted_rows' instead and also rename all related functions.
>>>>>>>>> What do you think?
>>>>>>>> I agree with Ilya that the "orphan_rows" is confusing. However,
>>>>>>> "deleted_rows" may also be confusing, because the current code also
>>>>>>> maintains deleted rows in the track_list of each table (no matter if
>>>>>>> tracking is enabled or not). If tracking is enabled, the rows are
>>>>>>> destroyed (for real) when ovsdb_idl_track_clear() is called; if not,
>>> it
>>>>>>> is destroyed in ovsdb_idl_row_destroy_postprocess(). The part of
> logic
>>>>>>> was really confusing. This patch changes that behavior further by
>>>>>>> introducing this orphan_rows list as a temporary place (for some
>>>>>>> situations) before moving to the track_list. Since this patch
> already
>>>>>>> did some refactoring (which is great), I'd suggest doing a little
> more
>>>>>>> to avoid the above confusion: for deleted rows, shall we just put
> the
>>>>>>> untracked rows in a "deleted_untracked_rows" list (i.e. rename the
>>>>>>> "orphan_rows" of this patch), and let the "track_list" only hold the
>>>>>>> tracked ones, and update the ovsdb_idl_row_destroy_postprocess() to
>>>>>>> destroy deleted_untracked_rows without worrying the track_list.
>>>>>>> track_list should exist only if tracking is enabled, and should be
>>> taken
>>>>>>> care by ovsdb_idl_track_clear().
>>>>>> Thanks, Ilya and Han, for the review and suggestions.
>>>>>> Renaming 'orphan_rows' definitely makes sense, I'll do that.
> However,
>>>>>> regarding using a different list for tables without change tracking
> I'm
>>>>>> not so sure if that's OK.  Even if the table doesn't have change
>>>>>> tracking its deleted rows should still first go to the "temporary"
>>>>>> orphan_rows list.
>>>>>> For example:
>>>>>> Table T1 (no change tracking):
>>>>>> - Row a = {A.uuid=<U1>}
>>>>>> Table T2 (change tracking enabled):
>>>>>> - Row b = {B.uuid=<U2>, B.ref_to_A=<U1>}
>>>>>> If in a transaction both rows 'a' and 'b' are deleted, and updates
> are
>>>>>> received in this order, if we wouldn't add 'a' to 'orphan_rows' then
>>>>>> we'd be reparsing backrefs for 'a' immediately leading to:
>>>>>> Row b (updated) = {B.uuid=<U2>, B.ref_to_A=None}
>>>>>> Then the update to delete 'b' is processed:
>>>>>> Row b (deleted) = {B.uuid=<U2>, B.ref_to_A=None}
>>>>>> The above is wrong and breaks consistency if ref_to_A is a strong
>>>>> reference.
>>>>> Sorry that I didn't make it clear enough. The "deleted_untracked_rows"
>>> list
>>>>> in my suggestion is just a renaming of your "orphan_rows". The real
>>> change
>>>>> I suggested was that let's don't use it as a temporary place to hold
>>> these
>>>>> rows. We can make it clear that deleted_untracked_rows is for deleted
>>>>> untracked rows and the "track_list" of each table contains only
> tracked
>>>>> rows.
>>>> I think there's some confusion: 'orphan_rows' doesn't contain deleted
>>>> untracked rows.  What it actually stores is "rows deleted in the
> current
>>>> IDL run regardless of change tracking enabled or not".
>>> I think we are saying the same thing from different perspectives. For
>>> "deleted untracked rows" I am not talking about whether change tracking
> is
>>> enabled or not, but about whether the deleted row is in "track_list" or
>>> not. In this patch, a deleted row is in either "track_list" (what I
> called
>>> "tracked") or "orphan_rows" (what I called "deleted and untracked"). You
>>> always call "ovsdb_idl_row_untrack_change" before putting a row to
>>> "orphan_rows", so I think it would be clear to rename "orphan_rows" as
>>> "deleted_untracked_rows".
>> Oh, I see now, it was a misunderstanding indeed.  I thought we were
>> referring to change tracking being enabled for the table or not.  I can
>> change the name to "deleted_untracked_rows" in the next revision.
>>>>> In the above example, row a (without change tracking) will be put to
>>>>> deleted_untracked_rows, but no need to move to "track_list" later.
>>>>> In ovsdb_idl_row_destroy_postprocess, we can destroy
>>>>> deleted_untracked_rows, and no need to touch "track_list".
>>>>> Would this be more clear?
>>>> The problem is with deleted rows that are part of tables with change
>>>> tracking enabled.  Taking the example from the commit log:
>>>> Considering two DB rows, 'a' from table A and 'b' from table B (with
>>>> column 'ref_a' a reference to table A), both tables with change
> tracking
>>>> enabled:
>>>> a = {A._uuid=<U1>}
>>>> b = {B._uuid=<U2>, B.ref_a=<U1>}
>>>> If row 'a' and row 'b' are deleted in the same update then the IDL user
>>>> should be notified of the following events.
>>>> - for table A:
>>>>   - deleted records: a = {A._uuid=<U1>}
>>>> - for table B:
>>>>   - deleted records: b = {B._uuid=<U2>, B.ref_a=<U1>}
>>>> To achieve this we need to make sure all "delete" updates are processed
>>>> before reparsing any backrefs.  This allows to check in
>>>> ovsdb_idl_row_reparse_backrefs() if an arc's source (B in the example
>>>> above) is also deleted in the current run, and skip unparsing B.ref_a
> so
>>>> that that the IDL user (e.g., ovn-controller) can still access it.
>>> In fact I didn't see the check you mentioned above. What's in the code
> is:
>>> +        /* Skip reparsing deleted refs.  If ref's table has change
> tracking
>>> +         * enabled, the users need the original information (and that
> will
>>> be
>>> +         * cleaned up by ovsdb_idl_track_clear()).  Otherwise, 'row' is
>>> +         * already orphan and is not accessible to the IDL user and
> will be
>>> +         * cleaned up at the end of the current IDL run.
>>> +         */
>>> +        if (ovsdb_idl_row_get_seqno(row, OVSDB_IDL_CHANGE_DELETE)) {
>>> +            continue;
>>> +        }
>>> +
>>> 1) "row" is "a" in the above example, and "ref" is "b". It seems you
> should
>>> check if "ref" is deleted?
>> You're right, this should be "ref" instead of "row, thanks for spotting
>> this!  I'll add a test case to exercise this scenario to avoid
>> regressions in the future.
>> Also, looking more at the code in my patch, OVSDB_IDL_CHANGE_DELETE will
>> never be set for rows that are on the "orphan_rows" list, so this
>> condition would never be true.
>> I'll rework this part and make sure that OVSDB_IDL_CHANGE_DELETE is
>> always set for rows that are on the "orphan_rows" list.
>>> 2) The comments says "if ref's table has change tracking enabled", but
> in
>>> fact the above code didn't check if change tracking is enabled or not.
>>> Should we check it?
>> I can try to rephrase the comment.  What I meant was:
>> "Skip reparsing deleted refs.  This is OK because either
>> - ref's table has change tracking enabled and the users need the
>>   original datum in 'ref', which will be cleaned up by
>>   ovsdb_idl_track_clear().
>> OR
>> - ref's table has change tracking disabled, 'ref' is deleted so its
>>   datum will be cleaned up in ovsdb_idl_row_destroy_postprocess()."
> For deleted refs, e.g. R2 reference R1, and both R1 and R2 are deleted,
> then during the call ovsdb_idl_delete_row(R2), the arc R2 -> R1 is already
> deleted, so here we won't even see R2 in ovsdb_idl_row_reparse_backref(R1),
> right? So the above check is useless?

You're right, I can remove this check altogether now that we delay
reparsing of backrefs for all deleted rows.  I'll update it in v4.

>>> On the other hand, I understand the problem you are trying to solve,
> but I
>>> don't see why it is related to my comment regarding clearly
> distinguishing
>>> deleted_untracked_rows v.s. track_list.
>> It's not, with your explanation about deleted_untracked_rows above it's
>> now clear to me.
>>>>>>>> What do you think?
>>>>>>> I took a second look at the patch, and it seems when tracking is not
>>>>>>> enabled it would result in the rows in "orphan_rows" leaked, because
>>> in
>>>>>>> ovsdb_idl_process_orphans() the rows are moved to "track_list" ONLY
> IF
>>>>>>> tracking is enabled for the table. If I am correct, would the above
>>>>>>> suggested refactoring help avoiding such kinds of problems?
>>>>>> Actually, there's no leak AFAICT.  ovsdb_idl_process_orphans() moves
>>>>>> rows to 'track_list' in two scenarios:
>>>>>> a. they're not referenced anymore (ovs_list_is_empty(&row->dst_arcs))
>>>>>> b. they're in a table with change tracking enabled.
>>>>>> If the table doesn't have change tracking enabled and the row is not
>>>>>> referenced anymore it will be added to 'track_list' and will be
> cleaned
>>>>>> up properly by ovsdb_idl_row_destroy_postprocess() (case "a" above).
>>>>>> If the table doesn't have change tracking enabled but the row (let's
>>>>>> call it 'R1') is still referenced (i.e., it became "orphan") then we
>>>>>> shouldn't delete it yet.  When, in the future, the rows referring to
>>>>>> this orphan row are deleted/updated (let's call them 'R2'),
>>>>>> ovsdb_idl_row_clear_arcs(R2, true) is called and the orphan row R1 is
>>>>>> now properly deleted.
>>>>> Ok, it seems no leak at all. I think I was confused by this check:
>>>>> +        /* Orphan rows that are still unreferenced or are part of
>>> tables
>>>>> that
>>>>> +         * have hange tracking enabled should be added to their
> table's
>>>>> +         * 'track_list'.
>>>>> +         */
>>>>> +        if (ovs_list_is_empty(&row->dst_arcs)
>>>>> +                || ovsdb_idl_track_is_set(row->table)) {
>>>>> +            ovsdb_idl_row_track_change(row, OVSDB_IDL_CHANGE_DELETE);
>>>>> +        }
>>>>> For R1, since "ovs_list_is_empty(&row->dst_arcs)" is true, it will be
>>> moved
>>>>> to track_list and finally get destroyed.
>>>> R1 is still referenced by R2 at this point in the example above, that
> is
>>>> there is an arc from R2 to R1.
>>> Ok, my mistake.
>>>>> But I think this "if" can be removed because deleted rows should
> always
>>> go
>>>>> to "track_list" so that they can be destroyed (assume not doing the
>>>>> refactoring I suggested earlier).
>>>> R1 should not be destroyed until R2 is destroyed too.  The complexity
>>>> here exists because for tables with change tracking enabled, the
> current
>>>> design is to always add deleted rows (even if they're still referenced)
>>>> to 'track_list'.  But, even for those, ovsdb_idl_track_clear() will not
>>>> destroy them as long as they're still referenced.
>>> I am confused again. According to the above code:
>>> 1) If "row" is not referenced, ovs_list_is_empty(&row->dst_arcs) is true
>>> and the row will be put to track_list.
>>> 2) If "row" is referenced
>>>    2.1)  if the table is tracked, ovsdb_idl_track_is_set(row->table) is
>>> true and the row will be put to track_list.
>>>    2.2)  if the table is not tracked, the row will NOT be put to
>>> track_list. The row will NOT be put to track_list.
>>> If I understand correctly, in the current implementation (and in the
>>> patch), everything in track_list will finally get released (either in
>>> post_processing or in track_clear).
>> Those rows (R1) are "orphan", their contents were deleted from the IDL
>> database view, but there still exist other non-orphan rows (R2) that
>> refer to them.
>>> For 2.2), since the row is NOT put to track_list, how will it be
> released?
>> When R2 is deleted, ovsdb_idl_delete_row(R2) is called which calls
>> ovsdb_idl_row_clear_arcs(R2, true).
> E.g.:
> In IDL iteration T1: R1 is deleted but R2 is not deleted, so during the
> call ovsdb_idl_row_reparse_backrefs(R1), it calls
> ovsdb_idl_row_clear_arcs(R2), which clears the arc R2 -> R1, so R1 is not
> referenced any more.

In ovsdb_idl_row_reparse_backrefs(R1), we also call
ovsdb_idl_row_parse(R2) after clearing arcs from R2 -> R1.

This ends up calling ovsdb_idl_get_row_arc(R2, R1.uuid) which will find
the orphaned R1 still in the hashtable and readd the arc, R2 -> R1.

This arc is needed for the situation when R1 is reinserted (e.g., at T3)
to also mark R2 as "modified" when that happens.

> In IDL iteration T2: R2 is deleted, but since the arc R2 -> R1 is lost,
> ovsdb_idl_row_clear_arcs(R2, true) will not find R1, and R1 is leaked.
> Did I misunderstand anything again?

As mentioned above, the R2 -> R1 arc still exists and
ovsdb_idl_row_clear_arcs(R2, true) will delete R1.

>> R2 has an arc to R1, and if R2 was the only row referring to R1 then
>> ovsdb_idl_row_clear_arcs(R2, true) calls ovsdb_idl_row_destroy(R1).
>> At this point ovs_list_is_empty(&R1->dst_arcs) == true so R1 is finally
>> removed from its table's hashtable, added to its table's 'track_list'
>> and will be completely freed in ovsdb_idl_row_destroy_postprocess().
>>> And for "But, even for those, ovsdb_idl_track_clear() will not destroy
> them
>>> as long as they're still referenced.", do you mean it is possible that a
>>> row is NOT destroyed even after the IDL iteration after user calling
>>> track_clear()? If yes, why is that necessary? And how is it achieved (in
>>> track_clear() it goes through each track_list and release all
> is_orphaned()
>>> rows, so how could it not get released)?
>> The reason is the same as above, there might still exist an R2 referring
> R1.
>> ovsdb_idl_track_clear(idl) calls ovsdb_idl_track_clear__(idl, false)
>> which means that rows that are on 'track_list' but still referenced will
>> not be freed because they're still in their table's hashtable:
>> https://github.com/openvswitch/ovs/blob/master/lib/ovsdb-idl.c#L1260
> Thanks to this pointer. However, I think R1 will be released as long as it
> is put to track_list. If R1 is deleted but R2 is not deleted, in
> ovsdb_idl_row_reparse_backrefs(R1), it calls ovsdb_idl_row_clear_arcs(R2),
> which clears the arc R2 -> R1, and so in ovsdb_idl_track_clear__() there is
> no reference to R1 and it will be released. Please correct me if I am
> wrong. (I think this behavior is desired, because the deleted R1 is not
> really needed after this IDL iteration after tracked change is accessed by
> user)

As before, R2 -> R1 gets recreated in ovsdb_idl_row_reparse_backrefs(R1)
and orphaned R1 is needed until either R2 is deleted or R1 is reinserted
(e.g., monitor condition change).

If it's OK with you, I can send a v4 incorporating the changes we
discussed until now (no refactoring though).  I'm also adding more test
cases.  Hopefully that makes the review easier.


>>>>> Before this check, ovsdb_idl_row_reparse_backrefs(row) is called to
> make
>>>>> sure the backrefs are cleared, so "ovs_list_is_empty(&row->dst_arcs)"
>>>>> should always be true, right?
>>>> No, even if there was an update to delete R1, R2 might still exist
>>>> (e.g., monitor condition change causes R1 to be deleted from the local
>>>> database view) and then ovsdb_idl_row_reparse_backrefs(R1) would create
>>>> an arc from R2 to R1.
>>>> I agree that the existing change tracking code is complex and can
>>>> potentially be simplified and/or refactored.  However, that's a big
>>>> effort and I think should not be in the scope of this bug fix.  I did
>>>> have to refactor some things but I limited myself to only changing
> parts
>>>> that are absolutely required to fix the bug.
>>> Sure, I agree that refactoring shouldn't be required in this patch. I am
>>> totally ok with it being merged without the refactoring I suggested, but
>>> regardless of the refactoring, could you still have my above concerns
>>> clarified? Thanks a lot for your patience!
>> Thanks for all the comments, they help question the implementation and
>> spot issues!
>>> Thanks,
>>> Han
>> Regards,
>> Dumitru
>>>> For the time being, without this fix ovn-controller is crashing because
>>>> of the inconsistent view of the database the IDL change tracking code
>>>> creates.
>>>> I think this should be fixed as soon as possible.  In my opinion all
>>>> other refactoring and hardening should be done as follow up.
>>>> Regards,
>>>> Dumitru
>>>>>>> Thanks,
>>>>>>> Han
>>>>>>>>> For the rest of the code... It seems fine, but it's really hard
>>>>>>>>> to review and I still have a feeling that something might go
>>>>>>>>> wrong (not more wrong than in current code, though), but I can't
>>>>>>>>> think of any example.  So, it's, probably, OK.
>>>>>> Thanks, I'm not sure how we can improve the stability/reliability of
>>>>>> this code without major refactoring/rewrite.  Until then though, the
>>>>>> fact that we break consistency of the IDL client's view of the
> database
>>>>>> is something that needs to be addressed because it's taken for
> granted
>>>>>> by ovn-controller in quite a few cases.
>>>>>>>>> Best regards, Ilya Maximets.
>>>>>> Regards,
>>>>>> Dumitru

More information about the dev mailing list