[ovs-dev] [PATCH v3 2/3] ovsdb-idl: Preserve references for deleted rows.

Han Zhou hzhou at ovn.org
Tue Mar 23 19:15:43 UTC 2021


On Tue, Mar 23, 2021 at 5:00 AM Dumitru Ceara <dceara at redhat.com> wrote:
>
> On 3/22/21 8:45 PM, Han Zhou wrote:
> > On Mon, Mar 22, 2021 at 4:27 AM Dumitru Ceara <dceara at redhat.com> wrote:
> >>
> >> On 3/19/21 6:55 PM, Han Zhou wrote:
> >>> On Fri, Mar 19, 2021 at 1:42 AM Dumitru Ceara <dceara at redhat.com>
wrote:
> >>>>
> >>>> On 3/19/21 2:41 AM, Han Zhou wrote:
> >>>>>
> >>>>>
> >>>>> On Thu, Mar 18, 2021 at 6:27 PM Han Zhou <hzhou at ovn.org
> >>>>> <mailto:hzhou at ovn.org>> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Thu, Mar 18, 2021 at 11:07 AM Ilya Maximets <i.maximets at ovn.org
> >>>>> <mailto:i.maximets at ovn.org>> wrote:
> >>>>>>>
> >>>>>>> On 3/12/21 1:07 PM, Dumitru Ceara wrote:
> >>>>>>>> Considering two DB rows, 'a' from table A and 'b' from table B
> >>> (with
> >>>>>>>> column 'ref_a' a reference to table A):
> >>>>>>>> a = {A._uuid=<U1>}
> >>>>>>>> b = {B._uuid=<U2>, B.ref_a=<U1>}
> >>>>>>>>
> >>>>>>>> Assuming both records are present in the IDL client's in-memory
> >>>>> view of
> >>>>>>>> the database, depending whether row 'b' is also deleted in the
same
> >>>>>>>> transaction or not, deletion of row 'a' should generate the
> >>> following
> >>>>>>>> tracked changes:
> >>>>>>>>
> >>>>>>>> 1. only row 'a' is deleted:
> >>>>>>>> - for table A:
> >>>>>>>>   - deleted records: a = {A._uuid=<U1>}
> >>>>>>>> - for table B:
> >>>>>>>>   - updated records: b = {B._uuid=<U2>, B.ref_a=[]}
> >>>>>>>>
> >>>>>>>> 2. row 'a' and row 'b' are deleted in the same update:
> >>>>>>>> - for table A:
> >>>>>>>>   - deleted records: a = {A._uuid=<U1>}
> >>>>>>>> - for table B:
> >>>>>>>>   - deleted records: b = {B._uuid=<U2>, B.ref_a=<U1>}
> >>>>>>>>
> >>>>>>>> To ensure this, we now delay reparsing row backrefs for deleted
> >>> rows
> >>>>>>>> until all updates in the current run have been processed.
> >>>>>>>>
> >>>>>>>> Without this change, in scenario 2 above, the tracked changes for
> >>>>> table
> >>>>>>>> B would be:
> >>>>>>>> - deleted records: b = {B._uuid=<U2>, B.ref_a=[]}
> >>>>>>>>
> >>>>>>>> In particular, for strong references, row 'a' can never be
deleted
> >>> in
> >>>>>>>> a transaction that happens strictly before row 'b' is deleted.
In
> >>>>> some
> >>>>>>>> cases [0] both rows are deleted in the same transaction and
having
> >>>>>>>> B.ref_a=[] would violate the integrity of the database from
client
> >>>>>>>> perspective.  This would force the client to always validate that
> >>>>>>>> strong reference fields are non-NULL.  This is not really an
option
> >>>>>>>> because the information in the original reference is required for
> >>>>>>>> incrementally processing the record deletion.
> >>>>>>>>
> >>>>>>>> [0] with ovn-monitor-all=true, the following command triggers a
> >>> crash
> >>>>>>>>     in ovn-controller because a strong reference field becomes
> >>> NULL:
> >>>>>>>>     $ ovn-nbctl --wait=hv -- lr-add r -- lrp-add r rp
> >>>>> 00:00:00:00:00:01 1.0.0.1/24 <http://1.0.0.1/24>
> >>>>>>>>     $ ovn-nbctl lr-del r
> >>>>>>>>
> >>>>>>>> Reported-at: https://bugzilla.redhat.com/1932642
> >>>>> <https://bugzilla.redhat.com/1932642>
> >>>>>>>> Fixes: 72aeb243a52a ("ovsdb-idl: Tracking - preserve data for
> >>>>> deleted rows.")
> >>>>>>>> Signed-off-by: Dumitru Ceara <dceara at redhat.com
> >>>>> <mailto:dceara at redhat.com>>
> >>>>>>>> ---
> >>>>>>>>  lib/ovsdb-idl.c     |  135 ++++++++++++++++++++++++++--------
> >>>>>>>>  tests/ovsdb-idl.at <http://ovsdb-idl.at>  |  203
> >>>>> +++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>>>>  tests/test-ovsdb.c  |   50 +++++++++++++
> >>>>>>>>  tests/test-ovsdb.py |   14 ++++
> >>>>>>>>  4 files changed, 370 insertions(+), 32 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
> >>>>>>>> index 9e1e787..1542da9 100644
> >>>>>>>> --- a/lib/ovsdb-idl.c
> >>>>>>>> +++ b/lib/ovsdb-idl.c
> >>>>>>>> @@ -92,6 +92,7 @@ struct ovsdb_idl {
> >>>>>>>>      struct ovsdb_idl_txn *txn;
> >>>>>>>>      struct hmap outstanding_txns;
> >>>>>>>>      bool verify_write_only;
> >>>>>>>> +    struct ovs_list orphan_rows; /* All deleted but still
> >>>>> referenced rows. */
> >>>>>>>
> >>>>>>> The name here is a bit confusing.  I mean it's not a list
> >>>>>>> of orphan rows, at least it's not a list of all the orphan
> >>>>>>> rows.  So, we should, probably, rename it to something like
> >>>>>>> 'deleted_rows' instead and also rename all related functions.
> >>>>>>>
> >>>>>>> What do you think?
> >>>>>>>
> >>>>>>
> >>>>>> I agree with Ilya that the "orphan_rows" is confusing. However,
> >>>>> "deleted_rows" may also be confusing, because the current code also
> >>>>> maintains deleted rows in the track_list of each table (no matter if
> >>>>> tracking is enabled or not). If tracking is enabled, the rows are
> >>>>> destroyed (for real) when ovsdb_idl_track_clear() is called; if not,
> > it
> >>>>> is destroyed in ovsdb_idl_row_destroy_postprocess(). The part of
logic
> >>>>> was really confusing. This patch changes that behavior further by
> >>>>> introducing this orphan_rows list as a temporary place (for some
> >>>>> situations) before moving to the track_list. Since this patch
already
> >>>>> did some refactoring (which is great), I'd suggest doing a little
more
> >>>>> to avoid the above confusion: for deleted rows, shall we just put
the
> >>>>> untracked rows in a "deleted_untracked_rows" list (i.e. rename the
> >>>>> "orphan_rows" of this patch), and let the "track_list" only hold the
> >>>>> tracked ones, and update the ovsdb_idl_row_destroy_postprocess() to
> >>>>> destroy deleted_untracked_rows without worrying the track_list.
> >>>>> track_list should exist only if tracking is enabled, and should be
> > taken
> >>>>> care by ovsdb_idl_track_clear().
> >>>>
> >>>> Thanks, Ilya and Han, for the review and suggestions.
> >>>>
> >>>> Renaming 'orphan_rows' definitely makes sense, I'll do that.
However,
> >>>> regarding using a different list for tables without change tracking
I'm
> >>>> not so sure if that's OK.  Even if the table doesn't have change
> >>>> tracking its deleted rows should still first go to the "temporary"
> >>>> orphan_rows list.
> >>>>
> >>>> For example:
> >>>>
> >>>> Table T1 (no change tracking):
> >>>> - Row a = {A.uuid=<U1>}
> >>>>
> >>>> Table T2 (change tracking enabled):
> >>>> - Row b = {B.uuid=<U2>, B.ref_to_A=<U1>}
> >>>>
> >>>> If in a transaction both rows 'a' and 'b' are deleted, and updates
are
> >>>> received in this order, if we wouldn't add 'a' to 'orphan_rows' then
> >>>> we'd be reparsing backrefs for 'a' immediately leading to:
> >>>>
> >>>> Row b (updated) = {B.uuid=<U2>, B.ref_to_A=None}
> >>>>
> >>>> Then the update to delete 'b' is processed:
> >>>>
> >>>> Row b (deleted) = {B.uuid=<U2>, B.ref_to_A=None}
> >>>>
> >>>> The above is wrong and breaks consistency if ref_to_A is a strong
> >>> reference.
> >>>>
> >>> Sorry that I didn't make it clear enough. The "deleted_untracked_rows"
> > list
> >>> in my suggestion is just a renaming of your "orphan_rows". The real
> > change
> >>> I suggested was that let's don't use it as a temporary place to hold
> > these
> >>> rows. We can make it clear that deleted_untracked_rows is for deleted
> >>> untracked rows and the "track_list" of each table contains only
tracked
> >>> rows.
> >>
> >> I think there's some confusion: 'orphan_rows' doesn't contain deleted
> >> untracked rows.  What it actually stores is "rows deleted in the
current
> >> IDL run regardless of change tracking enabled or not".
> >>
> > I think we are saying the same thing from different perspectives. For
> > "deleted untracked rows" I am not talking about whether change tracking
is
> > enabled or not, but about whether the deleted row is in "track_list" or
> > not. In this patch, a deleted row is in either "track_list" (what I
called
> > "tracked") or "orphan_rows" (what I called "deleted and untracked"). You
> > always call "ovsdb_idl_row_untrack_change" before putting a row to
> > "orphan_rows", so I think it would be clear to rename "orphan_rows" as
> > "deleted_untracked_rows".
>
> Oh, I see now, it was a misunderstanding indeed.  I thought we were
> referring to change tracking being enabled for the table or not.  I can
> change the name to "deleted_untracked_rows" in the next revision.
>
> >
> >>>
> >>> In the above example, row a (without change tracking) will be put to
> >>> deleted_untracked_rows, but no need to move to "track_list" later.
> >>>
> >>> In ovsdb_idl_row_destroy_postprocess, we can destroy
> >>> deleted_untracked_rows, and no need to touch "track_list".
> >>>
> >>> Would this be more clear?
> >>
> >> The problem is with deleted rows that are part of tables with change
> >> tracking enabled.  Taking the example from the commit log:
> >>
> >> Considering two DB rows, 'a' from table A and 'b' from table B (with
> >> column 'ref_a' a reference to table A), both tables with change
tracking
> >> enabled:
> >> a = {A._uuid=<U1>}
> >> b = {B._uuid=<U2>, B.ref_a=<U1>}
> >>
> >> If row 'a' and row 'b' are deleted in the same update then the IDL user
> >> should be notified of the following events.
> >> - for table A:
> >>   - deleted records: a = {A._uuid=<U1>}
> >> - for table B:
> >>   - deleted records: b = {B._uuid=<U2>, B.ref_a=<U1>}
> >>
> >> To achieve this we need to make sure all "delete" updates are processed
> >> before reparsing any backrefs.  This allows to check in
> >> ovsdb_idl_row_reparse_backrefs() if an arc's source (B in the example
> >> above) is also deleted in the current run, and skip unparsing B.ref_a
so
> >> that that the IDL user (e.g., ovn-controller) can still access it.
> >>
> >
> > In fact I didn't see the check you mentioned above. What's in the code
is:
> > +        /* Skip reparsing deleted refs.  If ref's table has change
tracking
> > +         * enabled, the users need the original information (and that
will
> > be
> > +         * cleaned up by ovsdb_idl_track_clear()).  Otherwise, 'row' is
> > +         * already orphan and is not accessible to the IDL user and
will be
> > +         * cleaned up at the end of the current IDL run.
> > +         */
> > +        if (ovsdb_idl_row_get_seqno(row, OVSDB_IDL_CHANGE_DELETE)) {
> > +            continue;
> > +        }
> > +
> >
> > 1) "row" is "a" in the above example, and "ref" is "b". It seems you
should
> > check if "ref" is deleted?
>
> You're right, this should be "ref" instead of "row, thanks for spotting
> this!  I'll add a test case to exercise this scenario to avoid
> regressions in the future.
>
> Also, looking more at the code in my patch, OVSDB_IDL_CHANGE_DELETE will
> never be set for rows that are on the "orphan_rows" list, so this
> condition would never be true.
>
> I'll rework this part and make sure that OVSDB_IDL_CHANGE_DELETE is
> always set for rows that are on the "orphan_rows" list.
>
> >
> > 2) The comments says "if ref's table has change tracking enabled", but
in
> > fact the above code didn't check if change tracking is enabled or not.
> > Should we check it?
>
> I can try to rephrase the comment.  What I meant was:
>
> "Skip reparsing deleted refs.  This is OK because either
> - ref's table has change tracking enabled and the users need the
>   original datum in 'ref', which will be cleaned up by
>   ovsdb_idl_track_clear().
> OR
> - ref's table has change tracking disabled, 'ref' is deleted so its
>   datum will be cleaned up in ovsdb_idl_row_destroy_postprocess()."
>

For deleted refs, e.g. R2 reference R1, and both R1 and R2 are deleted,
then during the call ovsdb_idl_delete_row(R2), the arc R2 -> R1 is already
deleted, so here we won't even see R2 in ovsdb_idl_row_reparse_backref(R1),
right? So the above check is useless?

> >
> > On the other hand, I understand the problem you are trying to solve,
but I
> > don't see why it is related to my comment regarding clearly
distinguishing
> > deleted_untracked_rows v.s. track_list.
>
> It's not, with your explanation about deleted_untracked_rows above it's
> now clear to me.
>
> >
> >>>
> >>>>>>
> >>>>>> What do you think?
> >>>>>>
> >>>>>
> >>>>> I took a second look at the patch, and it seems when tracking is not
> >>>>> enabled it would result in the rows in "orphan_rows" leaked, because
> > in
> >>>>> ovsdb_idl_process_orphans() the rows are moved to "track_list" ONLY
IF
> >>>>> tracking is enabled for the table. If I am correct, would the above
> >>>>> suggested refactoring help avoiding such kinds of problems?
> >>>>
> >>>> Actually, there's no leak AFAICT.  ovsdb_idl_process_orphans() moves
> >>>> rows to 'track_list' in two scenarios:
> >>>> a. they're not referenced anymore (ovs_list_is_empty(&row->dst_arcs))
> >>>> b. they're in a table with change tracking enabled.
> >>>>
> >>>> If the table doesn't have change tracking enabled and the row is not
> >>>> referenced anymore it will be added to 'track_list' and will be
cleaned
> >>>> up properly by ovsdb_idl_row_destroy_postprocess() (case "a" above).
> >>>>
> >>>> If the table doesn't have change tracking enabled but the row (let's
> >>>> call it 'R1') is still referenced (i.e., it became "orphan") then we
> >>>> shouldn't delete it yet.  When, in the future, the rows referring to
> >>>> this orphan row are deleted/updated (let's call them 'R2'),
> >>>> ovsdb_idl_row_clear_arcs(R2, true) is called and the orphan row R1 is
> >>>> now properly deleted.
> >>>>
> >>>
> >>> Ok, it seems no leak at all. I think I was confused by this check:
> >>> +        /* Orphan rows that are still unreferenced or are part of
> > tables
> >>> that
> >>> +         * have hange tracking enabled should be added to their
table's
> >>> +         * 'track_list'.
> >>> +         */
> >>> +        if (ovs_list_is_empty(&row->dst_arcs)
> >>> +                || ovsdb_idl_track_is_set(row->table)) {
> >>> +            ovsdb_idl_row_track_change(row, OVSDB_IDL_CHANGE_DELETE);
> >>> +        }
> >>>
> >>> For R1, since "ovs_list_is_empty(&row->dst_arcs)" is true, it will be
> > moved
> >>> to track_list and finally get destroyed.
> >>
> >> R1 is still referenced by R2 at this point in the example above, that
is
> >> there is an arc from R2 to R1.
> >
> > Ok, my mistake.
> >
> >>
> >>> But I think this "if" can be removed because deleted rows should
always
> > go
> >>> to "track_list" so that they can be destroyed (assume not doing the
> >>> refactoring I suggested earlier).
> >>
> >> R1 should not be destroyed until R2 is destroyed too.  The complexity
> >> here exists because for tables with change tracking enabled, the
current
> >> design is to always add deleted rows (even if they're still referenced)
> >> to 'track_list'.  But, even for those, ovsdb_idl_track_clear() will not
> >> destroy them as long as they're still referenced.
> >
> > I am confused again. According to the above code:
> > 1) If "row" is not referenced, ovs_list_is_empty(&row->dst_arcs) is true
> > and the row will be put to track_list.
> > 2) If "row" is referenced
> >    2.1)  if the table is tracked, ovsdb_idl_track_is_set(row->table) is
> > true and the row will be put to track_list.
> >    2.2)  if the table is not tracked, the row will NOT be put to
> > track_list. The row will NOT be put to track_list.
> >
> > If I understand correctly, in the current implementation (and in the
> > patch), everything in track_list will finally get released (either in
> > post_processing or in track_clear).
>
> Those rows (R1) are "orphan", their contents were deleted from the IDL
> database view, but there still exist other non-orphan rows (R2) that
> refer to them.
>
> > For 2.2), since the row is NOT put to track_list, how will it be
released?
>
> When R2 is deleted, ovsdb_idl_delete_row(R2) is called which calls
> ovsdb_idl_row_clear_arcs(R2, true).

E.g.:
In IDL iteration T1: R1 is deleted but R2 is not deleted, so during the
call ovsdb_idl_row_reparse_backrefs(R1), it calls
ovsdb_idl_row_clear_arcs(R2), which clears the arc R2 -> R1, so R1 is not
referenced any more.
In IDL iteration T2: R2 is deleted, but since the arc R2 -> R1 is lost,
ovsdb_idl_row_clear_arcs(R2, true) will not find R1, and R1 is leaked.
Did I misunderstand anything again?

>
> R2 has an arc to R1, and if R2 was the only row referring to R1 then
> ovsdb_idl_row_clear_arcs(R2, true) calls ovsdb_idl_row_destroy(R1).
>
> At this point ovs_list_is_empty(&R1->dst_arcs) == true so R1 is finally
> removed from its table's hashtable, added to its table's 'track_list'
> and will be completely freed in ovsdb_idl_row_destroy_postprocess().
>
> >
> > And for "But, even for those, ovsdb_idl_track_clear() will not destroy
them
> > as long as they're still referenced.", do you mean it is possible that a
> > row is NOT destroyed even after the IDL iteration after user calling
> > track_clear()? If yes, why is that necessary? And how is it achieved (in
> > track_clear() it goes through each track_list and release all
is_orphaned()
> > rows, so how could it not get released)?
>
> The reason is the same as above, there might still exist an R2 referring
R1.
>
> ovsdb_idl_track_clear(idl) calls ovsdb_idl_track_clear__(idl, false)
> which means that rows that are on 'track_list' but still referenced will
> not be freed because they're still in their table's hashtable:
>
> https://github.com/openvswitch/ovs/blob/master/lib/ovsdb-idl.c#L1260
>

Thanks to this pointer. However, I think R1 will be released as long as it
is put to track_list. If R1 is deleted but R2 is not deleted, in
ovsdb_idl_row_reparse_backrefs(R1), it calls ovsdb_idl_row_clear_arcs(R2),
which clears the arc R2 -> R1, and so in ovsdb_idl_track_clear__() there is
no reference to R1 and it will be released. Please correct me if I am
wrong. (I think this behavior is desired, because the deleted R1 is not
really needed after this IDL iteration after tracked change is accessed by
user)

> >
> >>
> >>> Before this check, ovsdb_idl_row_reparse_backrefs(row) is called to
make
> >>> sure the backrefs are cleared, so "ovs_list_is_empty(&row->dst_arcs)"
> >>> should always be true, right?
> >>
> >> No, even if there was an update to delete R1, R2 might still exist
> >> (e.g., monitor condition change causes R1 to be deleted from the local
> >> database view) and then ovsdb_idl_row_reparse_backrefs(R1) would create
> >> an arc from R2 to R1.
> >>
> >> I agree that the existing change tracking code is complex and can
> >> potentially be simplified and/or refactored.  However, that's a big
> >> effort and I think should not be in the scope of this bug fix.  I did
> >> have to refactor some things but I limited myself to only changing
parts
> >> that are absolutely required to fix the bug.
> >>
> > Sure, I agree that refactoring shouldn't be required in this patch. I am
> > totally ok with it being merged without the refactoring I suggested, but
> > regardless of the refactoring, could you still have my above concerns
> > clarified? Thanks a lot for your patience!
>
> Thanks for all the comments, they help question the implementation and
> spot issues!
>
> >
> > Thanks,
> > Han
>
> Regards,
> Dumitru
>
> >
> >> For the time being, without this fix ovn-controller is crashing because
> >> of the inconsistent view of the database the IDL change tracking code
> >> creates.
> >>
> >> I think this should be fixed as soon as possible.  In my opinion all
> >> other refactoring and hardening should be done as follow up.
> >>
> >> Regards,
> >> Dumitru
> >>
> >>>
> >>>>>
> >>>>> Thanks,
> >>>>> Han
> >>>>>
> >>>>>>>
> >>>>>>> For the rest of the code... It seems fine, but it's really hard
> >>>>>>> to review and I still have a feeling that something might go
> >>>>>>> wrong (not more wrong than in current code, though), but I can't
> >>>>>>> think of any example.  So, it's, probably, OK.
> >>>>
> >>>> Thanks, I'm not sure how we can improve the stability/reliability of
> >>>> this code without major refactoring/rewrite.  Until then though, the
> >>>> fact that we break consistency of the IDL client's view of the
database
> >>>> is something that needs to be addressed because it's taken for
granted
> >>>> by ovn-controller in quite a few cases.
> >>>>
> >>>>>>>
> >>>>>>> Best regards, Ilya Maximets.
> >>>>
> >>>> Regards,
> >>>> Dumitru
> >>>>
> >>>
> >>
> >
>


More information about the dev mailing list