[ovs-dev] [PATCH v3 2/3] ovsdb-idl: Preserve references for deleted rows.

Dumitru Ceara dceara at redhat.com
Mon Mar 22 11:27:34 UTC 2021

On 3/19/21 6:55 PM, Han Zhou wrote:
> On Fri, Mar 19, 2021 at 1:42 AM Dumitru Ceara <dceara at redhat.com> wrote:
>> On 3/19/21 2:41 AM, Han Zhou wrote:
>>> On Thu, Mar 18, 2021 at 6:27 PM Han Zhou <hzhou at ovn.org
>>> <mailto:hzhou at ovn.org>> wrote:
>>>> On Thu, Mar 18, 2021 at 11:07 AM Ilya Maximets <i.maximets at ovn.org
>>> <mailto:i.maximets at ovn.org>> wrote:
>>>>> On 3/12/21 1:07 PM, Dumitru Ceara wrote:
>>>>>> Considering two DB rows, 'a' from table A and 'b' from table B
> (with
>>>>>> column 'ref_a' a reference to table A):
>>>>>> a = {A._uuid=<U1>}
>>>>>> b = {B._uuid=<U2>, B.ref_a=<U1>}
>>>>>> Assuming both records are present in the IDL client's in-memory
>>> view of
>>>>>> the database, depending whether row 'b' is also deleted in the same
>>>>>> transaction or not, deletion of row 'a' should generate the
> following
>>>>>> tracked changes:
>>>>>> 1. only row 'a' is deleted:
>>>>>> - for table A:
>>>>>>   - deleted records: a = {A._uuid=<U1>}
>>>>>> - for table B:
>>>>>>   - updated records: b = {B._uuid=<U2>, B.ref_a=[]}
>>>>>> 2. row 'a' and row 'b' are deleted in the same update:
>>>>>> - for table A:
>>>>>>   - deleted records: a = {A._uuid=<U1>}
>>>>>> - for table B:
>>>>>>   - deleted records: b = {B._uuid=<U2>, B.ref_a=<U1>}
>>>>>> To ensure this, we now delay reparsing row backrefs for deleted
> rows
>>>>>> until all updates in the current run have been processed.
>>>>>> Without this change, in scenario 2 above, the tracked changes for
>>> table
>>>>>> B would be:
>>>>>> - deleted records: b = {B._uuid=<U2>, B.ref_a=[]}
>>>>>> In particular, for strong references, row 'a' can never be deleted
> in
>>>>>> a transaction that happens strictly before row 'b' is deleted.  In
>>> some
>>>>>> cases [0] both rows are deleted in the same transaction and having
>>>>>> B.ref_a=[] would violate the integrity of the database from client
>>>>>> perspective.  This would force the client to always validate that
>>>>>> strong reference fields are non-NULL.  This is not really an option
>>>>>> because the information in the original reference is required for
>>>>>> incrementally processing the record deletion.
>>>>>> [0] with ovn-monitor-all=true, the following command triggers a
> crash
>>>>>>     in ovn-controller because a strong reference field becomes
>>>>>>     $ ovn-nbctl --wait=hv -- lr-add r -- lrp-add r rp
>>> 00:00:00:00:00:01 <>
>>>>>>     $ ovn-nbctl lr-del r
>>>>>> Reported-at: https://bugzilla.redhat.com/1932642
>>> <https://bugzilla.redhat.com/1932642>
>>>>>> Fixes: 72aeb243a52a ("ovsdb-idl: Tracking - preserve data for
>>> deleted rows.")
>>>>>> Signed-off-by: Dumitru Ceara <dceara at redhat.com
>>> <mailto:dceara at redhat.com>>
>>>>>> ---
>>>>>>  lib/ovsdb-idl.c     |  135 ++++++++++++++++++++++++++--------
>>>>>>  tests/ovsdb-idl.at <http://ovsdb-idl.at>  |  203
>>> +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>  tests/test-ovsdb.c  |   50 +++++++++++++
>>>>>>  tests/test-ovsdb.py |   14 ++++
>>>>>>  4 files changed, 370 insertions(+), 32 deletions(-)
>>>>>> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
>>>>>> index 9e1e787..1542da9 100644
>>>>>> --- a/lib/ovsdb-idl.c
>>>>>> +++ b/lib/ovsdb-idl.c
>>>>>> @@ -92,6 +92,7 @@ struct ovsdb_idl {
>>>>>>      struct ovsdb_idl_txn *txn;
>>>>>>      struct hmap outstanding_txns;
>>>>>>      bool verify_write_only;
>>>>>> +    struct ovs_list orphan_rows; /* All deleted but still
>>> referenced rows. */
>>>>> The name here is a bit confusing.  I mean it's not a list
>>>>> of orphan rows, at least it's not a list of all the orphan
>>>>> rows.  So, we should, probably, rename it to something like
>>>>> 'deleted_rows' instead and also rename all related functions.
>>>>> What do you think?
>>>> I agree with Ilya that the "orphan_rows" is confusing. However,
>>> "deleted_rows" may also be confusing, because the current code also
>>> maintains deleted rows in the track_list of each table (no matter if
>>> tracking is enabled or not). If tracking is enabled, the rows are
>>> destroyed (for real) when ovsdb_idl_track_clear() is called; if not, it
>>> is destroyed in ovsdb_idl_row_destroy_postprocess(). The part of logic
>>> was really confusing. This patch changes that behavior further by
>>> introducing this orphan_rows list as a temporary place (for some
>>> situations) before moving to the track_list. Since this patch already
>>> did some refactoring (which is great), I'd suggest doing a little more
>>> to avoid the above confusion: for deleted rows, shall we just put the
>>> untracked rows in a "deleted_untracked_rows" list (i.e. rename the
>>> "orphan_rows" of this patch), and let the "track_list" only hold the
>>> tracked ones, and update the ovsdb_idl_row_destroy_postprocess() to
>>> destroy deleted_untracked_rows without worrying the track_list.
>>> track_list should exist only if tracking is enabled, and should be taken
>>> care by ovsdb_idl_track_clear().
>> Thanks, Ilya and Han, for the review and suggestions.
>> Renaming 'orphan_rows' definitely makes sense, I'll do that.  However,
>> regarding using a different list for tables without change tracking I'm
>> not so sure if that's OK.  Even if the table doesn't have change
>> tracking its deleted rows should still first go to the "temporary"
>> orphan_rows list.
>> For example:
>> Table T1 (no change tracking):
>> - Row a = {A.uuid=<U1>}
>> Table T2 (change tracking enabled):
>> - Row b = {B.uuid=<U2>, B.ref_to_A=<U1>}
>> If in a transaction both rows 'a' and 'b' are deleted, and updates are
>> received in this order, if we wouldn't add 'a' to 'orphan_rows' then
>> we'd be reparsing backrefs for 'a' immediately leading to:
>> Row b (updated) = {B.uuid=<U2>, B.ref_to_A=None}
>> Then the update to delete 'b' is processed:
>> Row b (deleted) = {B.uuid=<U2>, B.ref_to_A=None}
>> The above is wrong and breaks consistency if ref_to_A is a strong
> reference.
> Sorry that I didn't make it clear enough. The "deleted_untracked_rows" list
> in my suggestion is just a renaming of your "orphan_rows". The real change
> I suggested was that let's don't use it as a temporary place to hold these
> rows. We can make it clear that deleted_untracked_rows is for deleted
> untracked rows and the "track_list" of each table contains only tracked
> rows.

I think there's some confusion: 'orphan_rows' doesn't contain deleted
untracked rows.  What it actually stores is "rows deleted in the current
IDL run regardless of change tracking enabled or not".

> In the above example, row a (without change tracking) will be put to
> deleted_untracked_rows, but no need to move to "track_list" later.
> In ovsdb_idl_row_destroy_postprocess, we can destroy
> deleted_untracked_rows, and no need to touch "track_list".
> Would this be more clear?

The problem is with deleted rows that are part of tables with change
tracking enabled.  Taking the example from the commit log:

Considering two DB rows, 'a' from table A and 'b' from table B (with
column 'ref_a' a reference to table A), both tables with change tracking
a = {A._uuid=<U1>}
b = {B._uuid=<U2>, B.ref_a=<U1>}

If row 'a' and row 'b' are deleted in the same update then the IDL user
should be notified of the following events.
- for table A:
  - deleted records: a = {A._uuid=<U1>}
- for table B:
  - deleted records: b = {B._uuid=<U2>, B.ref_a=<U1>}

To achieve this we need to make sure all "delete" updates are processed
before reparsing any backrefs.  This allows to check in
ovsdb_idl_row_reparse_backrefs() if an arc's source (B in the example
above) is also deleted in the current run, and skip unparsing B.ref_a so
that that the IDL user (e.g., ovn-controller) can still access it.

>>>> What do you think?
>>> I took a second look at the patch, and it seems when tracking is not
>>> enabled it would result in the rows in "orphan_rows" leaked, because in
>>> ovsdb_idl_process_orphans() the rows are moved to "track_list" ONLY IF
>>> tracking is enabled for the table. If I am correct, would the above
>>> suggested refactoring help avoiding such kinds of problems?
>> Actually, there's no leak AFAICT.  ovsdb_idl_process_orphans() moves
>> rows to 'track_list' in two scenarios:
>> a. they're not referenced anymore (ovs_list_is_empty(&row->dst_arcs))
>> b. they're in a table with change tracking enabled.
>> If the table doesn't have change tracking enabled and the row is not
>> referenced anymore it will be added to 'track_list' and will be cleaned
>> up properly by ovsdb_idl_row_destroy_postprocess() (case "a" above).
>> If the table doesn't have change tracking enabled but the row (let's
>> call it 'R1') is still referenced (i.e., it became "orphan") then we
>> shouldn't delete it yet.  When, in the future, the rows referring to
>> this orphan row are deleted/updated (let's call them 'R2'),
>> ovsdb_idl_row_clear_arcs(R2, true) is called and the orphan row R1 is
>> now properly deleted.
> Ok, it seems no leak at all. I think I was confused by this check:
> +        /* Orphan rows that are still unreferenced or are part of tables
> that
> +         * have hange tracking enabled should be added to their table's
> +         * 'track_list'.
> +         */
> +        if (ovs_list_is_empty(&row->dst_arcs)
> +                || ovsdb_idl_track_is_set(row->table)) {
> +            ovsdb_idl_row_track_change(row, OVSDB_IDL_CHANGE_DELETE);
> +        }
> For R1, since "ovs_list_is_empty(&row->dst_arcs)" is true, it will be moved
> to track_list and finally get destroyed.

R1 is still referenced by R2 at this point in the example above, that is
there is an arc from R2 to R1.

> But I think this "if" can be removed because deleted rows should always go
> to "track_list" so that they can be destroyed (assume not doing the
> refactoring I suggested earlier).

R1 should not be destroyed until R2 is destroyed too.  The complexity
here exists because for tables with change tracking enabled, the current
design is to always add deleted rows (even if they're still referenced)
to 'track_list'.  But, even for those, ovsdb_idl_track_clear() will not
destroy them as long as they're still referenced.

> Before this check, ovsdb_idl_row_reparse_backrefs(row) is called to make
> sure the backrefs are cleared, so "ovs_list_is_empty(&row->dst_arcs)"
> should always be true, right?

No, even if there was an update to delete R1, R2 might still exist
(e.g., monitor condition change causes R1 to be deleted from the local
database view) and then ovsdb_idl_row_reparse_backrefs(R1) would create
an arc from R2 to R1.

I agree that the existing change tracking code is complex and can
potentially be simplified and/or refactored.  However, that's a big
effort and I think should not be in the scope of this bug fix.  I did
have to refactor some things but I limited myself to only changing parts
that are absolutely required to fix the bug.

For the time being, without this fix ovn-controller is crashing because
of the inconsistent view of the database the IDL change tracking code

I think this should be fixed as soon as possible.  In my opinion all
other refactoring and hardening should be done as follow up.


>>> Thanks,
>>> Han
>>>>> For the rest of the code... It seems fine, but it's really hard
>>>>> to review and I still have a feeling that something might go
>>>>> wrong (not more wrong than in current code, though), but I can't
>>>>> think of any example.  So, it's, probably, OK.
>> Thanks, I'm not sure how we can improve the stability/reliability of
>> this code without major refactoring/rewrite.  Until then though, the
>> fact that we break consistency of the IDL client's view of the database
>> is something that needs to be addressed because it's taken for granted
>> by ovn-controller in quite a few cases.
>>>>> Best regards, Ilya Maximets.
>> Regards,
>> Dumitru

More information about the dev mailing list