[ovs-dev] [PATCH ovn 0/2] Make ovn-northd recover from NB/SB inconsistencies.

Numan Siddique numans at ovn.org
Wed Apr 29 19:17:23 UTC 2020


On Wed, Apr 29, 2020 at 9:57 PM Dumitru Ceara <dceara at redhat.com> wrote:

> In some cases, if the NB/SB databases ovn-northd connects to are
> inconsistent, ovn-northd might generate transactions that fail
> continuously due to failed integrity checks on the SB database server.
>
> The first patch of the series addresses inconsistencies due to stale
> Datapath_Binding records in the SB database.
>
> The second patch of the series addresses inconsistencies due to stale
> tunnel_key values in various SB database table records.
>
> Reported-by: Dan Williams <dcbw at redhat.com>
> Reported-at: https://bugzilla.redhat.com/1828637
> Signed-off-by: Dumitru Ceara <dceara at redhat.com>
>
> Dumitru Ceara (2):
>       ovn-northd: Clear SB records depending on stale datapaths.
>       ovn-northd: Fix tunnel_key allocation for SB records.
>

Hi Dumitru,

I did some testing in my ovn-fake-multinode setup. These are my
observations.

I created a logical switch sw0 with 4 logical ports. So the next tunnel key
should be 5.
I stopped ovn-northd and  created a couple of port_binding entries manually
using
"ovn-sbctl create port_binding"  with tunnel keys 5 and 6.
I also created a logical port in sw0. Then I started ovn-northd. ovn-northd
deletes the port binding
entries added by me and creates the port_binding entry for the logical port
with the tunnel_key=5
in the same transaction.

I think ovn-northd syncs the south db based on the contents of the north db.

There's no harm in having your patches. But I'm not really sure if it
resolves the issue we have observed.

Just to brief everyone about the issue we are seeing, we see below logs in
ovn-northd.

*******
2020-04-16T23:02:33Z|00127|ovsdb_idl|WARN|transaction error:
{"details":"Transaction causes multiple rows in \"Port_Binding\" table to
have identical values (23eb9016-45f9-4158-be35-77b2713b9a0f and 7) for
index on columns \"datapath\" and \"tunnel_key\".  First row, with UUID
e4f11a7b-09b6-454f-a125-34cc4b144ef6, had the following index values before
the transaction: bdbb436e-f98c-4651-9b80-6e8b95044560 and 7.  Second row,
with UUID d37cc3f1-8633-440f-b145-8222a0d4723c, existed in the database
before this transaction and was not modified by the
transaction.","error":"constraint violation"}
******

And because of this constraint violation error, ovn-northd cannot further
write to the sb db until it is restarted.

In my opinion this can only happen if ovn-northd doesn't see the port
binding row (which is actually present in the DB) in its IDL in-memory db.
I suspect this could have happened when ovn-northd reconnects to the same
master or connects to the new master and it doesn't get the proper
updates.

Maybe in this case, the IDL should request the db contents with txn id =0,
so that it receives the complete dump of the db.

Is it possible that ovn-northd sees a port binding with a tunnel key 'x'
and still allocates the same tunnel id 'x' to a new logical port ?
If so, then definitely your patches makes sense.

@Han - Have you seen this issue in your deployments ? Do you have comments
here ?

Thanks
Numan





>
>  northd/ovn-northd.c |   57
> ++++++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 45 insertions(+), 12 deletions(-)
>
>
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>


More information about the dev mailing list