[ovs-dev] [ovs-discuss] [OVN] DB backup and restore

Tony Liu tonyliu0592 at hotmail.com
Thu Jul 30 22:41:33 UTC 2020


Hi,

A quick question here. Given this man page.
http://www.openvswitch.org/support/dist-docs/ovsdb-client.1.txt

It says backup and restore commands are for OVSDB standalone and

active-backup databases.



Can they be used for RAFT cluster? If not, what would be the concern,

like inconsistency?



If I restore to a follower, is the request going to be forwarded to the

leader to restore DB for the whole cluster? But I believe it's recommended

to restore to the leader directly for performance sake.



I am going to give it a try anyways, see how it works. Will make sure

there is no configuration update from OpenStack side while running such

snapshot and restore process.





Thanks!



Tony

From: Han Zhou<mailto:hzhou at ovn.org>
Sent: Thursday, July 30, 2020 12:23 PM
To: Tony Liu<mailto:tonyliu0592 at hotmail.com>
Cc: Han Zhou<mailto:hzhou at ovn.org>; ovs-discuss<mailto:ovs-discuss at openvswitch.org>; ovs-dev<mailto:ovs-dev at openvswitch.org>
Subject: Re: [ovs-discuss] [OVN] DB backup and restore



On Thu, Jul 30, 2020 at 10:56 AM Tony Liu <tonyliu0592 at hotmail.com<mailto:tonyliu0592 at hotmail.com>> wrote:
Hi Han,

That doc helps. I will run some tests and update here. The use case I want
to cover is snapshot/rollback and backup/restore.

========
Actually, "at-least-once" consistency, because OVSDB does not have a session
mechanism to drop duplicate transactions if a connection drops after the server
commits it but before the client receives the result.
========
I saw duplicated datapath bindings for the same logical switch once, if you
recall. This may explain that. The ovn-northd connection to sb-db is dropped
before receiving the result. So ovn-northd initiates another transaction to
create datapath binding for the same logical switch.

Yes, this is a possibility.
However, in reality, this is usually not a problem:

1) If DB schema has table keys properly defined, the redundant transaction from clients would be rejected by DB server because of key constraint check. In the datapath binding case, this doesn't work because of the poor definition of the datapath_binding table. It should have had "logical_switch_router" column defined and set as a key (in addition to the "tunnel_key") instead of storing it in external_ids. The duplicated entries would have been avoided. The other tables such as port_binding would never have such problem.

2) OVSDB clients usually monitors and syncs all (interested) data from server to local, so when they do declarative processing, they could correct problems by themselves. In fact, ovn-northd does the check and deletes duplicated datapaths. I did a simple test and it did cleanup by itself:
2020-07-30T18:55:53.057Z|00006|ovn_northd|INFO|ovn-northd lock acquired. This ovn-northd instance is now active.
2020-07-30T19:02:10.465Z|00007|ovn_northd|INFO|deleting Datapath_Binding abef9503-445e-4a52-ae88-4c826cbad9d6 with duplicate external-ids:logical-switch/router ee80c38b-2016-4cbc-9437-f73e3a59369e

I am not sure why in your case north was stuck, but I agree there must be something wrong. Please collect northd logs if you encounter this again so we can dig further.

I see two ways to improve it.
1) On client side, if the connection is broken while waiting for the result
   of a transaction, the client checks the transaction state, committed or not,
   when it reconnects to the leader (maybe a different node).
   Do we have such check today?

Clients does check. In this case when transaction was actually successful but appears to be failed from client point of view, the check doesn't help.

2) I see client connection is dropped by the leader when it's busy. I don't
   think this is a good way to control the traffic. The server can cache and
   hold the request when it's busy, or even push back. Dropping connection
   is not a good option. Any thoughts here?

The server doesn't make this kind of decisions. It could be simply overloaded and disconnected from the cluster, or even worse, a node could crash after commiting the transaction.

Thanks,
Han


Thanks!

Tony

From: Han Zhou<mailto:hzhou at ovn.org>
Sent: Wednesday, July 29, 2020 11:38 PM
To: Tony Liu<mailto:tonyliu0592 at hotmail.com>
Cc: ovs-discuss<mailto:ovs-discuss at openvswitch.org>; ovs-dev<mailto:ovs-dev at openvswitch.org>
Subject: Re: [ovs-discuss] [OVN] DB backup and restore



On Wed, Jul 29, 2020 at 10:58 PM Tony Liu <tonyliu0592 at hotmail.com<mailto:tonyliu0592 at hotmail.com>> wrote:
>
> Hi,
>
>
>
> There is any guidance to backup and restore OVN nb-db and sb-db?
>
>
>
> Is /var/lib/openvswitch/ovn-[ns]b/ovn[ns]b.db the only database file?
>
>
>
> For 3-node DB cluster, is replication 3 (the data is replicated onto
>
> All 3 nodes)?
>
>
>
> Are DB files on 3 nodes identical?
>
>
>
> If I stop a DB follower and empty the DB file on the follower node,
>
> when I start it back, is the whole DB going to be replicated to it?
>
>
>
> To backup the DB, is it OK to copy the DB file from any node, assuming
>
> no transaction ongoing?
>
>
>
> Is the following going to work to restore the DB?
>
> * Stop all 3 DBs.
>
> * Copy backup DB file to one node, empty DB file on the rest two nodes.
>
> * Bootstrap the node with DB file.
>
> * Start the rest two nodes to join the cluster.
>

For ovsdb operations, please refer to "man 7 ovsdb", or here: https://github.com/openvswitch/ovs/blob/master/Documentation/ref/ovsdb.7.rst

>
>
> Do I need to restore sb-db as well? Or restore nb-db only and let
>
> ovn-northd to sync data from nb-db to sb-db. Chassis data should be
>
> updated by onv-controller?
>

You don't have to restore sb-db. ovn-northd and ovn-controllers will sync the data in SB DB.
However, it may take quite some time to sync if the scale is large.
Also, remember that the mac_binding table in SB will not be restored by ovn-controller because it is populated as a result of ARP packets handling by ovn-controller. The entries will be generated again only if new ARP packets are observed by ovn-controller.

>
>
> I am running scaling test. It takes quite a lot of time to build
>
> Configurations. Wondering if I can back and restore DB to rollback
>
> to some checkpoint to avoid restart all over.
>
>
>
>
>
> Thanks!
>
>
>
> Tony
>
>
>
> _______________________________________________
> discuss mailing list
> discuss at openvswitch.org<mailto:discuss at openvswitch.org>
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss




More information about the dev mailing list