[ovs-dev] [PATCH 0/7] OVSDB 2-Tier deployment.
Dumitru Ceara
dceara at redhat.com
Mon May 10 12:36:12 UTC 2021
On 5/1/21 2:55 AM, Ilya Maximets wrote:
> Replication can be used to scale out read-only access to the database.
> But there are clients that are not read-only, but read-mostly.
> One of the main examples is ovn-controller that mostly monitors
> updates from the Southbound DB, but needs to claim ports by sending
> transactions that changes some database tables.
>
> Southbound database serves lots of connections: all connections
> from ovn-controllers and some service connections from cloud
> infrastructure, e.g. some OpenStack agents are monitoring updates.
> At a high scale and with a big size of the database ovsdb-server
> spends too much time processing monitor updates and it's required
> to move this load somewhere else. This patch-set aims to introduce
> required functionality to scale out read-mostly connections by
> replication.
>
> Replication mode natively supports replication of standalone and
> clustered databases, so it will work for any type of OVN deployment.
>
> There are 3 missing parts for existing replication mode:
>
> 1. Ability to handle transactions that aims to modify the data.
> Obviously, replica is not allowed to execute this kind of
> transactions. Solution is to implement transaction forwarding,
> i.e. allow replication server to act as a proxy by forwarding
> transactions to the primary server and forwarding replies back
> to the client. All read-only transactions and monitors are
> still fully served by the replica itself.
>
> 2. In case where replica replicates a member of a raft cluster,
> client needs to know the state of this cluster member in order
> to make a decision about re-connection to another server.
> This is solved by replicating a Database table of _Server database
> from the replication source, so clients are able to check the
> clustered database state as usual.
>
> ** Another solution for this problem is to allow the replication
> server itself to have multiple remotes and re-connect as client
> will do. However, this would be a significant behavioral change
> for the current implementation of the active-backup schema where
> backup stays connected no matter what. This will also require
> a huge rewrite of the replication state machine and will likely
> bring lots of code duplication with ovsdb-cs module. We might
> end up re-writing replication code on top of ovsdb-cs (which
> might be a good thing, though) and refactoring ovsdb-cs itself,
> but that would be much more work.
>
> 3. Client will need to know if replica is currently connected
> to the replication source. For example, for the case where one
> of the replicas lost connection with the primary server, client
> should be able to re-connect to another replica.
> This is implemented by reflecting the connection state in the
> 'connected' field of the row in Database table in _Server database.
> Currently for active-backup it's always set to 'true'.
>
> This patch set consists of 4 parts:
>
> Patch #1 - Implementation of a transaction forwarding. Fully
> independent from the rest of the series and it's the only
> mandatory change for a 2-Tire deployment. The rest of the
> set is to propagate status fields and have correct failover
> on a client side.
>
> Patches #2-5 - Solution for the missing part #2: Replication of a
> _Server database and handling on a client side.
>
> Patch #6 - Solution for the problem #3.
>
> Patch #7 - Slightly unrelated fix. Bringing one missing re-connection
> fix from C version to python IDL. Mostly to add more
> tests.
>
> Note: in order to replicate a clustered Sb DB, ephemeral columns from
> the ovn-sb schema should be manually converted to persistent ones before
> creating a database file for the replica, otherwise there will be schema
> mismatch and replication will fail.
>
Hi Ilya,
I had a look at the series and the changes look good to me and I acked
most of the patches. However, I don't feel confident enough on the
ovsdb-server side, so I hope other reviewers will share their opinions
on this feature before it's accepted.
Regards,
Dumitru
More information about the dev
mailing list