[ovs-dev] [PATCH v1] Monitor Database table to manage lifecycle of IDL client.

Thu Dec 27 17:22:56 UTC 2018

On Tue, Dec 18, 2018 at 12:18:06AM +0000, Ted Elhourani wrote:
> The Python IDL implementation supports ovsdb cluster connections.
> This patch is a follow up to commit 31e434fc98, it adds the option of
> connecting to the leader in the Raft-based cluster. It mimics the
> exisiting C IDL support for clusters introduced in commit 1b1d2e6daa.
> 
> The server schema is first requested, then a monitor of the Database table
> in the _Server Database. __check_server_db is called to verify the
> eligibility of the server. Unlike the C IDL, if an attempt to obtain a
> monitor of the Server database fails this implementation does not proceed
> to monitor and transact with the database. The reason is connections to
> standalone databases should not be permitted when the intention is to
> connect to a valid cluster node. The change_seqno is not incremented in the
> case of Database table updates.
> 
> The run method of Session (jsonrpc) is modified such that the session is
> closed when a stream connection fails, if more than one remote exists.
> Otherwise the session will re-attempt to establish a stream to the same
> cluster node. Method pick_remote must be called before opening a stream
> in __connect (in Session), otherwise a force_reconnect causes the Session
> to re-attempt a connection to the same server. pick_remote is no longer
> necessary in method run of Session.
> 
> Signed-off-by: Ted Elhourani <ted.elhourani at nutanix.com>

I'm pretty sure that this breaks backward compatibility with any
ovsdb-server not new enough to support the _Server table.  The commit
message above conflates that with whether the database is clustered.  I
think your intention is to reject non-clustered databases, but it
doesn't do that; instead, it rejects database servers too old to have a
_Server table and accepts standalone databases served up by new-enough
servers.  It's one thing to avoid connecting to standalone databases
when a cluster is expected, but this goes beyond that to drop support
for all older database servers.  I don't think that's reasonable behavior.

I noticed that the following looks wrong.  EAGAIN indicates that the
connection is not complete yet.  It should not be treated as a
connection failure (unless a timeout has occurred) even if there is more
than one remote:

> @@ -516,9 +516,9 @@ class Session(object):
>                  self.reconnect.connected(ovs.timeval.msec())
>                  self.rpc = Connection(self.stream)
>                  self.stream = None
> -            elif error != errno.EAGAIN:
> +            elif (error != errno.EAGAIN or
> +                  self.get_num_of_remotes() > 1):
>                  self.reconnect.connect_failed(ovs.timeval.msec(), error)
> -                self.pick_remote()
>                  self.stream.close()
>                  self.stream = None

Thanks,

Ben.