[ovs-discuss] Issue with failover running ovsdb-server in A/P mode with Pacemaker

Daniel Alvarez Sanchez dalvarez at redhat.com
Mon Jul 8 10:15:39 UTC 2019


Hi folks,

While working with an OpenStack environment running OVN and
ovsdb-server in A/P configuration with Pacemaker we hit an issue that
has been probably around for a long time. The bug itself seems to be
related with ovsdb-server not updating the read-only flag properly.

With a 3 nodes cluster running ovsdb-server in active/passive mode,
when we restart the master-node, pacemaker promotes another node as
master and moves the associated IPAddr2 resource to it.
At this point, ovn-controller instances across the cloud reconnect to
the new node but there's a window where ovsdb-server is still running
as backup.

For those ovn-controller instances that reconnect within that window,
every attempt to write in the OVSDB will fail with "operation not
allowed when database server is in read only mode". This state will
remain forever unless a reconnection is forced. Restarting
ovn-controller or killing the connection (for example with tcpkill)
will make things work again.

A workaround in OVN OCF script could be to wait for the
ovsdb_server_promote function to wait until we get 'running/active' on
that instance.

Another open question is what should clients (in this case,
ovn-controller) do in such situation? Shall they log an error and
attempt a reconnection (rate limited)?

Thoughts?

Thanks a lot,
Daniel


More information about the discuss mailing list