[ovs-discuss] Issue with failover running ovsdb-server in A/P mode with Pacemaker

Numan Siddique nusiddiq at redhat.com
Mon Jul 8 10:45:28 UTC 2019


On Mon, Jul 8, 2019 at 3:52 PM Daniel Alvarez Sanchez <dalvarez at redhat.com>
wrote:

> Hi folks,
>
> While working with an OpenStack environment running OVN and
> ovsdb-server in A/P configuration with Pacemaker we hit an issue that
> has been probably around for a long time. The bug itself seems to be
> related with ovsdb-server not updating the read-only flag properly.
>
> With a 3 nodes cluster running ovsdb-server in active/passive mode,
> when we restart the master-node, pacemaker promotes another node as
> master and moves the associated IPAddr2 resource to it.
> At this point, ovn-controller instances across the cloud reconnect to
> the new node but there's a window where ovsdb-server is still running
> as backup.
>
> For those ovn-controller instances that reconnect within that window,
> every attempt to write in the OVSDB will fail with "operation not
> allowed when database server is in read only mode". This state will
> remain forever unless a reconnection is forced. Restarting
> ovn-controller or killing the connection (for example with tcpkill)
> will make things work again.
>
> A workaround in OVN OCF script could be to wait for the
> ovsdb_server_promote function to wait until we get 'running/active' on
> that instance.
>
> Another open question is what should clients (in this case,
> ovn-controller) do in such situation? Shall they log an error and
> attempt a reconnection (rate limited)?
>

Thanks for reporting this issue Daniel.

I can easily  reproduce the issue with the below commands.

$ <start the sandbox with --ovn
$export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach)
$ovn-nbctl ls-add sw0
$ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status
state: active
$ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/set-active-ovsdb-server tcp:
192.0.2.2:6641
$ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/connect-active-ovsdb-server
$ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status
state: backup
connecting: tcp:192.0.2.2:6641
$ovn-nbctl ls-add sw1  --> This should have failed. Since OVN_NB_DAEMON is
set, ovn-nbctl talks to the
                                           ovn-nbctl daemon and it is able
to create a logical switch even though the db is in backup mode
$unset OVN_NB_DAEMON
$ovn-nbctl ls-add sw2
ovn-nbctl: transaction error: {"details":"insert operation not allowed when
database server is in read only mode","error":"not allowed"}


I looked into the ovsdb-server code, when the user changes the state of the
ovsdb-server, the read_only param of  active ovsdb_server_sessions
are not updated.

Thanks
Numan


> Thoughts?
>
> Thanks a lot,
> Daniel
> _______________________________________________
> discuss mailing list
> discuss at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20190708/1fdf1643/attachment-0001.html>


More information about the discuss mailing list