[ovs-discuss] Issue with failover running ovsdb-server in A/P mode with Pacemaker

Daniel Alvarez Sanchez dalvarez at redhat.com
Mon Jul 8 11:25:09 UTC 2019


I *think* that it may not a bug in ovsdb-server but a problem with
ovn-controller as it doesn't seem to be a DB change aware client.

When the role changes from master to backup or viceversa, connections
are expected to be reestablished for all clients except those that are
not aware of db changes [0] (note the 'false' argument). This flag is
explained here [1] and looks like since ovn-controller is not
monitoring the Database table in the _Server database, then the
connection with it is not re-established. This is just a blind guess
but  I can give it a shot :)

[0] https://github.com/openvswitch/ovs/blob/403a6a0cb003f1d48b0a3cbf11a2806c45e9d076/ovsdb/jsonrpc-server.c#L368
[1] https://github.com/openvswitch/ovs/blob/403a6a0cb003f1d48b0a3cbf11a2806c45e9d076/ovsdb/jsonrpc-server.c#L450-L456

On Mon, Jul 8, 2019 at 12:45 PM Numan Siddique <nusiddiq at redhat.com> wrote:
>
>
>
>
> On Mon, Jul 8, 2019 at 3:52 PM Daniel Alvarez Sanchez <dalvarez at redhat.com> wrote:
>>
>> Hi folks,
>>
>> While working with an OpenStack environment running OVN and
>> ovsdb-server in A/P configuration with Pacemaker we hit an issue that
>> has been probably around for a long time. The bug itself seems to be
>> related with ovsdb-server not updating the read-only flag properly.
>>
>> With a 3 nodes cluster running ovsdb-server in active/passive mode,
>> when we restart the master-node, pacemaker promotes another node as
>> master and moves the associated IPAddr2 resource to it.
>> At this point, ovn-controller instances across the cloud reconnect to
>> the new node but there's a window where ovsdb-server is still running
>> as backup.
>>
>> For those ovn-controller instances that reconnect within that window,
>> every attempt to write in the OVSDB will fail with "operation not
>> allowed when database server is in read only mode". This state will
>> remain forever unless a reconnection is forced. Restarting
>> ovn-controller or killing the connection (for example with tcpkill)
>> will make things work again.
>>
>> A workaround in OVN OCF script could be to wait for the
>> ovsdb_server_promote function to wait until we get 'running/active' on
>> that instance.
>>
>> Another open question is what should clients (in this case,
>> ovn-controller) do in such situation? Shall they log an error and
>> attempt a reconnection (rate limited)?
>
>
> Thanks for reporting this issue Daniel.
>
> I can easily  reproduce the issue with the below commands.
>
> $ <start the sandbox with --ovn
> $export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach)
> $ovn-nbctl ls-add sw0
> $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status
> state: active
> $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/set-active-ovsdb-server tcp:192.0.2.2:6641
> $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/connect-active-ovsdb-server
> $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status
> state: backup
> connecting: tcp:192.0.2.2:6641
> $ovn-nbctl ls-add sw1  --> This should have failed. Since OVN_NB_DAEMON is set, ovn-nbctl talks to the
>                                            ovn-nbctl daemon and it is able to create a logical switch even though the db is in backup mode
> $unset OVN_NB_DAEMON
> $ovn-nbctl ls-add sw2
> ovn-nbctl: transaction error: {"details":"insert operation not allowed when database server is in read only mode","error":"not allowed"}
>
>
> I looked into the ovsdb-server code, when the user changes the state of the ovsdb-server, the read_only param of  active ovsdb_server_sessions
> are not updated.
>
> Thanks
> Numan
>
>>
>> Thoughts?
>>
>> Thanks a lot,
>> Daniel
>> _______________________________________________
>> discuss mailing list
>> discuss at openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


More information about the discuss mailing list