[ovs-discuss] Issue with failover running ovsdb-server in A/P mode with Pacemaker

Daniel Alvarez Sanchez dalvarez at redhat.com
Mon Jul 8 15:45:03 UTC 2019


On Mon, Jul 8, 2019 at 5:43 PM Ben Pfaff <blp at ovn.org> wrote:
>
> Would you mind formally submitting this?  It seems like the best
> immediate solution.

Will do, thanks a lot Ben!
>
> On Mon, Jul 08, 2019 at 02:27:31PM +0200, Daniel Alvarez Sanchez wrote:
> > I tried a simple patch and it fixes the issue (see below). The
> > question now is, do we want to do this? I think it makes sense to drop
> > *all* the connections when the role changes but I'm curious to see
> > what other people think:
> >
> > diff --git a/ovsdb/jsonrpc-server.c b/ovsdb/jsonrpc-server.c
> > index 4dda63a..ddbbc2e 100644
> > --- a/ovsdb/jsonrpc-server.c
> > +++ b/ovsdb/jsonrpc-server.c
> > @@ -365,7 +365,7 @@ ovsdb_jsonrpc_server_set_read_only(struct
> > ovsdb_jsonrpc_server *svr,
> >  {
> >      if (svr->read_only != read_only) {
> >          svr->read_only = read_only;
> > -        ovsdb_jsonrpc_server_reconnect(svr, false,
> > +        ovsdb_jsonrpc_server_reconnect(svr, true,
> >                                         xstrdup(read_only
> >                                                 ? "making server read-only"
> >                                                 : "making server read/write"));
> >
> >
> > $export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach)
> > $ovn-nbctl ls-add sw0
> > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status
> > state: active
> > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/set-active-ovsdb-server
> > tcp:192.0.2.2:6641
> > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/connect-active-ovsdb-server
> > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status
> > state: backup
> > connecting: tcp:192.0.2.2:6641
> > $ ovn-nbctl ls-add sw1
> > ovn-nbctl: transaction error: {"details":"insert operation not allowed
> > when database server is in read only mode","error":"not allowed"}
> >
> > On Mon, Jul 8, 2019 at 1:25 PM Daniel Alvarez Sanchez
> > <dalvarez at redhat.com> wrote:
> > >
> > > I *think* that it may not a bug in ovsdb-server but a problem with
> > > ovn-controller as it doesn't seem to be a DB change aware client.
> > >
> > > When the role changes from master to backup or viceversa, connections
> > > are expected to be reestablished for all clients except those that are
> > > not aware of db changes [0] (note the 'false' argument). This flag is
> > > explained here [1] and looks like since ovn-controller is not
> > > monitoring the Database table in the _Server database, then the
> > > connection with it is not re-established. This is just a blind guess
> > > but  I can give it a shot :)
> > >
> > > [0] https://github.com/openvswitch/ovs/blob/403a6a0cb003f1d48b0a3cbf11a2806c45e9d076/ovsdb/jsonrpc-server.c#L368
> > > [1] https://github.com/openvswitch/ovs/blob/403a6a0cb003f1d48b0a3cbf11a2806c45e9d076/ovsdb/jsonrpc-server.c#L450-L456
> > >
> > > On Mon, Jul 8, 2019 at 12:45 PM Numan Siddique <nusiddiq at redhat.com> wrote:
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Jul 8, 2019 at 3:52 PM Daniel Alvarez Sanchez <dalvarez at redhat.com> wrote:
> > > >>
> > > >> Hi folks,
> > > >>
> > > >> While working with an OpenStack environment running OVN and
> > > >> ovsdb-server in A/P configuration with Pacemaker we hit an issue that
> > > >> has been probably around for a long time. The bug itself seems to be
> > > >> related with ovsdb-server not updating the read-only flag properly.
> > > >>
> > > >> With a 3 nodes cluster running ovsdb-server in active/passive mode,
> > > >> when we restart the master-node, pacemaker promotes another node as
> > > >> master and moves the associated IPAddr2 resource to it.
> > > >> At this point, ovn-controller instances across the cloud reconnect to
> > > >> the new node but there's a window where ovsdb-server is still running
> > > >> as backup.
> > > >>
> > > >> For those ovn-controller instances that reconnect within that window,
> > > >> every attempt to write in the OVSDB will fail with "operation not
> > > >> allowed when database server is in read only mode". This state will
> > > >> remain forever unless a reconnection is forced. Restarting
> > > >> ovn-controller or killing the connection (for example with tcpkill)
> > > >> will make things work again.
> > > >>
> > > >> A workaround in OVN OCF script could be to wait for the
> > > >> ovsdb_server_promote function to wait until we get 'running/active' on
> > > >> that instance.
> > > >>
> > > >> Another open question is what should clients (in this case,
> > > >> ovn-controller) do in such situation? Shall they log an error and
> > > >> attempt a reconnection (rate limited)?
> > > >
> > > >
> > > > Thanks for reporting this issue Daniel.
> > > >
> > > > I can easily  reproduce the issue with the below commands.
> > > >
> > > > $ <start the sandbox with --ovn
> > > > $export OVN_NB_DAEMON=$(ovn-nbctl --pidfile --detach)
> > > > $ovn-nbctl ls-add sw0
> > > > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status
> > > > state: active
> > > > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/set-active-ovsdb-server tcp:192.0.2.2:6641
> > > > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/connect-active-ovsdb-server
> > > > $ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/sync-status
> > > > state: backup
> > > > connecting: tcp:192.0.2.2:6641
> > > > $ovn-nbctl ls-add sw1  --> This should have failed. Since OVN_NB_DAEMON is set, ovn-nbctl talks to the
> > > >                                            ovn-nbctl daemon and it is able to create a logical switch even though the db is in backup mode
> > > > $unset OVN_NB_DAEMON
> > > > $ovn-nbctl ls-add sw2
> > > > ovn-nbctl: transaction error: {"details":"insert operation not allowed when database server is in read only mode","error":"not allowed"}
> > > >
> > > >
> > > > I looked into the ovsdb-server code, when the user changes the state of the ovsdb-server, the read_only param of  active ovsdb_server_sessions
> > > > are not updated.
> > > >
> > > > Thanks
> > > > Numan
> > > >
> > > >>
> > > >> Thoughts?
> > > >>
> > > >> Thanks a lot,
> > > >> Daniel
> > > >> _______________________________________________
> > > >> discuss mailing list
> > > >> discuss at openvswitch.org
> > > >> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> > _______________________________________________
> > discuss mailing list
> > discuss at openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


More information about the discuss mailing list