[ovs-discuss] Possible data loss of OVSDB active-backup mode

Han Zhou zhouhan at gmail.com
Thu Aug 9 16:32:21 UTC 2018


On Thu, Aug 9, 2018 at 1:57 AM, aginwala <aginwala at asu.edu> wrote:
>
>
> To add on , we are using LB VIP IP and no constraint with 3 nodes as Han
mentioned earlier where active node  have syncs from invalid IP and rest
two nodes sync from LB VIP IP. Also, I was able to get some logs from one
node  that triggered:
https://github.com/openvswitch/ovs/blob/master/ovsdb/ovsdb-server.c#L460
>
> 2018-08-04T01:43:39.914Z|03230|reconnect|DBG|tcp:10.189.208.16:50686:
entering RECONNECT
> 2018-08-04T01:43:39.914Z|03231|ovsdb_jsonrpc_server|INFO|tcp:
10.189.208.16:50686: disconnecting (removing OVN_Northbound database due to
server termination)
> 2018-08-04T01:43:39.932Z|03232|ovsdb_jsonrpc_server|INFO|tcp:
10.189.208.21:56160: disconnecting (removing _Server database due to server
termination)
> 20
>
> I am not sure if sync_from on active node too via some invalid ip is
causing some flaw when all are down during the race condition in this
corner case.
>
>
>
>
>
> On Thu, Aug 9, 2018 at 1:35 AM Numan Siddique <nusiddiq at redhat.com> wrote:
>>
>>
>>
>> On Thu, Aug 9, 2018 at 1:07 AM Ben Pfaff <blp at ovn.org> wrote:
>>>
>>> On Wed, Aug 08, 2018 at 12:18:10PM -0700, Han Zhou wrote:
>>> > On Wed, Aug 8, 2018 at 11:24 AM, Ben Pfaff <blp at ovn.org> wrote:
>>> > >
>>> > > On Wed, Aug 08, 2018 at 12:37:04AM -0700, Han Zhou wrote:
>>> > > > Hi,
>>> > > >
>>> > > > We found an issue in our testing (thanks aginwala) with
active-backup
>>> > mode
>>> > > > in OVN setup.
>>> > > > In the 3 node setup with pacemaker, after stopping pacemaker on
all
>>> > three
>>> > > > nodes (simulate a complete shutdown), and then if starting all of
them
>>> > > > simultaneously, there is a good chance that the whole DB content
gets
>>> > lost.
>>> > > >
>>> > > > After studying the replication code, it seems there is a phase
that the
>>> > > > backup node deletes all its data and wait for data to be synced
from the
>>> > > > active node:
>>> > > >
https://github.com/openvswitch/ovs/blob/master/ovsdb/replication.c#L306
>>> > > >
>>> > > > At this state, if the node was set to active, then all data is
gone for
>>> > the
>>> > > > whole cluster. This can happen in different situations. In the
test
>>> > > > scenario mentioned above it is very likely to happen, since
pacemaker
>>> > just
>>> > > > randomly select one as master, not knowing the internal sync
state of
>>> > each
>>> > > > node. It could also happen when failover happens right after a new
>>> > backup
>>> > > > is started, although less likely in real environment, so starting
up
>>> > node
>>> > > > one by one may largely reduce the probability.
>>> > > >
>>> > > > Does this analysis make sense? We will do more tests to verify the
>>> > > > conclusion, but would like to share with community for
discussions and
>>> > > > suggestions. Once this happens it is very critical - even more
serious
>>> > than
>>> > > > just no HA. Without HA it is just control plane outage, but this
would
>>> > be
>>> > > > data plane outage because OVS flows will be removed accordingly
since
>>> > the
>>> > > > data is considered as deleted from ovn-controller point of view.
>>> > > >
>>> > > > We understand that active-standby is not the ideal HA mechanism
and
>>> > > > clustering is the future, and we are also testing the clustering
with
>>> > the
>>> > > > latest patch. But it would be good if this problem can be
addressed with
>>> > > > some quick fix, such as keep a copy of the old data somewhere
until the
>>> > > > first sync finishes?
>>> > >
>>> > > This does seem like a plausible bug, and at first glance I believe
that
>>> > > you're correct about the race here.  I guess that the correct
behavior
>>> > > must be to keep the original data until a new copy of the data has
been
>>> > > received, and only then atomically replace the original by the new.
>>> > >
>>> > > Is this something you have time and ability to fix?
>>> >
>>> > Thanks Ben for quick response. I guess I will not have time until I
send
>>> > out next series for incremental processing :)
>>> > It would be good if someone can help and then please reply this email
if
>>> > he/she starts working on it so that we will not end up with
overlapping
>>> > work.
>>
>>
>> I will give a shot at fixing this issue.
>>
>> In the case of tripleo we haven't hit this issue. I haven't tested this
scenario.
>> I will test it out. One difference when compared to your setup is
tripleo uses
>> IPAddr2 resource and a collocation constraint set.
>>
>> Thanks
>> Numan
>>

Thanks Numan for helping on this. I think IPAddr2 should have same problem,
if my previous analysis was right, unless using IPAddr2 would result in
pacemaker always electing the node that is configured with the master IP as
the master when starting pacemaker on all nodes again.

Ali, thanks for the information. Just to clarify that the log "removing xxx
database due to server termination" is not related to this issue. It might
be misleading but it doesn't mean deleting content of database. It is just
doing clean-up of internal data structure before exiting. The code that
deletes the DB data is here:
https://github.com/openvswitch/ovs/blob/master/ovsdb/replication.c#L306,
and there is no log printing for this. You may add log here to verify when
you reproduce the issue.

Thanks,
Han
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20180809/2f70dcd0/attachment.html>


More information about the discuss mailing list