[ovs-discuss] Possible data loss of OVSDB active-backup mode

Han Zhou zhouhan at gmail.com
Wed Aug 8 07:37:04 UTC 2018


Hi,

We found an issue in our testing (thanks aginwala) with active-backup mode
in OVN setup.
In the 3 node setup with pacemaker, after stopping pacemaker on all three
nodes (simulate a complete shutdown), and then if starting all of them
simultaneously, there is a good chance that the whole DB content gets lost.

After studying the replication code, it seems there is a phase that the
backup node deletes all its data and wait for data to be synced from the
active node:
https://github.com/openvswitch/ovs/blob/master/ovsdb/replication.c#L306

At this state, if the node was set to active, then all data is gone for the
whole cluster. This can happen in different situations. In the test
scenario mentioned above it is very likely to happen, since pacemaker just
randomly select one as master, not knowing the internal sync state of each
node. It could also happen when failover happens right after a new backup
is started, although less likely in real environment, so starting up node
one by one may largely reduce the probability.

Does this analysis make sense? We will do more tests to verify the
conclusion, but would like to share with community for discussions and
suggestions. Once this happens it is very critical - even more serious than
just no HA. Without HA it is just control plane outage, but this would be
data plane outage because OVS flows will be removed accordingly since the
data is considered as deleted from ovn-controller point of view.

We understand that active-standby is not the ideal HA mechanism and
clustering is the future, and we are also testing the clustering with the
latest patch. But it would be good if this problem can be addressed with
some quick fix, such as keep a copy of the old data somewhere until the
first sync finishes?

Thanks,
Han
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20180808/6a43b641/attachment.html>


More information about the discuss mailing list