[ovs-discuss] How to restart raft cluster after a complete shutdown?

Tue Aug 25 18:06:53 UTC 2020

On Tue, Aug 25, 2020 at 7:08 AM Matthew Booth <mbooth at redhat.com> wrote:
>
> I'm deploying ovsdb-server (and only ovsdb-server) in K8S as a
StatefulSet:
>
>
https://github.com/openstack-k8s-operators/dev-tools/blob/master/ansible/files/ocp/ovn/ovsdb.yaml
>
> I'm going to replace this with an operator in due course, which may
> make the following simpler. I'm not necessarily constrained to only
> things which are easy to do in a StatefulSet.
>
> I've noticed an issue when I kill all 3 pods simultaneously: it is no
> longer possible to start the cluster. The issue is presumably one of
> quorum: when a node comes up it can't contact any other node to make
> quorum, and therefore can't come up. All nodes are similarly affected,
> so the cluster stays down. Ignoring kubernetes, how is this situation
> intended to be handled? Do I have to it to a single-node deployment,
> convert that to a new cluster and re-bootstrap it? This wouldn't be
> ideal. Is there any way, for example, I can bring up the first node
> while asserting to that node that the other 2 are definitely down?
>
In general you should be able to restart the whole cluster without
re-bootstraping it. The cluster should get back to work as long as 2 of the
3 nodes are back online.
In your case, I am not sure if you are using k8s pods' IPs as server
addresses. If so, probably the k8s pods' IP changed after you restart,
which causes the servers stored in the raft log can never be connected
again? Is that the problem?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20200825/b9e52025/attachment-0001.html>