[ovs-discuss] Raft issues while removing a node

Fri Nov 9 00:17:03 UTC 2018

Hi,

I am facing trouble in graceful removal of node in a 3 Node RAFT setup.

Node1 :

$ovn-ctl --db-nb-addr=10.8.49.184 --db-nb-port=6641
--db-nb-create-insecure-remote=yes --db-nb-cluster-local-proto=tcp
--db-nb-cluster-local-port=6643 --db-nb-cluster-local-addr=10.8.49.184
start_nb_ovsdb

$ovn-ctl --db-sb-addr=10.8.49.184 --db-sb-port=6642
--db-sb-create-insecure-remote=yes --db-sb-cluster-local-proto=tcp
--db-sb-cluster-local-port=6644 --db-sb-cluster-local-addr=10.8.49.184
start_sb_ovsdb

$ovn-ctl start_northd --ovn-manage-ovsdb=no --ovn-northd-nb-db="tcp:
10.8.49.184:6641,tcp:10.8.49.181:6641,tcp:10.8.49.173:6641"
--ovn-northd-sb-db="tcp:10.8.49.184:6642,tcp:10.8.49.181:6642 tcp:
10.8.49.173:6642"

Node 2:

$ovn-ctl --db-nb-addr=10.8.49.181 --db-nb-port=6641
--db-nb-create-insecure-remote=yes --db-nb-cluster-local-proto=tcp
--db-nb-cluster-local-port=6643 --db-nb-cluster-local-addr=10.8.49.181
--db-nb-cluster-remote-addr=10.8.49.184 start_nb_ovsdb

$ovn-ctl --db-sb-addr=10.8.49.181 --db-sb-port=6642
--db-sb-create-insecure-remote=yes --db-sb-cluster-local-proto=tcp
--db-sb-cluster-local-port=6644 --db-sb-cluster-local-addr=10.8.49.181
--db-sb-cluster-remote-addr=10.8.49.184 start_sb_ovsdb

Node 3:

$ovn-ctl --db-nb-addr=10.8.49.173 --db-nb-port=6641
--db-nb-create-insecure-remote=yes --db-nb-cluster-local-proto=tcp
--db-nb-cluster-local-port=6643 --db-nb-cluster-local-addr=10.8.49.173
--db-nb-cluster-remote-addr=10.8.49.184 start_nb_ovsdb

$ovn-ctl --db-sb-addr=10.8.49.173 --db-sb-port=6642
--db-sb-create-insecure-remote=yes --db-sb-cluster-local-proto=tcp
--db-sb-cluster-local-port=6644 --db-sb-cluster-local-addr=10.8.49.173
--db-sb-cluster-remote-addr=10.8.49.184 start_sb_ovsdb

Now I forcefully stopped northd on Node 1 (Either by doing ovn-central stop
service or forcefully closing the vm in which node1 is running).

 Now Node 3 was elected as leader from remaining nodes(verified by running

$ovs-appctl -t /var/run/openvswitch/ovnnb_db.ctl cluster/status
OVN_Northbound )

$ovs-appctl -t /var/run/openvswitch/ovnsb_db.ctl cluster/status
OVN_Southbound )

Now i tried to remove node using unixctl on the non-leader node.

$ovs-appctl -t /var/run/openvswitch/ovnsb_db.ctl cluster/kick
OVN_Southbound <Node1>

Ideally it should remove node1 but this removed node 2 itself from the
cluster and making the cluster unavailable.

Could you please let us know, how to do a graceful node removal from
cluster if the node becomes unavailable?

Thanks,
Ramteja
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20181108/39382489/attachment.html>