[ovs-discuss] OVN Controller - High Availability via Pacemaker

Numan Siddique nusiddiq at redhat.com
Wed May 1 15:37:18 UTC 2019


Hi Stephen,

Your setup seems fine to me. Please see below for some comments.


On Wed, May 1, 2019 at 8:23 PM Stephen Flynn via discuss <
ovs-discuss at openvswitch.org> wrote:

> Greetings OVS Discuss Group:
>
>
>
> First – I want to apologize for the wall of text to follow – I wasn’t sure
> how much information would be wanted and I didn’t want everyone to have to
> ask 100 questions to get what they needed.
>
>
>
> I am working on settings up an OVN Controller (3 Nodes) using Pacemaker /
> Corosync.   For reference --  A single node controller (no pacemaker /
> corosync) operates without issue.
>
> Root Goal:   Failover Redundancy for the OVN Controller – Allows for
> maintenance of controller nodes and/or failure of a controller node.
>

Just a small correction - Pacemker is used to provide active/passive HA for
the OVN DB servers. When you refer OVN Controller its a bit confusing since
we have a service
called - ovn-controller  (provided by ovn-host package) which runs on each
host/hypervisor.



>
> I have been following the very limited documentation on how to setup this
> environment but don’t seem to be having much luck getting it to be 100%
> stable or operational.
>
>
> http://docs.openvswitch.org/en/latest/topics/integration/?highlight=pacemaker
>
>
>
> If I could ask someone to assist in reviewing the below installation and
> provide some insight into what I may be doing wrong, or have wrong (need
> newer version of code, etc) I would be grateful.  I’ve currently only
> attempted this by using “packaged” versions of code to “keep it simple” …
> but I do realize that getting this to work may require a newer code release.
> Along with this, once I am able to get a stable environment, I would like
> to contribute updated documentation to the community on how to perform a
> full setup.
>
>
>
>
>
> *-- Environment –*
>
>
>
> # cat /etc/lsb-release | grep DESCRIPTION
>
> DISTRIB_DESCRIPTION="Ubuntu 18.04.2 LTS"
>
>
>
> # ovn-northd --version
>
> ovn-northd (Open vSwitch) 2.9.2
>
>
>
> # ovn-nbctl --version
>
> ovn-nbctl (Open vSwitch) 2.9.2
>
> DB Schema 5.10.0
>
>
>
> # ovn-sbctl --version
>
> ovn-sbctl (Open vSwitch) 2.9.2
>
> DB Schema 1.15.0
>
>
>
> *-- Setup Steps –*
>
> # cat /etc/hosts
>
>
>
> # LAB - Compute Nodes
>
> 192.168.100.10  ctrl00
>
> 192.168.100.11  ctrl01
>
> 192.168.100.12  ctrl02
>
> 192.168.100.13  ctrl03
>
> 192.168.100.76  cn01
>
> 192.168.100.77  cn02
>
> 192.168.100.78  cn03
>
> 192.168.100.79  cn04
>
>
>
>
>
> ### All Controllers
>
>
>
> ## System Package Updates
>
> apt clean all; apt update; apt -y dist-upgrade
>
>
>
> ## Time Sync Services (NTP) [ctrl01, ctrl02, ctrl03]
>
> apt install -y ntp
>
>
>
> ## Install OVN Central Controller [ctrl01, ctrl02, ctrl03]
>
> apt install -y openvswitch-common openvswitch-switch python-openvswitch
> python3-openvswitch ovn-common ovn-central
>
>
>
> ## Install pacemaker and corosync [ctrl01, ctrl02, ctrl03]
>
> apt install -y pcs pacemaker pacemaker-cli-utils
>
>
>
> ## Reset the Pacemaker Password [ctrl01, ctrl02, ctrl03]
>
> Password = 6B43WAmuPzM2Ewsr
>
>
>
> enc_passwd=$(python3 -c 'import crypt;
> print(crypt.crypt("6B43WAmuPzM2Ewsr", crypt.mksalt(crypt.METHOD_SHA512)))')
>
> usermod -p "${enc_passwd}" hacluster
>
>
>
> ## Enable and Start the PCS Daemon [ctrl01, ctrl02, ctrl03]
>
> sudo systemctl enable pcsd;
>
> sudo systemctl start pcsd;
>
> sudo systemctl status pcsd;
>
> ```
>
>
>
> ### Cluster Controller #1 (ONLY)
>
>
>
> ## Enable Cluster Services [ctrl01 --ONLY-- ]
>
>
>
> pcs cluster auth ctrl01 ctrl02 ctrl03 -u hacluster -p '6B43WAmuPzM2Ewsr'
> --force
>
> pcs cluster setup --name OVN-CLUSTER ctrl01 ctrl02 ctrl03 --force
>
>
>
> pcs cluster enable --all;
>
> pcs cluster start --all;
>
>
>
> pcs property set stonith-enabled=false
>
> pcs property set no-quorum-policy=ignore
>
>
>
> pcs status
>
>
>
> [[ output ]]
>
> Cluster name: OVN-CLUSTER
>
> Stack: corosync
>
> Current DC: ctrl01 (version 1.1.18-2b07d5c5a9) - partition with quorum
>
> Last updated: Wed May  1 09:42:33 2019
>
> Last change: Wed May  1 09:42:29 2019 by root via cibadmin on ctrl01
>
>
>
> 3 nodes configured
>
> 0 resources configured
>
>
>
> Online: [ ctrl01 ctrl02 ctrl03 ]
>
>
>
> No resources
>
>
>
>
>
> Daemon Status:
>
>   corosync: active/enabled
>
>   pacemaker: active/enabled
>
>   pcsd: active/enabled
>
>
>
>
>
>
>
> ## Add cluster resources [node01 --ONLY-- ]
>
> pcs resource create ovn-virtual-ip ocf:heartbeat:IPaddr2 nic=ens192
> ip=192.168.100.10 cidr_netmask=24 op monitor interval=30s
>
>
>
> pcs resource create ovndb_servers ocf:ovn:ovndb-servers \
>
>     master_ip=192.168.100.10 \
>
>     ovn_ctl=/usr/share/openvswitch/scripts/ovn-ctl \
>
>     op monitor interval="10s" \
>
>     op monitor role=Master interval="15s"
>
>
>
> pcs resource master ovndb_servers-master ovndb_servers \
>
>     meta notify="true"
>
>
>
> pcs constraint order promote ovndb_servers-master then ovn-virtual-ip
>
> pcs constraint colocation add ovn-virtual-ip with master
> ovndb_servers-master score=INFINITY
>
>
>
> pcs status
>
>
>
> [[ output ]]
>
> Cluster name: OVN-CLUSTER
>
> Stack: corosync
>
> Current DC: ctrl01 (version 1.1.18-2b07d5c5a9) - partition with quorum
>
> Last updated: Wed May  1 09:46:02 2019
>
> Last change: Wed May  1 09:44:53 2019 by root via crm_attribute on ctrl02
>
>
>
> 3 nodes configured
>
> 4 resources configured
>
>
>
> Online: [ ctrl01 ctrl02 ctrl03 ]
>
>
>
> Full list of resources:
>
>
>
> ovn-virtual-ip (ocf::heartbeat:IPaddr2):       Started ctrl02
>
> Master/Slave Set: ovndb_servers-master [ovndb_servers]
>
>      Masters: [ ctrl02 ]
>
>      Slaves: [ ctrl01 ctrl03 ]
>
>
>
> Failed Actions:
>
> * ovndb_servers_monitor_10000 on ctrl01 'master' (8): call=18,
> status=complete, exitreason='',
>
>     last-rc-change='Wed May  1 09:43:28 2019', queued=0ms, exec=73ms
>
>
>
>
>
> Daemon Status:
>
>   corosync: active/enabled
>
>   pacemaker: active/enabled
>
>   pcsd: active/enabled
>
>
>
>
>
> /////
>
>
>
> At this point, I execute ‘pcs cluster stop {{ controller }}; pcs cluster
> start {{ controller }}’ for each controller one at a time and eventually
> everything clears up.
>
>
>
> # pcs status
>
> Cluster name: OVN-CLUSTER
>
> Stack: corosync
>
> Current DC: ctrl02 (version 1.1.18-2b07d5c5a9) - partition with quorum
>
> Last updated: Wed May  1 09:47:50 2019
>
> Last change: Wed May  1 09:44:53 2019 by root via crm_attribute on ctrl02
>
>
>
> 3 nodes configured
>
> 4 resources configured
>
>
>
> Online: [ ctrl01 ctrl02 ctrl03 ]
>
>
>
> Full list of resources:
>
>
>
> ovn-virtual-ip (ocf::heartbeat:IPaddr2):       Started ctrl02
>
> Master/Slave Set: ovndb_servers-master [ovndb_servers]
>
>      Masters: [ ctrl02 ]
>
>      Slaves: [ ctrl01 ctrl03 ]
>
>
>
> Daemon Status:
>
>   corosync: active/enabled
>
>   pacemaker: active/enabled
>
>   pcsd: active/enabled
>
>
>
> /////
>
>
>
> lab-kvmctrl-01:~# ovn-sbctl show
>
> Chassis "3fd25b76-3170-4eab-8604-690182500478"
>
>     hostname: "lab-vxlan-cn03"
>
>     Encap geneve
>
>         ip: "192.168.100.78"
>
>         options: {csum="true"}
>
> Chassis "0bd6c91a-10d3-4a1f-86bf-bdeb4bf110a3"
>
>     hostname: "lab-vxlan-cn01"
>
>     Encap geneve
>
>         ip: "192.168.100.76"
>
>         options: {csum="true"}
>
> Chassis "bda632c5-afeb-41bd-80e5-5c423172a771"
>
>     hostname: "lab-vxlan-cn02"
>
>     Encap geneve
>
>         ip: "192.168.100.77"
>
>         options: {csum="true"}
>
> Chassis "5bd199bf-9e3e-41b1-bdb0-b1fab1adff7c"
>
>     hostname: "lab-vxlan-cn04"
>
>     Encap geneve
>
>         ip: "192.168.100.79"
>
>         options: {csum="true"}
>
>
>
> /////
>
>
>
> Now I turn up a VM on CN01 and it connects to ‘br-int’ for one of the
> interfaces.
>
> OVN_NB has the logical switch configured, and the port assignment.
>
> OVN_SB never receives the port state ‘online’ from the CN.
>
>
>
> # ovs-vsctl show
>
> aca349bd-6a23-47f6-98da-a35773753858
>
>     Bridge br-int
>
>         fail_mode: secure
>
>         Port br-int
>
>             Interface br-int
>
>                 type: internal
>
>         Port "ovn-bda632-0"
>
>             Interface "ovn-bda632-0"
>
>                 type: geneve
>
>                 options: {csum="true", key=flow,
> remote_ip="192.168.100.77"}
>
>         Port "ovn-3fd25b-0"
>
>             Interface "ovn-3fd25b-0"
>
>                 type: geneve
>
>                 options: {csum="true", key=flow,
> remote_ip="192.168.100.78"}
>
>         Port "ovn-5bd199-0"
>
>             Interface "ovn-5bd199-0"
>
>                 type: geneve
>
>                 options: {csum="true", key=flow,
> remote_ip="192.168.100.79"}
>
>         Port "525401c1d4e1"  <<<<<<<<<  VM PORT
>
>             Interface "525401c1d4e1" <<<<<<<<<  VM PORT
>
>
>
> /////
>
> VM DOMXML – INTERFACE
>
> /////
>
>     <interface type='bridge'>
>
>       <mac address='52:54:01:c1:d4:e1'/>
>
>       <source bridge='br-int'/>
>
>       <virtualport type='openvswitch'>
>
>         <parameters interfaceid='9de72cf5-cbb2-4ebe-9c89-3962a29ed869'/>
>
>       </virtualport>
>
>       <target dev='525401c1d4e1'/>
>
>       <model type='virtio'/>
>
>       <alias name='net1'/>
>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x04'
> function='0x0'/>
>
>     </interface>
>
>
>
>
>
> # ovn-nbctl show
>
> switch 9f87e014-4b3d-40e4-9a42-9f0b28957c05 (ls_1234)
>
>     port 9de72cf5-cbb2-4ebe-9c89-3962a29ed869
>
>         addresses: ["52:54:01:c1:d4:e1"]
>
>
>
> # ovn-sbctl show
>
> Chassis "3fd25b76-3170-4eab-8604-690182500478"
>
>     hostname: "lab-vxlan-cn03"
>
>     Encap geneve
>
>         ip: "192.168.100.78"
>
>         options: {csum="true"}
>
> Chassis "5bd199bf-9e3e-41b1-bdb0-b1fab1adff7c"
>
>     hostname: "lab-vxlan-cn04"
>
>     Encap geneve
>
>         ip: "192.168.100.79"
>
>         options: {csum="true"}
>
> Chassis "bda632c5-afeb-41bd-80e5-5c423172a771"
>
>     hostname: "lab-vxlan-cn02"
>
>     Encap geneve
>
>         ip: "192.168.100.77"
>
>         options: {csum="true"}
>
> Chassis "0bd6c91a-10d3-4a1f-86bf-bdeb4bf110a3"
>
>     hostname: "lab-vxlan-cn01"
>
>     Encap geneve
>
>         ip: "192.168.100.76"
>
>         options: {csum="true"}
>
>
>
> lab-vxlan-cn01# ovs list open_vswitch | grep external_ids
>
> external_ids        : {hostname="lab-vxlan-cn01",
> ovn-encap-ip="192.168.100.76", ovn-encap-type=geneve, ovn-nb="tcp:
> 192.168.100.10:6641", ovn-remote="tcp:192.168.100.10:6642",
> rundir="/var/run/openvswitch",
> system-id="0bd6c91a-10d3-4a1f-86bf-bdeb4bf110a3"}
>
>
>
>
>

If you are familiar with puppet, you can refer to this [1] which creates
the ocf:ovn:ovndb-servers  resource. But the steps you provided above to
setup the pacemaker cluster
seems fine to me.

[1] -
https://github.com/openstack/puppet-tripleo/blob/master/manifests/profile/pacemaker/ovn_northd.pp


You can do few things to check if your setup is fine or not

1. Check that ovn-controllers on nodes - cn[01-04] are able to communicate
to the OVN Southbound DB.
On the master node you can delete the chassis like "ovn-sbctl chassis-del
3fd25b76-3170-4eab-8604-690182500478"
and then run "ovn-sbctl show". If the chassis record for cn03 reappears
then its fine.

2. Run - ovn-nbctl --db=tcp:192.168.100.10:6641 show to make sure you are
able to talk to the OVN DB servers.

3.  See the ovn-controller.log on the CN01 and see if it has claimed the
port or not. In the logs you should see "Claiming port ...."


Thanks
Numan

/////
>
> Netstat output from “master” controller
>
> /////
>
>
>
> # netstat -antp | grep -v sshd | grep -v WAIT
>
> Active Internet connections (servers and established)
>
> Proto Recv-Q Send-Q Local Address           Foreign Address
> State       PID/Program name
>
> tcp        0      0 0.0.0.0:2224            0.0.0.0:*
> LISTEN      1885/ruby
>
> tcp        0      0 192.168.100.10:6641     0.0.0.0:*
> LISTEN      3493/ovsdb-server
>
> tcp        0      0 192.168.100.10:6642     0.0.0.0:*
> LISTEN      3503/ovsdb-server
>
> tcp        0      0 192.168.100.12:48362    192.168.100.12:2224
> ESTABLISHED 1885/ruby
>
> tcp        0      0 192.168.100.12:2224     192.168.100.11:58166
> ESTABLISHED 1885/ruby
>
> tcp        0      0 192.168.100.10:6642     192.168.100.13:34044
> ESTABLISHED 3503/ovsdb-server
>
> tcp        0      0 192.168.100.12:55480    192.168.100.13:2224
> ESTABLISHED 1885/ruby
>
> tcp        0      0 192.168.100.12:2224     192.168.100.12:48362
> ESTABLISHED 1885/ruby
>
> tcp        0      0 192.168.100.12:2224     192.168.100.13:43488
> ESTABLISHED 1885/ruby
>
> tcp        0      0 192.168.100.10:6641     192.168.100.13:34148
> ESTABLISHED 3493/ovsdb-server
>
> tcp        0      0 192.168.100.10:6642     192.168.100.79:47974
> ESTABLISHED 3503/ovsdb-server
>
> tcp        0      0 192.168.100.10:6641     192.168.100.11:60226
> ESTABLISHED 3493/ovsdb-server
>
> tcp        0      0 192.168.100.10:6642     192.168.100.76:55570
> ESTABLISHED 3503/ovsdb-server
>
> tcp        0      0 192.168.100.10:6642     192.168.100.78:36428
> ESTABLISHED 3503/ovsdb-server
>
> tcp        0      0 192.168.100.10:6642     192.168.100.11:40682
> ESTABLISHED 3503/ovsdb-server
>
> tcp        0      0 192.168.100.10:6642     192.168.100.77:58772
> ESTABLISHED 3503/ovsdb-server
>
> tcp        0      0 192.168.100.12:41974    192.168.100.11:2224
> ESTABLISHED 1885/ruby
>
> tcp6       0      0 :::2224                 :::*
> LISTEN      1885/ruby
>
>
>
>
>
>
>
> Regards,
>
>
>
> *Stephen Flynn*
>
>
> _______________________________________________
> discuss mailing list
> discuss at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20190501/c70ff324/attachment-0001.html>


More information about the discuss mailing list