[ovs-discuss] OVN Controller - High Availability via Pacemaker
Numan Siddique
nusiddiq at redhat.com
Wed May 1 16:02:43 UTC 2019
On Wed, May 1, 2019 at 9:07 PM Numan Siddique <nusiddiq at redhat.com> wrote:
>
> Hi Stephen,
>
> Your setup seems fine to me. Please see below for some comments.
>
>
> On Wed, May 1, 2019 at 8:23 PM Stephen Flynn via discuss <
> ovs-discuss at openvswitch.org> wrote:
>
>> Greetings OVS Discuss Group:
>>
>>
>>
>> First – I want to apologize for the wall of text to follow – I wasn’t
>> sure how much information would be wanted and I didn’t want everyone to
>> have to ask 100 questions to get what they needed.
>>
>>
>>
>> I am working on settings up an OVN Controller (3 Nodes) using Pacemaker /
>> Corosync. For reference -- A single node controller (no pacemaker /
>> corosync) operates without issue.
>>
>> Root Goal: Failover Redundancy for the OVN Controller – Allows for
>> maintenance of controller nodes and/or failure of a controller node.
>>
>
> Just a small correction - Pacemker is used to provide active/passive HA
> for the OVN DB servers. When you refer OVN Controller its a bit confusing
> since we have a service
> called - ovn-controller (provided by ovn-host package) which runs on each
> host/hypervisor.
>
>
>
>>
>> I have been following the very limited documentation on how to setup this
>> environment but don’t seem to be having much luck getting it to be 100%
>> stable or operational.
>>
>>
>> http://docs.openvswitch.org/en/latest/topics/integration/?highlight=pacemaker
>>
>>
>>
>> If I could ask someone to assist in reviewing the below installation and
>> provide some insight into what I may be doing wrong, or have wrong (need
>> newer version of code, etc) I would be grateful. I’ve currently only
>> attempted this by using “packaged” versions of code to “keep it simple” …
>> but I do realize that getting this to work may require a newer code release.
>> Along with this, once I am able to get a stable environment, I would like
>> to contribute updated documentation to the community on how to perform a
>> full setup.
>>
>>
>>
>>
>>
>> *-- Environment –*
>>
>>
>>
>> # cat /etc/lsb-release | grep DESCRIPTION
>>
>> DISTRIB_DESCRIPTION="Ubuntu 18.04.2 LTS"
>>
>>
>>
>> # ovn-northd --version
>>
>> ovn-northd (Open vSwitch) 2.9.2
>>
>>
>>
>> # ovn-nbctl --version
>>
>> ovn-nbctl (Open vSwitch) 2.9.2
>>
>> DB Schema 5.10.0
>>
>>
>>
>> # ovn-sbctl --version
>>
>> ovn-sbctl (Open vSwitch) 2.9.2
>>
>> DB Schema 1.15.0
>>
>>
>>
>> *-- Setup Steps –*
>>
>> # cat /etc/hosts
>>
>>
>>
>> # LAB - Compute Nodes
>>
>> 192.168.100.10 ctrl00
>>
>> 192.168.100.11 ctrl01
>>
>> 192.168.100.12 ctrl02
>>
>> 192.168.100.13 ctrl03
>>
>> 192.168.100.76 cn01
>>
>> 192.168.100.77 cn02
>>
>> 192.168.100.78 cn03
>>
>> 192.168.100.79 cn04
>>
>>
>>
>>
>>
>> ### All Controllers
>>
>>
>>
>> ## System Package Updates
>>
>> apt clean all; apt update; apt -y dist-upgrade
>>
>>
>>
>> ## Time Sync Services (NTP) [ctrl01, ctrl02, ctrl03]
>>
>> apt install -y ntp
>>
>>
>>
>> ## Install OVN Central Controller [ctrl01, ctrl02, ctrl03]
>>
>> apt install -y openvswitch-common openvswitch-switch python-openvswitch
>> python3-openvswitch ovn-common ovn-central
>>
>>
>>
>> ## Install pacemaker and corosync [ctrl01, ctrl02, ctrl03]
>>
>> apt install -y pcs pacemaker pacemaker-cli-utils
>>
>>
>>
>> ## Reset the Pacemaker Password [ctrl01, ctrl02, ctrl03]
>>
>> Password = 6B43WAmuPzM2Ewsr
>>
>>
>>
>> enc_passwd=$(python3 -c 'import crypt;
>> print(crypt.crypt("6B43WAmuPzM2Ewsr", crypt.mksalt(crypt.METHOD_SHA512)))')
>>
>> usermod -p "${enc_passwd}" hacluster
>>
>>
>>
>> ## Enable and Start the PCS Daemon [ctrl01, ctrl02, ctrl03]
>>
>> sudo systemctl enable pcsd;
>>
>> sudo systemctl start pcsd;
>>
>> sudo systemctl status pcsd;
>>
>> ```
>>
>>
>>
>> ### Cluster Controller #1 (ONLY)
>>
>>
>>
>> ## Enable Cluster Services [ctrl01 --ONLY-- ]
>>
>>
>>
>> pcs cluster auth ctrl01 ctrl02 ctrl03 -u hacluster -p '6B43WAmuPzM2Ewsr'
>> --force
>>
>> pcs cluster setup --name OVN-CLUSTER ctrl01 ctrl02 ctrl03 --force
>>
>>
>>
>> pcs cluster enable --all;
>>
>> pcs cluster start --all;
>>
>>
>>
>> pcs property set stonith-enabled=false
>>
>> pcs property set no-quorum-policy=ignore
>>
>>
>>
>> pcs status
>>
>>
>>
>> [[ output ]]
>>
>> Cluster name: OVN-CLUSTER
>>
>> Stack: corosync
>>
>> Current DC: ctrl01 (version 1.1.18-2b07d5c5a9) - partition with quorum
>>
>> Last updated: Wed May 1 09:42:33 2019
>>
>> Last change: Wed May 1 09:42:29 2019 by root via cibadmin on ctrl01
>>
>>
>>
>> 3 nodes configured
>>
>> 0 resources configured
>>
>>
>>
>> Online: [ ctrl01 ctrl02 ctrl03 ]
>>
>>
>>
>> No resources
>>
>>
>>
>>
>>
>> Daemon Status:
>>
>> corosync: active/enabled
>>
>> pacemaker: active/enabled
>>
>> pcsd: active/enabled
>>
>>
>>
>>
>>
>>
>>
>> ## Add cluster resources [node01 --ONLY-- ]
>>
>> pcs resource create ovn-virtual-ip ocf:heartbeat:IPaddr2 nic=ens192
>> ip=192.168.100.10 cidr_netmask=24 op monitor interval=30s
>>
>>
>>
>> pcs resource create ovndb_servers ocf:ovn:ovndb-servers \
>>
>> master_ip=192.168.100.10 \
>>
>> ovn_ctl=/usr/share/openvswitch/scripts/ovn-ctl \
>>
>> op monitor interval="10s" \
>>
>> op monitor role=Master interval="15s"
>>
>>
>>
>> pcs resource master ovndb_servers-master ovndb_servers \
>>
>> meta notify="true"
>>
>>
>>
>> pcs constraint order promote ovndb_servers-master then ovn-virtual-ip
>>
>> pcs constraint colocation add ovn-virtual-ip with master
>> ovndb_servers-master score=INFINITY
>>
>>
>>
>> pcs status
>>
>>
>>
>> [[ output ]]
>>
>> Cluster name: OVN-CLUSTER
>>
>> Stack: corosync
>>
>> Current DC: ctrl01 (version 1.1.18-2b07d5c5a9) - partition with quorum
>>
>> Last updated: Wed May 1 09:46:02 2019
>>
>> Last change: Wed May 1 09:44:53 2019 by root via crm_attribute on ctrl02
>>
>>
>>
>> 3 nodes configured
>>
>> 4 resources configured
>>
>>
>>
>> Online: [ ctrl01 ctrl02 ctrl03 ]
>>
>>
>>
>> Full list of resources:
>>
>>
>>
>> ovn-virtual-ip (ocf::heartbeat:IPaddr2): Started ctrl02
>>
>> Master/Slave Set: ovndb_servers-master [ovndb_servers]
>>
>> Masters: [ ctrl02 ]
>>
>> Slaves: [ ctrl01 ctrl03 ]
>>
>>
>>
>> Failed Actions:
>>
>> * ovndb_servers_monitor_10000 on ctrl01 'master' (8): call=18,
>> status=complete, exitreason='',
>>
>> last-rc-change='Wed May 1 09:43:28 2019', queued=0ms, exec=73ms
>>
>>
>>
>>
>>
>> Daemon Status:
>>
>> corosync: active/enabled
>>
>> pacemaker: active/enabled
>>
>> pcsd: active/enabled
>>
>>
>>
>>
>>
>> /////
>>
>>
>>
>> At this point, I execute ‘pcs cluster stop {{ controller }}; pcs cluster
>> start {{ controller }}’ for each controller one at a time and eventually
>> everything clears up.
>>
>>
>>
>> # pcs status
>>
>> Cluster name: OVN-CLUSTER
>>
>> Stack: corosync
>>
>> Current DC: ctrl02 (version 1.1.18-2b07d5c5a9) - partition with quorum
>>
>> Last updated: Wed May 1 09:47:50 2019
>>
>> Last change: Wed May 1 09:44:53 2019 by root via crm_attribute on ctrl02
>>
>>
>>
>> 3 nodes configured
>>
>> 4 resources configured
>>
>>
>>
>> Online: [ ctrl01 ctrl02 ctrl03 ]
>>
>>
>>
>> Full list of resources:
>>
>>
>>
>> ovn-virtual-ip (ocf::heartbeat:IPaddr2): Started ctrl02
>>
>> Master/Slave Set: ovndb_servers-master [ovndb_servers]
>>
>> Masters: [ ctrl02 ]
>>
>> Slaves: [ ctrl01 ctrl03 ]
>>
>>
>>
>> Daemon Status:
>>
>> corosync: active/enabled
>>
>> pacemaker: active/enabled
>>
>> pcsd: active/enabled
>>
>>
>>
>> /////
>>
>>
>>
>> lab-kvmctrl-01:~# ovn-sbctl show
>>
>> Chassis "3fd25b76-3170-4eab-8604-690182500478"
>>
>> hostname: "lab-vxlan-cn03"
>>
>> Encap geneve
>>
>> ip: "192.168.100.78"
>>
>> options: {csum="true"}
>>
>> Chassis "0bd6c91a-10d3-4a1f-86bf-bdeb4bf110a3"
>>
>> hostname: "lab-vxlan-cn01"
>>
>> Encap geneve
>>
>> ip: "192.168.100.76"
>>
>> options: {csum="true"}
>>
>> Chassis "bda632c5-afeb-41bd-80e5-5c423172a771"
>>
>> hostname: "lab-vxlan-cn02"
>>
>> Encap geneve
>>
>> ip: "192.168.100.77"
>>
>> options: {csum="true"}
>>
>> Chassis "5bd199bf-9e3e-41b1-bdb0-b1fab1adff7c"
>>
>> hostname: "lab-vxlan-cn04"
>>
>> Encap geneve
>>
>> ip: "192.168.100.79"
>>
>> options: {csum="true"}
>>
>>
>>
>> /////
>>
>>
>>
>> Now I turn up a VM on CN01 and it connects to ‘br-int’ for one of the
>> interfaces.
>>
>> OVN_NB has the logical switch configured, and the port assignment.
>>
>> OVN_SB never receives the port state ‘online’ from the CN.
>>
>>
>>
>> # ovs-vsctl show
>>
>> aca349bd-6a23-47f6-98da-a35773753858
>>
>> Bridge br-int
>>
>> fail_mode: secure
>>
>> Port br-int
>>
>> Interface br-int
>>
>> type: internal
>>
>> Port "ovn-bda632-0"
>>
>> Interface "ovn-bda632-0"
>>
>> type: geneve
>>
>> options: {csum="true", key=flow,
>> remote_ip="192.168.100.77"}
>>
>> Port "ovn-3fd25b-0"
>>
>> Interface "ovn-3fd25b-0"
>>
>> type: geneve
>>
>> options: {csum="true", key=flow,
>> remote_ip="192.168.100.78"}
>>
>> Port "ovn-5bd199-0"
>>
>> Interface "ovn-5bd199-0"
>>
>> type: geneve
>>
>> options: {csum="true", key=flow,
>> remote_ip="192.168.100.79"}
>>
>> Port "525401c1d4e1" <<<<<<<<< VM PORT
>>
>> Interface "525401c1d4e1" <<<<<<<<< VM PORT
>>
>>
>>
>> /////
>>
>> VM DOMXML – INTERFACE
>>
>> /////
>>
>> <interface type='bridge'>
>>
>> <mac address='52:54:01:c1:d4:e1'/>
>>
>> <source bridge='br-int'/>
>>
>> <virtualport type='openvswitch'>
>>
>> <parameters interfaceid='9de72cf5-cbb2-4ebe-9c89-3962a29ed869'/>
>>
>> </virtualport>
>>
>> <target dev='525401c1d4e1'/>
>>
>> <model type='virtio'/>
>>
>> <alias name='net1'/>
>>
>> <address type='pci' domain='0x0000' bus='0x00' slot='0x04'
>> function='0x0'/>
>>
>> </interface>
>>
>>
>>
>>
>>
>> # ovn-nbctl show
>>
>> switch 9f87e014-4b3d-40e4-9a42-9f0b28957c05 (ls_1234)
>>
>> port 9de72cf5-cbb2-4ebe-9c89-3962a29ed869
>>
>> addresses: ["52:54:01:c1:d4:e1"]
>>
>>
>>
>> # ovn-sbctl show
>>
>> Chassis "3fd25b76-3170-4eab-8604-690182500478"
>>
>> hostname: "lab-vxlan-cn03"
>>
>> Encap geneve
>>
>> ip: "192.168.100.78"
>>
>> options: {csum="true"}
>>
>> Chassis "5bd199bf-9e3e-41b1-bdb0-b1fab1adff7c"
>>
>> hostname: "lab-vxlan-cn04"
>>
>> Encap geneve
>>
>> ip: "192.168.100.79"
>>
>> options: {csum="true"}
>>
>> Chassis "bda632c5-afeb-41bd-80e5-5c423172a771"
>>
>> hostname: "lab-vxlan-cn02"
>>
>> Encap geneve
>>
>> ip: "192.168.100.77"
>>
>> options: {csum="true"}
>>
>> Chassis "0bd6c91a-10d3-4a1f-86bf-bdeb4bf110a3"
>>
>> hostname: "lab-vxlan-cn01"
>>
>> Encap geneve
>>
>> ip: "192.168.100.76"
>>
>> options: {csum="true"}
>>
>>
>>
>> lab-vxlan-cn01# ovs list open_vswitch | grep external_ids
>>
>> external_ids : {hostname="lab-vxlan-cn01",
>> ovn-encap-ip="192.168.100.76", ovn-encap-type=geneve, ovn-nb="tcp:
>> 192.168.100.10:6641", ovn-remote="tcp:192.168.100.10:6642",
>> rundir="/var/run/openvswitch",
>> system-id="0bd6c91a-10d3-4a1f-86bf-bdeb4bf110a3"}
>>
>>
>>
>>
>>
>
> If you are familiar with puppet, you can refer to this [1] which creates
> the ocf:ovn:ovndb-servers resource. But the steps you provided above to
> setup the pacemaker cluster
> seems fine to me.
>
> [1] -
> https://github.com/openstack/puppet-tripleo/blob/master/manifests/profile/pacemaker/ovn_northd.pp
>
>
> You can do few things to check if your setup is fine or not
>
> 1. Check that ovn-controllers on nodes - cn[01-04] are able to communicate
> to the OVN Southbound DB.
> On the master node you can delete the chassis like "ovn-sbctl chassis-del
> 3fd25b76-3170-4eab-8604-690182500478"
> and then run "ovn-sbctl show". If the chassis record for cn03 reappears
> then its fine.
>
> 2. Run - ovn-nbctl --db=tcp:192.168.100.10:6641 show to make sure you are
> able to talk to the OVN DB servers.
>
> 3. See the ovn-controller.log on the CN01 and see if it has claimed the
> port or not. In the logs you should see "Claiming port ...."
>
>
Looks to me ovn-northd is not running on the master controller node -
ctrl02.
Run "ovn-sbctl list port_binding". If this is empty then ovn-northd is not
running.
In order for the OVN OCF pacemaker script to start ovn-northd on the master
node you need to pass
"--manage-northd=yes".
Hope this helps.
Thanks
>
> Thanks
> Numan
>
> /////
>>
>> Netstat output from “master” controller
>>
>> /////
>>
>>
>>
>> # netstat -antp | grep -v sshd | grep -v WAIT
>>
>> Active Internet connections (servers and established)
>>
>> Proto Recv-Q Send-Q Local Address Foreign Address
>> State PID/Program name
>>
>> tcp 0 0 0.0.0.0:2224 0.0.0.0:*
>> LISTEN 1885/ruby
>>
>> tcp 0 0 192.168.100.10:6641 0.0.0.0:*
>> LISTEN 3493/ovsdb-server
>>
>> tcp 0 0 192.168.100.10:6642 0.0.0.0:*
>> LISTEN 3503/ovsdb-server
>>
>> tcp 0 0 192.168.100.12:48362 192.168.100.12:2224
>> ESTABLISHED 1885/ruby
>>
>> tcp 0 0 192.168.100.12:2224 192.168.100.11:58166
>> ESTABLISHED 1885/ruby
>>
>> tcp 0 0 192.168.100.10:6642 192.168.100.13:34044
>> ESTABLISHED 3503/ovsdb-server
>>
>> tcp 0 0 192.168.100.12:55480 192.168.100.13:2224
>> ESTABLISHED 1885/ruby
>>
>> tcp 0 0 192.168.100.12:2224 192.168.100.12:48362
>> ESTABLISHED 1885/ruby
>>
>> tcp 0 0 192.168.100.12:2224 192.168.100.13:43488
>> ESTABLISHED 1885/ruby
>>
>> tcp 0 0 192.168.100.10:6641 192.168.100.13:34148
>> ESTABLISHED 3493/ovsdb-server
>>
>> tcp 0 0 192.168.100.10:6642 192.168.100.79:47974
>> ESTABLISHED 3503/ovsdb-server
>>
>> tcp 0 0 192.168.100.10:6641 192.168.100.11:60226
>> ESTABLISHED 3493/ovsdb-server
>>
>> tcp 0 0 192.168.100.10:6642 192.168.100.76:55570
>> ESTABLISHED 3503/ovsdb-server
>>
>> tcp 0 0 192.168.100.10:6642 192.168.100.78:36428
>> ESTABLISHED 3503/ovsdb-server
>>
>> tcp 0 0 192.168.100.10:6642 192.168.100.11:40682
>> ESTABLISHED 3503/ovsdb-server
>>
>> tcp 0 0 192.168.100.10:6642 192.168.100.77:58772
>> ESTABLISHED 3503/ovsdb-server
>>
>> tcp 0 0 192.168.100.12:41974 192.168.100.11:2224
>> ESTABLISHED 1885/ruby
>>
>> tcp6 0 0 :::2224 :::*
>> LISTEN 1885/ruby
>>
>>
>>
>>
>>
>>
>>
>> Regards,
>>
>>
>>
>> *Stephen Flynn*
>>
>>
>> _______________________________________________
>> discuss mailing list
>> discuss at openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20190501/35a1d4c2/attachment-0001.html>
More information about the discuss
mailing list