[ovs-dev] [PATCH v3 1/4] ovn: ovn-ctl support for HA ovn DB servers

Andy Zhou azhou at ovn.org
Thu Oct 13 01:56:36 UTC 2016


On Sun, Oct 9, 2016 at 12:02 AM, Babu Shanmugam <bschanmu at redhat.com> wrote:

>
>
> On Friday 07 October 2016 05:33 AM, Andy Zhou wrote:
>
>> Babu,  Thank you for working on this.  At a high level, it is not clear
>> to me the boundary between ocf scripts and the ovn-ctl script -- i.e. which
>> aspect is managed by which entity.  For example, 1) which scripts are
>> responsible for starting the ovsdb servers.
>>
> ovsdb servers are started by the pacemaker. It uses the OCF script and the
> OCF script uses ovn-ctl.
>
> 2) Which script should manage the fail-over -- I tried to shut down a
>> cluster node using the "pcs" command, and fail-over did not happen.
>>
> The OCF script for OVN DB servers is capable of understanding the promote
> and demote calls. So, pacemaker will use this script to run ovsdb server in
> all the nodes and promote one node as the master(active server). If the
> node in which the master instance is running fails, pacemaker automatically
> promotes another node as the master. OCF script is an agent for the
> pacemaker for the OVN db resource.
> The above behavior depends on the way you are configuring the resource
> that uses this OCF script. I am attaching a simple set of commands to
> configure the ovsdb server. You can create the resources after creating the
> cluster with the following command
>
> crm configure < ovndb.pcmk
>
> Please note, you have to replace the macros VM1_NAME, VM2_NAME, VM3_NAME
> and MASTER_IP with the respective values before using ovndb.pcmk. This
> script works with a 3 node cluster. I am assuming the node ids as 101, 102,
> and 103. Please replace them as well to work with your cluster.
>
>
> --
> Babu
>

Unfortunately, CRM is not distributed with pacemaker on centos anymore.  It
took me some time to get it installed.  I think other may ran into similar
issues, so
it may be worth while do document this, or change the script to use "pcs"
which is part of the distribution.


I adapted the script with my setup.  I have two nodes, "h1"(10.33.74.77)
and "h2"(10.33.75.158), For Master_IP, I used 10.33.75.220.

This is the output of crm configure show:

------

 [root at h2 azhou]# crm configure show

node 1: h1 \

attributes

node 2: h2

primitive ClusterIP IPaddr2 \

params ip=10.33.75.200 cidr_netmask=32 \

op start interval=0s timeout=20s \

op stop interval=0s timeout=20s \

op monitor interval=30s

primitive WebSite apache \

params configfile="/etc/httpd/conf/httpd.conf" statusurl="
http://127.0.0.1/server-status" \

op start interval=0s timeout=40s \

op stop interval=0s timeout=60s \

op monitor interval=1min \

meta

primitive ovndb ocf:ovn:ovndb-servers \

op start interval=0s timeout=30s \

op stop interval=0s timeout=20s \

op promote interval=0s timeout=50s \

op demote interval=0s timeout=50s \

op monitor interval=1min \

meta

colocation colocation-WebSite-ClusterIP-INFINITY inf: WebSite ClusterIP

order order-ClusterIP-WebSite-mandatory ClusterIP:start WebSite:start

property cib-bootstrap-options: \

have-watchdog=false \

dc-version=1.1.13-10.el7_2.4-44eb2dd \

cluster-infrastructure=corosync \

cluster-name=mycluster \

stonith-enabled=false


--------

I have also added firewall rules to allow access to TCP port 6642 and port
6641.


At this stage, crm_mon shows:

Last updated: Wed Oct 12 14:49:07 2016          Last change: Wed Oct 12
13:58:55

 2016 by root via crm_attribute on h2

Stack: corosync

Current DC: h2 (version 1.1.13-10.el7_2.4-44eb2dd) - partition with quorum

2 nodes and 3 resources configured


Online: [ h1 h2 ]


ClusterIP (ocf::heartbeat:IPaddr2): Started h2

WebSite (ocf::heartbeat:apache):        Started h2

ovndb   (ocf::ovn:ovndb-servers): Started h1


Failed Actions:

* ovndb_start_0 on h2 'unknown error' (1): call=39, status=Timed Out,
exitreason

='none',

    last-rc-change='Wed Oct 12 14:43:03 2016', queued=0ms, exec=30003ms


---

Not sure what the error message on h2 is about, Notice ovndb service is now
running on h1, while the cluster IP is on h2.

Also, both server are running as a backup server:

[root at h1 azhou]# ovs-appctl -t /run/openvswitch/ovnsb_db.ctl
ovsdb-server/sync-status

state: backup

connecting: tcp:192.0.2.254:6642   // I specified the IP at
/etc/openvswitch/ovnsb-active.conf, But the file was over-written with
192.0.2.254


[root at h2 ovs]# ovs-appctl -t /run/openvswitch/ovnsb_db.ctl
ovsdb-server/sync-status

state: backup

replicating: tcp:10.33.74.77:6642   // The IP address was retained on h2

database: OVN_Southbound

---

Any suggestions on what I did wrong?



More information about the dev mailing list