[ovs-dev] [PATCH v3 1/4] ovn: ovn-ctl support for HA ovn DB servers

Babu Shanmugam bschanmu at redhat.com
Thu Oct 13 05:57:10 UTC 2016



On Thursday 13 October 2016 07:26 AM, Andy Zhou wrote:
>
>
> On Sun, Oct 9, 2016 at 12:02 AM, Babu Shanmugam <bschanmu at redhat.com 
> <mailto:bschanmu at redhat.com>> wrote:
>
>
>
>     On Friday 07 October 2016 05:33 AM, Andy Zhou wrote:
>
>         Babu, Thank you for working on this.  At a high level, it is
>         not clear to me the boundary between ocf scripts and the
>         ovn-ctl script -- i.e. which aspect is managed by which
>         entity.  For example, 1) which scripts are responsible for
>         starting the ovsdb servers.
>
>     ovsdb servers are started by the pacemaker. It uses the OCF script
>     and the OCF script uses ovn-ctl.
>
>         2) Which script should manage the fail-over -- I tried to shut
>         down a cluster node using the "pcs" command, and fail-over did
>         not happen.
>
>     The OCF script for OVN DB servers is capable of understanding the
>     promote and demote calls. So, pacemaker will use this script to
>     run ovsdb server in all the nodes and promote one node as the
>     master(active server). If the node in which the master instance is
>     running fails, pacemaker automatically promotes another node as
>     the master. OCF script is an agent for the pacemaker for the OVN
>     db resource.
>     The above behavior depends on the way you are configuring the
>     resource that uses this OCF script. I am attaching a simple set of
>     commands to configure the ovsdb server. You can create the
>     resources after creating the cluster with the following command
>
>     crm configure < ovndb.pcmk
>
>     Please note, you have to replace the macros VM1_NAME, VM2_NAME,
>     VM3_NAME and MASTER_IP with the respective values before using
>     ovndb.pcmk. This script works with a 3 node cluster. I am assuming
>     the node ids as 101, 102, and 103. Please replace them as well to
>     work with your cluster.
>
>
>     --
>     Babu
>
>
> Unfortunately, CRM is not distributed with pacemaker on centos 
> anymore.  It took me some time to get it installed.  I think other may 
> ran into similar issues, so
> it may be worth while do document this, or change the script to use 
> "pcs" which is part of the distribution.
>

I agree. Is INSTALL*.md good enough? In openstack, we are managing the 
resource through puppet manifests.

>
> I adapted the script with my setup.  I have two nodes, 
> "h1"(10.33.74.77) and "h2"(10.33.75.158), For Master_IP, I used 
> 10.33.75.220.
>
> This is the output of crm configure show:
>
> ------
>
>  [root at h2 azhou]# crm configure show
>
> node1: h1 \
>
> attributes
>
> node2: h2
>
> primitiveClusterIP IPaddr2 \
>
> paramsip=10.33.75.200cidr_netmask=32\
>
> opstart interval=0stimeout=20s\
>
> opstop interval=0stimeout=20s\
>
> opmonitor interval=30s
>
> primitiveWebSite apache \
>
> paramsconfigfile="/etc/httpd/conf/httpd.conf"statusurl="http://127.0.0.1/server-status"\
>
> opstart interval=0stimeout=40s\
>
> opstop interval=0stimeout=60s\
>
> opmonitor interval=1min\
>
> meta
>
> primitiveovndb ocf:ovn:ovndb-servers \
>
> opstart interval=0stimeout=30s\
>
> opstop interval=0stimeout=20s\
>
> oppromote interval=0stimeout=50s\
>
> opdemote interval=0stimeout=50s\
>
> opmonitor interval=1min\
>
> meta
>
> colocationcolocation-WebSite-ClusterIP-INFINITY inf: WebSiteClusterIP
>
> orderorder-ClusterIP-WebSite-mandatory ClusterIP:start WebSite:start
>
> propertycib-bootstrap-options: \
>
> have-watchdog=false\
>
> dc-version=1.1.13-10.el7_2.4-44eb2dd\
>
> cluster-infrastructure=corosync\
>
> cluster-name=mycluster\
>
> stonith-enabled=false
>
>

You seem to have configured ovndb just as a primitive resource and not 
as a master slave resource. And there is no colocation resource 
configured for the ovndb with ClusterIP. Only with the colocation 
resource, ovndb server will be co-located with the ClusterIP resource.  
You will have to include the following lines for crm configure. You can 
configure the same with pcs as well.

ms ovndb-master ovndb meta notify="true"
colocation colocation-ovndb-master-ClusterIP-INFINITY inf: 
ovndb-master:Started ClusterIP:Master
order order-ClusterIP-ovndb-master-mandatory inf: ClusterIP:start 
ovndb-master:start

> --------
>
> I have also added firewall rules to allow access to TCP port 6642 and 
> port 6641.
>
>
> At this stage, crm_mon shows:
>
> Last updated: Wed Oct 12 14:49:07 2016          Last change: Wed Oct 
> 12 13:58:55
>
>  2016 by root via crm_attributeon h2
>
> Stack: corosync
>
> Current DC: h2 (version 1.1.13-10.el7_2.4-44eb2dd) - partition with quorum
>
> 2 nodes and 3 resources configured
>
>
> Online: [ h1 h2 ]
>
>
> ClusterIP(ocf::heartbeat:IPaddr2):Started h2
>
> WebSite (ocf::heartbeat:apache):        Started h2
>
> ovndb (ocf::ovn:ovndb-servers):Started h1
>
>
> Failed Actions:
>
> * ovndb_start_0 on h2 'unknown error' (1): call=39, status=Timed Out, 
> exitreason
>
> ='none',
>
> last-rc-change='Wed Oct 12 14:43:03 2016', queued=0ms, exec=30003ms
>
>
> ---
>
> Not sure what the error message on h2 is about, Notice ovndb service 
> is now running on h1, while the cluster IP is on h2.
>

Looks like, the OCF script is not able to start the ovsdb servers in 
'h2' node (we are getting a timed-out status). You can check if the OCF 
script is working good by using ocf-tester. You can run the ocf-tester using

ocf-tester -n test-ovndb -o master_ip 10.0.0.1 <path-to-the-ocf-script>

Alternately, you can check if the ovsdb servers are started properly by 
running

/usr/share/openvswitch/scripts/ovn-ctl --db-sb-sync-from=10.0.0.1 
--db-nb-sync-from=10.0.0.1 start_ovsdb


> Also, both server are running as a backup server:
>
> [root at h1 azhou]# ovs-appctl -t /run/openvswitch/ovnsb_db.ctl 
> ovsdb-server/sync-status
>
> state: backup
>
> connecting: tcp:192.0.2.254:6642 <http://192.0.2.254:6642>   // I 
> specified the IP at /etc/openvswitch/ovnsb-active.conf, But the file 
> was over-written with 192.0.2.254
>
>
> [root at h2 ovs]# ovs-appctl -t /run/openvswitch/ovnsb_db.ctl 
> ovsdb-server/sync-status
>
> state: backup
>
> replicating: tcp:10.33.74.77:6642 <http://10.33.74.77:6642>   // The 
> IP address was retained on h2
>
> database: OVN_Southbound
>
> ---
>
> Any suggestions on what I did wrong?
>
>

I think this is mostly due to the crm configuration. Once you add the 
'ms' and 'colocation' resources, you should be able to overcome this 
problem.

I have never tried colocating two resources with the ClusterIP resource. 
Just for testing, is it possible to drop the WebServer resource?

Thank you,
Babu




More information about the dev mailing list