[ovs-dev] [PATCH v5 1/4] ovn: ovn-ctl support for HA ovn DB servers

Babu Shanmugam bschanmu at redhat.com
Tue Nov 15 11:52:56 UTC 2016



On Tuesday 15 November 2016 05:17 PM, Andy Zhou wrote:
>
>
> On Mon, Nov 14, 2016 at 11:54 PM, Babu Shanmugam <bschanmu at redhat.com 
> <mailto:bschanmu at redhat.com>> wrote:
>
>
>
>     On Friday 11 November 2016 02:18 PM, Andy Zhou wrote:
>>
>>
>>     On Mon, Nov 7, 2016 at 11:55 PM, Babu Shanmugam
>>     <bschanmu at redhat.com <mailto:bschanmu at redhat.com>> wrote:
>>
>>
>>
>>         On Monday 07 November 2016 06:49 PM, Andy Zhou wrote:
>>>         This version is better, I am able to apply them. Thanks.
>>>
>>>         I got the system running, but managed to get system into a
>>>         state where both machines (centos and centos2)
>>>         are running the ovsdb in a backup mode.   The output of "pcs
>>>         status" shows an error message, but the message is not
>>>         very helpful.  Any suggestion on how to debug this?
>>>
>>>         root at centos:/# pcs status
>>>         Cluster name: mycluster
>>>         Last updated: Mon Nov  7 05:12:06 2016Last change: Mon Nov
>>>          7 05:08:24 2016 by root via cibadmin on centos
>>>         Stack: corosync
>>>         Current DC: centos2 (version 1.1.13-10.el7_2.4-44eb2dd) -
>>>         partition with quorum
>>>         2 nodes and 3 resources configured
>>>
>>>         Node centos: standby
>>>         Online: [ centos2 ]
>>>
>>>         Full list of resources:
>>>
>>>          virtip(ocf::heartbeat:IPaddr):Started centos2
>>>          Master/Slave Set: ovndb_servers_master [ovndb_servers]
>>>              Stopped: [ centos centos2 ]
>>>
>>>         Failed Actions:
>>>         * ovndb_servers_start_0 on centos2 'unknown error' (1):
>>>         call=18, status=Timed Out, exitreason='none',
>>>             last-rc-change='Mon Nov  7 02:28:07 2016', queued=0ms,
>>>         exec=30002ms
>>>
>>>
>>>         PCSD Status:
>>>           centos: Online
>>>           centos2: Online
>>>
>>>         Daemon Status:
>>>           corosync: active/enabled
>>>           pacemaker: active/enabled
>>>           pcsd: active/enabled
>>>
>>>         --------------------------------------------
>>>         root at centos:/# pcs config
>>>         Cluster Name: mycluster
>>>         Corosync Nodes:
>>>          centos centos2
>>>         Pacemaker Nodes:
>>>          centos centos2
>>>
>>>         Resources:
>>>          Resource: virtip (class=ocf provider=heartbeat type=IPaddr)
>>>           Attributes: ip=192.168.122.200 cidr_netmask=24
>>>           Operations: start interval=0s timeout=20s
>>>         (virtip-start-interval-0s)
>>>                       stop interval=0s timeout=20s
>>>         (virtip-stop-interval-0s)
>>>                       monitor interval=30s (virtip-monitor-interval-30s)
>>>          Master: ovndb_servers_master
>>>           Meta Attrs: notify=true
>>>           Resource: ovndb_servers (class=ocf provider=ovn
>>>         type=ovndb-servers)
>>>            Attributes: master_ip=192.168.122.200
>>
>>         Andy, you don't seem to have defined an attribute for
>>         ovn_ctl. It means, the ovn-ctl script will be assumed to be
>>         present in /usr/share/openvswitch/scripts/ovn-ctl. Can you
>>         check if you have ovn-ctl at the correct location?
>>
>>     Yes, The script was installed there.--db-nb-sync-from-addr=
>>
>>         If not, please define an attribute similar to master_ip and
>>         name it ovn_ctl and point that to the correct location of
>>         ovn-ctl ?
>>
>>     The document says "ovn-ctl" is optional. I now changed to have it
>>     fully specified, but makes no difference. There are some log
>>     information towards the end of email if they help. Overall, it
>>     could just be something weird about my system, I am not sure it will
>>     be worth while to track it down.  On the other hand, I will be
>>     happy to provide more information about the my setup in case they
>>     are useful.
>
>     Andy, can you please try to manually start the db servers using
>     ovn-ctl and see if they are really started. I would use the
>     following commands to learn that.
>
>     /usr/share/openvswitch/scripts/ovn-ctl
>     --db-nb-sync-from-addr=192.0.2.254
>     --db-nb-sync-from-addr=192.0.2.254 start_ovsdb
>
>     /usr/share/openvswitch/scripts/ovn-ctl status_ovnsb
>     /usr/share/openvswitch/scripts/ovn-ctl status_ovnnb
>
>
>     The last two commands should print 'running/backup'.
>
> Typed those commands as is produced expected results. But I am not 
> sure if you meant to actually specify "--db-nb-sync-from-addr" twice.  
> I also tried to replace the 2nd argument with "--db-sb-sync-from-addr" 
> and got the same result.

It was indeed --db-sb-sync-from-addr.

>
>     There are some log messages added in the ocf script in case the
>     start funcitonality fails by timing out. You should be able to see
>     some print starting with "ovndb_servers: After starting ovsdb,
>     status is". These log messages should be present in the pacemaker
>     logs.
>
>
> Thanks for the hint. I got more useful logs by "journalctl -u 
> pacemaker -b". After some debugging, I found out what actually 
> happened -- both 'nb' and 'sb' are already running in the system.  It 
> works after I killed both servers and restart the cluster.

That's great!

>
>
>>>         Is the user expected to populate those files by hand? If
>>>         yes, what IP address should be used? The floating IP?
>>
>>         This file will have to be populated by the user, only when
>>         the user wants ovn-northd to connect to a different set of DB
>>         urls, other than unix sockets in that same machine.
>>         The IP address depends  on the setup. The pacemaker script
>>         uses the master-ip address that you supply to the OCF
>>         resource as an attribute.
>>
>>     Thanks. Should this be added to IntegrationGuide.rst?
>>
>
>     I have added the need of IPAddr2 resource in IntegrationGuide.rst.
>     The new options are in ovn-ctl man page. Is there something
>     specific, that you feel missed out in the documentation?
>
>
> I think the fact we are using the presence of the file to control how 
> "systemctl start ovn-northd" is subtle and should be mentioned some 
> where.
>
> It is not very clear what "master_ip" should be from the context. Did 
> miss something?

I agree, Andy. I will correct this in v6. Thanks so much.

>
>
>
>>     More logs..
>>
>>     root at centos:~# ls -l /usr/share/openvswitch/scripts/ovn-ctl
>>     -rwxr-xr-x. 1 root root 15539 Nov  7 02:12
>>     /usr/share/openvswitch/scripts/ovn-ctl
>>
>>     Resources:
>>      Resource: virtip (class=ocf provider=heartbeat type=IPaddr)
>>       Attributes: ip=192.168.122.200 cidr_netmask=24
>>       Operations: start interval=0s timeout=20s
>>     (virtip-start-interval-0s)
>>                   stop interval=0s timeout=20s (virtip-stop-interval-0s)
>>                   monitor interval=30s (virtip-monitor-interval-30s)
>>      Master: ovndb_servers_master
>>       Meta Attrs: notify=true
>>       Resource: ovndb_servers (class=ocf provider=ovn type=ovndb-servers)
>>        Attributes: master_ip=192.168.122.200
>>     ovn_ctl=/usr/share/openvswitch/scripts/ovn-ctl
>>        Operations: start interval=0s timeout=30s
>>     (ovndb_servers-start-interval-0s)
>>                    stop interval=0s timeout=20s
>>     (ovndb_servers-stop-interval-0s)
>>                    promote interval=0s timeout=50s
>>     (ovndb_servers-promote-interval-0s)
>>                    demote interval=0s timeout=50s
>>     (ovndb_servers-demote-interval-0s)
>>                    monitor interval=10s
>>     (ovndb_servers-monitor-interval-10s)
>>
>
>     Resource configuration looks fine. I think the above experiment
>     would help us catch the problem.
>
>>
>>     pcs status still shows ovsdb are offline on both hosts:
>>     ==========================================
>>     Cluster name: mycluster
>>     Last updated: Fri Nov 11 00:33:10 2016          Last change: Fri
>>     Nov 11 00:09:13 2016 by root via crm_attribute on centos2
>>     Stack: corosync
>>     Current DC: centos (version 1.1.13-10.el7_2.4-44eb2dd) -
>>     partition with quorum
>>     2 nodes and 3 resources configured
>>
>>     Online: [ centos centos2 ]
>>
>>     Full list of resources:
>>
>>      virtip (ocf::heartbeat:IPaddr):  Started centos
>>      Master/Slave Set: ovndb_servers_master [ovndb_servers]
>>          Stopped: [ centos centos2 ]
>>
>>     Failed Actions:
>>     * ovndb_servers_start_0 on centos 'unknown error' (1): call=18,
>>     status=Timed Out, exitreason='none',
>>         last-rc-change='Fri Nov 11 00:09:13 2016', queued=0ms,
>>     exec=30280ms
>>     * ovndb_servers_start_0 on centos2 'unknown error' (1): call=13,
>>     status=Timed Out, exitreason='none',
>>         last-rc-change='Fri Nov 11 00:07:42 2016', queued=0ms,
>>     exec=30234ms
>>
>>
>>     PCSD Status:
>>       centos: Online
>>       centos2: Online
>>
>>
>>
>
>



More information about the dev mailing list