[ovs-dev] [PATCH v5 1/4] ovn: ovn-ctl support for HA ovn DB servers
Babu Shanmugam
bschanmu at redhat.com
Tue Nov 15 11:52:56 UTC 2016
On Tuesday 15 November 2016 05:17 PM, Andy Zhou wrote:
>
>
> On Mon, Nov 14, 2016 at 11:54 PM, Babu Shanmugam <bschanmu at redhat.com
> <mailto:bschanmu at redhat.com>> wrote:
>
>
>
> On Friday 11 November 2016 02:18 PM, Andy Zhou wrote:
>>
>>
>> On Mon, Nov 7, 2016 at 11:55 PM, Babu Shanmugam
>> <bschanmu at redhat.com <mailto:bschanmu at redhat.com>> wrote:
>>
>>
>>
>> On Monday 07 November 2016 06:49 PM, Andy Zhou wrote:
>>> This version is better, I am able to apply them. Thanks.
>>>
>>> I got the system running, but managed to get system into a
>>> state where both machines (centos and centos2)
>>> are running the ovsdb in a backup mode. The output of "pcs
>>> status" shows an error message, but the message is not
>>> very helpful. Any suggestion on how to debug this?
>>>
>>> root at centos:/# pcs status
>>> Cluster name: mycluster
>>> Last updated: Mon Nov 7 05:12:06 2016Last change: Mon Nov
>>> 7 05:08:24 2016 by root via cibadmin on centos
>>> Stack: corosync
>>> Current DC: centos2 (version 1.1.13-10.el7_2.4-44eb2dd) -
>>> partition with quorum
>>> 2 nodes and 3 resources configured
>>>
>>> Node centos: standby
>>> Online: [ centos2 ]
>>>
>>> Full list of resources:
>>>
>>> virtip(ocf::heartbeat:IPaddr):Started centos2
>>> Master/Slave Set: ovndb_servers_master [ovndb_servers]
>>> Stopped: [ centos centos2 ]
>>>
>>> Failed Actions:
>>> * ovndb_servers_start_0 on centos2 'unknown error' (1):
>>> call=18, status=Timed Out, exitreason='none',
>>> last-rc-change='Mon Nov 7 02:28:07 2016', queued=0ms,
>>> exec=30002ms
>>>
>>>
>>> PCSD Status:
>>> centos: Online
>>> centos2: Online
>>>
>>> Daemon Status:
>>> corosync: active/enabled
>>> pacemaker: active/enabled
>>> pcsd: active/enabled
>>>
>>> --------------------------------------------
>>> root at centos:/# pcs config
>>> Cluster Name: mycluster
>>> Corosync Nodes:
>>> centos centos2
>>> Pacemaker Nodes:
>>> centos centos2
>>>
>>> Resources:
>>> Resource: virtip (class=ocf provider=heartbeat type=IPaddr)
>>> Attributes: ip=192.168.122.200 cidr_netmask=24
>>> Operations: start interval=0s timeout=20s
>>> (virtip-start-interval-0s)
>>> stop interval=0s timeout=20s
>>> (virtip-stop-interval-0s)
>>> monitor interval=30s (virtip-monitor-interval-30s)
>>> Master: ovndb_servers_master
>>> Meta Attrs: notify=true
>>> Resource: ovndb_servers (class=ocf provider=ovn
>>> type=ovndb-servers)
>>> Attributes: master_ip=192.168.122.200
>>
>> Andy, you don't seem to have defined an attribute for
>> ovn_ctl. It means, the ovn-ctl script will be assumed to be
>> present in /usr/share/openvswitch/scripts/ovn-ctl. Can you
>> check if you have ovn-ctl at the correct location?
>>
>> Yes, The script was installed there.--db-nb-sync-from-addr=
>>
>> If not, please define an attribute similar to master_ip and
>> name it ovn_ctl and point that to the correct location of
>> ovn-ctl ?
>>
>> The document says "ovn-ctl" is optional. I now changed to have it
>> fully specified, but makes no difference. There are some log
>> information towards the end of email if they help. Overall, it
>> could just be something weird about my system, I am not sure it will
>> be worth while to track it down. On the other hand, I will be
>> happy to provide more information about the my setup in case they
>> are useful.
>
> Andy, can you please try to manually start the db servers using
> ovn-ctl and see if they are really started. I would use the
> following commands to learn that.
>
> /usr/share/openvswitch/scripts/ovn-ctl
> --db-nb-sync-from-addr=192.0.2.254
> --db-nb-sync-from-addr=192.0.2.254 start_ovsdb
>
> /usr/share/openvswitch/scripts/ovn-ctl status_ovnsb
> /usr/share/openvswitch/scripts/ovn-ctl status_ovnnb
>
>
> The last two commands should print 'running/backup'.
>
> Typed those commands as is produced expected results. But I am not
> sure if you meant to actually specify "--db-nb-sync-from-addr" twice.
> I also tried to replace the 2nd argument with "--db-sb-sync-from-addr"
> and got the same result.
It was indeed --db-sb-sync-from-addr.
>
> There are some log messages added in the ocf script in case the
> start funcitonality fails by timing out. You should be able to see
> some print starting with "ovndb_servers: After starting ovsdb,
> status is". These log messages should be present in the pacemaker
> logs.
>
>
> Thanks for the hint. I got more useful logs by "journalctl -u
> pacemaker -b". After some debugging, I found out what actually
> happened -- both 'nb' and 'sb' are already running in the system. It
> works after I killed both servers and restart the cluster.
That's great!
>
>
>>> Is the user expected to populate those files by hand? If
>>> yes, what IP address should be used? The floating IP?
>>
>> This file will have to be populated by the user, only when
>> the user wants ovn-northd to connect to a different set of DB
>> urls, other than unix sockets in that same machine.
>> The IP address depends on the setup. The pacemaker script
>> uses the master-ip address that you supply to the OCF
>> resource as an attribute.
>>
>> Thanks. Should this be added to IntegrationGuide.rst?
>>
>
> I have added the need of IPAddr2 resource in IntegrationGuide.rst.
> The new options are in ovn-ctl man page. Is there something
> specific, that you feel missed out in the documentation?
>
>
> I think the fact we are using the presence of the file to control how
> "systemctl start ovn-northd" is subtle and should be mentioned some
> where.
>
> It is not very clear what "master_ip" should be from the context. Did
> miss something?
I agree, Andy. I will correct this in v6. Thanks so much.
>
>
>
>> More logs..
>>
>> root at centos:~# ls -l /usr/share/openvswitch/scripts/ovn-ctl
>> -rwxr-xr-x. 1 root root 15539 Nov 7 02:12
>> /usr/share/openvswitch/scripts/ovn-ctl
>>
>> Resources:
>> Resource: virtip (class=ocf provider=heartbeat type=IPaddr)
>> Attributes: ip=192.168.122.200 cidr_netmask=24
>> Operations: start interval=0s timeout=20s
>> (virtip-start-interval-0s)
>> stop interval=0s timeout=20s (virtip-stop-interval-0s)
>> monitor interval=30s (virtip-monitor-interval-30s)
>> Master: ovndb_servers_master
>> Meta Attrs: notify=true
>> Resource: ovndb_servers (class=ocf provider=ovn type=ovndb-servers)
>> Attributes: master_ip=192.168.122.200
>> ovn_ctl=/usr/share/openvswitch/scripts/ovn-ctl
>> Operations: start interval=0s timeout=30s
>> (ovndb_servers-start-interval-0s)
>> stop interval=0s timeout=20s
>> (ovndb_servers-stop-interval-0s)
>> promote interval=0s timeout=50s
>> (ovndb_servers-promote-interval-0s)
>> demote interval=0s timeout=50s
>> (ovndb_servers-demote-interval-0s)
>> monitor interval=10s
>> (ovndb_servers-monitor-interval-10s)
>>
>
> Resource configuration looks fine. I think the above experiment
> would help us catch the problem.
>
>>
>> pcs status still shows ovsdb are offline on both hosts:
>> ==========================================
>> Cluster name: mycluster
>> Last updated: Fri Nov 11 00:33:10 2016 Last change: Fri
>> Nov 11 00:09:13 2016 by root via crm_attribute on centos2
>> Stack: corosync
>> Current DC: centos (version 1.1.13-10.el7_2.4-44eb2dd) -
>> partition with quorum
>> 2 nodes and 3 resources configured
>>
>> Online: [ centos centos2 ]
>>
>> Full list of resources:
>>
>> virtip (ocf::heartbeat:IPaddr): Started centos
>> Master/Slave Set: ovndb_servers_master [ovndb_servers]
>> Stopped: [ centos centos2 ]
>>
>> Failed Actions:
>> * ovndb_servers_start_0 on centos 'unknown error' (1): call=18,
>> status=Timed Out, exitreason='none',
>> last-rc-change='Fri Nov 11 00:09:13 2016', queued=0ms,
>> exec=30280ms
>> * ovndb_servers_start_0 on centos2 'unknown error' (1): call=13,
>> status=Timed Out, exitreason='none',
>> last-rc-change='Fri Nov 11 00:07:42 2016', queued=0ms,
>> exec=30234ms
>>
>>
>> PCSD Status:
>> centos: Online
>> centos2: Online
>>
>>
>>
>
>
More information about the dev
mailing list