[ovs-discuss] ovndb_servers can't be promoted

Mon Dec 4 05:42:09 UTC 2017

On Fri, Dec 1, 2017 at 4:32 AM, Hui Xiang <xianghuir at gmail.com> wrote:

> Thanks Numan, in my environment, it's worse, it's even not getting started
> and the monitor is only called once other than repeatedly for both
> master/slave or none, do you know if any problem could cause pacemaker have
> this decision? other resource are good.
>

Hi Hui,

Can you share me the output of the commands - pcs resource show
<OVN_DB_RES_NAME> and the commands you have used to create pacemaker ovn
resources.

In your previous output of pcs resource show, the meta attribute notify was
not set properly.

Thanks
Numan

>
> On Fri, Dec 1, 2017 at 2:08 AM, Numan Siddique <nusiddiq at redhat.com>
> wrote:
>
>> Hi HuiXiang,
>> Even I am seeing the issue where no node is promoted as master. I will
>> test more, fix and and submit patch set v3.
>>
>> Thanks
>> Numan
>>
>>
>> On Thu, Nov 30, 2017 at 4:10 PM, Numan Siddique <nusiddiq at redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Thu, Nov 30, 2017 at 1:15 PM, Hui Xiang <xianghuir at gmail.com> wrote:
>>>
>>>> Hi Numan,
>>>>
>>>> Thanks for helping, I am following your pcs example, but still with no
>>>> lucky,
>>>>
>>>> 1. Before running any configuration, I stopped all of the ovsdb-server
>>>> for OVN, and ovn-northd. Deleted ovnnb_active.conf/ovnsb_active.conf.
>>>>
>>>> 2. Since I have already had an vip in the cluster, so I chose to use
>>>> it, it's status is OK.
>>>> [root at node-1 ~]# pcs resource show
>>>>  vip__management_old (ocf::es:ns_IPaddr2): Started node-1.domain.tld
>>>>
>>>> 3. Use pcs to create ovndb-servers and constraint
>>>> [root at node-1 ~]# pcs resource create tst-ovndb ocf:ovn:ovndb-servers
>>>> manage_northd=yes master_ip=192.168.0.2 nb_master_port=6641
>>>> sb_master_port=6642 master
>>>>      ([root at node-1 ~]# pcs resource meta tst-ovndb-master notify=true
>>>>       Error: unable to find a resource/clone/master/group:
>>>> tst-ovndb-master) ## returned error, so I changed into below command.
>>>>
>>>
>>> Hi HuiXiang,
>>> This command is very important. Without which, pacemaker do not notify
>>> the status change and ovsdb-servers would not be promoted or demoted.
>>> Hence  you don't see the notify action getting called in ovn ocf script.
>>>
>>> Can you try with the other command which I shared in my previous email.
>>> These commands work fine for me.
>>>
>>> Let me  know how it goes.
>>>
>>> Thanks
>>> Numan
>>>
>>>
>>> [root at node-1 ~]# pcs resource master tst-ovndb-master tst-ovndb
>>>> notify=true
>>>> [root at node-1 ~]# pcs constraint colocation add master tst-ovndb-master
>>>> with vip__management_old
>>>>
>>>> 4. pcs status
>>>> [root at node-1 ~]# pcs status
>>>>  vip__management_old (ocf::es:ns_IPaddr2): Started node-1.domain.tld
>>>>  Master/Slave Set: tst-ovndb-master [tst-ovndb]
>>>>      Stopped: [ node-1.domain.tld node-2.domain.tld node-3.domain.tld ]
>>>>
>>>> 5. pcs resource show XXX
>>>> [root at node-1 ~]# pcs resource show  vip__management_old
>>>>  Resource: vip__management_old (class=ocf provider=es type=ns_IPaddr2)
>>>>   Attributes: nic=br-mgmt base_veth=br-mgmt-hapr ns_veth=hapr-m
>>>> ip=192.168.0.2 iflabel=ka cidr_netmask=24 ns=haproxy gateway=none
>>>> gateway_metric=0 iptables_start_rules=false iptables_stop_rules=false
>>>> iptables_comment=default-comment
>>>>   Meta Attrs: migration-threshold=3 failure-timeout=60
>>>> resource-stickiness=1
>>>>   Operations: monitor interval=3 timeout=30
>>>> (vip__management_old-monitor-3)
>>>>               start interval=0 timeout=30 (vip__management_old-start-0)
>>>>               stop interval=0 timeout=30 (vip__management_old-stop-0)
>>>> [root at node-1 ~]# pcs resource show tst-ovndb-master
>>>>  Master: tst-ovndb-master
>>>>   Meta Attrs: notify=true
>>>>   Resource: tst-ovndb (class=ocf provider=ovn type=ovndb-servers)
>>>>    Attributes: manage_northd=yes master_ip=192.168.0.2
>>>> nb_master_port=6641 sb_master_port=6642
>>>>    Operations: start interval=0s timeout=30s
>>>> (tst-ovndb-start-timeout-30s)
>>>>                stop interval=0s timeout=20s (tst-ovndb-stop-timeout-20s)
>>>>                promote interval=0s timeout=50s
>>>> (tst-ovndb-promote-timeout-50s)
>>>>                demote interval=0s timeout=50s
>>>> (tst-ovndb-demote-timeout-50s)
>>>>                monitor interval=30s timeout=20s
>>>> (tst-ovndb-monitor-interval-30s)
>>>>                monitor interval=10s role=Master timeout=20s
>>>> (tst-ovndb-monitor-interval-10s-role-Master)
>>>>                monitor interval=30s role=Slave timeout=20s
>>>> (tst-ovndb-monitor-interval-30s-role-Slave)
>>>>
>>>>
>>>> 6. I have put log in every ovndb-servers op, seems only the monitor op
>>>> is being called, no promoted by the pacemaker DC:
>>>> <30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO:
>>>> ovsdb_server_monitor
>>>> <30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO:
>>>> ovsdb_server_check_status
>>>> <30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO:
>>>> return OCFOCF_NOT_RUNNINGG
>>>> <30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO:
>>>> ovsdb_server_master_update: 7}
>>>> <30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO:
>>>> ovsdb_server_master_update end}
>>>> <30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO:
>>>> monitor is going to return 7
>>>> <30>Nov 30 15:22:20 node-1 ovndb-servers(undef)[2980970]: INFO:
>>>> metadata exit OCF_SUCCESS}
>>>>
>>>>
>>>> Please take a look,  thank you very much.
>>>> Hui.
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Nov 29, 2017 at 11:03 PM, Numan Siddique <nusiddiq at redhat.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, Nov 29, 2017 at 4:16 PM, Hui Xiang <xianghuir at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> FYI, If I have configured a good ovndb-server cluster with one active
>>>>>> two slaves, then start pacemaker ovn-servers resource agents, they are all
>>>>>> becoming slaves...
>>>>>>
>>>>>
>>>>> You don't need to start ovndb-servers. When you create pacemaker
>>>>> resources it would automatically start them and promote on of them.
>>>>>
>>>>> One thing which is very important is to create an IPaddr2 resource
>>>>> before and add a colocation constraint so that pacemaker would promote the
>>>>> ovsdb-server in the node
>>>>> where IPaddr2 resource is running. This IPaddr2 resource ip should be
>>>>> your master ip.
>>>>>
>>>>> Can you please do "pcs resource show <name_of_the_resource>" and share
>>>>> the output ?
>>>>>
>>>>> Below is how I normally use for my testing.
>>>>>
>>>>> ############
>>>>> pcs cluster cib tmp-cib.xml
>>>>> cp tmp-cib.xml tmp-cib.xml.deltasrc
>>>>>
>>>>> pcs -f tmp-cib.xml resource create tst-ovndb ocf:ovn:ovndb-servers
>>>>>  manage_northd=yes master_ip=192.168.24.10 nb_master_port=6641
>>>>> sb_master_port=6642 master
>>>>> pcs -f tmp-cib.xml resource meta tst-ovndb-master notify=true
>>>>> pcs -f tmp-cib.xml constraint colocation add master tst-ovndb-master
>>>>> with ip-192.168.24.10
>>>>>
>>>>> pcs cluster cib-push tmp-cib.xml diff-against=tmp-cib.xml.deltasrc
>>>>> pcs status
>>>>> ##############
>>>>>
>>>>> In the above example, "ip-192.168.24.10" is the IPaddr2 resource.
>>>>>
>>>>> Thanks
>>>>> Numan
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> On Tue, Nov 28, 2017 at 10:48 PM, Numan Siddique <nusiddiq at redhat.com
>>>>>> > wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Nov 28, 2017 at 2:29 PM, Hui Xiang <xianghuir at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Numan,
>>>>>>>>
>>>>>>>>
>>>>>>>> Finally figure it out what's wrong when running ovndb-servers ocf
>>>>>>>> in my environment.
>>>>>>>>
>>>>>>>> 1. There is no default ovnnb and ovnsb running in my environment, I
>>>>>>>> thought it should be started by pacemaker as the usual way other typical
>>>>>>>> resource agent do it.
>>>>>>>> when I create the ovndb_servers resource, nothing happened, no
>>>>>>>> operation is executed except monitor, which is really hard to debug for a
>>>>>>>> while.
>>>>>>>> In the ovsdb_server_monitor() function, first it will check the
>>>>>>>> status, here, it will be return NOT_RUNNING, then in
>>>>>>>> the ovsdb_server_master_update() function, "CRM_MASTER -D" is
>>>>>>>> being executed, which appears stopped every following action, I am not very
>>>>>>>> clear what work it did.
>>>>>>>>
>>>>>>>> So, do the ovn_nb and ovn_sb needs to be running previouly before
>>>>>>>> pacemaker ovndb_servers resource create? Is there any such documentation
>>>>>>>> referred?
>>>>>>>>
>>>>>>> No they don't need to be.
>>>>>
>>>>>
>>>>>>
>>>>>>>> 2. Without your patch every nodes executing ovsdb_server_monitor
>>>>>>>> and return OCF_SUCCESS
>>>>>>>> However, the first node of the three nodes cluster is executed
>>>>>>>> ovsdb_server_stop action, the reason showed below:
>>>>>>>> <27>Nov 28 15:35:11 node-1 pengine[1897010]:    error: clone_color:
>>>>>>>> ovndb_servers:0 is running on node-1.domain.tld which isn't allowed
>>>>>>>> Did I miss anything? I don't understand why it isn't allowed.
>>>>>>>>
>>>>>>>> 3. Regard your patch[1]
>>>>>>>> It first reports "/usr/lib/ocf/resource.d/ovn/ovndb-servers: line
>>>>>>>> 26: ocf_attribute_target: command not found ]" in my environment(pacemaker
>>>>>>>> 1.1.12)
>>>>>>>>
>>>>>>>
>>>>>>> Thanks. I will come back to you on your other points. The function
>>>>>>> "ocf_attribute_target" action must be added in 1.1.16-12.
>>>>>>>
>>>>>>> I think it makes sense to either remove "ocf_attribute_target" or
>>>>>>> find a way so that even older versions work.
>>>>>>>
>>>>>>> I will spin a v2.
>>>>>>> Thanks
>>>>>>> Numan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The log showed same as item2, but I have seen very shortly different
>>>>>>>> state from "pcs status" as below shown:
>>>>>>>>  Master/Slave Set: ovndb_servers-master [ovndb_servers]
>>>>>>>>      Slaves: [ node-1.domain.tld node-2.domain.tld
>>>>>>>> node-3.domain.tld ]
>>>>>>>> There is no promote action being executed.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks for looking and help.
>>>>>>>>
>>>>>>>> [1] - https://patchwork.ozlabs.org/patch/839022/
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Nov 24, 2017 at 10:54 PM, Numan Siddique <
>>>>>>>> nusiddiq at redhat.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Hui Xiang,
>>>>>>>>>
>>>>>>>>> Can you please try with this patch [1]  and see if it works for
>>>>>>>>> you ? Please let me know how it goes. But I am not sure, if the patch would
>>>>>>>>> fix the issue.
>>>>>>>>>
>>>>>>>>> To brief, the OVN OCF script doesn't add monitor action for
>>>>>>>>> "Master" role. So pacemaker Resource agent would not check for the status
>>>>>>>>> of ovn db servers periodically. In case ovn db servers are killed,
>>>>>>>>> pacemaker wont know about it.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> You can also take a look at this [1] to know how it is used in
>>>>>>>>> openstack with tripleo installation.
>>>>>>>>>
>>>>>>>>> [1] - https://patchwork.ozlabs.org/patch/839022/
>>>>>>>>> [2] - https://github.com/openstack/puppet-tripleo/blob/master/ma
>>>>>>>>> nifests/profile/pacemaker/ovn_northd.pp
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Numan
>>>>>>>>>
>>>>>>>>> On Fri, Nov 24, 2017 at 3:00 PM, Hui Xiang <xianghuir at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi folks,
>>>>>>>>>>
>>>>>>>>>>   I am following what suggested on doc[1] to configure the
>>>>>>>>>> ovndb_servers HA, however, it's so unluck with upgrading pacemaker packages
>>>>>>>>>> from 1.12 to 1.16, do almost every kind of changes, there still not a
>>>>>>>>>> ovndb_servers master promoted, is there any special recipe for it to run?
>>>>>>>>>> so frustrated on it, sigh.
>>>>>>>>>>
>>>>>>>>>> It always showed:
>>>>>>>>>>  Master/Slave Set: ovndb_servers-master [ovndb_servers]
>>>>>>>>>>      Stopped: [ node-1.domain.tld node-2.domain.tld
>>>>>>>>>> node-3.domain.tld ]
>>>>>>>>>>
>>>>>>>>>> Even if I tried below steps:
>>>>>>>>>> 1. pcs resource debug-stop ovndb_server on every nodes.
>>>>>>>>>> ovn-ctl status_ovnxb: running/backup
>>>>>>>>>> 2. pcs resource debug-start ovndb_server on every nodes.
>>>>>>>>>> ovn-ctl status_ovnxb: running/backup
>>>>>>>>>> 3. pcs resource debug-promote ovndb_server on one nodes.
>>>>>>>>>>  ovn-ctl status_ovnxb: running/active
>>>>>>>>>>
>>>>>>>>>> With above status, the pcs status still showed as:
>>>>>>>>>>  Master/Slave Set: ovndb_servers-master [ovndb_servers]
>>>>>>>>>>      Stopped: [ node-1.domain.tld node-2.domain.tld
>>>>>>>>>> node-3.domain.tld ]
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [1]. https://github.com/openvswitch/ovs/blob/master/Document
>>>>>>>>>> ation/topics/integration.rst
>>>>>>>>>>
>>>>>>>>>> Appreciated any hint.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> discuss mailing list
>>>>>>>>>> discuss at openvswitch.org
>>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20171204/284a5aa8/attachment-0001.html>