[ovs-discuss] raft ovsdb clustering

aginwala aginwala at asu.edu
Tue Mar 27 19:28:00 UTC 2018


Sure:


#Node1

/usr/share/openvswitch/scripts/ovn-ctl  --db-nb-addr=192.168.220.101
--db-nb-port=6641 --db-nb-cluster-local-addr=tcp:192.168.220.101:6645
--db-nb-create-insecure-remote=yes start_nb_ovsdb

/usr/share/openvswitch/scripts/ovn-ctl  --db-sb-addr=192.168.220.101
--db-sb-port=6642 --db-sb-create-insecure-remote=yes
--db-sb-cluster-local-addr=tcp:192.168.10.220:6644 start_sb_ovsdb

ovn-northd -vconsole:emer -vsyslog:err -vfile:info --ovnnb-db="tcp:192.168.
220.101:6641,tcp:192.168.220.102:6641,tcp:192.168.220.103:6641" --ovnsb-db="
tcp:192.168.220.101:6642,tcp:192.168.220.102:6642,tcp:192.168.220.103:6642"
--no-chdir --log-file=/var/log/openvswitch/ovn-northd.log
--pidfile=/var/run/openvswitch/ovn-northd.pid --detach --monitor

#Node2

/usr/share/openvswitch/scripts/ovn-ctl  --db-nb-addr=192.168.220.102
--db-nb-port=6641 --db-nb-cluster-local-addr=tcp:192.168.220.102:6645
--db-nb-cluster-remote-addr="tcp:192.168.220.101:6645"
--db-nb-create-insecure-remote=yes start_nb_ovsdb

/usr/share/openvswitch/scripts/ovn-ctl  --db-sb-addr=192.168.220.102
--db-sb-port=6642 --db-sb-create-insecure-remote=yes
--db-sb-cluster-local-addr="tcp:192.168.220.102:6644"
--db-sb-cluster-remote-addr="tcp:192.168.220.101:6644"  start_sb_ovsdb

ovn-northd -vconsole:emer -vsyslog:err -vfile:info --ovnnb-db="tcp:192.168.
220.101:6641,tcp:192.168.220.102:6641,tcp:192.168.220.103:6641" --ovnsb-db="
tcp:192.168.220.101:6642,tcp:192.168.220.102:6642,tcp:192.168.220.103:6642"
--no-chdir --log-file=/var/log/openvswitch/ovn-northd.log
--pidfile=/var/run/openvswitch/ovn-northd.pid --detach --monitor


#Node3

/usr/share/openvswitch/scripts/ovn-ctl  --db-nb-addr=192.168.220.103
--db-nb-port=6641 --db-nb-cluster-local-addr=tcp:192.168.220.103:6645
--db-nb-cluster-remote-addr="tcp:192.168.220.101:6645"
--db-nb-create-insecure-remote=yes start_nb_ovsdb

/usr/share/openvswitch/scripts/ovn-ctl  --db-sb-addr=192.168.220.103
--db-sb-port=6642 --db-sb-create-insecure-remote=yes
--db-sb-cluster-local-addr="tcp:192.168.220.103:6644"
--db-sb-cluster-remote-addr="tcp:192.168.220.101:6644"  start_sb_ovsdb

ovn-northd -vconsole:emer -vsyslog:err -vfile:info --ovnnb-db="tcp:
192.168.220.101:6641,tcp:192.168.220.102:6641,tcp:192.168.220.103:6641"
--ovnsb-db="tcp:192.168.220.101:6642,tcp:192.168.220.102:6642,tcp:
192.168.220.103:6642" --no-chdir --log-file=/var/log/openvswitch/ovn-northd.log
--pidfile=/var/run/openvswitch/ovn-northd.pid --detach --monitor

#.export remote="tcp:192.168.220.103:6641,tcp:192.168.220.102:6641,tcp:
192.168.220.101:6641"

#. ovn-nbctl show can be done using command below

ovn-nbctl --db=$remote show

#.ovn-sbctl commands can be run as below:

ovn-sbctl --db=$remote show


Regards,



On Tue, Mar 27, 2018 at 12:08 PM, Numan Siddique <nusiddiq at redhat.com>
wrote:

> Thanks Aliasgar,
>
> I am still facing the same issue.
>
> Can you also share the (ovn-ctl) commands you used to start/join the
> ovsdb-server clusters in your nodes ?
>
> Thanks
> Numan
>
>
> On Tue, Mar 27, 2018 at 11:04 PM, aginwala <aginwala at asu.edu> wrote:
>
>> Hu Numan:
>>
>> You need to use --db as you are now running db in cluster, you can access
>> data from any of the three dbs.
>>
>> So if the leader crashes, it re-elects from the other two. Below is the
>> e.g. command:
>>
>> # export remote="tcp:192.168.220.103:6641,tcp:192.168.220.102:6641,tcp:
>> 192.168.220.101:6641"
>> # kill -9 3985
>> # ovn-nbctl --db=$remote show
>> switch 1d86ab4e-c8bf-4747-a716-8832a285d58c (ls1)
>> # ovn-nbctl --db=$remote ls-del ls1
>>
>>
>>
>>
>>
>>
>>
>> Hope it helps!
>>
>> Regards,
>>
>>
>> On Tue, Mar 27, 2018 at 10:01 AM, Numan Siddique <nusiddiq at redhat.com>
>> wrote:
>>
>>> Hi Aliasgar,
>>>
>>> In your setup, if you kill the leader what is the behaviour ?  Are you
>>> still able to create or delete any resources ? Is a new leader elected ?
>>>
>>> In my setup, the command "ovn-nbctl ls-add" for example blocks until I
>>> restart the ovsdb-server in node 1. And I don't see any other ovsdb-server
>>> becoming leader. May be I have configured wrongly.
>>> Could you please test this scenario if not yet please and let me know
>>> your observations if possible.
>>>
>>> Thanks
>>> Numan
>>>
>>>
>>> On Thu, Mar 22, 2018 at 12:28 PM, Han Zhou <zhouhan at gmail.com> wrote:
>>>
>>>> Sounds good.
>>>>
>>>> Just checked the patch, by default the C IDL has "leader_only" as true,
>>>> which ensures that connection is to leader only. This is the case for
>>>> northd. So the lock works for northd hot active-standby purpose if all the
>>>> ovsdb endpoints of a cluster are specified to northd, since all northds are
>>>> connecting to the same DB, the leader.
>>>>
>>>> For neutron networking-ovn, this may not work yet, since I didn't see
>>>> such logic in the python IDL in current patch series. It would be good if
>>>> we add similar logic for python IDL. (@ben/numan, correct me if I am wrong)
>>>>
>>>>
>>>> On Wed, Mar 21, 2018 at 6:49 PM, aginwala <aginwala at asu.edu> wrote:
>>>>
>>>>> Hi :
>>>>>
>>>>> Just sorted out the correct settings and northd also works in ha in
>>>>> raft.
>>>>>
>>>>> There were 2 issues in the setup:
>>>>> 1. I had started nb db without --db-nb-create-insecure-remote
>>>>> 2. I also started northd locally on all 3 without remote which is like
>>>>> all three northd trying to lock the ovsdb locally.
>>>>>
>>>>> Hence, the duplicate logs were populated in the southbound datapath
>>>>> due to multiple northd trying to write the local copy.
>>>>>
>>>>> So, I now start nb db with --db-nb-create-insecure-remote and northd
>>>>> on all 3 nodes using below command:
>>>>>
>>>>> ovn-northd -vconsole:emer -vsyslog:err -vfile:info --ovnnb-db="tcp:
>>>>> 10.169.125.152:6641,tcp:10.169.125.131:6641,tcp:10.148.181.162:6641"
>>>>> --ovnsb-db="tcp:10.169.125.152:6642,tcp:10.169.125.131:6642,tcp:
>>>>> 10.148.181.162:6642" --no-chdir --log-file=/var/log/openvswitch/ovn-northd.log
>>>>> --pidfile=/var/run/openvswitch/ovn-northd.pid --detach --monitor
>>>>>
>>>>>
>>>>> #At start, northd went active on the leader node and standby on other
>>>>> two nodes.
>>>>>
>>>>> #After old leader crashed and new leader got elected, northd goes
>>>>> active on any of the remaining 2 nodes as per sample logs below from
>>>>> non-leader node:
>>>>> 2018-03-22T00:20:30.732Z|00023|ovn_northd|INFO|ovn-northd lock lost.
>>>>> This ovn-northd instance is now on standby.
>>>>> 2018-03-22T00:20:30.743Z|00024|ovn_northd|INFO|ovn-northd lock
>>>>> acquired. This ovn-northd instance is now active.
>>>>>
>>>>> # Also ovn-controller works similar way if leader goes down and
>>>>> connects to any of the remaining 2 nodes:
>>>>> 2018-03-22T01:21:56.250Z|00029|ovsdb_idl|INFO|tcp:10.148.181.162:6642:
>>>>> clustered database server is disconnected from cluster; trying another
>>>>> server
>>>>> 2018-03-22T01:21:56.250Z|00030|reconnect|INFO|tcp:10.148.181.162:6642:
>>>>> connection attempt timed out
>>>>> 2018-03-22T01:21:56.250Z|00031|reconnect|INFO|tcp:10.148.181.162:6642:
>>>>> waiting 4 seconds before reconnect
>>>>> 2018-03-22T01:23:52.417Z|00043|reconnect|INFO|tcp:10.148.181.162:6642:
>>>>> connected
>>>>>
>>>>>
>>>>>
>>>>> Above settings will also work if we put all the nodes behind the vip
>>>>> and updates the ovn configs to use vips. So we don't need pacemaker
>>>>> explicitly for northd HA :).
>>>>>
>>>>> Since the setup is complete now, I will populate the same in scale
>>>>> test env and see how it behaves.
>>>>>
>>>>> @Numan: We can try the same with networking-ovn integration and see if
>>>>> we find anything weird there too. Not sure if you have any exclusive
>>>>> findings for this case.
>>>>>
>>>>> Let me know if something else is missed here.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> On Wed, Mar 21, 2018 at 2:50 PM, Han Zhou <zhouhan at gmail.com> wrote:
>>>>>
>>>>>> Ali, sorry if I misunderstand what you are saying, but pacemaker here
>>>>>> is for northd HA. pacemaker itself won't point to any ovsdb cluster node.
>>>>>> All northds can point to a LB VIP for the ovsdb cluster, so if a member of
>>>>>> ovsdb cluster is down it won't have impact to northd.
>>>>>>
>>>>>> Without clustering support of the ovsdb lock, I think this is what we
>>>>>> have now for northd HA. Please suggest if anyone has any other idea. Thanks
>>>>>> :)
>>>>>>
>>>>>> On Wed, Mar 21, 2018 at 1:12 PM, aginwala <aginwala at asu.edu> wrote:
>>>>>>
>>>>>>> :) The only thing is while using pacemaker, if the node that
>>>>>>> pacemaker if pointing to is down, all the active/standby northd nodes have
>>>>>>> to be updated to new node from the cluster. But will dig in more to see
>>>>>>> what else I can find.
>>>>>>>
>>>>>>> @Ben: Any suggestions further?
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> On Wed, Mar 21, 2018 at 10:22 AM, Han Zhou <zhouhan at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 21, 2018 at 9:49 AM, aginwala <aginwala at asu.edu> wrote:
>>>>>>>>
>>>>>>>>> Thanks Numan:
>>>>>>>>>
>>>>>>>>> Yup agree with the locking part. For now; yes I am running northd
>>>>>>>>> on one node. I might right a script to monitor northd  in cluster so that
>>>>>>>>> if the node where it's running goes down, script can spin up northd on one
>>>>>>>>> other active nodes as a dirty hack.
>>>>>>>>>
>>>>>>>>> The "dirty hack" is pacemaker :)
>>>>>>>>
>>>>>>>>
>>>>>>>>> Sure, will await for the inputs from Ben too on this and see how
>>>>>>>>> complex would it be to roll out this feature.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Mar 21, 2018 at 5:43 AM, Numan Siddique <
>>>>>>>>> nusiddiq at redhat.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Aliasgar,
>>>>>>>>>>
>>>>>>>>>> ovsdb-server maintains locks per each connection and not across
>>>>>>>>>> the db. A workaround for you now would be to configure all the ovn-northd
>>>>>>>>>> instances to connect to one ovsdb-server if you want to have active/standy.
>>>>>>>>>>
>>>>>>>>>> Probably Ben can answer if there is a plan to support ovsdb locks
>>>>>>>>>> across the db. We also need this support in networking-ovn as it also uses
>>>>>>>>>> ovsdb locks.
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Numan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Mar 21, 2018 at 1:40 PM, aginwala <aginwala at asu.edu>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Numan:
>>>>>>>>>>>
>>>>>>>>>>> Just figured out that ovn-northd is running as active on all 3
>>>>>>>>>>> nodes instead of one active instance as I continued to test further which
>>>>>>>>>>> results in db errors as per logs.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> # on node 3, I run ovn-nbctl ls-add ls2 ; it populates below
>>>>>>>>>>> logs in  ovn-north
>>>>>>>>>>> 2018-03-21T06:01:59.442Z|00007|ovsdb_idl|WARN|transaction
>>>>>>>>>>> error: {"details":"Transaction causes multiple rows in \"Datapath_Binding\"
>>>>>>>>>>> table to have identical values (1) for index on column \"tunnel_key\".
>>>>>>>>>>> First row, with UUID 8c5d9342-2b90-4229-8ea1-001a733a915c, was
>>>>>>>>>>> inserted by this transaction.  Second row, with UUID
>>>>>>>>>>> 8e06f919-4cc7-4ffc-9a79-20ce6663b683, existed in the database
>>>>>>>>>>> before this transaction and was not modified by the
>>>>>>>>>>> transaction.","error":"constraint violation"}
>>>>>>>>>>>
>>>>>>>>>>> In southbound datapath list, 2 duplicate records gets created
>>>>>>>>>>> for same switch.
>>>>>>>>>>>
>>>>>>>>>>> # ovn-sbctl list Datapath
>>>>>>>>>>> _uuid               : b270ae30-3458-445f-95d2-b14e8ebddd01
>>>>>>>>>>> external_ids        : {logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d",
>>>>>>>>>>> name="ls2"}
>>>>>>>>>>> tunnel_key          : 2
>>>>>>>>>>>
>>>>>>>>>>> _uuid               : 8e06f919-4cc7-4ffc-9a79-20ce6663b683
>>>>>>>>>>> external_ids        : {logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d",
>>>>>>>>>>> name="ls2"}
>>>>>>>>>>> tunnel_key          : 1
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> # on nodes 1 and 2 where northd is running, it gives below error:
>>>>>>>>>>> 2018-03-21T06:01:59.437Z|00008|ovsdb_idl|WARN|transaction
>>>>>>>>>>> error: {"details":"cannot delete Datapath_Binding row
>>>>>>>>>>> 8e06f919-4cc7-4ffc-9a79-20ce6663b683 because of 17 remaining
>>>>>>>>>>> reference(s)","error":"referential integrity violation"}
>>>>>>>>>>>
>>>>>>>>>>> As per commit message, for northd I re-tried setting
>>>>>>>>>>> --ovnnb-db="tcp:10.169.125.152:6641,tcp:10.169.125.131:6641,tcp:
>>>>>>>>>>> 10.148.181.162:6641"  and --ovnsb-db="tcp:10.169.125.152:6642
>>>>>>>>>>> ,tcp:10.169.125.131:6642,tcp:10.148.181.162:6642" and it did
>>>>>>>>>>> not help either.
>>>>>>>>>>>
>>>>>>>>>>> There is no issue if I keep running only one instance of northd
>>>>>>>>>>> on any of these 3 nodes. Hence, wanted to know is there
>>>>>>>>>>> something else missing here to make only one northd instance as active and
>>>>>>>>>>> rest as standby?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Mar 15, 2018 at 3:09 AM, Numan Siddique <
>>>>>>>>>>> nusiddiq at redhat.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> That's great
>>>>>>>>>>>>
>>>>>>>>>>>> Numan
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Mar 15, 2018 at 2:57 AM, aginwala <aginwala at asu.edu>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Numan:
>>>>>>>>>>>>>
>>>>>>>>>>>>> I tried on new nodes (kernel : 4.4.0-104-generic , Ubuntu
>>>>>>>>>>>>> 16.04)with fresh installation and it worked super fine for both
>>>>>>>>>>>>> sb and nb dbs. Seems like some kernel issue on the previous
>>>>>>>>>>>>> nodes when I re-installed raft patch as I was running different ovs version
>>>>>>>>>>>>> on those nodes before.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> For 2 HVs, I now set ovn-remote="tcp:10.169.125.152:6642, tcp:
>>>>>>>>>>>>> 10.169.125.131:6642, tcp:10.148.181.162:6642"  and started
>>>>>>>>>>>>> controller and it works super fine.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Did some failover testing by rebooting/killing the leader (
>>>>>>>>>>>>> 10.169.125.152) and bringing it back up and it works as
>>>>>>>>>>>>> expected. Nothing weird noted so far.
>>>>>>>>>>>>>
>>>>>>>>>>>>> # check-cluster gives below data one of the node(
>>>>>>>>>>>>> 10.148.181.162) post leader failure
>>>>>>>>>>>>>
>>>>>>>>>>>>> ovsdb-tool check-cluster /etc/openvswitch/ovnsb_db.db
>>>>>>>>>>>>> ovsdb-tool: leader /etc/openvswitch/ovnsb_db.db for term 2 has
>>>>>>>>>>>>> log entries only up to index 18446744073709551615, but index 9 was
>>>>>>>>>>>>> committed in a previous term (e.g. by /etc/openvswitch/ovnsb_db.db)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> For check-cluster, are we planning to add more output showing
>>>>>>>>>>>>> which node is active(leader), etc in upcoming versions ?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks a ton for helping sort this out.  I think the patch
>>>>>>>>>>>>> looks good to be merged post addressing of the comments by Justin along
>>>>>>>>>>>>> with the man page details for ovsdb-tool.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I will do some more crash testing for the cluster along with
>>>>>>>>>>>>> the scale test and keep you posted if something unexpected is noted.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Mar 13, 2018 at 11:07 PM, Numan Siddique <
>>>>>>>>>>>>> nusiddiq at redhat.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Mar 14, 2018 at 7:51 AM, aginwala <aginwala at asu.edu>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sure.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> To add on , I also ran for nb db too using different port  and
>>>>>>>>>>>>>>> Node2 crashes with same error :
>>>>>>>>>>>>>>> # Node 2
>>>>>>>>>>>>>>> /usr/share/openvswitch/scripts/ovn-ctl
>>>>>>>>>>>>>>> --db-nb-addr=10.99.152.138 --db-nb-port=6641 --db-nb-cluster-remote-addr="t
>>>>>>>>>>>>>>> cp:10.99.152.148:6645" --db-nb-cluster-local-addr="tcp:
>>>>>>>>>>>>>>> 10.99.152.138:6645" start_nb_ovsdb
>>>>>>>>>>>>>>> ovsdb-server: ovsdb error: /etc/openvswitch/ovnnb_db.db:
>>>>>>>>>>>>>>> cannot identify file type
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Aliasgar,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It worked for me. Can you delete the old db files in
>>>>>>>>>>>>>> /etc/openvswitch/ and try running the commands again ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Below are the commands I ran in my setup.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Node 1
>>>>>>>>>>>>>> -------
>>>>>>>>>>>>>> sudo /usr/share/openvswitch/scripts/ovn-ctl
>>>>>>>>>>>>>> --db-sb-addr=192.168.121.91 --db-sb-port=6642 --db-sb-create-insecure-remote=yes
>>>>>>>>>>>>>> --db-sb-cluster-local-addr=tcp:192.168.121.91:6644
>>>>>>>>>>>>>> start_sb_ovsdb
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Node 2
>>>>>>>>>>>>>> ---------
>>>>>>>>>>>>>> sudo /usr/share/openvswitch/scripts/ovn-ctl
>>>>>>>>>>>>>> --db-sb-addr=192.168.121.87 --db-sb-port=6642 --db-sb-create-insecure-remote=yes
>>>>>>>>>>>>>> --db-sb-cluster-local-addr="tcp:192.168.121.87:6644"
>>>>>>>>>>>>>> --db-sb-cluster-remote-addr="tcp:192.168.121.91:6644"
>>>>>>>>>>>>>> start_sb_ovsdb
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Node 3
>>>>>>>>>>>>>> ---------
>>>>>>>>>>>>>> sudo /usr/share/openvswitch/scripts/ovn-ctl
>>>>>>>>>>>>>> --db-sb-addr=192.168.121.78 --db-sb-port=6642 --db-sb-create-insecure-remote=yes
>>>>>>>>>>>>>> --db-sb-cluster-local-addr="tcp:192.168.121.78:6644"
>>>>>>>>>>>>>> --db-sb-cluster-remote-addr="tcp:192.168.121.91:6644"
>>>>>>>>>>>>>> start_sb_ovsdb
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>> Numan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Mar 13, 2018 at 9:40 AM, Numan Siddique <
>>>>>>>>>>>>>>> nusiddiq at redhat.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Mar 13, 2018 at 9:46 PM, aginwala <aginwala at asu.edu
>>>>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks Numan for the response.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> There is no command start_cluster_sb_ovsdb in the source
>>>>>>>>>>>>>>>>> code too. Is that in a separate commit somewhere? Hence, I used start_sb_ovsdb
>>>>>>>>>>>>>>>>> which I think would not be a right choice?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sorry, I meant start_sb_ovsdb. Strange that it didn't work
>>>>>>>>>>>>>>>> for you. Let me try it out again and update this thread.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>> Numan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> # Node1  came up as expected.
>>>>>>>>>>>>>>>>> ovn-ctl --db-sb-addr=10.99.152.148 --db-sb-port=6642
>>>>>>>>>>>>>>>>> --db-sb-create-insecure-remote=yes
>>>>>>>>>>>>>>>>> --db-sb-cluster-local-addr="tcp:10.99.152.148:6644"
>>>>>>>>>>>>>>>>> start_sb_ovsdb.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> # verifying its a clustered db with ovsdb-tool
>>>>>>>>>>>>>>>>> db-local-address /etc/openvswitch/ovnsb_db.db
>>>>>>>>>>>>>>>>> tcp:10.99.152.148:6644
>>>>>>>>>>>>>>>>> # ovn-sbctl show works fine and chassis are being
>>>>>>>>>>>>>>>>> populated correctly.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> #Node 2 fails with error:
>>>>>>>>>>>>>>>>> /usr/share/openvswitch/scripts/ovn-ctl
>>>>>>>>>>>>>>>>> --db-sb-addr=10.99.152.138 --db-sb-port=6642 --db-sb-create-insecure-remote=yes
>>>>>>>>>>>>>>>>> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644"
>>>>>>>>>>>>>>>>> --db-sb-cluster-local-addr="tcp:10.99.152.138:6644"
>>>>>>>>>>>>>>>>> start_sb_ovsdb
>>>>>>>>>>>>>>>>> ovsdb-server: ovsdb error: /etc/openvswitch/ovnsb_db.db:
>>>>>>>>>>>>>>>>> cannot identify file type
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> # So i did start the sb db the usual way using start_ovsdb
>>>>>>>>>>>>>>>>> to just get the db file created and killed the sb pid and re-ran the
>>>>>>>>>>>>>>>>> command which gave actual error where it complains for join-cluster command
>>>>>>>>>>>>>>>>> that is being called internally
>>>>>>>>>>>>>>>>> /usr/share/openvswitch/scripts/ovn-ctl
>>>>>>>>>>>>>>>>> --db-sb-addr=10.99.152.138 --db-sb-port=6642 --db-sb-create-insecure-remote=yes
>>>>>>>>>>>>>>>>> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644"
>>>>>>>>>>>>>>>>> --db-sb-cluster-local-addr="tcp:10.99.152.138:6644"
>>>>>>>>>>>>>>>>> start_sb_ovsdb
>>>>>>>>>>>>>>>>> ovsdb-tool: /etc/openvswitch/ovnsb_db.db: not a clustered
>>>>>>>>>>>>>>>>> database
>>>>>>>>>>>>>>>>>  * Backing up database to /etc/openvswitch/ovnsb_db.db.b
>>>>>>>>>>>>>>>>> ackup1.15.0-70426956
>>>>>>>>>>>>>>>>> ovsdb-tool: 'join-cluster' command requires at least 4
>>>>>>>>>>>>>>>>> arguments
>>>>>>>>>>>>>>>>>  * Creating cluster database /etc/openvswitch/ovnsb_db.db
>>>>>>>>>>>>>>>>> from existing one
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> # based on above error I killed the sb db pid again and
>>>>>>>>>>>>>>>>> try to create a local cluster on node  then re-ran the join operation as
>>>>>>>>>>>>>>>>> per the source code function.
>>>>>>>>>>>>>>>>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db
>>>>>>>>>>>>>>>>> OVN_Southbound tcp:10.99.152.138:6644 tcp:
>>>>>>>>>>>>>>>>> 10.99.152.148:6644 which still complains
>>>>>>>>>>>>>>>>> ovsdb-tool: I/O error: /etc/openvswitch/ovnsb_db.db:
>>>>>>>>>>>>>>>>> create failed (File exists)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> # Node 3: I did not try as I am assuming the same failure
>>>>>>>>>>>>>>>>> as node 2
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Let me know may know further.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Mar 13, 2018 at 3:08 AM, Numan Siddique <
>>>>>>>>>>>>>>>>> nusiddiq at redhat.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Aliasgar,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Tue, Mar 13, 2018 at 7:11 AM, aginwala <
>>>>>>>>>>>>>>>>>> aginwala at asu.edu> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Ben/Noman:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I am trying to setup 3 node southbound db cluster  using
>>>>>>>>>>>>>>>>>>> raft10 <https://patchwork.ozlabs.org/patch/854298/> in
>>>>>>>>>>>>>>>>>>> review.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> # Node 1 create-cluster
>>>>>>>>>>>>>>>>>>> ovsdb-tool create-cluster /etc/openvswitch/ovnsb_db.db
>>>>>>>>>>>>>>>>>>> /root/ovs-reviews/ovn/ovn-sb.ovsschema tcp:
>>>>>>>>>>>>>>>>>>> 10.99.152.148:6642
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> A different port is used for RAFT. So you have to choose
>>>>>>>>>>>>>>>>>> another port like 6644 for example.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> # Node 2
>>>>>>>>>>>>>>>>>>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db
>>>>>>>>>>>>>>>>>>> OVN_Southbound tcp:10.99.152.138:6642 tcp:
>>>>>>>>>>>>>>>>>>> 10.99.152.148:6642 --cid 5dfcb678-bb1d-4377-b02d-a380ed
>>>>>>>>>>>>>>>>>>> ec2982
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> #Node 3
>>>>>>>>>>>>>>>>>>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db
>>>>>>>>>>>>>>>>>>> OVN_Southbound tcp:10.99.152.101:6642 tcp:
>>>>>>>>>>>>>>>>>>> 10.99.152.138:6642 tcp:10.99.152.148:6642 --cid
>>>>>>>>>>>>>>>>>>> 5dfcb678-bb1d-4377-b02d-a380edec2982
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> # ovn remote is set to all 3 nodes
>>>>>>>>>>>>>>>>>>> external_ids:ovn-remote="tcp:10.99.152.148:6642, tcp:
>>>>>>>>>>>>>>>>>>> 10.99.152.138:6642, tcp:10.99.152.101:6642"
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> # Starting sb db on node 1 using below command on node 1:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> ovsdb-server --detach --monitor -vconsole:off -vraft
>>>>>>>>>>>>>>>>>>> -vjsonrpc --log-file=/var/log/openvswitch/ovsdb-server-sb.log
>>>>>>>>>>>>>>>>>>> --pidfile=/var/run/openvswitch/ovnsb_db.pid
>>>>>>>>>>>>>>>>>>> --remote=db:OVN_Southbound,SB_Global,connections
>>>>>>>>>>>>>>>>>>> --unixctl=ovnsb_db.ctl --private-key=db:OVN_Southbound,SSL,private_key
>>>>>>>>>>>>>>>>>>> --certificate=db:OVN_Southbound,SSL,certificate
>>>>>>>>>>>>>>>>>>> --ca-cert=db:OVN_Southbound,SSL,ca_cert
>>>>>>>>>>>>>>>>>>> --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols
>>>>>>>>>>>>>>>>>>> --ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers
>>>>>>>>>>>>>>>>>>> --remote=punix:/var/run/openvswitch/ovnsb_db.sock
>>>>>>>>>>>>>>>>>>> /etc/openvswitch/ovnsb_db.db
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> # check-cluster is returning nothing
>>>>>>>>>>>>>>>>>>> ovsdb-tool check-cluster /etc/openvswitch/ovnsb_db.db
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> # ovsdb-server-sb.log below shows the leader is elected
>>>>>>>>>>>>>>>>>>> with only one server and there are rbac related debug logs with rpc replies
>>>>>>>>>>>>>>>>>>> and empty params with no errors
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2018-03-13T01:12:02Z|00002|raft|DBG|server 63d1 added
>>>>>>>>>>>>>>>>>>> to configuration
>>>>>>>>>>>>>>>>>>> 2018-03-13T01:12:02Z|00003|raft|INFO|term 6: starting
>>>>>>>>>>>>>>>>>>> election
>>>>>>>>>>>>>>>>>>> 2018-03-13T01:12:02Z|00004|raft|INFO|term 6: elected
>>>>>>>>>>>>>>>>>>> leader by 1+ of 1 servers
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Now Starting the ovsdb-server on the other clusters
>>>>>>>>>>>>>>>>>>> fails saying
>>>>>>>>>>>>>>>>>>> ovsdb-server: ovsdb error: /etc/openvswitch/ovnsb_db.db:
>>>>>>>>>>>>>>>>>>> cannot identify file type
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Also noticed that man ovsdb-tool is missing cluster
>>>>>>>>>>>>>>>>>>> details. Might want to address it in the same patch or different.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Please advise to what is missing here for running
>>>>>>>>>>>>>>>>>>> ovn-sbctl show as this command hangs.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I think you can use the ovn-ctl command
>>>>>>>>>>>>>>>>>> "start_cluster_sb_ovsdb" for your testing (atleast for now)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> For your setup, I think you can start the cluster as
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> # Node 1
>>>>>>>>>>>>>>>>>> ovn-ctl --db-sb-addr=10.99.152.148 --db-sb-port=6642
>>>>>>>>>>>>>>>>>> --db-sb-create-insecure-remote=yes
>>>>>>>>>>>>>>>>>> --db-sb-cluster-local-addr="tcp:10.99.152.148:6644"
>>>>>>>>>>>>>>>>>> start_cluster_sb_ovsdb
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> # Node 2
>>>>>>>>>>>>>>>>>> ovn-ctl --db-sb-addr=10.99.152.138 --db-sb-port=6642
>>>>>>>>>>>>>>>>>> --db-sb-create-insecure-remote=yes
>>>>>>>>>>>>>>>>>> --db-sb-cluster-local-addr="tcp:10.99.152.138:6644"
>>>>>>>>>>>>>>>>>> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644"
>>>>>>>>>>>>>>>>>> start_cluster_sb_ovsdb
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> # Node 3
>>>>>>>>>>>>>>>>>> ovn-ctl --db-sb-addr=10.99.152.101 --db-sb-port=6642
>>>>>>>>>>>>>>>>>> --db-sb-create-insecure-remote=yes
>>>>>>>>>>>>>>>>>> --db-sb-cluster-local-addr="tcp:10.99.152.101:6644"
>>>>>>>>>>>>>>>>>> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644"
>>>>>>>>>>>>>>>>>> start_cluster_sb_ovsdb
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Let me know how it goes.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>> Numan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>> discuss mailing list
>>>>>>>>>>>>>>>>>>> discuss at openvswitch.org
>>>>>>>>>>>>>>>>>>> https://mail.openvswitch.org/m
>>>>>>>>>>>>>>>>>>> ailman/listinfo/ovs-discuss
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> discuss mailing list
>>>>>>>>> discuss at openvswitch.org
>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20180327/88041e22/attachment-0001.html>


More information about the discuss mailing list