[ovs-discuss] raft ovsdb clustering

Numan Siddique nusiddiq at redhat.com
Tue Mar 27 17:01:43 UTC 2018


Hi Aliasgar,

In your setup, if you kill the leader what is the behaviour ?  Are you
still able to create or delete any resources ? Is a new leader elected ?

In my setup, the command "ovn-nbctl ls-add" for example blocks until I
restart the ovsdb-server in node 1. And I don't see any other ovsdb-server
becoming leader. May be I have configured wrongly.
Could you please test this scenario if not yet please and let me know your
observations if possible.

Thanks
Numan


On Thu, Mar 22, 2018 at 12:28 PM, Han Zhou <zhouhan at gmail.com> wrote:

> Sounds good.
>
> Just checked the patch, by default the C IDL has "leader_only" as true,
> which ensures that connection is to leader only. This is the case for
> northd. So the lock works for northd hot active-standby purpose if all the
> ovsdb endpoints of a cluster are specified to northd, since all northds are
> connecting to the same DB, the leader.
>
> For neutron networking-ovn, this may not work yet, since I didn't see such
> logic in the python IDL in current patch series. It would be good if we add
> similar logic for python IDL. (@ben/numan, correct me if I am wrong)
>
>
> On Wed, Mar 21, 2018 at 6:49 PM, aginwala <aginwala at asu.edu> wrote:
>
>> Hi :
>>
>> Just sorted out the correct settings and northd also works in ha in raft.
>>
>> There were 2 issues in the setup:
>> 1. I had started nb db without --db-nb-create-insecure-remote
>> 2. I also started northd locally on all 3 without remote which is like
>> all three northd trying to lock the ovsdb locally.
>>
>> Hence, the duplicate logs were populated in the southbound datapath due
>> to multiple northd trying to write the local copy.
>>
>> So, I now start nb db with --db-nb-create-insecure-remote and northd on
>> all 3 nodes using below command:
>>
>> ovn-northd -vconsole:emer -vsyslog:err -vfile:info --ovnnb-db="tcp:
>> 10.169.125.152:6641,tcp:10.169.125.131:6641,tcp:10.148.181.162:6641"
>> --ovnsb-db="tcp:10.169.125.152:6642,tcp:10.169.125.131:6642,tcp:
>> 10.148.181.162:6642" --no-chdir --log-file=/var/log/openvswitch/ovn-northd.log
>> --pidfile=/var/run/openvswitch/ovn-northd.pid --detach --monitor
>>
>>
>> #At start, northd went active on the leader node and standby on other two
>> nodes.
>>
>> #After old leader crashed and new leader got elected, northd goes active
>> on any of the remaining 2 nodes as per sample logs below from non-leader
>> node:
>> 2018-03-22T00:20:30.732Z|00023|ovn_northd|INFO|ovn-northd lock lost.
>> This ovn-northd instance is now on standby.
>> 2018-03-22T00:20:30.743Z|00024|ovn_northd|INFO|ovn-northd lock acquired.
>> This ovn-northd instance is now active.
>>
>> # Also ovn-controller works similar way if leader goes down and connects
>> to any of the remaining 2 nodes:
>> 2018-03-22T01:21:56.250Z|00029|ovsdb_idl|INFO|tcp:10.148.181.162:6642:
>> clustered database server is disconnected from cluster; trying another
>> server
>> 2018-03-22T01:21:56.250Z|00030|reconnect|INFO|tcp:10.148.181.162:6642:
>> connection attempt timed out
>> 2018-03-22T01:21:56.250Z|00031|reconnect|INFO|tcp:10.148.181.162:6642:
>> waiting 4 seconds before reconnect
>> 2018-03-22T01:23:52.417Z|00043|reconnect|INFO|tcp:10.148.181.162:6642:
>> connected
>>
>>
>>
>> Above settings will also work if we put all the nodes behind the vip and
>> updates the ovn configs to use vips. So we don't need pacemaker explicitly
>> for northd HA :).
>>
>> Since the setup is complete now, I will populate the same in scale test
>> env and see how it behaves.
>>
>> @Numan: We can try the same with networking-ovn integration and see if we
>> find anything weird there too. Not sure if you have any exclusive findings
>> for this case.
>>
>> Let me know if something else is missed here.
>>
>>
>>
>>
>> Regards,
>>
>> On Wed, Mar 21, 2018 at 2:50 PM, Han Zhou <zhouhan at gmail.com> wrote:
>>
>>> Ali, sorry if I misunderstand what you are saying, but pacemaker here is
>>> for northd HA. pacemaker itself won't point to any ovsdb cluster node. All
>>> northds can point to a LB VIP for the ovsdb cluster, so if a member of
>>> ovsdb cluster is down it won't have impact to northd.
>>>
>>> Without clustering support of the ovsdb lock, I think this is what we
>>> have now for northd HA. Please suggest if anyone has any other idea. Thanks
>>> :)
>>>
>>> On Wed, Mar 21, 2018 at 1:12 PM, aginwala <aginwala at asu.edu> wrote:
>>>
>>>> :) The only thing is while using pacemaker, if the node that pacemaker
>>>> if pointing to is down, all the active/standby northd nodes have to be
>>>> updated to new node from the cluster. But will dig in more to see what else
>>>> I can find.
>>>>
>>>> @Ben: Any suggestions further?
>>>>
>>>>
>>>> Regards,
>>>>
>>>> On Wed, Mar 21, 2018 at 10:22 AM, Han Zhou <zhouhan at gmail.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 21, 2018 at 9:49 AM, aginwala <aginwala at asu.edu> wrote:
>>>>>
>>>>>> Thanks Numan:
>>>>>>
>>>>>> Yup agree with the locking part. For now; yes I am running northd on
>>>>>> one node. I might right a script to monitor northd  in cluster so that if
>>>>>> the node where it's running goes down, script can spin up northd on one
>>>>>> other active nodes as a dirty hack.
>>>>>>
>>>>>> The "dirty hack" is pacemaker :)
>>>>>
>>>>>
>>>>>> Sure, will await for the inputs from Ben too on this and see how
>>>>>> complex would it be to roll out this feature.
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 21, 2018 at 5:43 AM, Numan Siddique <nusiddiq at redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Aliasgar,
>>>>>>>
>>>>>>> ovsdb-server maintains locks per each connection and not across the
>>>>>>> db. A workaround for you now would be to configure all the ovn-northd
>>>>>>> instances to connect to one ovsdb-server if you want to have active/standy.
>>>>>>>
>>>>>>> Probably Ben can answer if there is a plan to support ovsdb locks
>>>>>>> across the db. We also need this support in networking-ovn as it also uses
>>>>>>> ovsdb locks.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Numan
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 21, 2018 at 1:40 PM, aginwala <aginwala at asu.edu> wrote:
>>>>>>>
>>>>>>>> Hi Numan:
>>>>>>>>
>>>>>>>> Just figured out that ovn-northd is running as active on all 3
>>>>>>>> nodes instead of one active instance as I continued to test further which
>>>>>>>> results in db errors as per logs.
>>>>>>>>
>>>>>>>>
>>>>>>>> # on node 3, I run ovn-nbctl ls-add ls2 ; it populates below logs
>>>>>>>> in  ovn-north
>>>>>>>> 2018-03-21T06:01:59.442Z|00007|ovsdb_idl|WARN|transaction error:
>>>>>>>> {"details":"Transaction causes multiple rows in \"Datapath_Binding\" table
>>>>>>>> to have identical values (1) for index on column \"tunnel_key\".  First
>>>>>>>> row, with UUID 8c5d9342-2b90-4229-8ea1-001a733a915c, was inserted
>>>>>>>> by this transaction.  Second row, with UUID 8e06f919-4cc7-4ffc-9a79-20ce6663b683,
>>>>>>>> existed in the database before this transaction and was not modified by the
>>>>>>>> transaction.","error":"constraint violation"}
>>>>>>>>
>>>>>>>> In southbound datapath list, 2 duplicate records gets created for
>>>>>>>> same switch.
>>>>>>>>
>>>>>>>> # ovn-sbctl list Datapath
>>>>>>>> _uuid               : b270ae30-3458-445f-95d2-b14e8ebddd01
>>>>>>>> external_ids        : {logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d",
>>>>>>>> name="ls2"}
>>>>>>>> tunnel_key          : 2
>>>>>>>>
>>>>>>>> _uuid               : 8e06f919-4cc7-4ffc-9a79-20ce6663b683
>>>>>>>> external_ids        : {logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d",
>>>>>>>> name="ls2"}
>>>>>>>> tunnel_key          : 1
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> # on nodes 1 and 2 where northd is running, it gives below error:
>>>>>>>> 2018-03-21T06:01:59.437Z|00008|ovsdb_idl|WARN|transaction error:
>>>>>>>> {"details":"cannot delete Datapath_Binding row
>>>>>>>> 8e06f919-4cc7-4ffc-9a79-20ce6663b683 because of 17 remaining
>>>>>>>> reference(s)","error":"referential integrity violation"}
>>>>>>>>
>>>>>>>> As per commit message, for northd I re-tried setting
>>>>>>>> --ovnnb-db="tcp:10.169.125.152:6641,tcp:10.169.125.131:6641,tcp:
>>>>>>>> 10.148.181.162:6641"  and --ovnsb-db="tcp:10.169.125.152:6642,tcp:
>>>>>>>> 10.169.125.131:6642,tcp:10.148.181.162:6642" and it did not help
>>>>>>>> either.
>>>>>>>>
>>>>>>>> There is no issue if I keep running only one instance of northd on
>>>>>>>> any of these 3 nodes. Hence, wanted to know is there something
>>>>>>>> else missing here to make only one northd instance as active and rest as
>>>>>>>> standby?
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> On Thu, Mar 15, 2018 at 3:09 AM, Numan Siddique <
>>>>>>>> nusiddiq at redhat.com> wrote:
>>>>>>>>
>>>>>>>>> That's great
>>>>>>>>>
>>>>>>>>> Numan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Mar 15, 2018 at 2:57 AM, aginwala <aginwala at asu.edu>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Numan:
>>>>>>>>>>
>>>>>>>>>> I tried on new nodes (kernel : 4.4.0-104-generic , Ubuntu
>>>>>>>>>> 16.04)with fresh installation and it worked super fine for both
>>>>>>>>>> sb and nb dbs. Seems like some kernel issue on the previous
>>>>>>>>>> nodes when I re-installed raft patch as I was running different ovs version
>>>>>>>>>> on those nodes before.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> For 2 HVs, I now set ovn-remote="tcp:10.169.125.152:6642, tcp:
>>>>>>>>>> 10.169.125.131:6642, tcp:10.148.181.162:6642"  and started
>>>>>>>>>> controller and it works super fine.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Did some failover testing by rebooting/killing the leader (
>>>>>>>>>> 10.169.125.152) and bringing it back up and it works as
>>>>>>>>>> expected. Nothing weird noted so far.
>>>>>>>>>>
>>>>>>>>>> # check-cluster gives below data one of the node(10.148.181.162) post
>>>>>>>>>> leader failure
>>>>>>>>>>
>>>>>>>>>> ovsdb-tool check-cluster /etc/openvswitch/ovnsb_db.db
>>>>>>>>>> ovsdb-tool: leader /etc/openvswitch/ovnsb_db.db for term 2 has
>>>>>>>>>> log entries only up to index 18446744073709551615, but index 9 was
>>>>>>>>>> committed in a previous term (e.g. by /etc/openvswitch/ovnsb_db.db)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> For check-cluster, are we planning to add more output showing
>>>>>>>>>> which node is active(leader), etc in upcoming versions ?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks a ton for helping sort this out.  I think the patch looks
>>>>>>>>>> good to be merged post addressing of the comments by Justin along with the
>>>>>>>>>> man page details for ovsdb-tool.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I will do some more crash testing for the cluster along with the
>>>>>>>>>> scale test and keep you posted if something unexpected is noted.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 13, 2018 at 11:07 PM, Numan Siddique <
>>>>>>>>>> nusiddiq at redhat.com> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Mar 14, 2018 at 7:51 AM, aginwala <aginwala at asu.edu>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Sure.
>>>>>>>>>>>>
>>>>>>>>>>>> To add on , I also ran for nb db too using different port  and
>>>>>>>>>>>> Node2 crashes with same error :
>>>>>>>>>>>> # Node 2
>>>>>>>>>>>> /usr/share/openvswitch/scripts/ovn-ctl
>>>>>>>>>>>> --db-nb-addr=10.99.152.138 --db-nb-port=6641 --db-nb-cluster-remote-addr="t
>>>>>>>>>>>> cp:10.99.152.148:6645" --db-nb-cluster-local-addr="tcp:
>>>>>>>>>>>> 10.99.152.138:6645" start_nb_ovsdb
>>>>>>>>>>>> ovsdb-server: ovsdb error: /etc/openvswitch/ovnnb_db.db: cannot
>>>>>>>>>>>> identify file type
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Hi Aliasgar,
>>>>>>>>>>>
>>>>>>>>>>> It worked for me. Can you delete the old db files in
>>>>>>>>>>> /etc/openvswitch/ and try running the commands again ?
>>>>>>>>>>>
>>>>>>>>>>> Below are the commands I ran in my setup.
>>>>>>>>>>>
>>>>>>>>>>> Node 1
>>>>>>>>>>> -------
>>>>>>>>>>> sudo /usr/share/openvswitch/scripts/ovn-ctl
>>>>>>>>>>> --db-sb-addr=192.168.121.91 --db-sb-port=6642 --db-sb-create-insecure-remote=yes
>>>>>>>>>>> --db-sb-cluster-local-addr=tcp:192.168.121.91:6644
>>>>>>>>>>> start_sb_ovsdb
>>>>>>>>>>>
>>>>>>>>>>> Node 2
>>>>>>>>>>> ---------
>>>>>>>>>>> sudo /usr/share/openvswitch/scripts/ovn-ctl
>>>>>>>>>>> --db-sb-addr=192.168.121.87 --db-sb-port=6642 --db-sb-create-insecure-remote=yes
>>>>>>>>>>> --db-sb-cluster-local-addr="tcp:192.168.121.87:6644"
>>>>>>>>>>> --db-sb-cluster-remote-addr="tcp:192.168.121.91:6644"
>>>>>>>>>>> start_sb_ovsdb
>>>>>>>>>>>
>>>>>>>>>>> Node 3
>>>>>>>>>>> ---------
>>>>>>>>>>> sudo /usr/share/openvswitch/scripts/ovn-ctl
>>>>>>>>>>> --db-sb-addr=192.168.121.78 --db-sb-port=6642 --db-sb-create-insecure-remote=yes
>>>>>>>>>>> --db-sb-cluster-local-addr="tcp:192.168.121.78:6644"
>>>>>>>>>>> --db-sb-cluster-remote-addr="tcp:192.168.121.91:6644"
>>>>>>>>>>> start_sb_ovsdb
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> Numan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Mar 13, 2018 at 9:40 AM, Numan Siddique <
>>>>>>>>>>>> nusiddiq at redhat.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Mar 13, 2018 at 9:46 PM, aginwala <aginwala at asu.edu>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks Numan for the response.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> There is no command start_cluster_sb_ovsdb in the source
>>>>>>>>>>>>>> code too. Is that in a separate commit somewhere? Hence, I used start_sb_ovsdb
>>>>>>>>>>>>>> which I think would not be a right choice?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sorry, I meant start_sb_ovsdb. Strange that it didn't work for
>>>>>>>>>>>>> you. Let me try it out again and update this thread.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>> Numan
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # Node1  came up as expected.
>>>>>>>>>>>>>> ovn-ctl --db-sb-addr=10.99.152.148 --db-sb-port=6642
>>>>>>>>>>>>>> --db-sb-create-insecure-remote=yes
>>>>>>>>>>>>>> --db-sb-cluster-local-addr="tcp:10.99.152.148:6644"
>>>>>>>>>>>>>> start_sb_ovsdb.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # verifying its a clustered db with ovsdb-tool
>>>>>>>>>>>>>> db-local-address /etc/openvswitch/ovnsb_db.db
>>>>>>>>>>>>>> tcp:10.99.152.148:6644
>>>>>>>>>>>>>> # ovn-sbctl show works fine and chassis are being populated
>>>>>>>>>>>>>> correctly.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> #Node 2 fails with error:
>>>>>>>>>>>>>> /usr/share/openvswitch/scripts/ovn-ctl
>>>>>>>>>>>>>> --db-sb-addr=10.99.152.138 --db-sb-port=6642 --db-sb-create-insecure-remote=yes
>>>>>>>>>>>>>> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644"
>>>>>>>>>>>>>> --db-sb-cluster-local-addr="tcp:10.99.152.138:6644"
>>>>>>>>>>>>>> start_sb_ovsdb
>>>>>>>>>>>>>> ovsdb-server: ovsdb error: /etc/openvswitch/ovnsb_db.db:
>>>>>>>>>>>>>> cannot identify file type
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # So i did start the sb db the usual way using start_ovsdb to
>>>>>>>>>>>>>> just get the db file created and killed the sb pid and re-ran the command
>>>>>>>>>>>>>> which gave actual error where it complains for join-cluster command that is
>>>>>>>>>>>>>> being called internally
>>>>>>>>>>>>>> /usr/share/openvswitch/scripts/ovn-ctl
>>>>>>>>>>>>>> --db-sb-addr=10.99.152.138 --db-sb-port=6642 --db-sb-create-insecure-remote=yes
>>>>>>>>>>>>>> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644"
>>>>>>>>>>>>>> --db-sb-cluster-local-addr="tcp:10.99.152.138:6644"
>>>>>>>>>>>>>> start_sb_ovsdb
>>>>>>>>>>>>>> ovsdb-tool: /etc/openvswitch/ovnsb_db.db: not a clustered
>>>>>>>>>>>>>> database
>>>>>>>>>>>>>>  * Backing up database to /etc/openvswitch/ovnsb_db.db.b
>>>>>>>>>>>>>> ackup1.15.0-70426956
>>>>>>>>>>>>>> ovsdb-tool: 'join-cluster' command requires at least 4
>>>>>>>>>>>>>> arguments
>>>>>>>>>>>>>>  * Creating cluster database /etc/openvswitch/ovnsb_db.db
>>>>>>>>>>>>>> from existing one
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # based on above error I killed the sb db pid again and  try
>>>>>>>>>>>>>> to create a local cluster on node  then re-ran the join operation as per
>>>>>>>>>>>>>> the source code function.
>>>>>>>>>>>>>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db
>>>>>>>>>>>>>> OVN_Southbound tcp:10.99.152.138:6644 tcp:10.99.152.148:6644
>>>>>>>>>>>>>> which still complains
>>>>>>>>>>>>>> ovsdb-tool: I/O error: /etc/openvswitch/ovnsb_db.db: create
>>>>>>>>>>>>>> failed (File exists)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # Node 3: I did not try as I am assuming the same failure as
>>>>>>>>>>>>>> node 2
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Let me know may know further.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Mar 13, 2018 at 3:08 AM, Numan Siddique <
>>>>>>>>>>>>>> nusiddiq at redhat.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Aliasgar,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Mar 13, 2018 at 7:11 AM, aginwala <aginwala at asu.edu>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Ben/Noman:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I am trying to setup 3 node southbound db cluster  using
>>>>>>>>>>>>>>>> raft10 <https://patchwork.ozlabs.org/patch/854298/> in
>>>>>>>>>>>>>>>> review.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> # Node 1 create-cluster
>>>>>>>>>>>>>>>> ovsdb-tool create-cluster /etc/openvswitch/ovnsb_db.db
>>>>>>>>>>>>>>>> /root/ovs-reviews/ovn/ovn-sb.ovsschema tcp:
>>>>>>>>>>>>>>>> 10.99.152.148:6642
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> A different port is used for RAFT. So you have to choose
>>>>>>>>>>>>>>> another port like 6644 for example.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> # Node 2
>>>>>>>>>>>>>>>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db
>>>>>>>>>>>>>>>> OVN_Southbound tcp:10.99.152.138:6642 tcp:
>>>>>>>>>>>>>>>> 10.99.152.148:6642 --cid 5dfcb678-bb1d-4377-b02d-a380ed
>>>>>>>>>>>>>>>> ec2982
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> #Node 3
>>>>>>>>>>>>>>>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db
>>>>>>>>>>>>>>>> OVN_Southbound tcp:10.99.152.101:6642 tcp:
>>>>>>>>>>>>>>>> 10.99.152.138:6642 tcp:10.99.152.148:6642 --cid
>>>>>>>>>>>>>>>> 5dfcb678-bb1d-4377-b02d-a380edec2982
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> # ovn remote is set to all 3 nodes
>>>>>>>>>>>>>>>> external_ids:ovn-remote="tcp:10.99.152.148:6642, tcp:
>>>>>>>>>>>>>>>> 10.99.152.138:6642, tcp:10.99.152.101:6642"
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> # Starting sb db on node 1 using below command on node 1:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ovsdb-server --detach --monitor -vconsole:off -vraft
>>>>>>>>>>>>>>>> -vjsonrpc --log-file=/var/log/openvswitch/ovsdb-server-sb.log
>>>>>>>>>>>>>>>> --pidfile=/var/run/openvswitch/ovnsb_db.pid
>>>>>>>>>>>>>>>> --remote=db:OVN_Southbound,SB_Global,connections
>>>>>>>>>>>>>>>> --unixctl=ovnsb_db.ctl --private-key=db:OVN_Southbound,SSL,private_key
>>>>>>>>>>>>>>>> --certificate=db:OVN_Southbound,SSL,certificate
>>>>>>>>>>>>>>>> --ca-cert=db:OVN_Southbound,SSL,ca_cert
>>>>>>>>>>>>>>>> --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols
>>>>>>>>>>>>>>>> --ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers
>>>>>>>>>>>>>>>> --remote=punix:/var/run/openvswitch/ovnsb_db.sock
>>>>>>>>>>>>>>>> /etc/openvswitch/ovnsb_db.db
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> # check-cluster is returning nothing
>>>>>>>>>>>>>>>> ovsdb-tool check-cluster /etc/openvswitch/ovnsb_db.db
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> # ovsdb-server-sb.log below shows the leader is elected
>>>>>>>>>>>>>>>> with only one server and there are rbac related debug logs with rpc replies
>>>>>>>>>>>>>>>> and empty params with no errors
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2018-03-13T01:12:02Z|00002|raft|DBG|server 63d1 added to
>>>>>>>>>>>>>>>> configuration
>>>>>>>>>>>>>>>> 2018-03-13T01:12:02Z|00003|raft|INFO|term 6: starting
>>>>>>>>>>>>>>>> election
>>>>>>>>>>>>>>>> 2018-03-13T01:12:02Z|00004|raft|INFO|term 6: elected
>>>>>>>>>>>>>>>> leader by 1+ of 1 servers
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Now Starting the ovsdb-server on the other clusters fails
>>>>>>>>>>>>>>>> saying
>>>>>>>>>>>>>>>> ovsdb-server: ovsdb error: /etc/openvswitch/ovnsb_db.db:
>>>>>>>>>>>>>>>> cannot identify file type
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Also noticed that man ovsdb-tool is missing cluster
>>>>>>>>>>>>>>>> details. Might want to address it in the same patch or different.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Please advise to what is missing here for running ovn-sbctl
>>>>>>>>>>>>>>>> show as this command hangs.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think you can use the ovn-ctl command
>>>>>>>>>>>>>>> "start_cluster_sb_ovsdb" for your testing (atleast for now)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> For your setup, I think you can start the cluster as
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # Node 1
>>>>>>>>>>>>>>> ovn-ctl --db-sb-addr=10.99.152.148 --db-sb-port=6642
>>>>>>>>>>>>>>> --db-sb-create-insecure-remote=yes
>>>>>>>>>>>>>>> --db-sb-cluster-local-addr="tcp:10.99.152.148:6644"
>>>>>>>>>>>>>>> start_cluster_sb_ovsdb
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # Node 2
>>>>>>>>>>>>>>> ovn-ctl --db-sb-addr=10.99.152.138 --db-sb-port=6642
>>>>>>>>>>>>>>> --db-sb-create-insecure-remote=yes
>>>>>>>>>>>>>>> --db-sb-cluster-local-addr="tcp:10.99.152.138:6644"
>>>>>>>>>>>>>>> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644"
>>>>>>>>>>>>>>> start_cluster_sb_ovsdb
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # Node 3
>>>>>>>>>>>>>>> ovn-ctl --db-sb-addr=10.99.152.101 --db-sb-port=6642
>>>>>>>>>>>>>>> --db-sb-create-insecure-remote=yes
>>>>>>>>>>>>>>> --db-sb-cluster-local-addr="tcp:10.99.152.101:6644"
>>>>>>>>>>>>>>> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644" start_c
>>>>>>>>>>>>>>> luster_sb_ovsdb
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Let me know how it goes.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>> Numan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>> discuss mailing list
>>>>>>>>>>>>>>>> discuss at openvswitch.org
>>>>>>>>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> discuss mailing list
>>>>>> discuss at openvswitch.org
>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20180327/a226547c/attachment-0001.html>


More information about the discuss mailing list