[ovs-discuss] raft ovsdb clustering

Numan Siddique nusiddiq at redhat.com
Tue Mar 27 19:08:36 UTC 2018


Thanks Aliasgar,

I am still facing the same issue.

Can you also share the (ovn-ctl) commands you used to start/join the
ovsdb-server clusters in your nodes ?

Thanks
Numan


On Tue, Mar 27, 2018 at 11:04 PM, aginwala <aginwala at asu.edu> wrote:

> Hu Numan:
>
> You need to use --db as you are now running db in cluster, you can access
> data from any of the three dbs.
>
> So if the leader crashes, it re-elects from the other two. Below is the
> e.g. command:
>
> # export remote="tcp:192.168.220.103:6641,tcp:192.168.220.102:6641,tcp:
> 192.168.220.101:6641"
> # kill -9 3985
> # ovn-nbctl --db=$remote show
> switch 1d86ab4e-c8bf-4747-a716-8832a285d58c (ls1)
> # ovn-nbctl --db=$remote ls-del ls1
>
>
>
>
>
>
>
> Hope it helps!
>
> Regards,
>
>
> On Tue, Mar 27, 2018 at 10:01 AM, Numan Siddique <nusiddiq at redhat.com>
> wrote:
>
>> Hi Aliasgar,
>>
>> In your setup, if you kill the leader what is the behaviour ?  Are you
>> still able to create or delete any resources ? Is a new leader elected ?
>>
>> In my setup, the command "ovn-nbctl ls-add" for example blocks until I
>> restart the ovsdb-server in node 1. And I don't see any other ovsdb-server
>> becoming leader. May be I have configured wrongly.
>> Could you please test this scenario if not yet please and let me know
>> your observations if possible.
>>
>> Thanks
>> Numan
>>
>>
>> On Thu, Mar 22, 2018 at 12:28 PM, Han Zhou <zhouhan at gmail.com> wrote:
>>
>>> Sounds good.
>>>
>>> Just checked the patch, by default the C IDL has "leader_only" as true,
>>> which ensures that connection is to leader only. This is the case for
>>> northd. So the lock works for northd hot active-standby purpose if all the
>>> ovsdb endpoints of a cluster are specified to northd, since all northds are
>>> connecting to the same DB, the leader.
>>>
>>> For neutron networking-ovn, this may not work yet, since I didn't see
>>> such logic in the python IDL in current patch series. It would be good if
>>> we add similar logic for python IDL. (@ben/numan, correct me if I am wrong)
>>>
>>>
>>> On Wed, Mar 21, 2018 at 6:49 PM, aginwala <aginwala at asu.edu> wrote:
>>>
>>>> Hi :
>>>>
>>>> Just sorted out the correct settings and northd also works in ha in
>>>> raft.
>>>>
>>>> There were 2 issues in the setup:
>>>> 1. I had started nb db without --db-nb-create-insecure-remote
>>>> 2. I also started northd locally on all 3 without remote which is like
>>>> all three northd trying to lock the ovsdb locally.
>>>>
>>>> Hence, the duplicate logs were populated in the southbound datapath due
>>>> to multiple northd trying to write the local copy.
>>>>
>>>> So, I now start nb db with --db-nb-create-insecure-remote and northd on
>>>> all 3 nodes using below command:
>>>>
>>>> ovn-northd -vconsole:emer -vsyslog:err -vfile:info --ovnnb-db="tcp:
>>>> 10.169.125.152:6641,tcp:10.169.125.131:6641,tcp:10.148.181.162:6641"
>>>> --ovnsb-db="tcp:10.169.125.152:6642,tcp:10.169.125.131:6642,tcp:
>>>> 10.148.181.162:6642" --no-chdir --log-file=/var/log/openvswitch/ovn-northd.log
>>>> --pidfile=/var/run/openvswitch/ovn-northd.pid --detach --monitor
>>>>
>>>>
>>>> #At start, northd went active on the leader node and standby on other
>>>> two nodes.
>>>>
>>>> #After old leader crashed and new leader got elected, northd goes
>>>> active on any of the remaining 2 nodes as per sample logs below from
>>>> non-leader node:
>>>> 2018-03-22T00:20:30.732Z|00023|ovn_northd|INFO|ovn-northd lock lost.
>>>> This ovn-northd instance is now on standby.
>>>> 2018-03-22T00:20:30.743Z|00024|ovn_northd|INFO|ovn-northd lock
>>>> acquired. This ovn-northd instance is now active.
>>>>
>>>> # Also ovn-controller works similar way if leader goes down and
>>>> connects to any of the remaining 2 nodes:
>>>> 2018-03-22T01:21:56.250Z|00029|ovsdb_idl|INFO|tcp:10.148.181.162:6642:
>>>> clustered database server is disconnected from cluster; trying another
>>>> server
>>>> 2018-03-22T01:21:56.250Z|00030|reconnect|INFO|tcp:10.148.181.162:6642:
>>>> connection attempt timed out
>>>> 2018-03-22T01:21:56.250Z|00031|reconnect|INFO|tcp:10.148.181.162:6642:
>>>> waiting 4 seconds before reconnect
>>>> 2018-03-22T01:23:52.417Z|00043|reconnect|INFO|tcp:10.148.181.162:6642:
>>>> connected
>>>>
>>>>
>>>>
>>>> Above settings will also work if we put all the nodes behind the vip
>>>> and updates the ovn configs to use vips. So we don't need pacemaker
>>>> explicitly for northd HA :).
>>>>
>>>> Since the setup is complete now, I will populate the same in scale test
>>>> env and see how it behaves.
>>>>
>>>> @Numan: We can try the same with networking-ovn integration and see if
>>>> we find anything weird there too. Not sure if you have any exclusive
>>>> findings for this case.
>>>>
>>>> Let me know if something else is missed here.
>>>>
>>>>
>>>>
>>>>
>>>> Regards,
>>>>
>>>> On Wed, Mar 21, 2018 at 2:50 PM, Han Zhou <zhouhan at gmail.com> wrote:
>>>>
>>>>> Ali, sorry if I misunderstand what you are saying, but pacemaker here
>>>>> is for northd HA. pacemaker itself won't point to any ovsdb cluster node.
>>>>> All northds can point to a LB VIP for the ovsdb cluster, so if a member of
>>>>> ovsdb cluster is down it won't have impact to northd.
>>>>>
>>>>> Without clustering support of the ovsdb lock, I think this is what we
>>>>> have now for northd HA. Please suggest if anyone has any other idea. Thanks
>>>>> :)
>>>>>
>>>>> On Wed, Mar 21, 2018 at 1:12 PM, aginwala <aginwala at asu.edu> wrote:
>>>>>
>>>>>> :) The only thing is while using pacemaker, if the node that
>>>>>> pacemaker if pointing to is down, all the active/standby northd nodes have
>>>>>> to be updated to new node from the cluster. But will dig in more to see
>>>>>> what else I can find.
>>>>>>
>>>>>> @Ben: Any suggestions further?
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> On Wed, Mar 21, 2018 at 10:22 AM, Han Zhou <zhouhan at gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 21, 2018 at 9:49 AM, aginwala <aginwala at asu.edu> wrote:
>>>>>>>
>>>>>>>> Thanks Numan:
>>>>>>>>
>>>>>>>> Yup agree with the locking part. For now; yes I am running northd
>>>>>>>> on one node. I might right a script to monitor northd  in cluster so that
>>>>>>>> if the node where it's running goes down, script can spin up northd on one
>>>>>>>> other active nodes as a dirty hack.
>>>>>>>>
>>>>>>>> The "dirty hack" is pacemaker :)
>>>>>>>
>>>>>>>
>>>>>>>> Sure, will await for the inputs from Ben too on this and see how
>>>>>>>> complex would it be to roll out this feature.
>>>>>>>>
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 21, 2018 at 5:43 AM, Numan Siddique <
>>>>>>>> nusiddiq at redhat.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Aliasgar,
>>>>>>>>>
>>>>>>>>> ovsdb-server maintains locks per each connection and not across
>>>>>>>>> the db. A workaround for you now would be to configure all the ovn-northd
>>>>>>>>> instances to connect to one ovsdb-server if you want to have active/standy.
>>>>>>>>>
>>>>>>>>> Probably Ben can answer if there is a plan to support ovsdb locks
>>>>>>>>> across the db. We also need this support in networking-ovn as it also uses
>>>>>>>>> ovsdb locks.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Numan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Mar 21, 2018 at 1:40 PM, aginwala <aginwala at asu.edu>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Numan:
>>>>>>>>>>
>>>>>>>>>> Just figured out that ovn-northd is running as active on all 3
>>>>>>>>>> nodes instead of one active instance as I continued to test further which
>>>>>>>>>> results in db errors as per logs.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> # on node 3, I run ovn-nbctl ls-add ls2 ; it populates below logs
>>>>>>>>>> in  ovn-north
>>>>>>>>>> 2018-03-21T06:01:59.442Z|00007|ovsdb_idl|WARN|transaction error:
>>>>>>>>>> {"details":"Transaction causes multiple rows in \"Datapath_Binding\" table
>>>>>>>>>> to have identical values (1) for index on column \"tunnel_key\".  First
>>>>>>>>>> row, with UUID 8c5d9342-2b90-4229-8ea1-001a733a915c, was
>>>>>>>>>> inserted by this transaction.  Second row, with UUID
>>>>>>>>>> 8e06f919-4cc7-4ffc-9a79-20ce6663b683, existed in the database
>>>>>>>>>> before this transaction and was not modified by the
>>>>>>>>>> transaction.","error":"constraint violation"}
>>>>>>>>>>
>>>>>>>>>> In southbound datapath list, 2 duplicate records gets created for
>>>>>>>>>> same switch.
>>>>>>>>>>
>>>>>>>>>> # ovn-sbctl list Datapath
>>>>>>>>>> _uuid               : b270ae30-3458-445f-95d2-b14e8ebddd01
>>>>>>>>>> external_ids        : {logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d",
>>>>>>>>>> name="ls2"}
>>>>>>>>>> tunnel_key          : 2
>>>>>>>>>>
>>>>>>>>>> _uuid               : 8e06f919-4cc7-4ffc-9a79-20ce6663b683
>>>>>>>>>> external_ids        : {logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d",
>>>>>>>>>> name="ls2"}
>>>>>>>>>> tunnel_key          : 1
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> # on nodes 1 and 2 where northd is running, it gives below error:
>>>>>>>>>> 2018-03-21T06:01:59.437Z|00008|ovsdb_idl|WARN|transaction error:
>>>>>>>>>> {"details":"cannot delete Datapath_Binding row
>>>>>>>>>> 8e06f919-4cc7-4ffc-9a79-20ce6663b683 because of 17 remaining
>>>>>>>>>> reference(s)","error":"referential integrity violation"}
>>>>>>>>>>
>>>>>>>>>> As per commit message, for northd I re-tried setting
>>>>>>>>>> --ovnnb-db="tcp:10.169.125.152:6641,tcp:10.169.125.131:6641,tcp:
>>>>>>>>>> 10.148.181.162:6641"  and --ovnsb-db="tcp:10.169.125.152:6642
>>>>>>>>>> ,tcp:10.169.125.131:6642,tcp:10.148.181.162:6642" and it did not
>>>>>>>>>> help either.
>>>>>>>>>>
>>>>>>>>>> There is no issue if I keep running only one instance of northd
>>>>>>>>>> on any of these 3 nodes. Hence, wanted to know is there
>>>>>>>>>> something else missing here to make only one northd instance as active and
>>>>>>>>>> rest as standby?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> On Thu, Mar 15, 2018 at 3:09 AM, Numan Siddique <
>>>>>>>>>> nusiddiq at redhat.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> That's great
>>>>>>>>>>>
>>>>>>>>>>> Numan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Mar 15, 2018 at 2:57 AM, aginwala <aginwala at asu.edu>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Numan:
>>>>>>>>>>>>
>>>>>>>>>>>> I tried on new nodes (kernel : 4.4.0-104-generic , Ubuntu
>>>>>>>>>>>> 16.04)with fresh installation and it worked super fine for both
>>>>>>>>>>>> sb and nb dbs. Seems like some kernel issue on the previous
>>>>>>>>>>>> nodes when I re-installed raft patch as I was running different ovs version
>>>>>>>>>>>> on those nodes before.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> For 2 HVs, I now set ovn-remote="tcp:10.169.125.152:6642, tcp:
>>>>>>>>>>>> 10.169.125.131:6642, tcp:10.148.181.162:6642"  and started
>>>>>>>>>>>> controller and it works super fine.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Did some failover testing by rebooting/killing the leader (
>>>>>>>>>>>> 10.169.125.152) and bringing it back up and it works as
>>>>>>>>>>>> expected. Nothing weird noted so far.
>>>>>>>>>>>>
>>>>>>>>>>>> # check-cluster gives below data one of the node(
>>>>>>>>>>>> 10.148.181.162) post leader failure
>>>>>>>>>>>>
>>>>>>>>>>>> ovsdb-tool check-cluster /etc/openvswitch/ovnsb_db.db
>>>>>>>>>>>> ovsdb-tool: leader /etc/openvswitch/ovnsb_db.db for term 2 has
>>>>>>>>>>>> log entries only up to index 18446744073709551615, but index 9 was
>>>>>>>>>>>> committed in a previous term (e.g. by /etc/openvswitch/ovnsb_db.db)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> For check-cluster, are we planning to add more output showing
>>>>>>>>>>>> which node is active(leader), etc in upcoming versions ?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks a ton for helping sort this out.  I think the patch
>>>>>>>>>>>> looks good to be merged post addressing of the comments by Justin along
>>>>>>>>>>>> with the man page details for ovsdb-tool.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I will do some more crash testing for the cluster along with
>>>>>>>>>>>> the scale test and keep you posted if something unexpected is noted.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Mar 13, 2018 at 11:07 PM, Numan Siddique <
>>>>>>>>>>>> nusiddiq at redhat.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Mar 14, 2018 at 7:51 AM, aginwala <aginwala at asu.edu>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sure.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To add on , I also ran for nb db too using different port  and
>>>>>>>>>>>>>> Node2 crashes with same error :
>>>>>>>>>>>>>> # Node 2
>>>>>>>>>>>>>> /usr/share/openvswitch/scripts/ovn-ctl
>>>>>>>>>>>>>> --db-nb-addr=10.99.152.138 --db-nb-port=6641 --db-nb-cluster-remote-addr="t
>>>>>>>>>>>>>> cp:10.99.152.148:6645" --db-nb-cluster-local-addr="tcp:
>>>>>>>>>>>>>> 10.99.152.138:6645" start_nb_ovsdb
>>>>>>>>>>>>>> ovsdb-server: ovsdb error: /etc/openvswitch/ovnnb_db.db:
>>>>>>>>>>>>>> cannot identify file type
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Aliasgar,
>>>>>>>>>>>>>
>>>>>>>>>>>>> It worked for me. Can you delete the old db files in
>>>>>>>>>>>>> /etc/openvswitch/ and try running the commands again ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Below are the commands I ran in my setup.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Node 1
>>>>>>>>>>>>> -------
>>>>>>>>>>>>> sudo /usr/share/openvswitch/scripts/ovn-ctl
>>>>>>>>>>>>> --db-sb-addr=192.168.121.91 --db-sb-port=6642 --db-sb-create-insecure-remote=yes
>>>>>>>>>>>>> --db-sb-cluster-local-addr=tcp:192.168.121.91:6644
>>>>>>>>>>>>> start_sb_ovsdb
>>>>>>>>>>>>>
>>>>>>>>>>>>> Node 2
>>>>>>>>>>>>> ---------
>>>>>>>>>>>>> sudo /usr/share/openvswitch/scripts/ovn-ctl
>>>>>>>>>>>>> --db-sb-addr=192.168.121.87 --db-sb-port=6642 --db-sb-create-insecure-remote=yes
>>>>>>>>>>>>> --db-sb-cluster-local-addr="tcp:192.168.121.87:6644"
>>>>>>>>>>>>> --db-sb-cluster-remote-addr="tcp:192.168.121.91:6644"
>>>>>>>>>>>>> start_sb_ovsdb
>>>>>>>>>>>>>
>>>>>>>>>>>>> Node 3
>>>>>>>>>>>>> ---------
>>>>>>>>>>>>> sudo /usr/share/openvswitch/scripts/ovn-ctl
>>>>>>>>>>>>> --db-sb-addr=192.168.121.78 --db-sb-port=6642 --db-sb-create-insecure-remote=yes
>>>>>>>>>>>>> --db-sb-cluster-local-addr="tcp:192.168.121.78:6644"
>>>>>>>>>>>>> --db-sb-cluster-remote-addr="tcp:192.168.121.91:6644"
>>>>>>>>>>>>> start_sb_ovsdb
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>> Numan
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Mar 13, 2018 at 9:40 AM, Numan Siddique <
>>>>>>>>>>>>>> nusiddiq at redhat.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Mar 13, 2018 at 9:46 PM, aginwala <aginwala at asu.edu>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks Numan for the response.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> There is no command start_cluster_sb_ovsdb in the source
>>>>>>>>>>>>>>>> code too. Is that in a separate commit somewhere? Hence, I used start_sb_ovsdb
>>>>>>>>>>>>>>>> which I think would not be a right choice?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Sorry, I meant start_sb_ovsdb. Strange that it didn't work
>>>>>>>>>>>>>>> for you. Let me try it out again and update this thread.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>> Numan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> # Node1  came up as expected.
>>>>>>>>>>>>>>>> ovn-ctl --db-sb-addr=10.99.152.148 --db-sb-port=6642
>>>>>>>>>>>>>>>> --db-sb-create-insecure-remote=yes
>>>>>>>>>>>>>>>> --db-sb-cluster-local-addr="tcp:10.99.152.148:6644"
>>>>>>>>>>>>>>>> start_sb_ovsdb.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> # verifying its a clustered db with ovsdb-tool
>>>>>>>>>>>>>>>> db-local-address /etc/openvswitch/ovnsb_db.db
>>>>>>>>>>>>>>>> tcp:10.99.152.148:6644
>>>>>>>>>>>>>>>> # ovn-sbctl show works fine and chassis are being populated
>>>>>>>>>>>>>>>> correctly.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> #Node 2 fails with error:
>>>>>>>>>>>>>>>> /usr/share/openvswitch/scripts/ovn-ctl
>>>>>>>>>>>>>>>> --db-sb-addr=10.99.152.138 --db-sb-port=6642 --db-sb-create-insecure-remote=yes
>>>>>>>>>>>>>>>> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644"
>>>>>>>>>>>>>>>> --db-sb-cluster-local-addr="tcp:10.99.152.138:6644"
>>>>>>>>>>>>>>>> start_sb_ovsdb
>>>>>>>>>>>>>>>> ovsdb-server: ovsdb error: /etc/openvswitch/ovnsb_db.db:
>>>>>>>>>>>>>>>> cannot identify file type
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> # So i did start the sb db the usual way using start_ovsdb
>>>>>>>>>>>>>>>> to just get the db file created and killed the sb pid and re-ran the
>>>>>>>>>>>>>>>> command which gave actual error where it complains for join-cluster command
>>>>>>>>>>>>>>>> that is being called internally
>>>>>>>>>>>>>>>> /usr/share/openvswitch/scripts/ovn-ctl
>>>>>>>>>>>>>>>> --db-sb-addr=10.99.152.138 --db-sb-port=6642 --db-sb-create-insecure-remote=yes
>>>>>>>>>>>>>>>> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644"
>>>>>>>>>>>>>>>> --db-sb-cluster-local-addr="tcp:10.99.152.138:6644"
>>>>>>>>>>>>>>>> start_sb_ovsdb
>>>>>>>>>>>>>>>> ovsdb-tool: /etc/openvswitch/ovnsb_db.db: not a clustered
>>>>>>>>>>>>>>>> database
>>>>>>>>>>>>>>>>  * Backing up database to /etc/openvswitch/ovnsb_db.db.b
>>>>>>>>>>>>>>>> ackup1.15.0-70426956
>>>>>>>>>>>>>>>> ovsdb-tool: 'join-cluster' command requires at least 4
>>>>>>>>>>>>>>>> arguments
>>>>>>>>>>>>>>>>  * Creating cluster database /etc/openvswitch/ovnsb_db.db
>>>>>>>>>>>>>>>> from existing one
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> # based on above error I killed the sb db pid again and
>>>>>>>>>>>>>>>> try to create a local cluster on node  then re-ran the join operation as
>>>>>>>>>>>>>>>> per the source code function.
>>>>>>>>>>>>>>>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db
>>>>>>>>>>>>>>>> OVN_Southbound tcp:10.99.152.138:6644 tcp:
>>>>>>>>>>>>>>>> 10.99.152.148:6644 which still complains
>>>>>>>>>>>>>>>> ovsdb-tool: I/O error: /etc/openvswitch/ovnsb_db.db: create
>>>>>>>>>>>>>>>> failed (File exists)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> # Node 3: I did not try as I am assuming the same failure
>>>>>>>>>>>>>>>> as node 2
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Let me know may know further.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Mar 13, 2018 at 3:08 AM, Numan Siddique <
>>>>>>>>>>>>>>>> nusiddiq at redhat.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Aliasgar,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Mar 13, 2018 at 7:11 AM, aginwala <
>>>>>>>>>>>>>>>>> aginwala at asu.edu> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Ben/Noman:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I am trying to setup 3 node southbound db cluster  using
>>>>>>>>>>>>>>>>>> raft10 <https://patchwork.ozlabs.org/patch/854298/> in
>>>>>>>>>>>>>>>>>> review.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> # Node 1 create-cluster
>>>>>>>>>>>>>>>>>> ovsdb-tool create-cluster /etc/openvswitch/ovnsb_db.db
>>>>>>>>>>>>>>>>>> /root/ovs-reviews/ovn/ovn-sb.ovsschema tcp:
>>>>>>>>>>>>>>>>>> 10.99.152.148:6642
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> A different port is used for RAFT. So you have to choose
>>>>>>>>>>>>>>>>> another port like 6644 for example.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> # Node 2
>>>>>>>>>>>>>>>>>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db
>>>>>>>>>>>>>>>>>> OVN_Southbound tcp:10.99.152.138:6642 tcp:
>>>>>>>>>>>>>>>>>> 10.99.152.148:6642 --cid 5dfcb678-bb1d-4377-b02d-a380ed
>>>>>>>>>>>>>>>>>> ec2982
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> #Node 3
>>>>>>>>>>>>>>>>>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db
>>>>>>>>>>>>>>>>>> OVN_Southbound tcp:10.99.152.101:6642 tcp:
>>>>>>>>>>>>>>>>>> 10.99.152.138:6642 tcp:10.99.152.148:6642 --cid
>>>>>>>>>>>>>>>>>> 5dfcb678-bb1d-4377-b02d-a380edec2982
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> # ovn remote is set to all 3 nodes
>>>>>>>>>>>>>>>>>> external_ids:ovn-remote="tcp:10.99.152.148:6642, tcp:
>>>>>>>>>>>>>>>>>> 10.99.152.138:6642, tcp:10.99.152.101:6642"
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> # Starting sb db on node 1 using below command on node 1:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ovsdb-server --detach --monitor -vconsole:off -vraft
>>>>>>>>>>>>>>>>>> -vjsonrpc --log-file=/var/log/openvswitch/ovsdb-server-sb.log
>>>>>>>>>>>>>>>>>> --pidfile=/var/run/openvswitch/ovnsb_db.pid
>>>>>>>>>>>>>>>>>> --remote=db:OVN_Southbound,SB_Global,connections
>>>>>>>>>>>>>>>>>> --unixctl=ovnsb_db.ctl --private-key=db:OVN_Southbound,SSL,private_key
>>>>>>>>>>>>>>>>>> --certificate=db:OVN_Southbound,SSL,certificate
>>>>>>>>>>>>>>>>>> --ca-cert=db:OVN_Southbound,SSL,ca_cert
>>>>>>>>>>>>>>>>>> --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols
>>>>>>>>>>>>>>>>>> --ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers
>>>>>>>>>>>>>>>>>> --remote=punix:/var/run/openvswitch/ovnsb_db.sock
>>>>>>>>>>>>>>>>>> /etc/openvswitch/ovnsb_db.db
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> # check-cluster is returning nothing
>>>>>>>>>>>>>>>>>> ovsdb-tool check-cluster /etc/openvswitch/ovnsb_db.db
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> # ovsdb-server-sb.log below shows the leader is elected
>>>>>>>>>>>>>>>>>> with only one server and there are rbac related debug logs with rpc replies
>>>>>>>>>>>>>>>>>> and empty params with no errors
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2018-03-13T01:12:02Z|00002|raft|DBG|server 63d1 added to
>>>>>>>>>>>>>>>>>> configuration
>>>>>>>>>>>>>>>>>> 2018-03-13T01:12:02Z|00003|raft|INFO|term 6: starting
>>>>>>>>>>>>>>>>>> election
>>>>>>>>>>>>>>>>>> 2018-03-13T01:12:02Z|00004|raft|INFO|term 6: elected
>>>>>>>>>>>>>>>>>> leader by 1+ of 1 servers
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Now Starting the ovsdb-server on the other clusters fails
>>>>>>>>>>>>>>>>>> saying
>>>>>>>>>>>>>>>>>> ovsdb-server: ovsdb error: /etc/openvswitch/ovnsb_db.db:
>>>>>>>>>>>>>>>>>> cannot identify file type
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Also noticed that man ovsdb-tool is missing cluster
>>>>>>>>>>>>>>>>>> details. Might want to address it in the same patch or different.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Please advise to what is missing here for running
>>>>>>>>>>>>>>>>>> ovn-sbctl show as this command hangs.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I think you can use the ovn-ctl command
>>>>>>>>>>>>>>>>> "start_cluster_sb_ovsdb" for your testing (atleast for now)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> For your setup, I think you can start the cluster as
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> # Node 1
>>>>>>>>>>>>>>>>> ovn-ctl --db-sb-addr=10.99.152.148 --db-sb-port=6642
>>>>>>>>>>>>>>>>> --db-sb-create-insecure-remote=yes
>>>>>>>>>>>>>>>>> --db-sb-cluster-local-addr="tcp:10.99.152.148:6644"
>>>>>>>>>>>>>>>>> start_cluster_sb_ovsdb
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> # Node 2
>>>>>>>>>>>>>>>>> ovn-ctl --db-sb-addr=10.99.152.138 --db-sb-port=6642
>>>>>>>>>>>>>>>>> --db-sb-create-insecure-remote=yes
>>>>>>>>>>>>>>>>> --db-sb-cluster-local-addr="tcp:10.99.152.138:6644"
>>>>>>>>>>>>>>>>> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644"
>>>>>>>>>>>>>>>>> start_cluster_sb_ovsdb
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> # Node 3
>>>>>>>>>>>>>>>>> ovn-ctl --db-sb-addr=10.99.152.101 --db-sb-port=6642
>>>>>>>>>>>>>>>>> --db-sb-create-insecure-remote=yes
>>>>>>>>>>>>>>>>> --db-sb-cluster-local-addr="tcp:10.99.152.101:6644"
>>>>>>>>>>>>>>>>> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644"
>>>>>>>>>>>>>>>>> start_cluster_sb_ovsdb
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Let me know how it goes.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>> Numan
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>> discuss mailing list
>>>>>>>>>>>>>>>>>> discuss at openvswitch.org
>>>>>>>>>>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> discuss mailing list
>>>>>>>> discuss at openvswitch.org
>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20180328/a4af7615/attachment-0001.html>


More information about the discuss mailing list