[ovs-discuss] raft ovsdb clustering

Numan Siddique nusiddiq at redhat.com
Thu Mar 15 10:09:55 UTC 2018


That's great

Numan


On Thu, Mar 15, 2018 at 2:57 AM, aginwala <aginwala at asu.edu> wrote:

> Hi Numan:
>
> I tried on new nodes (kernel : 4.4.0-104-generic , Ubuntu 16.04)with fresh
> installation and it worked super fine for both sb and nb dbs. Seems like
> some kernel issue on the previous nodes when I re-installed raft patch as I
> was running different ovs version on those nodes before.
>
>
> For 2 HVs, I now set ovn-remote="tcp:10.169.125.152:6642, tcp:
> 10.169.125.131:6642, tcp:10.148.181.162:6642"  and started controller and
> it works super fine.
>
>
> Did some failover testing by rebooting/killing the leader (10.169.125.152)
> and bringing it back up and it works as expected. Nothing weird noted so
> far.
>
> # check-cluster gives below data one of the node(10.148.181.162) post
> leader failure
>
> ovsdb-tool check-cluster /etc/openvswitch/ovnsb_db.db
> ovsdb-tool: leader /etc/openvswitch/ovnsb_db.db for term 2 has log entries
> only up to index 18446744073709551615, but index 9 was committed in a
> previous term (e.g. by /etc/openvswitch/ovnsb_db.db)
>
>
> For check-cluster, are we planning to add more output showing which node
> is active(leader), etc in upcoming versions ?
>
>
> Thanks a ton for helping sort this out.  I think the patch looks good to
> be merged post addressing of the comments by Justin along with the man page
> details for ovsdb-tool.
>
>
> I will do some more crash testing for the cluster along with the scale
> test and keep you posted if something unexpected is noted.
>
>
>
> Regards,
>
>
>
> On Tue, Mar 13, 2018 at 11:07 PM, Numan Siddique <nusiddiq at redhat.com>
> wrote:
>
>>
>>
>> On Wed, Mar 14, 2018 at 7:51 AM, aginwala <aginwala at asu.edu> wrote:
>>
>>> Sure.
>>>
>>> To add on , I also ran for nb db too using different port  and Node2
>>> crashes with same error :
>>> # Node 2
>>> /usr/share/openvswitch/scripts/ovn-ctl --db-nb-addr=10.99.152.138
>>> --db-nb-port=6641 --db-nb-cluster-remote-addr="tcp:10.99.152.148:6645"
>>> --db-nb-cluster-local-addr="tcp:10.99.152.138:6645" start_nb_ovsdb
>>> ovsdb-server: ovsdb error: /etc/openvswitch/ovnnb_db.db: cannot identify
>>> file type
>>>
>>>
>>>
>> Hi Aliasgar,
>>
>> It worked for me. Can you delete the old db files in /etc/openvswitch/
>> and try running the commands again ?
>>
>> Below are the commands I ran in my setup.
>>
>> Node 1
>> -------
>> sudo /usr/share/openvswitch/scripts/ovn-ctl  --db-sb-addr=192.168.121.91
>> --db-sb-port=6642 --db-sb-create-insecure-remote=yes
>> --db-sb-cluster-local-addr=tcp:192.168.121.91:6644 start_sb_ovsdb
>>
>> Node 2
>> ---------
>> sudo /usr/share/openvswitch/scripts/ovn-ctl  --db-sb-addr=192.168.121.87
>> --db-sb-port=6642 --db-sb-create-insecure-remote=yes
>> --db-sb-cluster-local-addr="tcp:192.168.121.87:6644"
>> --db-sb-cluster-remote-addr="tcp:192.168.121.91:6644"  start_sb_ovsdb
>>
>> Node 3
>> ---------
>> sudo /usr/share/openvswitch/scripts/ovn-ctl  --db-sb-addr=192.168.121.78
>> --db-sb-port=6642 --db-sb-create-insecure-remote=yes
>> --db-sb-cluster-local-addr="tcp:192.168.121.78:6644"
>> --db-sb-cluster-remote-addr="tcp:192.168.121.91:6644"  start_sb_ovsdb
>>
>>
>>
>> Thanks
>> Numan
>>
>>
>>
>>
>>
>>>
>>> On Tue, Mar 13, 2018 at 9:40 AM, Numan Siddique <nusiddiq at redhat.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Mar 13, 2018 at 9:46 PM, aginwala <aginwala at asu.edu> wrote:
>>>>
>>>>> Thanks Numan for the response.
>>>>>
>>>>> There is no command start_cluster_sb_ovsdb in the source code too. Is
>>>>> that in a separate commit somewhere? Hence, I used start_sb_ovsdb
>>>>> which I think would not be a right choice?
>>>>>
>>>>
>>>> Sorry, I meant start_sb_ovsdb. Strange that it didn't work for you. Let
>>>> me try it out again and update this thread.
>>>>
>>>> Thanks
>>>> Numan
>>>>
>>>>
>>>>>
>>>>> # Node1  came up as expected.
>>>>> ovn-ctl --db-sb-addr=10.99.152.148 --db-sb-port=6642
>>>>> --db-sb-create-insecure-remote=yes --db-sb-cluster-local-addr="tcp:
>>>>> 10.99.152.148:6644" start_sb_ovsdb.
>>>>>
>>>>> # verifying its a clustered db with ovsdb-tool db-local-address
>>>>> /etc/openvswitch/ovnsb_db.db
>>>>> tcp:10.99.152.148:6644
>>>>> # ovn-sbctl show works fine and chassis are being populated correctly.
>>>>>
>>>>> #Node 2 fails with error:
>>>>> /usr/share/openvswitch/scripts/ovn-ctl --db-sb-addr=10.99.152.138
>>>>> --db-sb-port=6642 --db-sb-create-insecure-remote=yes
>>>>> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644"
>>>>> --db-sb-cluster-local-addr="tcp:10.99.152.138:6644" start_sb_ovsdb
>>>>> ovsdb-server: ovsdb error: /etc/openvswitch/ovnsb_db.db: cannot
>>>>> identify file type
>>>>>
>>>>> # So i did start the sb db the usual way using start_ovsdb to just get
>>>>> the db file created and killed the sb pid and re-ran the command which gave
>>>>> actual error where it complains for join-cluster command that is being
>>>>> called internally
>>>>> /usr/share/openvswitch/scripts/ovn-ctl --db-sb-addr=10.99.152.138
>>>>> --db-sb-port=6642 --db-sb-create-insecure-remote=yes
>>>>> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644"
>>>>> --db-sb-cluster-local-addr="tcp:10.99.152.138:6644" start_sb_ovsdb
>>>>> ovsdb-tool: /etc/openvswitch/ovnsb_db.db: not a clustered database
>>>>>  * Backing up database to /etc/openvswitch/ovnsb_db.db.b
>>>>> ackup1.15.0-70426956
>>>>> ovsdb-tool: 'join-cluster' command requires at least 4 arguments
>>>>>  * Creating cluster database /etc/openvswitch/ovnsb_db.db from
>>>>> existing one
>>>>>
>>>>>
>>>>> # based on above error I killed the sb db pid again and  try to create
>>>>> a local cluster on node  then re-ran the join operation as per the source
>>>>> code function.
>>>>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db OVN_Southbound
>>>>> tcp:10.99.152.138:6644 tcp:10.99.152.148:6644 which still complains
>>>>> ovsdb-tool: I/O error: /etc/openvswitch/ovnsb_db.db: create failed
>>>>> (File exists)
>>>>>
>>>>>
>>>>> # Node 3: I did not try as I am assuming the same failure as node 2
>>>>>
>>>>>
>>>>> Let me know may know further.
>>>>>
>>>>>
>>>>> On Tue, Mar 13, 2018 at 3:08 AM, Numan Siddique <nusiddiq at redhat.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Aliasgar,
>>>>>>
>>>>>> On Tue, Mar 13, 2018 at 7:11 AM, aginwala <aginwala at asu.edu> wrote:
>>>>>>
>>>>>>> Hi Ben/Noman:
>>>>>>>
>>>>>>> I am trying to setup 3 node southbound db cluster  using raft10
>>>>>>> <https://patchwork.ozlabs.org/patch/854298/> in review.
>>>>>>>
>>>>>>> # Node 1 create-cluster
>>>>>>> ovsdb-tool create-cluster /etc/openvswitch/ovnsb_db.db
>>>>>>> /root/ovs-reviews/ovn/ovn-sb.ovsschema tcp:10.99.152.148:6642
>>>>>>>
>>>>>>
>>>>>> A different port is used for RAFT. So you have to choose another port
>>>>>> like 6644 for example.
>>>>>>
>>>>>
>>>>>>>
>>>>>>> # Node 2
>>>>>>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db OVN_Southbound
>>>>>>> tcp:10.99.152.138:6642 tcp:10.99.152.148:6642 --cid
>>>>>>> 5dfcb678-bb1d-4377-b02d-a380edec2982
>>>>>>>
>>>>>>> #Node 3
>>>>>>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db OVN_Southbound
>>>>>>> tcp:10.99.152.101:6642 tcp:10.99.152.138:6642 tcp:10.99.152.148:6642 --cid
>>>>>>> 5dfcb678-bb1d-4377-b02d-a380edec2982
>>>>>>>
>>>>>>> # ovn remote is set to all 3 nodes
>>>>>>> external_ids:ovn-remote="tcp:10.99.152.148:6642, tcp:
>>>>>>> 10.99.152.138:6642, tcp:10.99.152.101:6642"
>>>>>>>
>>>>>>
>>>>>>> # Starting sb db on node 1 using below command on node 1:
>>>>>>>
>>>>>>> ovsdb-server --detach --monitor -vconsole:off -vraft -vjsonrpc
>>>>>>> --log-file=/var/log/openvswitch/ovsdb-server-sb.log
>>>>>>> --pidfile=/var/run/openvswitch/ovnsb_db.pid
>>>>>>> --remote=db:OVN_Southbound,SB_Global,connections
>>>>>>> --unixctl=ovnsb_db.ctl --private-key=db:OVN_Southbound,SSL,private_key
>>>>>>> --certificate=db:OVN_Southbound,SSL,certificate
>>>>>>> --ca-cert=db:OVN_Southbound,SSL,ca_cert
>>>>>>> --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols
>>>>>>> --ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers
>>>>>>> --remote=punix:/var/run/openvswitch/ovnsb_db.sock
>>>>>>> /etc/openvswitch/ovnsb_db.db
>>>>>>>
>>>>>>> # check-cluster is returning nothing
>>>>>>> ovsdb-tool check-cluster /etc/openvswitch/ovnsb_db.db
>>>>>>>
>>>>>>> # ovsdb-server-sb.log below shows the leader is elected with only
>>>>>>> one server and there are rbac related debug logs with rpc replies and empty
>>>>>>> params with no errors
>>>>>>>
>>>>>>> 2018-03-13T01:12:02Z|00002|raft|DBG|server 63d1 added to
>>>>>>> configuration
>>>>>>> 2018-03-13T01:12:02Z|00003|raft|INFO|term 6: starting election
>>>>>>> 2018-03-13T01:12:02Z|00004|raft|INFO|term 6: elected leader by 1+
>>>>>>> of 1 servers
>>>>>>>
>>>>>>>
>>>>>>> Now Starting the ovsdb-server on the other clusters fails saying
>>>>>>> ovsdb-server: ovsdb error: /etc/openvswitch/ovnsb_db.db: cannot
>>>>>>> identify file type
>>>>>>>
>>>>>>>
>>>>>>> Also noticed that man ovsdb-tool is missing cluster details. Might
>>>>>>> want to address it in the same patch or different.
>>>>>>>
>>>>>>>
>>>>>>> Please advise to what is missing here for running ovn-sbctl show as
>>>>>>> this command hangs.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> I think you can use the ovn-ctl command "start_cluster_sb_ovsdb" for
>>>>>> your testing (atleast for now)
>>>>>>
>>>>>> For your setup, I think you can start the cluster as
>>>>>>
>>>>>> # Node 1
>>>>>> ovn-ctl --db-sb-addr=10.99.152.148 --db-sb-port=6642
>>>>>> --db-sb-create-insecure-remote=yes --db-sb-cluster-local-addr="tcp:
>>>>>> 10.99.152.148:6644" start_cluster_sb_ovsdb
>>>>>>
>>>>>> # Node 2
>>>>>> ovn-ctl --db-sb-addr=10.99.152.138 --db-sb-port=6642
>>>>>> --db-sb-create-insecure-remote=yes --db-sb-cluster-local-addr="tc
>>>>>> p:10.99.152.138:6644"  --db-sb-cluster-remote-addr="tcp:10.99.152.148
>>>>>> :6644" start_cluster_sb_ovsdb
>>>>>>
>>>>>> # Node 3
>>>>>> ovn-ctl --db-sb-addr=10.99.152.101 --db-sb-port=6642
>>>>>> --db-sb-create-insecure-remote=yes --db-sb-cluster-local-addr="tc
>>>>>> p:10.99.152.101:6644"  --db-sb-cluster-remote-addr="tcp:10.99.152.148
>>>>>> :6644" start_cluster_sb_ovsdb
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Let me know how it goes.
>>>>>>
>>>>>> Thanks
>>>>>> Numan
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> discuss mailing list
>>>>>>> discuss at openvswitch.org
>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20180315/bbf1d4bf/attachment-0001.html>


More information about the discuss mailing list