[ovs-discuss] raft ovsdb clustering

aginwala aginwala at asu.edu
Wed Mar 14 21:27:19 UTC 2018


Hi Numan:

I tried on new nodes (kernel : 4.4.0-104-generic , Ubuntu 16.04)with fresh
installation and it worked super fine for both sb and nb dbs. Seems like
some kernel issue on the previous nodes when I re-installed raft patch as I
was running different ovs version on those nodes before.


For 2 HVs, I now set ovn-remote="tcp:10.169.125.152:6642, tcp:
10.169.125.131:6642, tcp:10.148.181.162:6642"  and started controller and
it works super fine.


Did some failover testing by rebooting/killing the leader (10.169.125.152)
and bringing it back up and it works as expected. Nothing weird noted so
far.

# check-cluster gives below data one of the node(10.148.181.162) post
leader failure

ovsdb-tool check-cluster /etc/openvswitch/ovnsb_db.db
ovsdb-tool: leader /etc/openvswitch/ovnsb_db.db for term 2 has log entries
only up to index 18446744073709551615, but index 9 was committed in a
previous term (e.g. by /etc/openvswitch/ovnsb_db.db)


For check-cluster, are we planning to add more output showing which node is
active(leader), etc in upcoming versions ?


Thanks a ton for helping sort this out.  I think the patch looks good to be
merged post addressing of the comments by Justin along with the man page
details for ovsdb-tool.


I will do some more crash testing for the cluster along with the scale test
and keep you posted if something unexpected is noted.



Regards,



On Tue, Mar 13, 2018 at 11:07 PM, Numan Siddique <nusiddiq at redhat.com>
wrote:

>
>
> On Wed, Mar 14, 2018 at 7:51 AM, aginwala <aginwala at asu.edu> wrote:
>
>> Sure.
>>
>> To add on , I also ran for nb db too using different port  and Node2
>> crashes with same error :
>> # Node 2
>> /usr/share/openvswitch/scripts/ovn-ctl --db-nb-addr=10.99.152.138
>> --db-nb-port=6641 --db-nb-cluster-remote-addr="tcp:10.99.152.148:6645"
>> --db-nb-cluster-local-addr="tcp:10.99.152.138:6645" start_nb_ovsdb
>> ovsdb-server: ovsdb error: /etc/openvswitch/ovnnb_db.db: cannot identify
>> file type
>>
>>
>>
> Hi Aliasgar,
>
> It worked for me. Can you delete the old db files in /etc/openvswitch/ and
> try running the commands again ?
>
> Below are the commands I ran in my setup.
>
> Node 1
> -------
> sudo /usr/share/openvswitch/scripts/ovn-ctl  --db-sb-addr=192.168.121.91
> --db-sb-port=6642 --db-sb-create-insecure-remote=yes
> --db-sb-cluster-local-addr=tcp:192.168.121.91:6644 start_sb_ovsdb
>
> Node 2
> ---------
> sudo /usr/share/openvswitch/scripts/ovn-ctl  --db-sb-addr=192.168.121.87
> --db-sb-port=6642 --db-sb-create-insecure-remote=yes
> --db-sb-cluster-local-addr="tcp:192.168.121.87:6644"
> --db-sb-cluster-remote-addr="tcp:192.168.121.91:6644"  start_sb_ovsdb
>
> Node 3
> ---------
> sudo /usr/share/openvswitch/scripts/ovn-ctl  --db-sb-addr=192.168.121.78
> --db-sb-port=6642 --db-sb-create-insecure-remote=yes
> --db-sb-cluster-local-addr="tcp:192.168.121.78:6644"
> --db-sb-cluster-remote-addr="tcp:192.168.121.91:6644"  start_sb_ovsdb
>
>
>
> Thanks
> Numan
>
>
>
>
>
>>
>> On Tue, Mar 13, 2018 at 9:40 AM, Numan Siddique <nusiddiq at redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, Mar 13, 2018 at 9:46 PM, aginwala <aginwala at asu.edu> wrote:
>>>
>>>> Thanks Numan for the response.
>>>>
>>>> There is no command start_cluster_sb_ovsdb in the source code too. Is
>>>> that in a separate commit somewhere? Hence, I used start_sb_ovsdb
>>>> which I think would not be a right choice?
>>>>
>>>
>>> Sorry, I meant start_sb_ovsdb. Strange that it didn't work for you. Let
>>> me try it out again and update this thread.
>>>
>>> Thanks
>>> Numan
>>>
>>>
>>>>
>>>> # Node1  came up as expected.
>>>> ovn-ctl --db-sb-addr=10.99.152.148 --db-sb-port=6642
>>>> --db-sb-create-insecure-remote=yes --db-sb-cluster-local-addr="tcp:
>>>> 10.99.152.148:6644" start_sb_ovsdb.
>>>>
>>>> # verifying its a clustered db with ovsdb-tool db-local-address
>>>> /etc/openvswitch/ovnsb_db.db
>>>> tcp:10.99.152.148:6644
>>>> # ovn-sbctl show works fine and chassis are being populated correctly.
>>>>
>>>> #Node 2 fails with error:
>>>> /usr/share/openvswitch/scripts/ovn-ctl --db-sb-addr=10.99.152.138
>>>> --db-sb-port=6642 --db-sb-create-insecure-remote=yes
>>>> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644"
>>>> --db-sb-cluster-local-addr="tcp:10.99.152.138:6644" start_sb_ovsdb
>>>> ovsdb-server: ovsdb error: /etc/openvswitch/ovnsb_db.db: cannot
>>>> identify file type
>>>>
>>>> # So i did start the sb db the usual way using start_ovsdb to just get
>>>> the db file created and killed the sb pid and re-ran the command which gave
>>>> actual error where it complains for join-cluster command that is being
>>>> called internally
>>>> /usr/share/openvswitch/scripts/ovn-ctl --db-sb-addr=10.99.152.138
>>>> --db-sb-port=6642 --db-sb-create-insecure-remote=yes
>>>> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644"
>>>> --db-sb-cluster-local-addr="tcp:10.99.152.138:6644" start_sb_ovsdb
>>>> ovsdb-tool: /etc/openvswitch/ovnsb_db.db: not a clustered database
>>>>  * Backing up database to /etc/openvswitch/ovnsb_db.db.b
>>>> ackup1.15.0-70426956
>>>> ovsdb-tool: 'join-cluster' command requires at least 4 arguments
>>>>  * Creating cluster database /etc/openvswitch/ovnsb_db.db from existing
>>>> one
>>>>
>>>>
>>>> # based on above error I killed the sb db pid again and  try to create
>>>> a local cluster on node  then re-ran the join operation as per the source
>>>> code function.
>>>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db OVN_Southbound tcp:
>>>> 10.99.152.138:6644 tcp:10.99.152.148:6644 which still complains
>>>> ovsdb-tool: I/O error: /etc/openvswitch/ovnsb_db.db: create failed
>>>> (File exists)
>>>>
>>>>
>>>> # Node 3: I did not try as I am assuming the same failure as node 2
>>>>
>>>>
>>>> Let me know may know further.
>>>>
>>>>
>>>> On Tue, Mar 13, 2018 at 3:08 AM, Numan Siddique <nusiddiq at redhat.com>
>>>> wrote:
>>>>
>>>>> Hi Aliasgar,
>>>>>
>>>>> On Tue, Mar 13, 2018 at 7:11 AM, aginwala <aginwala at asu.edu> wrote:
>>>>>
>>>>>> Hi Ben/Noman:
>>>>>>
>>>>>> I am trying to setup 3 node southbound db cluster  using raft10
>>>>>> <https://patchwork.ozlabs.org/patch/854298/> in review.
>>>>>>
>>>>>> # Node 1 create-cluster
>>>>>> ovsdb-tool create-cluster /etc/openvswitch/ovnsb_db.db
>>>>>> /root/ovs-reviews/ovn/ovn-sb.ovsschema tcp:10.99.152.148:6642
>>>>>>
>>>>>
>>>>> A different port is used for RAFT. So you have to choose another port
>>>>> like 6644 for example.
>>>>>
>>>>
>>>>>>
>>>>>> # Node 2
>>>>>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db OVN_Southbound
>>>>>> tcp:10.99.152.138:6642 tcp:10.99.152.148:6642 --cid
>>>>>> 5dfcb678-bb1d-4377-b02d-a380edec2982
>>>>>>
>>>>>> #Node 3
>>>>>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db OVN_Southbound
>>>>>> tcp:10.99.152.101:6642 tcp:10.99.152.138:6642 tcp:10.99.152.148:6642 --cid
>>>>>> 5dfcb678-bb1d-4377-b02d-a380edec2982
>>>>>>
>>>>>> # ovn remote is set to all 3 nodes
>>>>>> external_ids:ovn-remote="tcp:10.99.152.148:6642, tcp:
>>>>>> 10.99.152.138:6642, tcp:10.99.152.101:6642"
>>>>>>
>>>>>
>>>>>> # Starting sb db on node 1 using below command on node 1:
>>>>>>
>>>>>> ovsdb-server --detach --monitor -vconsole:off -vraft -vjsonrpc
>>>>>> --log-file=/var/log/openvswitch/ovsdb-server-sb.log
>>>>>> --pidfile=/var/run/openvswitch/ovnsb_db.pid
>>>>>> --remote=db:OVN_Southbound,SB_Global,connections
>>>>>> --unixctl=ovnsb_db.ctl --private-key=db:OVN_Southbound,SSL,private_key
>>>>>> --certificate=db:OVN_Southbound,SSL,certificate
>>>>>> --ca-cert=db:OVN_Southbound,SSL,ca_cert
>>>>>> --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols
>>>>>> --ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers
>>>>>> --remote=punix:/var/run/openvswitch/ovnsb_db.sock
>>>>>> /etc/openvswitch/ovnsb_db.db
>>>>>>
>>>>>> # check-cluster is returning nothing
>>>>>> ovsdb-tool check-cluster /etc/openvswitch/ovnsb_db.db
>>>>>>
>>>>>> # ovsdb-server-sb.log below shows the leader is elected with only one
>>>>>> server and there are rbac related debug logs with rpc replies and empty
>>>>>> params with no errors
>>>>>>
>>>>>> 2018-03-13T01:12:02Z|00002|raft|DBG|server 63d1 added to
>>>>>> configuration
>>>>>> 2018-03-13T01:12:02Z|00003|raft|INFO|term 6: starting election
>>>>>> 2018-03-13T01:12:02Z|00004|raft|INFO|term 6: elected leader by 1+ of
>>>>>> 1 servers
>>>>>>
>>>>>>
>>>>>> Now Starting the ovsdb-server on the other clusters fails saying
>>>>>> ovsdb-server: ovsdb error: /etc/openvswitch/ovnsb_db.db: cannot
>>>>>> identify file type
>>>>>>
>>>>>>
>>>>>> Also noticed that man ovsdb-tool is missing cluster details. Might
>>>>>> want to address it in the same patch or different.
>>>>>>
>>>>>>
>>>>>> Please advise to what is missing here for running ovn-sbctl show as
>>>>>> this command hangs.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> I think you can use the ovn-ctl command "start_cluster_sb_ovsdb" for
>>>>> your testing (atleast for now)
>>>>>
>>>>> For your setup, I think you can start the cluster as
>>>>>
>>>>> # Node 1
>>>>> ovn-ctl --db-sb-addr=10.99.152.148 --db-sb-port=6642
>>>>> --db-sb-create-insecure-remote=yes --db-sb-cluster-local-addr="tcp:
>>>>> 10.99.152.148:6644" start_cluster_sb_ovsdb
>>>>>
>>>>> # Node 2
>>>>> ovn-ctl --db-sb-addr=10.99.152.138 --db-sb-port=6642
>>>>> --db-sb-create-insecure-remote=yes --db-sb-cluster-local-addr="tc
>>>>> p:10.99.152.138:6644"  --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644"
>>>>> start_cluster_sb_ovsdb
>>>>>
>>>>> # Node 3
>>>>> ovn-ctl --db-sb-addr=10.99.152.101 --db-sb-port=6642
>>>>> --db-sb-create-insecure-remote=yes --db-sb-cluster-local-addr="tc
>>>>> p:10.99.152.101:6644"  --db-sb-cluster-remote-addr="tcp:10.99.152.148:
>>>>> 6644" start_cluster_sb_ovsdb
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> Let me know how it goes.
>>>>>
>>>>> Thanks
>>>>> Numan
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> discuss mailing list
>>>>>> discuss at openvswitch.org
>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20180314/5e2dbb05/attachment-0001.html>


More information about the discuss mailing list