[ovs-dev] raft ovsdb clustering with scale test

Tue Apr 17 00:22:16 UTC 2018

Hi Ben/Numan:

Just following up again. Let me know for any further
improvements/suggestions on this.

Regards,

On Mon, Apr 9, 2018 at 10:53 AM, aginwala <aginwala at asu.edu> wrote:

> Further Analysis indicates that cluster is not able to select a new leader
> and hence every node keeps retrying:
>
> ovn-northd logs around the same time on all three nodes complains about
> disconnection from cluster and failing to choose leader
>
>
> #Node1
> 2018-04-06T01:19:06.481Z|00466|ovn_northd|INFO|ovn-northd lock lost. This
> ovn-northd instance is now on standby.
> 2018-04-06T01:19:07.668Z|00467|ovn_northd|INFO|ovn-northd lock acquired.
> This ovn-northd instance is now active.
> 2018-04-06T01:19:07.671Z|00468|ovsdb_idl|INFO|tcp:192.168.220.102:6642:
> clustered database server is not cluster leader; trying another server
> 2018-04-06T01:19:15.673Z|00469|reconnect|INFO|tcp:192.168.220.103:6642:
> connected
> 2018-04-06T01:19:15.673Z|00470|ovn_northd|INFO|ovn-northd lock lost. This
> ovn-northd instance is now on standby.
>
> #Node2
> 2018-04-06T01:19:58.249Z|00487|ovn_northd|INFO|ovn-northd lock acquired.
> This ovn-northd instance is now active.
> 2018-04-06T01:19:58.255Z|00488|ovsdb_idl|INFO|tcp:192.168.220.103:6642:
> clustered database server is disconnected from cluster; trying another
> server
> 2018-04-06T01:20:09.261Z|00489|reconnect|INFO|tcp:192.168.220.102:6642:
> connected
> 2018-04-06T01:20:09.261Z|00490|ovn_northd|INFO|ovn-northd lock lost. This
> ovn-northd instance is now on standby.
>
>
> #Node3
> 2018-04-06T01:19:01.654Z|00964|ovn_northd|INFO|ovn-northd lock acquired.
> This ovn-northd instance is now active.
> 2018-04-06T01:19:01.711Z|00965|ovsdb_idl|INFO|tcp:192.168.220.102:6642:
> clustered database server is disconnected frr
> om cluster; trying another server
> 2018-04-06T01:19:09.715Z|00966|reconnect|INFO|tcp:192.168.220.103:6642:
> connected
> 2018-04-06T01:19:09.716Z|00967|ovn_northd|INFO|ovn-northd lock lost. This
> ovn-northd instance is now on standby.
>
>
>
> Regards,
> Aliasgar
>
> On Fri, Apr 6, 2018 at 2:16 PM, aginwala <aginwala at asu.edu> wrote:
>
>> Hi Ben/Numan:
>>
>> So I went ahead to try out clustered db in scale env and results are not
>> as expected.
>> OVS branch used is master; commit-id(2062840612904ff0873d
>> 46b2f4f226bc23f2296d)
>>
>> Setup is uing 3 nodes.
>>
>> Also disabled inactivity_probe,
>> ovn-nbctl --db="tcp:192.168.220.101:6641,tcp:192.168.220.102:6641,tcp:
>> 192.168.220.103:6641" set NB . external_ids:inactivity_probe=0
>> ovn-sbctl --db="tcp:192.168.220.101:6642,tcp:192.168.220.102:6642,tcp:
>> 192.168.220.103:6642" set SB . external_ids:inactivity_probe=0
>>
>>
>> 1. With wait_up = true for 10k lports and 1k HVs which internally uses
>> wait-until, it was able to create around 1k lports and rally exited due db
>> connection error.
>> Cause of failure: database connection failed is @
>> https://raw.githubusercontent.com/noah8713/ovn-scale-test/e3
>> 98ccdd77b5c0bfed3d1cfe37c7e7ac5d8d8f81/results/output_raft_fail.html
>>
>> Some data from sb db logs are as below
>> 2018-04-05T06:29:32.628Z|00065|poll_loop|INFO|wakeup due to 9-ms timeout
>> at lib/reconnect.c:643 (90% CPU usage)
>> 2018-04-05T06:30:19.432Z|00066|raft|INFO|current entry eid
>> 23a85453-70a1-49ae-bf8f-22a1eeda6a60 does not match prerequisite
>> 966d3234-961a-4dc5-b2ad-4f12766fd610 in execute_command_request
>> 2018-04-05T06:31:03.428Z|00067|raft|INFO|current entry eid
>> f0e3517a-e59a-4533-9e60-a3ded2ba704b does not match prerequisite
>> 43ed7278-e1de-44b8-83ea-1cc9578c4d41 in execute_command_request
>> 2018-04-05T06:31:03.469Z|00068|raft|INFO|current entry eid
>> f0e3517a-e59a-4533-9e60-a3ded2ba704b does not match prerequisite
>> 43ed7278-e1de-44b8-83ea-1cc9578c4d41 in execute_command_request
>> 2018-04-05T06:31:03.492Z|00069|raft|INFO|current entry eid
>> f0e3517a-e59a-4533-9e60-a3ded2ba704b does not match prerequisite
>> 43ed7278-e1de-44b8-83ea-1cc9578c4d41 in execute_command_request
>> 2018-04-05T06:31:03.505Z|00070|raft|INFO|current entry eid
>> f0e3517a-e59a-4533-9e60-a3ded2ba704b does not match prerequisite
>> 43ed7278-e1de-44b8-83ea-1cc9578c4d41 in execute_command_request
>> 2018-04-05T06:31:03.510Z|00071|raft|INFO|current entry eid
>> f0e3517a-e59a-4533-9e60-a3ded2ba704b does not match prerequisite
>> 43ed7278-e1de-44b8-83ea-1cc9578c4d41 in execute_command_request
>> 2018-04-05T06:32:04.056Z|00072|memory|INFO|peak resident set size grew
>> 51% in last 35031.7 seconds, from 24748 kB to 37332 kB
>> 2018-04-05T06:32:04.056Z|00073|memory|INFO|cells:59908 json-caches:1
>> monitors:3 sessions:431
>> 2018-04-05T06:32:27.956Z|00074|raft|WARN|ignoring vote request received
>> as leader
>> 2018-04-05T06:32:27.958Z|00075|timeval|WARN|Unreasonably long 1278ms
>> poll interval (1044ms user, 232ms system)
>> 2018-04-05T06:32:27.958Z|00076|timeval|WARN|faults: 61669 minor, 0 major
>> 2018-04-05T06:32:27.958Z|00077|timeval|WARN|context switches: 0
>> voluntary, 3 involuntary
>> 2018-04-05T06:32:27.958Z|00078|coverage|INFO|Event coverage, avg rate
>> over last: 5 seconds, last minute, last hour,  hash=1dd2af20:
>> 2018-04-05T06:32:27.959Z|00079|coverage|INFO|hmap_pathological
>>  25.0/sec    24.633/sec        1.5550/sec   total: 19274
>> 2018-04-05T06:32:27.959Z|00080|coverage|INFO|hmap_expand
>> 4202.6/sec  4041.417/sec      578.9233/sec   total: 14158936
>> 2018-04-05T06:32:27.959Z|00081|coverage|INFO|lockfile_lock
>> 0.0/sec     0.000/sec        0.0000/sec   total: 1
>> 2018-04-05T06:32:27.960Z|00082|coverage|INFO|poll_create_node
>>  50281.4/sec 47195.233/sec    29424.6886/sec   total: 1001238470
>> 2018-04-05T06:32:27.960Z|00083|coverage|INFO|poll_zero_timeout
>> 9.8/sec    12.167/sec        8.2392/sec   total: 312540
>> 2018-04-05T06:32:27.960Z|00084|coverage|INFO|seq_change
>>  122.8/sec   115.967/sec       68.2761/sec   total: 2309640
>> 2018-04-05T06:32:27.961Z|00085|coverage|INFO|pstream_open
>>  0.0/sec     0.000/sec        0.0000/sec   total: 4
>> 2018-04-05T06:32:27.961Z|00086|coverage|INFO|stream_open
>> 0.0/sec     0.000/sec        0.0000/sec   total: 9
>> 2018-04-05T06:32:27.961Z|00087|coverage|INFO|util_xalloc
>> 229362.2/sec 218086.767/sec    46494.5058/sec   total: 1344641131
>>
>>
>>
>> some data from nb db logs:
>> 2018-04-05T23:35:33.458Z|00020|raft|INFO|adding b7bd (b7bd at tcp:
>> 192.168.220.103:6645) to cluster a36d succeeded (completed)
>> 2018-04-05T23:53:06.360Z|00021|jsonrpc|WARN|tcp:192.168.10.10:57298:
>> receive error: Connection reset by peer
>> 2018-04-05T23:53:06.360Z|00022|reconnect|WARN|tcp:192.168.10.10:57298:
>> connection dropped (Connection reset by peer)
>> 2018-04-05T23:53:09.071Z|00023|jsonrpc|WARN|tcp:192.168.10.10:57348:
>> receive error: Connection reset by peer
>> 2018-04-05T23:53:09.071Z|00024|reconnect|WARN|tcp:192.168.10.10:57348:
>> connection dropped (Connection reset by peer)
>> ^2018-04-06T00:02:03.864Z|00076|raft|INFO|current entry eid
>> 0b9eeff2-8377-4f6d-a907-cb4e98396d14 does not match prerequisite
>> 277c09b4-96b3-43e9-b482-a9df29576988
>> 2018-04-06T00:02:06.882Z|00077|poll_loop|INFO|Dropped 6187 log messages
>> in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
>> 2018-04-06T00:02:06.882Z|00078|poll_loop|INFO|wakeup due to [POLLIN] on
>> fd 31 (192.168.220.102:6645<->192.168.10.10:54782) at
>> lib/stream-fd.c:157 (62% CPU usage)
>> 2018-04-06T00:02:09.130Z|00079|jsonrpc|WARN|tcp:192.168.10.10:33592:
>> receive error: Connection reset by peer
>> 2018-04-06T00:02:09.131Z|00080|reconnect|WARN|tcp:192.168.10.10:33592:
>> connection dropped (Connection reset by peer)
>> 2018-04-06T00:02:33.986Z|00081|poll_loop|INFO|Dropped 1509 log messages
>> in last 26 seconds (most recently, 24 seconds ago) due to excessive rate
>> 2018-04-06T00:02:33.986Z|00082|poll_loop|INFO|wakeup due to [POLLIN] on
>> fd 24 (FIFO pipe:[40290445]) at ovsdb/log.c:994 (51% CPU usage)
>> 2018-04-06T00:02:33.986Z|00083|poll_loop|INFO|wakeup due to [POLLIN] on
>> fd 29 (192.168.220.102:6645<->192.168.10.10:55738) at
>> lib/stream-fd.c:157 (51% CPU usage)
>>
>>
>>
>> 2018-04-06T00:13:48.141Z|00037|raft|INFO|rejecting term 4 < current term
>> 5 received in vote_reply message from server b7bd
>> 2018-04-06T00:14:21.138Z|00038|jsonrpc|WARN|tcp:192.168.10.10:33990:
>> receive error: Connection reset by peer
>> 2018-04-06T00:14:21.138Z|00039|reconnect|WARN|tcp:192.168.10.10:33990:
>> connection dropped (Connection reset by peer)
>> 2018-04-06T00:17:01.532Z|00040|raft|INFO|current entry eid
>> e07229fe-99d8-436b-8ff5-ee533ed3c825 does not match prerequisite
>> d47dbdf1-0981-4077-baab-372b05aeb0bb
>> 2018-04-06T00:17:01.532Z|00041|raft|INFO|current entry eid
>> e07229fe-99d8-436b-8ff5-ee533ed3c825 does not match prerequisite
>> d47dbdf1-0981-4077-baab-372b05aeb0bb
>> 2018-04-06T00:17:01.532Z|00042|raft|INFO|current entry eid
>> e07229fe-99d8-436b-8ff5-ee533ed3c825 does not match prerequisite
>> d47dbdf1-0981-4077-baab-372b05aeb0bb
>> 2018-04-06T00:17:01.533Z|00043|raft|INFO|current entry eid
>> e07229fe-99d8-436b-8ff5-ee533ed3c825 does not match prerequisite
>> d47dbdf1-0981-4077-baab-372b05aeb0bb
>> 2018-04-06T00:17:01.533Z|00044|raft|INFO|current entry eid
>> e07229fe-99d8-436b-8ff5-ee533ed3c825 does not match prerequisite
>> d47dbdf1-0981-4077-baab-372b05aeb0bb
>> 2018-04-06T00:19:32.039Z|00045|timeval|WARN|Unreasonably long 1373ms
>> poll interval (1340ms user, 24ms system)
>> 2018-04-06T00:19:32.039Z|00046|timeval|WARN|faults: 319 minor, 0 major
>> 2018-04-06T00:19:32.039Z|00047|timeval|WARN|disk: 0 reads, 368 writes
>> 2018-04-06T00:19:32.039Z|00048|timeval|WARN|context switches: 5
>> voluntary, 19 involuntary
>> 2018-04-06T00:19:32.039Z|00049|coverage|INFO|Event coverage, avg rate
>> over last: 5 seconds, last minute, last hour,  hash=1dd2af20:
>> 2018-04-06T00:19:32.039Z|00050|coverage|INFO|hmap_pathological
>> 0.0/sec     0.000/sec        3.4164/sec   total: 12306
>> 2018-04-06T00:19:32.039Z|00051|coverage|INFO|hmap_expand
>>  73.6/sec    71.267/sec      729.9656/sec   total: 2628980
>> 2018-04-06T00:19:32.039Z|00052|coverage|INFO|lockfile_lock
>> 0.0/sec     0.000/sec        0.0003/sec   total: 1
>> 2018-04-06T00:19:32.039Z|00053|coverage|INFO|poll_create_node
>>  129.6/sec   122.333/sec      866.7178/sec   total: 3120304
>> 2018-04-06T00:19:32.039Z|00054|coverage|INFO|poll_zero_timeout
>> 0.0/sec     0.267/sec       27.4550/sec   total: 98838
>> 2018-04-06T00:19:32.039Z|00055|coverage|INFO|seq_change
>> 10.8/sec    10.217/sec      135.4125/sec   total: 487498
>> 2018-04-06T00:19:32.039Z|00056|coverage|INFO|pstream_open
>>  0.0/sec     0.000/sec        0.0011/sec   total: 4
>> 2018-04-06T00:19:32.039Z|00057|coverage|INFO|stream_open
>> 0.0/sec     0.000/sec        0.0019/sec   total: 7
>> 2018-04-06T00:19:32.039Z|00058|coverage|INFO|util_xalloc
>> 1542.2/sec  1488.750/sec    21346.2483/sec   total: 76872987
>> 2018-04-06T00:19:32.039Z|00059|coverage|
>>
>>
>>
>> To find that wait-until was only the bottleneck, I skipped wait-until and
>> test got completed in 2.4 hours but things did not go as expected. All the
>> chassis ovn-controllers crashed and ports actually did not get bind all the
>> way down to the compute as ovs sb db was running super hot
>>
>> Below is the controller log from sample HV without wait-until.
>> 2018-04-06T04:50:40.274Z|01703|poll_loop|INFO|Dropped 5903 log messages
>> in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
>> 2018-04-06T04:50:40.274Z|01704|poll_loop|INFO|wakeup due to
>> [POLLIN][POLLHUP] on fd 13 (<->/root/sandbox-192.168.72.45/br-int.mgmt)
>> at ../lib/stream-fd.c:157 (100% CPU usage)
>> 2018-04-06T04:50:41.060Z|01705|ovsdb_idl|INFO|tcp:192.168.220.102:6642:
>> clustered database server is disconnected from cluster; trying another
>> server
>> 2018-04-06T04:50:41.066Z|01706|binding|INFO|Claiming lport
>> lport_a3060c_mlAOEq for this chassis.
>> 2018-04-06T04:50:41.066Z|01707|binding|INFO|lport_a3060c_mlAOEq:
>> Claiming a2:1c:47:32:30:52 172.145.23.37
>> 2018-04-06T04:50:41.066Z|01708|vconn_stream|ERR|send: Broken pipe
>> 2018-04-06T04:50:41.066Z|01709|rconn|WARN|unix:/root/sandbox-
>> 192.168.72.45/br-int.mgmt: connection dropped (Broken pipe)
>> 2018-04-06T04:50:41.067Z|01710|rconn|INFO|unix:/root/sandbox-
>> 192.168.72.45/br-int.mgmt: connected
>> 2018-04-06T04:50:41.067Z|01711|rconn|WARN|unix:/root/sandbox-
>> 192.168.72.45/br-int.mgmt: connection dropped (Broken pipe)
>> 2018-04-06T04:50:41.067Z|01712|util|EMER|../lib/ofp-msgs.c:1062:
>> assertion version >= info->min_version && version <= info->max_version
>> failed in raw_instance_get()
>>
>>
>>
>> Also northd sees many transaction errors without wait-until:
>> northd.log
>> 2018-04-06T04:25:26.594Z|01076|ovsdb_idl|WARN|transaction error:
>> {"details":"Transaction causes multiple rows in \"Port_Binding\" table to
>> have identical values (44d6726d-2a3a-428d-b8d4-b0d1e4c9593b and 104) for
>> index on columns \"datapath\" and \"tunnel_key\".  First row, with UUID
>> 489f777f-d7f6-4b92-80b9-c1c64c083ea4, existed in the database before
>> this transaction and was not modified by the transaction.  Second row, with
>> UUID b59a2500-3da1-40ab-932c-5e2dd4acc3b4, was inserted by this
>> transaction.","error":"constraint violation"}
>> 2018-04-06T04:25:27.202Z|01077|ovsdb_idl|WARN|transaction error:
>> {"details":"Transaction causes multiple rows in \"Port_Binding\" table to
>> have identical values (44d6726d-2a3a-428d-b8d4-b0d1e4c9593b and 26) for
>> index on columns \"datapath\" and \"tunnel_key\".  First row, with UUID
>> 0668b300-55f8-4bdd-b0a8-8038258d4502, was inserted by this transaction.
>> Second row, with UUID 63972d45-8845-4489-ac3e-916909a593c7, existed in
>> the database before this transaction and was not modified by the
>> transaction.","error":"constraint violation"}
>> 2018-04-06T04:25:32.036Z|01078|ovsdb_idl|WARN|transaction error:
>> {"details":"Transaction causes multiple rows in \"Port_Binding\" table to
>> have identical values (44d6726d-2a3a-428d-b8d4-b0d1e4c9593b and 224) for
>> index on columns \"datapath\" and \"tunnel_key\".  First row, with UUID
>> 528dea47-f284-40d6-bc7e-a97a084f49eb, existed in the database before
>> this transaction and was not modified by the transaction.  Second row, with
>> UUID 6fcc2500-fedb-4ee9-b54d-ea548dd36936, was inserted by this
>> transaction.","error":"constraint violation"}
>> 2018-04-06T04:25:32.482Z|01079|ovsdb_idl|WARN|transaction error:
>> {"details":"Transaction causes multiple rows in \"Port_Binding\" table to
>> have identical values (44d6726d-2a3a-428d-b8d4-b0d1e4c9593b and 23) for
>> index on columns \"datapath\" and \"tunnel_key\".  First row, with UUID
>> 925eb001-adeb-4748-a55f-6c5812e70a7e, was inserted by this transaction.
>> Second row, with UUID 5ed7bbc5-b7f5-4d4e-8e99-d8c4adcb2b34, existed in
>> the database before this transaction and was not modified by the
>> transaction.","error":"constraint violation"}
>> 2018-04-06T04:25:35.204Z|01080|ovsdb_idl|WARN|transaction error:
>> {"details":"Transaction causes multiple rows in \"Port_Binding\" table to
>> have identical values (44d6726d-2a3a-428d-b8d4-b0d1e4c9593b and 18) for
>> index on columns \"datapath\" and \"tunnel_key\".  First row, with UUID
>> 4b429b00-75ca-4e38-904d-80a11e6c0f38, was inserted by this transaction.
>> Second row, with UUID c2db03fe-18b6-4a2b-8db0-bf37ef93db92, existed in
>> the database before this transaction and was not modified by the
>> transaction.","error":"constraint violation"}
>> 2018-04-06T04:25:35.933Z|01081|ovsdb_idl|INFO|tcp:192.168.220.101:6642:
>> clustered database server is not cluster leader; trying another server
>>
>>
>> To summerize:
>>
>>    - ovn nb db has issues with wait-until and works fine without
>>    wait-until
>>    - ovn sb db causes controller to crash due to more HVs and just few
>>    hundred of ports with no solid root cause without wait-until. Even sbctl
>>    commands hangs as sb db is 100% on cpu.
>>    - ovn northd sees transaction error without wait-until
>>
>>
>>
>> Will continue to troubleshoot and would love to get any
>> improvements/suggestions if any.
>>
>>
>> Regards,
>>
>>
>