[ovs-dev] raft ovsdb clustering with scale test

Tue Apr 17 07:13:28 UTC 2018

On Tue, Apr 17, 2018 at 5:52 AM, aginwala <aginwala at asu.edu> wrote:

> Hi Ben/Numan:
>
>
> Just following up again. Let me know for any further
> improvements/suggestions on this.
>
>
Hi Aliasgar,

I  need to educate myself first by studing raft and Ben's patches :). I
hope I will do it soon.

Thanks
Numan

> Regards,
>
>
> On Mon, Apr 9, 2018 at 10:53 AM, aginwala <aginwala at asu.edu> wrote:
>
> > Further Analysis indicates that cluster is not able to select a new
> leader
> > and hence every node keeps retrying:
> >
> > ovn-northd logs around the same time on all three nodes complains about
> > disconnection from cluster and failing to choose leader
> >
> >
> > #Node1
> > 2018-04-06T01:19:06.481Z|00466|ovn_northd|INFO|ovn-northd lock lost.
> This
> > ovn-northd instance is now on standby.
> > 2018-04-06T01:19:07.668Z|00467|ovn_northd|INFO|ovn-northd lock acquired.
> > This ovn-northd instance is now active.
> > 2018-04-06T01:19:07.671Z|00468|ovsdb_idl|INFO|tcp:192.168.220.102:6642:
> > clustered database server is not cluster leader; trying another server
> > 2018-04-06T01:19:15.673Z|00469|reconnect|INFO|tcp:192.168.220.103:6642:
> > connected
> > 2018-04-06T01:19:15.673Z|00470|ovn_northd|INFO|ovn-northd lock lost.
> This
> > ovn-northd instance is now on standby.
> >
> > #Node2
> > 2018-04-06T01:19:58.249Z|00487|ovn_northd|INFO|ovn-northd lock acquired.
> > This ovn-northd instance is now active.
> > 2018-04-06T01:19:58.255Z|00488|ovsdb_idl|INFO|tcp:192.168.220.103:6642:
> > clustered database server is disconnected from cluster; trying another
> > server
> > 2018-04-06T01:20:09.261Z|00489|reconnect|INFO|tcp:192.168.220.102:6642:
> > connected
> > 2018-04-06T01:20:09.261Z|00490|ovn_northd|INFO|ovn-northd lock lost.
> This
> > ovn-northd instance is now on standby.
> >
> >
> > #Node3
> > 2018-04-06T01:19:01.654Z|00964|ovn_northd|INFO|ovn-northd lock acquired.
> > This ovn-northd instance is now active.
> > 2018-04-06T01:19:01.711Z|00965|ovsdb_idl|INFO|tcp:192.168.220.102:6642:
> > clustered database server is disconnected frr
> > om cluster; trying another server
> > 2018-04-06T01:19:09.715Z|00966|reconnect|INFO|tcp:192.168.220.103:6642:
> > connected
> > 2018-04-06T01:19:09.716Z|00967|ovn_northd|INFO|ovn-northd lock lost.
> This
> > ovn-northd instance is now on standby.
> >
> >
> >
> > Regards,
> > Aliasgar
> >
> > On Fri, Apr 6, 2018 at 2:16 PM, aginwala <aginwala at asu.edu> wrote:
> >
> >> Hi Ben/Numan:
> >>
> >> So I went ahead to try out clustered db in scale env and results are not
> >> as expected.
> >> OVS branch used is master; commit-id(2062840612904ff0873d
> >> 46b2f4f226bc23f2296d)
> >>
> >> Setup is uing 3 nodes.
> >>
> >> Also disabled inactivity_probe,
> >> ovn-nbctl --db="tcp:192.168.220.101:6641,tcp:192.168.220.102:6641,tcp:
> >> 192.168.220.103:6641" set NB . external_ids:inactivity_probe=0
> >> ovn-sbctl --db="tcp:192.168.220.101:6642,tcp:192.168.220.102:6642,tcp:
> >> 192.168.220.103:6642" set SB . external_ids:inactivity_probe=0
> >>
> >>
> >> 1. With wait_up = true for 10k lports and 1k HVs which internally uses
> >> wait-until, it was able to create around 1k lports and rally exited due
> db
> >> connection error.
> >> Cause of failure: database connection failed is @
> >> https://raw.githubusercontent.com/noah8713/ovn-scale-test/e3
> >> 98ccdd77b5c0bfed3d1cfe37c7e7ac5d8d8f81/results/output_raft_fail.html
> >>
> >> Some data from sb db logs are as below
> >> 2018-04-05T06:29:32.628Z|00065|poll_loop|INFO|wakeup due to 9-ms
> timeout
> >> at lib/reconnect.c:643 (90% CPU usage)
> >> 2018-04-05T06:30:19.432Z|00066|raft|INFO|current entry eid
> >> 23a85453-70a1-49ae-bf8f-22a1eeda6a60 does not match prerequisite
> >> 966d3234-961a-4dc5-b2ad-4f12766fd610 in execute_command_request
> >> 2018-04-05T06:31:03.428Z|00067|raft|INFO|current entry eid
> >> f0e3517a-e59a-4533-9e60-a3ded2ba704b does not match prerequisite
> >> 43ed7278-e1de-44b8-83ea-1cc9578c4d41 in execute_command_request
> >> 2018-04-05T06:31:03.469Z|00068|raft|INFO|current entry eid
> >> f0e3517a-e59a-4533-9e60-a3ded2ba704b does not match prerequisite
> >> 43ed7278-e1de-44b8-83ea-1cc9578c4d41 in execute_command_request
> >> 2018-04-05T06:31:03.492Z|00069|raft|INFO|current entry eid
> >> f0e3517a-e59a-4533-9e60-a3ded2ba704b does not match prerequisite
> >> 43ed7278-e1de-44b8-83ea-1cc9578c4d41 in execute_command_request
> >> 2018-04-05T06:31:03.505Z|00070|raft|INFO|current entry eid
> >> f0e3517a-e59a-4533-9e60-a3ded2ba704b does not match prerequisite
> >> 43ed7278-e1de-44b8-83ea-1cc9578c4d41 in execute_command_request
> >> 2018-04-05T06:31:03.510Z|00071|raft|INFO|current entry eid
> >> f0e3517a-e59a-4533-9e60-a3ded2ba704b does not match prerequisite
> >> 43ed7278-e1de-44b8-83ea-1cc9578c4d41 in execute_command_request
> >> 2018-04-05T06:32:04.056Z|00072|memory|INFO|peak resident set size grew
> >> 51% in last 35031.7 seconds, from 24748 kB to 37332 kB
> >> 2018-04-05T06:32:04.056Z|00073|memory|INFO|cells:59908 json-caches:1
> >> monitors:3 sessions:431
> >> 2018-04-05T06:32:27.956Z|00074|raft|WARN|ignoring vote request received
> >> as leader
> >> 2018-04-05T06:32:27.958Z|00075|timeval|WARN|Unreasonably long 1278ms
> >> poll interval (1044ms user, 232ms system)
> >> 2018-04-05T06:32:27.958Z|00076|timeval|WARN|faults: 61669 minor, 0
> major
> >> 2018-04-05T06:32:27.958Z|00077|timeval|WARN|context switches: 0
> >> voluntary, 3 involuntary
> >> 2018-04-05T06:32:27.958Z|00078|coverage|INFO|Event coverage, avg rate
> >> over last: 5 seconds, last minute, last hour,  hash=1dd2af20:
> >> 2018-04-05T06:32:27.959Z|00079|coverage|INFO|hmap_pathological
> >>  25.0/sec    24.633/sec        1.5550/sec   total: 19274
> >> 2018-04-05T06:32:27.959Z|00080|coverage|INFO|hmap_expand
> >> 4202.6/sec  4041.417/sec      578.9233/sec   total: 14158936
> >> 2018-04-05T06:32:27.959Z|00081|coverage|INFO|lockfile_lock
> >> 0.0/sec     0.000/sec        0.0000/sec   total: 1
> >> 2018-04-05T06:32:27.960Z|00082|coverage|INFO|poll_create_node
> >>  50281.4/sec 47195.233/sec    29424.6886/sec   total: 1001238470
> >> 2018-04-05T06:32:27.960Z|00083|coverage|INFO|poll_zero_timeout
> >> 9.8/sec    12.167/sec        8.2392/sec   total: 312540
> >> 2018-04-05T06:32:27.960Z|00084|coverage|INFO|seq_change
> >>  122.8/sec   115.967/sec       68.2761/sec   total: 2309640
> >> 2018-04-05T06:32:27.961Z|00085|coverage|INFO|pstream_open
> >>  0.0/sec     0.000/sec        0.0000/sec   total: 4
> >> 2018-04-05T06:32:27.961Z|00086|coverage|INFO|stream_open
> >> 0.0/sec     0.000/sec        0.0000/sec   total: 9
> >> 2018-04-05T06:32:27.961Z|00087|coverage|INFO|util_xalloc
> >> 229362.2/sec 218086.767/sec    46494.5058/sec   total: 1344641131
> >>
> >>
> >>
> >> some data from nb db logs:
> >> 2018-04-05T23:35:33.458Z|00020|raft|INFO|adding b7bd (b7bd at tcp:
> >> 192.168.220.103:6645) to cluster a36d succeeded (completed)
> >> 2018-04-05T23:53:06.360Z|00021|jsonrpc|WARN|tcp:192.168.10.10:57298:
> >> receive error: Connection reset by peer
> >> 2018-04-05T23:53:06.360Z|00022|reconnect|WARN|tcp:192.168.10.10:57298:
> >> connection dropped (Connection reset by peer)
> >> 2018-04-05T23:53:09.071Z|00023|jsonrpc|WARN|tcp:192.168.10.10:57348:
> >> receive error: Connection reset by peer
> >> 2018-04-05T23:53:09.071Z|00024|reconnect|WARN|tcp:192.168.10.10:57348:
> >> connection dropped (Connection reset by peer)
> >> ^2018-04-06T00:02:03.864Z|00076|raft|INFO|current entry eid
> >> 0b9eeff2-8377-4f6d-a907-cb4e98396d14 does not match prerequisite
> >> 277c09b4-96b3-43e9-b482-a9df29576988
> >> 2018-04-06T00:02:06.882Z|00077|poll_loop|INFO|Dropped 6187 log messages
> >> in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
> >> 2018-04-06T00:02:06.882Z|00078|poll_loop|INFO|wakeup due to [POLLIN] on
> >> fd 31 (192.168.220.102:6645<->192.168.10.10:54782) at
> >> lib/stream-fd.c:157 (62% CPU usage)
> >> 2018-04-06T00:02:09.130Z|00079|jsonrpc|WARN|tcp:192.168.10.10:33592:
> >> receive error: Connection reset by peer
> >> 2018-04-06T00:02:09.131Z|00080|reconnect|WARN|tcp:192.168.10.10:33592:
> >> connection dropped (Connection reset by peer)
> >> 2018-04-06T00:02:33.986Z|00081|poll_loop|INFO|Dropped 1509 log messages
> >> in last 26 seconds (most recently, 24 seconds ago) due to excessive rate
> >> 2018-04-06T00:02:33.986Z|00082|poll_loop|INFO|wakeup due to [POLLIN] on
> >> fd 24 (FIFO pipe:[40290445]) at ovsdb/log.c:994 (51% CPU usage)
> >> 2018-04-06T00:02:33.986Z|00083|poll_loop|INFO|wakeup due to [POLLIN] on
> >> fd 29 (192.168.220.102:6645<->192.168.10.10:55738) at
> >> lib/stream-fd.c:157 (51% CPU usage)
> >>
> >>
> >>
> >> 2018-04-06T00:13:48.141Z|00037|raft|INFO|rejecting term 4 < current
> term
> >> 5 received in vote_reply message from server b7bd
> >> 2018-04-06T00:14:21.138Z|00038|jsonrpc|WARN|tcp:192.168.10.10:33990:
> >> receive error: Connection reset by peer
> >> 2018-04-06T00:14:21.138Z|00039|reconnect|WARN|tcp:192.168.10.10:33990:
> >> connection dropped (Connection reset by peer)
> >> 2018-04-06T00:17:01.532Z|00040|raft|INFO|current entry eid
> >> e07229fe-99d8-436b-8ff5-ee533ed3c825 does not match prerequisite
> >> d47dbdf1-0981-4077-baab-372b05aeb0bb
> >> 2018-04-06T00:17:01.532Z|00041|raft|INFO|current entry eid
> >> e07229fe-99d8-436b-8ff5-ee533ed3c825 does not match prerequisite
> >> d47dbdf1-0981-4077-baab-372b05aeb0bb
> >> 2018-04-06T00:17:01.532Z|00042|raft|INFO|current entry eid
> >> e07229fe-99d8-436b-8ff5-ee533ed3c825 does not match prerequisite
> >> d47dbdf1-0981-4077-baab-372b05aeb0bb
> >> 2018-04-06T00:17:01.533Z|00043|raft|INFO|current entry eid
> >> e07229fe-99d8-436b-8ff5-ee533ed3c825 does not match prerequisite
> >> d47dbdf1-0981-4077-baab-372b05aeb0bb
> >> 2018-04-06T00:17:01.533Z|00044|raft|INFO|current entry eid
> >> e07229fe-99d8-436b-8ff5-ee533ed3c825 does not match prerequisite
> >> d47dbdf1-0981-4077-baab-372b05aeb0bb
> >> 2018-04-06T00:19:32.039Z|00045|timeval|WARN|Unreasonably long 1373ms
> >> poll interval (1340ms user, 24ms system)
> >> 2018-04-06T00:19:32.039Z|00046|timeval|WARN|faults: 319 minor, 0 major
> >> 2018-04-06T00:19:32.039Z|00047|timeval|WARN|disk: 0 reads, 368 writes
> >> 2018-04-06T00:19:32.039Z|00048|timeval|WARN|context switches: 5
> >> voluntary, 19 involuntary
> >> 2018-04-06T00:19:32.039Z|00049|coverage|INFO|Event coverage, avg rate
> >> over last: 5 seconds, last minute, last hour,  hash=1dd2af20:
> >> 2018-04-06T00:19:32.039Z|00050|coverage|INFO|hmap_pathological
> >> 0.0/sec     0.000/sec        3.4164/sec   total: 12306
> >> 2018-04-06T00:19:32.039Z|00051|coverage|INFO|hmap_expand
> >>  73.6/sec    71.267/sec      729.9656/sec   total: 2628980
> >> 2018-04-06T00:19:32.039Z|00052|coverage|INFO|lockfile_lock
> >> 0.0/sec     0.000/sec        0.0003/sec   total: 1
> >> 2018-04-06T00:19:32.039Z|00053|coverage|INFO|poll_create_node
> >>  129.6/sec   122.333/sec      866.7178/sec   total: 3120304
> >> 2018-04-06T00:19:32.039Z|00054|coverage|INFO|poll_zero_timeout
> >> 0.0/sec     0.267/sec       27.4550/sec   total: 98838
> >> 2018-04-06T00:19:32.039Z|00055|coverage|INFO|seq_change
> >> 10.8/sec    10.217/sec      135.4125/sec   total: 487498
> >> 2018-04-06T00:19:32.039Z|00056|coverage|INFO|pstream_open
> >>  0.0/sec     0.000/sec        0.0011/sec   total: 4
> >> 2018-04-06T00:19:32.039Z|00057|coverage|INFO|stream_open
> >> 0.0/sec     0.000/sec        0.0019/sec   total: 7
> >> 2018-04-06T00:19:32.039Z|00058|coverage|INFO|util_xalloc
> >> 1542.2/sec  1488.750/sec    21346.2483/sec   total: 76872987
> >> 2018-04-06T00:19:32.039Z|00059|coverage|
> >>
> >>
> >>
> >> To find that wait-until was only the bottleneck, I skipped wait-until
> and
> >> test got completed in 2.4 hours but things did not go as expected. All
> the
> >> chassis ovn-controllers crashed and ports actually did not get bind all
> the
> >> way down to the compute as ovs sb db was running super hot
> >>
> >> Below is the controller log from sample HV without wait-until.
> >> 2018-04-06T04:50:40.274Z|01703|poll_loop|INFO|Dropped 5903 log messages
> >> in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
> >> 2018-04-06T04:50:40.274Z|01704|poll_loop|INFO|wakeup due to
> >> [POLLIN][POLLHUP] on fd 13 (<->/root/sandbox-192.168.72.45/br-int.mgmt)
> >> at ../lib/stream-fd.c:157 (100% CPU usage)
> >> 2018-04-06T04:50:41.060Z|01705|ovsdb_idl|INFO|tcp:192.168.220.102:6642:
> >> clustered database server is disconnected from cluster; trying another
> >> server
> >> 2018-04-06T04:50:41.066Z|01706|binding|INFO|Claiming lport
> >> lport_a3060c_mlAOEq for this chassis.
> >> 2018-04-06T04:50:41.066Z|01707|binding|INFO|lport_a3060c_mlAOEq:
> >> Claiming a2:1c:47:32:30:52 172.145.23.37
> >> 2018-04-06T04:50:41.066Z|01708|vconn_stream|ERR|send: Broken pipe
> >> 2018-04-06T04:50:41.066Z|01709|rconn|WARN|unix:/root/sandbox-
> >> 192.168.72.45/br-int.mgmt: connection dropped (Broken pipe)
> >> 2018-04-06T04:50:41.067Z|01710|rconn|INFO|unix:/root/sandbox-
> >> 192.168.72.45/br-int.mgmt: connected
> >> 2018-04-06T04:50:41.067Z|01711|rconn|WARN|unix:/root/sandbox-
> >> 192.168.72.45/br-int.mgmt: connection dropped (Broken pipe)
> >> 2018-04-06T04:50:41.067Z|01712|util|EMER|../lib/ofp-msgs.c:1062:
> >> assertion version >= info->min_version && version <= info->max_version
> >> failed in raw_instance_get()
> >>
> >>
> >>
> >> Also northd sees many transaction errors without wait-until:
> >> northd.log
> >> 2018-04-06T04:25:26.594Z|01076|ovsdb_idl|WARN|transaction error:
> >> {"details":"Transaction causes multiple rows in \"Port_Binding\" table
> to
> >> have identical values (44d6726d-2a3a-428d-b8d4-b0d1e4c9593b and 104)
> for
> >> index on columns \"datapath\" and \"tunnel_key\".  First row, with UUID
> >> 489f777f-d7f6-4b92-80b9-c1c64c083ea4, existed in the database before
> >> this transaction and was not modified by the transaction.  Second row,
> with
> >> UUID b59a2500-3da1-40ab-932c-5e2dd4acc3b4, was inserted by this
> >> transaction.","error":"constraint violation"}
> >> 2018-04-06T04:25:27.202Z|01077|ovsdb_idl|WARN|transaction error:
> >> {"details":"Transaction causes multiple rows in \"Port_Binding\" table
> to
> >> have identical values (44d6726d-2a3a-428d-b8d4-b0d1e4c9593b and 26) for
> >> index on columns \"datapath\" and \"tunnel_key\".  First row, with UUID
> >> 0668b300-55f8-4bdd-b0a8-8038258d4502, was inserted by this transaction.
> >> Second row, with UUID 63972d45-8845-4489-ac3e-916909a593c7, existed in
> >> the database before this transaction and was not modified by the
> >> transaction.","error":"constraint violation"}
> >> 2018-04-06T04:25:32.036Z|01078|ovsdb_idl|WARN|transaction error:
> >> {"details":"Transaction causes multiple rows in \"Port_Binding\" table
> to
> >> have identical values (44d6726d-2a3a-428d-b8d4-b0d1e4c9593b and 224)
> for
> >> index on columns \"datapath\" and \"tunnel_key\".  First row, with UUID
> >> 528dea47-f284-40d6-bc7e-a97a084f49eb, existed in the database before
> >> this transaction and was not modified by the transaction.  Second row,
> with
> >> UUID 6fcc2500-fedb-4ee9-b54d-ea548dd36936, was inserted by this
> >> transaction.","error":"constraint violation"}
> >> 2018-04-06T04:25:32.482Z|01079|ovsdb_idl|WARN|transaction error:
> >> {"details":"Transaction causes multiple rows in \"Port_Binding\" table
> to
> >> have identical values (44d6726d-2a3a-428d-b8d4-b0d1e4c9593b and 23) for
> >> index on columns \"datapath\" and \"tunnel_key\".  First row, with UUID
> >> 925eb001-adeb-4748-a55f-6c5812e70a7e, was inserted by this transaction.
> >> Second row, with UUID 5ed7bbc5-b7f5-4d4e-8e99-d8c4adcb2b34, existed in
> >> the database before this transaction and was not modified by the
> >> transaction.","error":"constraint violation"}
> >> 2018-04-06T04:25:35.204Z|01080|ovsdb_idl|WARN|transaction error:
> >> {"details":"Transaction causes multiple rows in \"Port_Binding\" table
> to
> >> have identical values (44d6726d-2a3a-428d-b8d4-b0d1e4c9593b and 18) for
> >> index on columns \"datapath\" and \"tunnel_key\".  First row, with UUID
> >> 4b429b00-75ca-4e38-904d-80a11e6c0f38, was inserted by this transaction.
> >> Second row, with UUID c2db03fe-18b6-4a2b-8db0-bf37ef93db92, existed in
> >> the database before this transaction and was not modified by the
> >> transaction.","error":"constraint violation"}
> >> 2018-04-06T04:25:35.933Z|01081|ovsdb_idl|INFO|tcp:192.168.220.101:6642:
> >> clustered database server is not cluster leader; trying another server
> >>
> >>
> >> To summerize:
> >>
> >>    - ovn nb db has issues with wait-until and works fine without
> >>    wait-until
> >>    - ovn sb db causes controller to crash due to more HVs and just few
> >>    hundred of ports with no solid root cause without wait-until. Even
> sbctl
> >>    commands hangs as sb db is 100% on cpu.
> >>    - ovn northd sees transaction error without wait-until
> >>
> >>
> >>
> >> Will continue to troubleshoot and would love to get any
> >> improvements/suggestions if any.
> >>
> >>
> >> Regards,
> >>
> >>
> >
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>