[ovs-discuss] OVN Scale with RAFT: how to make ovn-northd more reliable when RAFT leader unstable

Winson Wang windson.wang at gmail.com
Wed Apr 29 17:28:52 UTC 2020


Hello Experts,

I am doing stress with k8s cluster with ovn,  one thing I am seeing is that
when raft nodes
got update for large data in short time from ovn-northd,  3 raft nodes will
trigger voting and leader role switched from one node to another.

>From ovn-northd side,  I can see ovn-northd will trigger the BACKOFF,
RECONNECT...

Since ovn-northd only connect to NB/SB leader only and how can we make
ovn-northd more available  in most of the time?

Is it possible to make ovn-northd have established connections to all raft
nodes to avoid the
reconnect mechanism?
Since the backoff time 8s is not configurable for now.


Test logs:

2020-04-29T17:03:08.296Z|41861|ovsdb_idl|INFO|tcp:10.0.2.152:6642:
clustered database server is not cluster leader; trying another server

2020-04-29T17:03:08.296Z|41862|reconnect|DBG|tcp:10.0.2.152:6642: entering
RECONNECT

2020-04-29T17:03:08.304Z|41863|reconnect|DBG|tcp:10.0.2.152:6642: entering
BACKOFF

2020-04-29T17:03:09.708Z|41867|coverage|INFO|Dropped 2 log messages in last
78 seconds (most recently, 71 seconds ago) due to excessive rate

2020-04-29T17:03:09.708Z|41868|coverage|INFO|Skipping details of duplicate
event coverage for hash=ceada91f

2020-04-29T17:03:16.304Z|41869|reconnect|DBG|tcp:10.0.2.153:6642: entering
CONNECTING

2020-04-29T17:03:16.308Z|41870|reconnect|INFO|tcp:10.0.2.153:6642: connected

2020-04-29T17:03:16.308Z|41871|reconnect|DBG|tcp:10.0.2.153:6642: entering
ACTIVE

2020-04-29T17:03:16.308Z|41872|ovn_northd|INFO|ovn-northd lock lost. This
ovn-northd instance is now on standby.

2020-04-29T17:03:16.309Z|41873|ovn_northd|INFO|ovn-northd lock acquired.
This ovn-northd instance is now active.

2020-04-29T17:03:16.311Z|41874|ovsdb_idl|INFO|tcp:10.0.2.153:6642:
clustered database server is disconnected from cluster; trying another
server

2020-04-29T17:03:16.311Z|41875|reconnect|DBG|tcp:10.0.2.153:6642: entering
RECONNECT

2020-04-29T17:03:16.312Z|41876|reconnect|DBG|tcp:10.0.2.153:6642: entering
BACKOFF

2020-04-29T17:03:24.316Z|41877|reconnect|DBG|tcp:10.0.2.151:6642: entering
CONNECTING

2020-04-29T17:03:24.321Z|41878|reconnect|INFO|tcp:10.0.2.151:6642: connected

2020-04-29T17:03:24.321Z|41879|reconnect|DBG|tcp:10.0.2.151:6642: entering
ACTIVE

2020-04-29T17:03:24.321Z|41880|ovn_northd|INFO|ovn-northd lock lost. This
ovn-northd instance is now on standby.

2020-04-29T17:03:24.354Z|41881|ovn_northd|INFO|ovn-northd lock acquired.
This ovn-northd instance is now active.

2020-04-29T17:03:24.358Z|41882|ovsdb_idl|INFO|tcp:10.0.2.151:6642:
clustered database server is not cluster leader; trying another server

2020-04-29T17:03:24.358Z|41883|reconnect|DBG|tcp:10.0.2.151:6642: entering
RECONNECT

2020-04-29T17:03:24.360Z|41884|reconnect|DBG|tcp:10.0.2.151:6642: entering
BACKOFF

2020-04-29T17:03:32.367Z|41885|reconnect|DBG|tcp:10.0.2.152:6642: entering
CONNECTING

2020-04-29T17:03:32.372Z|41886|reconnect|INFO|tcp:10.0.2.152:6642: connected

2020-04-29T17:03:32.372Z|41887|reconnect|DBG|tcp:10.0.2.152:6642: entering
ACTIVE

2020-04-29T17:03:32.372Z|41888|ovn_northd|INFO|ovn-northd lock lost. This
ovn-northd instance is now on standby.

2020-04-29T17:03:32.373Z|41889|ovn_northd|INFO|ovn-northd lock acquired.
This ovn-northd instance is now active.

2020-04-29T17:03:32.376Z|41890|ovsdb_idl|INFO|tcp:10.0.2.152:6642:
clustered database server is not cluster leader; trying another server

2020-04-29T17:03:32.376Z|41891|reconnect|DBG|tcp:10.0.2.152:6642: entering
RECONNECT

2020-04-29T17:03:32.378Z|41892|reconnect|DBG|tcp:10.0.2.152:6642: entering
BACKOFF

2020-04-29T17:03:40.381Z|41893|reconnect|DBG|tcp:10.0.2.153:6642: entering
CONNECTING

2020-04-29T17:03:40.385Z|41894|reconnect|INFO|tcp:10.0.2.153:6642: connected

2020-04-29T17:03:40.385Z|41895|reconnect|DBG|tcp:10.0.2.153:6642: entering
ACTIVE

2020-04-29T17:03:40.385Z|41896|ovn_northd|INFO|ovn-northd lock lost. This
ovn-northd instance is now on standby.

2020-04-29T17:03:40.385Z|41897|ovn_northd|INFO|ovn-northd lock acquired.
This ovn-northd instance is now active.

-- 
Winson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20200429/681b22b8/attachment-0001.html>


More information about the discuss mailing list