[ovs-discuss] OVN Scale with RAFT: how to make ovn-northd more reliable when RAFT leader unstable

Fri Jul 17 15:53:59 UTC 2020

On Fri, Jul 17, 2020 at 12:54 AM Dumitru Ceara <dceara at redhat.com> wrote:

> On 7/17/20 2:58 AM, Winson Wang wrote:
> > Hi Dumitru,
> >
> > most of the flows are in table 19.
>
> This is the ls_in_pre_hairpin table where we add flows for each backend
> of the load balancers.
>
> >
> > -rw-r--r-- 1 root root 142M Jul 16 17:07 br-int.txt (all flows dump file)
> > -rw-r--r-- 1 root root 102M Jul 16 17:43 table-19.txt
> > -rw-r--r-- 1 root root 7.8M Jul 16 17:43 table-11.txt
> > -rw-r--r-- 1 root root 3.7M Jul 16 17:43 table-21.txt
> >
> > # cat table-19.txt |wc -l
> > 408458
> > ]# cat table-19.txt | grep "=9153" | wc -l
> > 124744
> >  cat table-19.txt | grep "=53" | wc -l
> > 249488
> > Coredns pod has svc with port number 53 and 9153.
> >
>
> How many backends do you have for these VIPs (with port number 53 and
> 9153) in your load_balancer config?
>
backends number is 63 with 63 CoreDNS pods running with exposing Cluster IP
10.96.0.10.  tcp/53,  udp/53, tcp/9153.
lb-list  |grep 10.96.0.10
3b8a468a-44d2-4a34-94ca-626dac936cde                        udp
10.96.0.10:53        192.168.104.3:53,192.168.105.3:53,192.168.106.3:53,
192.168.107.3:53,192.168.108.3:53,192.168.109.3:53,192.168.110.3:53,
192.168.111.3:53,192.168.112.3:53,192.168.113.3:53,192.168.114.3:53,
192.168.115.3:53,192.168.116.3:53,192.168.118.4:53,192.168.119.3:53,
192.168.120.4:53,192.168.121.3:53,192.168.122.3:53,192.168.123.3:53,
192.168.130.3:53,192.168.131.3:53,192.168.136.3:53,192.168.142.3:53,
192.168.4.3:53,192.168.45.3:53,192.168.46.3:53,192.168.47.3:53,
192.168.48.3:53,192.168.49.3:53,192.168.50.3:53,192.168.51.3:53,
192.168.52.3:53,192.168.53.3:53,192.168.54.3:53,192.168.55.3:53,
192.168.56.3:53,192.168.57.3:53,192.168.58.3:53,192.168.59.4:53,
192.168.60.4:53,192.168.61.3:53,192.168.62.3:53,192.168.63.3:53,
192.168.64.3:53,192.168.65.3:53,192.168.66.3:53,192.168.67.3:53,
192.168.68.4:53,192.168.69.3:53,192.168.70.3:53,192.168.71.3:53,
192.168.72.3:53,192.168.73.3:53,192.168.74.4:53,192.168.75.4:53,
192.168.76.3:53,192.168.77.3:53,192.168.78.3:53,192.168.79.3:53,
192.168.80.4:53,192.168.81.3:53,192.168.82.4:53,192.168.83.4:53
                                                            tcp
10.96.0.10:53        192.168.104.3:53,192.168.105.3:53,192.168.106.3:53,
192.168.107.3:53,192.168.108.3:53,192.168.109.3:53,192.168.110.3:53,
192.168.111.3:53,192.168.112.3:53,192.168.113.3:53,192.168.114.3:53,
192.168.115.3:53,192.168.116.3:53,192.168.118.4:53,192.168.119.3:53,
192.168.120.4:53,192.168.121.3:53,192.168.122.3:53,192.168.123.3:53,
192.168.130.3:53,192.168.131.3:53,192.168.136.3:53,192.168.142.3:53,
192.168.4.3:53,192.168.45.3:53,192.168.46.3:53,192.168.47.3:53,
192.168.48.3:53,192.168.49.3:53,192.168.50.3:53,192.168.51.3:53,
192.168.52.3:53,192.168.53.3:53,192.168.54.3:53,192.168.55.3:53,
192.168.56.3:53,192.168.57.3:53,192.168.58.3:53,192.168.59.4:53,
192.168.60.4:53,192.168.61.3:53,192.168.62.3:53,192.168.63.3:53,
192.168.64.3:53,192.168.65.3:53,192.168.66.3:53,192.168.67.3:53,
192.168.68.4:53,192.168.69.3:53,192.168.70.3:53,192.168.71.3:53,
192.168.72.3:53,192.168.73.3:53,192.168.74.4:53,192.168.75.4:53,
192.168.76.3:53,192.168.77.3:53,192.168.78.3:53,192.168.79.3:53,
192.168.80.4:53,192.168.81.3:53,192.168.82.4:53,192.168.83.4:53
                                                            tcp
10.96.0.10:9153      192.168.104.3:9153,192.168.105.3:9153,
192.168.106.3:9153,192.168.107.3:9153,192.168.108.3:9153,192.168.109.3:9153,
192.168.110.3:9153,192.168.111.3:9153,192.168.112.3:9153,192.168.113.3:9153,
192.168.114.3:9153,192.168.115.3:9153,192.168.116.3:9153,192.168.118.4:9153,
192.168.119.3:9153,192.168.120.4:9153,192.168.121.3:9153,192.168.122.3:9153,
192.168.123.3:9153,192.168.130.3:9153,192.168.131.3:9153,192.168.136.3:9153,
192.168.142.3:9153,192.168.4.3:9153,192.168.45.3:9153,192.168.46.3:9153,
192.168.47.3:9153,192.168.48.3:9153,192.168.49.3:9153,192.168.50.3:9153,
192.168.51.3:9153,192.168.52.3:9153,192.168.53.3:9153,192.168.54.3:9153,
192.168.55.3:9153,192.168.56.3:9153,192.168.57.3:9153,192.168.58.3:9153,
192.168.59.4:9153,192.168.60.4:9153,192.168.61.3:9153,192.168.62.3:9153,
192.168.63.3:9153,192.168.64.3:9153,192.168.65.3:9153,192.168.66.3:9153,
192.168.67.3:9153,192.168.68.4:9153,192.168.69.3:9153,192.168.70.3:9153,
192.168.71.3:9153,192.168.72.3:9153,192.168.73.3:9153,192.168.74.4:9153,
192.168.75.4:9153,192.168.76.3:9153,192.168.77.3:9153,192.168.78.3:9153,
192.168.79.3:9153,192.168.80.4:9153,192.168.81.3:9153,192.168.82.4:9153,
192.168.83.4:9153

>
> Thanks,
> Dumitru
>
> > Please let me know if you need more information.
> >
> >
> > Regards,
> > Winson
> >
> >
> > On Thu, Jul 16, 2020 at 11:23 AM Dumitru Ceara <dceara at redhat.com
> > <mailto:dceara at redhat.com>> wrote:
> >
> >     On 7/15/20 8:02 PM, Winson Wang wrote:
> >     > +add ovn-Kubernetes group.
> >     >
> >     > Hi Dumitru,
> >     >
> >     > With recent patches from you and Han,  now for k8s basic workload,
> >     such
> >     > node resources and pod resources are fixed and look good.
> >     > Much thanks!
> >
> >     Hi Winson,
> >
> >     Glad to hear that!
> >
> >     >
> >     > For k8s workload which exposes as svc IP is every common,  for
> >     example,
> >     > the coreDNS pod's deployment.
> >     > With large cluster size such  as 1000,  there is service to auto
> scale
> >     > up coreDNS deployment,  if we use default 16 nodes per coredns,
> >     it could be
> >     > 63 coredns pods.
> >     > On my 1006 nodes setup,  deployment from coreDNS from 2 to 63.
> >     > SB raft election 16s is not good for this operation in my test
> >     > environment, it makes one raft node cannot finish the election in
> two
> >     > election slot when making all it's
> >     > clients disconnect and reconnect to two other raft nodes,  which
> makes
> >     > raft clients in an unbalanced state after this operation.
> >     > This condition might be avoided without larger election timer.
> >     >
> >     > For the SB and work node resource side:
> >     > SB DB size increased 27MB.
> >     > br-int open flows increased around 369K,
> >     > RSS memory of (ovs + ovn-controller) increased more than 600MB.
> >
> >     This increase on the hypervisor side is most likely because of the
> >     openflows for hairpin traffic for VIPs (service IP). To confirm,
> would
> >     it be possible to take a snapshot of the OVS flow table and see how
> many
> >     flows there are per table?
> >
> >     >
> >     > So if OVN experts can figure how to optimize it would be very
> >     great for
> >     > ovn-k8s scale up to large cluster size I think.
> >     >
> >
> >     If the above is due to flows for LB flows to handle hairpin traffic,
> the
> >     only idea I have is to use OVS "learn" action to have the flows
> >     generated as needed. However, I didn't get the chance to try it out
> yet.
> >
> >     Thanks,
> >     Dumitru
> >
> >     >
> >     > Regards,
> >     > Winson
> >     >
> >     >
> >     > On Fri, May 1, 2020 at 1:35 AM Dumitru Ceara <dceara at redhat.com
> >     <mailto:dceara at redhat.com>
> >     > <mailto:dceara at redhat.com <mailto:dceara at redhat.com>>> wrote:
> >     >
> >     >     On 5/1/20 12:00 AM, Winson Wang wrote:
> >     >     > Hi Han,  Dumitru,
> >     >     >
> >     >
> >     >     Hi Winson,
> >     >
> >     >     > With the fix from Dumitru
> >     >     >
> >     >
> >
> https://github.com/ovn-org/ovn/commit/97e82ae5f135a088c9e95b49122d8217718d23f4
> >     >     >
> >     >     > It can greatly reduced the OVS SB RAFT workload based on my
> >     stress
> >     >     test
> >     >     > mode with k8s svc with large endpoints.
> >     >     >
> >     >     > The DB file size increased much less with fix, so it will
> >     not trigger
> >     >     > the leader election with same work load.
> >     >     >
> >     >     > Dumitru,  based my test,  logic flows number is fixed with
> >     cluster
> >     >     size
> >     >     > regardless of number of VIP endpoints.
> >     >
> >     >     The number of logical flows will be fixed based on number of
> >     VIPs (2 per
> >     >     VIP) but the size of the match expression depends on the
> number of
> >     >     backends per VIP so the SB DB size will increase when adding
> >     backends to
> >     >     existing VIPs.
> >     >
> >     >     >
> >     >     > But the open flow count on each node still have the
> relationship
> >     >     of the
> >     >     > endpoints size.
> >     >
> >     >     Yes, this is due to the match expression in the logical flow
> >     above which
> >     >     is of the form:
> >     >
> >     >     (ip.src == backend-ip1 && ip.dst == backend-ip2) || .. ||
> >     (ip.src ==
> >     >     backend-ipn && ip.dst == backend-ipn)
> >     >
> >     >     This will get expanded to n openflow rules, one per backend, to
> >     >     determine if traffic was hairpinned.
> >     >
> >     >     > Any idea how to reduce the open flow cnt on each node's
> br-int?
> >     >     >
> >     >     >
> >     >
> >     >     Unfortunately I don't think there's a way to determine if
> >     traffic was
> >     >     hairpinned because I don't think we can have openflow rules
> >     that match
> >     >     on "ip.src == ip.dst". So in the worst case, we will probably
> >     need two
> >     >     openflow rules per backend IP (one for initiator traffic, one
> for
> >     >     reply).
> >     >
> >     >     I'll think more about it though.
> >     >
> >     >     Regards,
> >     >     Dumitru
> >     >
> >     >     > Regards,
> >     >     > Winson
> >     >     >
> >     >     >
> >     >     >
> >     >     >
> >     >     >
> >     >     >
> >     >     >
> >     >     > On Wed, Apr 29, 2020 at 1:42 PM Winson Wang
> >     >     <windson.wang at gmail.com <mailto:windson.wang at gmail.com>
> >     <mailto:windson.wang at gmail.com <mailto:windson.wang at gmail.com>>
> >     >     > <mailto:windson.wang at gmail.com
> >     <mailto:windson.wang at gmail.com> <mailto:windson.wang at gmail.com
> >     <mailto:windson.wang at gmail.com>>>>
> >     >     wrote:
> >     >     >
> >     >     >     Hi Han,
> >     >     >
> >     >     >     Thanks for quick reply.
> >     >     >     Please see my reply below.
> >     >     >
> >     >     >     On Wed, Apr 29, 2020 at 12:31 PM Han Zhou <hzhou at ovn.org
> >     <mailto:hzhou at ovn.org>
> >     >     <mailto:hzhou at ovn.org <mailto:hzhou at ovn.org>>
> >     >     >     <mailto:hzhou at ovn.org <mailto:hzhou at ovn.org>
> >     <mailto:hzhou at ovn.org <mailto:hzhou at ovn.org>>>> wrote:
> >     >     >
> >     >     >
> >     >     >
> >     >     >         On Wed, Apr 29, 2020 at 10:29 AM Winson Wang
> >     >     >         <windson.wang at gmail.com
> >     <mailto:windson.wang at gmail.com> <mailto:windson.wang at gmail.com
> >     <mailto:windson.wang at gmail.com>>
> >     >     <mailto:windson.wang at gmail.com <mailto:windson.wang at gmail.com>
> >     <mailto:windson.wang at gmail.com <mailto:windson.wang at gmail.com>>>>
> wrote:
> >     >     >         >
> >     >     >         > Hello Experts,
> >     >     >         >
> >     >     >         > I am doing stress with k8s cluster with ovn,  one
> >     thing I am
> >     >     >         seeing is that when raft nodes
> >     >     >         > got update for large data in short time from
> >     ovn-northd,  3
> >     >     >         raft nodes will trigger voting and leader role
> switched
> >     >     from one
> >     >     >         node to another.
> >     >     >         >
> >     >     >         > From ovn-northd side,  I can see ovn-northd will
> >     trigger the
> >     >     >         BACKOFF, RECONNECT...
> >     >     >         >
> >     >     >         > Since ovn-northd only connect to NB/SB leader only
> and
> >     >     how can
> >     >     >         we make ovn-northd more available  in most of the
> time?
> >     >     >         >
> >     >     >         > Is it possible to make ovn-northd have established
> >     >     connections
> >     >     >         to all raft nodes to avoid the
> >     >     >         > reconnect mechanism?
> >     >     >         > Since the backoff time 8s is not configurable for
> now.
> >     >     >         >
> >     >     >         >
> >     >     >         > Test logs:
> >     >     >         >
> >     >     >         >
> >     >     >
> >     >
> >       2020-04-29T17:03:08.296Z|41861|ovsdb_idl|INFO|tcp:10.0.2.152:6642
> >     <http://10.0.2.152:6642>
> >     >     <http://10.0.2.152:6642> <http://10.0.2.152:6642>:
> >     >     >         clustered database server is not cluster leader;
> trying
> >     >     another
> >     >     >         server
> >     >     >         >
> >     >     >         >
> >     >     >
> >     >
> >       2020-04-29T17:03:08.296Z|41862|reconnect|DBG|tcp:10.0.2.152:6642
> >     <http://10.0.2.152:6642>
> >     >     <http://10.0.2.152:6642>
> >     >     >         <http://10.0.2.152:6642>: entering RECONNECT
> >     >     >         >
> >     >     >         >
> >     >     >
> >     >
> >       2020-04-29T17:03:08.304Z|41863|reconnect|DBG|tcp:10.0.2.152:6642
> >     <http://10.0.2.152:6642>
> >     >     <http://10.0.2.152:6642>
> >     >     >         <http://10.0.2.152:6642>: entering BACKOFF
> >     >     >         >
> >     >     >         >
> >     2020-04-29T17:03:09.708Z|41867|coverage|INFO|Dropped 2 log
> >     >     >         messages in last 78 seconds (most recently, 71
> seconds
> >     >     ago) due
> >     >     >         to excessive rate
> >     >     >         >
> >     >     >         >
> 2020-04-29T17:03:09.708Z|41868|coverage|INFO|Skipping
> >     >     details
> >     >     >         of duplicate event coverage for hash=ceada91f
> >     >     >         >
> >     >     >         >
> >     >     >
> >     >
> >       2020-04-29T17:03:16.304Z|41869|reconnect|DBG|tcp:10.0.2.153:6642
> >     <http://10.0.2.153:6642>
> >     >     <http://10.0.2.153:6642>
> >     >     >         <http://10.0.2.153:6642>: entering CONNECTING
> >     >     >         >
> >     >     >         >
> >     >     >
> >     >
> >       2020-04-29T17:03:16.308Z|41870|reconnect|INFO|tcp:10.0.2.153:6642
> >     <http://10.0.2.153:6642>
> >     >     <http://10.0.2.153:6642> <http://10.0.2.153:6642>:
> >     >     >         connected
> >     >     >         >
> >     >     >         >
> >     >     >
> >     >
> >       2020-04-29T17:03:16.308Z|41871|reconnect|DBG|tcp:10.0.2.153:6642
> >     <http://10.0.2.153:6642>
> >     >     <http://10.0.2.153:6642>
> >     >     >         <http://10.0.2.153:6642>: entering ACTIVE
> >     >     >         >
> >     >     >         >
> >     >     2020-04-29T17:03:16.308Z|41872|ovn_northd|INFO|ovn-northd lock
> >     >     >         lost. This ovn-northd instance is now on standby.
> >     >     >         >
> >     >     >         >
> >     >     2020-04-29T17:03:16.309Z|41873|ovn_northd|INFO|ovn-northd lock
> >     >     >         acquired. This ovn-northd instance is now active.
> >     >     >         >
> >     >     >         >
> >     >     >
> >     >
> >       2020-04-29T17:03:16.311Z|41874|ovsdb_idl|INFO|tcp:10.0.2.153:6642
> >     <http://10.0.2.153:6642>
> >     >     <http://10.0.2.153:6642> <http://10.0.2.153:6642>:
> >     >     >         clustered database server is disconnected from
> >     cluster; trying
> >     >     >         another server
> >     >     >         >
> >     >     >         >
> >     >     >
> >     >
> >       2020-04-29T17:03:16.311Z|41875|reconnect|DBG|tcp:10.0.2.153:6642
> >     <http://10.0.2.153:6642>
> >     >     <http://10.0.2.153:6642>
> >     >     >         <http://10.0.2.153:6642>: entering RECONNECT
> >     >     >         >
> >     >     >         >
> >     >     >
> >     >
> >       2020-04-29T17:03:16.312Z|41876|reconnect|DBG|tcp:10.0.2.153:6642
> >     <http://10.0.2.153:6642>
> >     >     <http://10.0.2.153:6642>
> >     >     >         <http://10.0.2.153:6642>: entering BACKOFF
> >     >     >         >
> >     >     >         >
> >     >     >
> >     >
> >       2020-04-29T17:03:24.316Z|41877|reconnect|DBG|tcp:10.0.2.151:6642
> >     <http://10.0.2.151:6642>
> >     >     <http://10.0.2.151:6642>
> >     >     >         <http://10.0.2.151:6642>: entering CONNECTING
> >     >     >         >
> >     >     >         >
> >     >     >
> >     >
> >       2020-04-29T17:03:24.321Z|41878|reconnect|INFO|tcp:10.0.2.151:6642
> >     <http://10.0.2.151:6642>
> >     >     <http://10.0.2.151:6642> <http://10.0.2.151:6642>:
> >     >     >         connected
> >     >     >         >
> >     >     >         >
> >     >     >
> >     >
> >       2020-04-29T17:03:24.321Z|41879|reconnect|DBG|tcp:10.0.2.151:6642
> >     <http://10.0.2.151:6642>
> >     >     <http://10.0.2.151:6642>
> >     >     >         <http://10.0.2.151:6642>: entering ACTIVE
> >     >     >         >
> >     >     >         >
> >     >     2020-04-29T17:03:24.321Z|41880|ovn_northd|INFO|ovn-northd lock
> >     >     >         lost. This ovn-northd instance is now on standby.
> >     >     >         >
> >     >     >         >
> >     >     2020-04-29T17:03:24.354Z|41881|ovn_northd|INFO|ovn-northd lock
> >     >     >         acquired. This ovn-northd instance is now active.
> >     >     >         >
> >     >     >         >
> >     >     >
> >     >
> >       2020-04-29T17:03:24.358Z|41882|ovsdb_idl|INFO|tcp:10.0.2.151:6642
> >     <http://10.0.2.151:6642>
> >     >     <http://10.0.2.151:6642> <http://10.0.2.151:6642>:
> >     >     >         clustered database server is not cluster leader;
> trying
> >     >     another
> >     >     >         server
> >     >     >         >
> >     >     >         >
> >     >     >
> >     >
> >       2020-04-29T17:03:24.358Z|41883|reconnect|DBG|tcp:10.0.2.151:6642
> >     <http://10.0.2.151:6642>
> >     >     <http://10.0.2.151:6642>
> >     >     >         <http://10.0.2.151:6642>: entering RECONNECT
> >     >     >         >
> >     >     >         >
> >     >     >
> >     >
> >       2020-04-29T17:03:24.360Z|41884|reconnect|DBG|tcp:10.0.2.151:6642
> >     <http://10.0.2.151:6642>
> >     >     <http://10.0.2.151:6642>
> >     >     >         <http://10.0.2.151:6642>: entering BACKOFF
> >     >     >         >
> >     >     >         >
> >     >     >
> >     >
> >       2020-04-29T17:03:32.367Z|41885|reconnect|DBG|tcp:10.0.2.152:6642
> >     <http://10.0.2.152:6642>
> >     >     <http://10.0.2.152:6642>
> >     >     >         <http://10.0.2.152:6642>: entering CONNECTING
> >     >     >         >
> >     >     >         >
> >     >     >
> >     >
> >       2020-04-29T17:03:32.372Z|41886|reconnect|INFO|tcp:10.0.2.152:6642
> >     <http://10.0.2.152:6642>
> >     >     <http://10.0.2.152:6642> <http://10.0.2.152:6642>:
> >     >     >         connected
> >     >     >         >
> >     >     >         >
> >     >     >
> >     >
> >       2020-04-29T17:03:32.372Z|41887|reconnect|DBG|tcp:10.0.2.152:6642
> >     <http://10.0.2.152:6642>
> >     >     <http://10.0.2.152:6642>
> >     >     >         <http://10.0.2.152:6642>: entering ACTIVE
> >     >     >         >
> >     >     >         >
> >     >     2020-04-29T17:03:32.372Z|41888|ovn_northd|INFO|ovn-northd lock
> >     >     >         lost. This ovn-northd instance is now on standby.
> >     >     >         >
> >     >     >         >
> >     >     2020-04-29T17:03:32.373Z|41889|ovn_northd|INFO|ovn-northd lock
> >     >     >         acquired. This ovn-northd instance is now active.
> >     >     >         >
> >     >     >         >
> >     >     >
> >     >
> >       2020-04-29T17:03:32.376Z|41890|ovsdb_idl|INFO|tcp:10.0.2.152:6642
> >     <http://10.0.2.152:6642>
> >     >     <http://10.0.2.152:6642> <http://10.0.2.152:6642>:
> >     >     >         clustered database server is not cluster leader;
> trying
> >     >     another
> >     >     >         server
> >     >     >         >
> >     >     >         >
> >     >     >
> >     >
> >       2020-04-29T17:03:32.376Z|41891|reconnect|DBG|tcp:10.0.2.152:6642
> >     <http://10.0.2.152:6642>
> >     >     <http://10.0.2.152:6642>
> >     >     >         <http://10.0.2.152:6642>: entering RECONNECT
> >     >     >         >
> >     >     >         >
> >     >     >
> >     >
> >       2020-04-29T17:03:32.378Z|41892|reconnect|DBG|tcp:10.0.2.152:6642
> >     <http://10.0.2.152:6642>
> >     >     <http://10.0.2.152:6642>
> >     >     >         <http://10.0.2.152:6642>: entering BACKOFF
> >     >     >         >
> >     >     >         >
> >     >     >
> >     >
> >       2020-04-29T17:03:40.381Z|41893|reconnect|DBG|tcp:10.0.2.153:6642
> >     <http://10.0.2.153:6642>
> >     >     <http://10.0.2.153:6642>
> >     >     >         <http://10.0.2.153:6642>: entering CONNECTING
> >     >     >         >
> >     >     >         >
> >     >     >
> >     >
> >       2020-04-29T17:03:40.385Z|41894|reconnect|INFO|tcp:10.0.2.153:6642
> >     <http://10.0.2.153:6642>
> >     >     <http://10.0.2.153:6642> <http://10.0.2.153:6642>:
> >     >     >         connected
> >     >     >         >
> >     >     >         >
> >     >     >
> >     >
> >       2020-04-29T17:03:40.385Z|41895|reconnect|DBG|tcp:10.0.2.153:6642
> >     <http://10.0.2.153:6642>
> >     >     <http://10.0.2.153:6642>
> >     >     >         <http://10.0.2.153:6642>: entering ACTIVE
> >     >     >         >
> >     >     >         >
> >     >     2020-04-29T17:03:40.385Z|41896|ovn_northd|INFO|ovn-northd lock
> >     >     >         lost. This ovn-northd instance is now on standby.
> >     >     >         >
> >     >     >         >
> >     >     2020-04-29T17:03:40.385Z|41897|ovn_northd|INFO|ovn-northd lock
> >     >     >         acquired. This ovn-northd instance is now active.
> >     >     >         >
> >     >     >         >
> >     >     >         > --
> >     >     >         > Winson
> >     >     >
> >     >     >         Hi Winson,
> >     >     >
> >     >     >         Since northd heavily writes to SB DB, it is
> >     implemented to
> >     >     >         connect to leader only, for better performance
> >     (avoid the
> >     >     extra
> >     >     >         cost of a follower forwarding writes to leader).
> >     When leader
> >     >     >         re-election happened, it has to reconnect to the new
> >     leader.
> >     >     >         However, if the cluster is unstable, this step would
> >     also take
> >     >     >         longer time than expected. I'd suggest to tune the
> >     election
> >     >     >         timer to avoid re-election during heavy operations.
> >     >     >
> >     >     >     I can see with election timer to higher value can avoid
> >     this,
> >     >     but if
> >     >     >     more stress generated then I  see it happen again.
> >     >     >     For real workload,  it may not hit the spike stress I
> >     trigger for
> >     >     >     stress test, so this is just for scale profiling.
> >     >     >
> >     >     >
> >     >     >
> >     >     >         If the server is overloaded for too long and longer
> >     election
> >     >     >         timer is unacceptable, the only way to solve the
> >     availability
> >     >     >         problem is to improve ovsdb performance. How big is
> your
> >     >     >         transaction and what's your election timer setting?
> >     >     >
> >     >     >     I can see ovn-northd send 33MB data in short time,  and
> >     >     ovsdb-server
> >     >     >     need sync with clients,  I run iftop on on-controller
> >     side, each
> >     >     >     node will receive around 25MB update.
> >     >     >     Each ovn-controller get 25MB data,  3 raft nodes total
> send
> >     >     25*646 ~16GB
> >     >     >
> >     >     >
> >     >     >         The number of clients also impacts the performance
> >     since the
> >     >     >         heavy update needs to be synced to all clients. How
> many
> >     >     clients
> >     >     >         do you have?
> >     >     >
> >     >     >     Is there one mechanism for all the ovn-controller
> clients to
> >     >     connect
> >     >     >     to raft followers only to skip leader?
> >     >     >     That will make leader node more cpu resource for voting
> and
> >     >     cluster
> >     >     >     level sync.
> >     >     >     Based my stress test,  after ovn-controller connected to
> 2
> >     >     follower
> >     >     >     nodes,  leader node only connect to ovn-northd.
> >     >     >     This model can finish raft voting finish in shorter time
> >     when
> >     >     >     ovn-northd trigger same workload.
> >     >     >
> >     >     >      Total clients is 646 nodes.
> >     >     >     Before the leader role changes,  all clients connected
> to 3
> >     >     nodes in
> >     >     >     balanced way,  each raft node has 200+ connections.
> >     >     >     After lead role change,  ovn controller side get the
> >     following
> >     >     messages:
> >     >     >
> >     >
> >       2020-04-29T04:21:14.566Z|00674|ovsdb_idl|INFO|tcp:10.0.2.153:6642
> >     <http://10.0.2.153:6642>
> >     >     <http://10.0.2.153:6642>
> >     >     >     <http://10.0.2.153:6642>: clustered database server is
> >     >     disconnected
> >     >     >     from cluster; trying another server
> >     >     >
> >     >     >     Node 10.0.2.153 :
> >     >     >
> >     >     >     SB role changed from follower to candidate on 21:21:06
> >     >     >
> >     >     >     SB role changed from candidate to leader on 21:22:16
> >     >     >
> >     >     >     netstat for 6642 port connections:
> >     >     >
> >     >     >     21:21:31 ESTABLISHED 202
> >     >     >
> >     >     >     21:21:31 Pending 0
> >     >     >
> >     >     >     21:21:41 ESTABLISHED 0
> >     >     >
> >     >     >     21:21:41 Pending 0
> >     >     >
> >     >     >
> >     >     >     The above node in candidate role for more than 60s which
> >     more than
> >     >     >     my election timer setting 30s.
> >     >     >
> >     >     >     all the 202 connections of node (10.0.2.153) shift to the
> >     >     other two
> >     >     >     nodes in short time. After that only
> >     >     >
> >     >     >     ovn-northd connected to this node.
> >     >     >
> >     >     >
> >     >     >     Node 10.0.2.151 <http://10.0.2.151>:
> >     >     >
> >     >     >     SB role changed from leader to follower on 21:21:23
> >     >     >
> >     >     >
> >     >     >     21:21:35 ESTABLISHED 233
> >     >     >
> >     >     >     21:21:35 Pending 0
> >     >     >
> >     >     >     21:21:45 ESTABLISHED 282
> >     >     >
> >     >     >     21:21:45 Pending 9
> >     >     >
> >     >     >     21:21:55 ESTABLISHED 330
> >     >     >
> >     >     >     21:21:55 Pending 1
> >     >     >
> >     >     >     21:22:05 ESTABLISHED 330
> >     >     >
> >     >     >     21:22:05 Pending 1
> >     >     >
> >     >     >
> >     >     >
> >     >     >     Node 10.0.2.152 <http://10.0.2.152>:
> >     >     >
> >     >     >     SB role changed from follower to candidate on 21:21:57
> >     >     >
> >     >     >     SB role changed from candidate to follower on 21:22:17
> >     >     >
> >     >     >
> >     >     >     21:21:35 ESTABLISHED 211
> >     >     >
> >     >     >     21:21:35 Pending 0
> >     >     >
> >     >     >     21:21:45 ESTABLISHED 263
> >     >     >
> >     >     >     21:21:45 Pending 5
> >     >     >
> >     >     >     21:21:55 ESTABLISHED 316
> >     >     >
> >     >     >     21:21:55 Pending 0
> >     >     >
> >     >     >
> >     >     >
> >     >     >
> >     >     >         Thanks,
> >     >     >         Han
> >     >     >
> >     >     >
> >     >     >
> >     >     >     --
> >     >     >     Winson
> >     >     >
> >     >     >
> >     >     >
> >     >     > --
> >     >     > Winson
> >     >
> >     >
> >     >
> >     > --
> >     > Winson
> >
> >
> >
> > --
> > Winson
>
>

-- 
Winson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20200717/8b91999c/attachment-0001.html>