[ovs-discuss] OVN Scale with RAFT: how to make ovn-northd more reliable when RAFT leader unstable
Dumitru Ceara
dceara at redhat.com
Fri Aug 28 06:58:59 UTC 2020
On 8/28/20 1:01 AM, Winson Wang wrote:
> Hi Dumitru,
>
> Have you tried the OVS "learn" action to see if it address this scale issue?
>
>
> Regards,
> Winson
>
>
Hi Winson,
Sorry, didn't get a chance to look at this yet. It's still on my todo-list.
Regards,
Dumitru
>
> On Fri, Jul 17, 2020 at 8:53 AM Winson Wang <windson.wang at gmail.com
> <mailto:windson.wang at gmail.com>> wrote:
>
>
>
> On Fri, Jul 17, 2020 at 12:54 AM Dumitru Ceara <dceara at redhat.com
> <mailto:dceara at redhat.com>> wrote:
>
> On 7/17/20 2:58 AM, Winson Wang wrote:
> > Hi Dumitru,
> >
> > most of the flows are in table 19.
>
> This is the ls_in_pre_hairpin table where we add flows for each
> backend
> of the load balancers.
>
> >
> > -rw-r--r-- 1 root root 142M Jul 16 17:07 br-int.txt (all flows
> dump file)
> > -rw-r--r-- 1 root root 102M Jul 16 17:43 table-19.txt
> > -rw-r--r-- 1 root root 7.8M Jul 16 17:43 table-11.txt
> > -rw-r--r-- 1 root root 3.7M Jul 16 17:43 table-21.txt
> >
> > # cat table-19.txt |wc -l
> > 408458
> > ]# cat table-19.txt | grep "=9153" | wc -l
> > 124744
> > cat table-19.txt | grep "=53" | wc -l
> > 249488
> > Coredns pod has svc with port number 53 and 9153.
> >
>
> How many backends do you have for these VIPs (with port number
> 53 and
> 9153) in your load_balancer config?
>
> backends number is 63 with 63 CoreDNS pods running with exposing
> Cluster IP 10.96.0.10. tcp/53, udp/53, tcp/9153.
> lb-list |grep 10.96.0.10
> 3b8a468a-44d2-4a34-94ca-626dac936cde udp
> 10.96.0.10:53 <http://10.96.0.10:53> 192.168.104.3:53
> <http://192.168.104.3:53>,192.168.105.3:53
> <http://192.168.105.3:53>,192.168.106.3:53
> <http://192.168.106.3:53>,192.168.107.3:53
> <http://192.168.107.3:53>,192.168.108.3:53
> <http://192.168.108.3:53>,192.168.109.3:53
> <http://192.168.109.3:53>,192.168.110.3:53
> <http://192.168.110.3:53>,192.168.111.3:53
> <http://192.168.111.3:53>,192.168.112.3:53
> <http://192.168.112.3:53>,192.168.113.3:53
> <http://192.168.113.3:53>,192.168.114.3:53
> <http://192.168.114.3:53>,192.168.115.3:53
> <http://192.168.115.3:53>,192.168.116.3:53
> <http://192.168.116.3:53>,192.168.118.4:53
> <http://192.168.118.4:53>,192.168.119.3:53
> <http://192.168.119.3:53>,192.168.120.4:53
> <http://192.168.120.4:53>,192.168.121.3:53
> <http://192.168.121.3:53>,192.168.122.3:53
> <http://192.168.122.3:53>,192.168.123.3:53
> <http://192.168.123.3:53>,192.168.130.3:53
> <http://192.168.130.3:53>,192.168.131.3:53
> <http://192.168.131.3:53>,192.168.136.3:53
> <http://192.168.136.3:53>,192.168.142.3:53
> <http://192.168.142.3:53>,192.168.4.3:53
> <http://192.168.4.3:53>,192.168.45.3:53
> <http://192.168.45.3:53>,192.168.46.3:53
> <http://192.168.46.3:53>,192.168.47.3:53
> <http://192.168.47.3:53>,192.168.48.3:53
> <http://192.168.48.3:53>,192.168.49.3:53
> <http://192.168.49.3:53>,192.168.50.3:53
> <http://192.168.50.3:53>,192.168.51.3:53
> <http://192.168.51.3:53>,192.168.52.3:53
> <http://192.168.52.3:53>,192.168.53.3:53
> <http://192.168.53.3:53>,192.168.54.3:53
> <http://192.168.54.3:53>,192.168.55.3:53
> <http://192.168.55.3:53>,192.168.56.3:53
> <http://192.168.56.3:53>,192.168.57.3:53
> <http://192.168.57.3:53>,192.168.58.3:53
> <http://192.168.58.3:53>,192.168.59.4:53
> <http://192.168.59.4:53>,192.168.60.4:53
> <http://192.168.60.4:53>,192.168.61.3:53
> <http://192.168.61.3:53>,192.168.62.3:53
> <http://192.168.62.3:53>,192.168.63.3:53
> <http://192.168.63.3:53>,192.168.64.3:53
> <http://192.168.64.3:53>,192.168.65.3:53
> <http://192.168.65.3:53>,192.168.66.3:53
> <http://192.168.66.3:53>,192.168.67.3:53
> <http://192.168.67.3:53>,192.168.68.4:53
> <http://192.168.68.4:53>,192.168.69.3:53
> <http://192.168.69.3:53>,192.168.70.3:53
> <http://192.168.70.3:53>,192.168.71.3:53
> <http://192.168.71.3:53>,192.168.72.3:53
> <http://192.168.72.3:53>,192.168.73.3:53
> <http://192.168.73.3:53>,192.168.74.4:53
> <http://192.168.74.4:53>,192.168.75.4:53
> <http://192.168.75.4:53>,192.168.76.3:53
> <http://192.168.76.3:53>,192.168.77.3:53
> <http://192.168.77.3:53>,192.168.78.3:53
> <http://192.168.78.3:53>,192.168.79.3:53
> <http://192.168.79.3:53>,192.168.80.4:53
> <http://192.168.80.4:53>,192.168.81.3:53
> <http://192.168.81.3:53>,192.168.82.4:53
> <http://192.168.82.4:53>,192.168.83.4:53 <http://192.168.83.4:53>
> tcp
> 10.96.0.10:53 <http://10.96.0.10:53> 192.168.104.3:53
> <http://192.168.104.3:53>,192.168.105.3:53
> <http://192.168.105.3:53>,192.168.106.3:53
> <http://192.168.106.3:53>,192.168.107.3:53
> <http://192.168.107.3:53>,192.168.108.3:53
> <http://192.168.108.3:53>,192.168.109.3:53
> <http://192.168.109.3:53>,192.168.110.3:53
> <http://192.168.110.3:53>,192.168.111.3:53
> <http://192.168.111.3:53>,192.168.112.3:53
> <http://192.168.112.3:53>,192.168.113.3:53
> <http://192.168.113.3:53>,192.168.114.3:53
> <http://192.168.114.3:53>,192.168.115.3:53
> <http://192.168.115.3:53>,192.168.116.3:53
> <http://192.168.116.3:53>,192.168.118.4:53
> <http://192.168.118.4:53>,192.168.119.3:53
> <http://192.168.119.3:53>,192.168.120.4:53
> <http://192.168.120.4:53>,192.168.121.3:53
> <http://192.168.121.3:53>,192.168.122.3:53
> <http://192.168.122.3:53>,192.168.123.3:53
> <http://192.168.123.3:53>,192.168.130.3:53
> <http://192.168.130.3:53>,192.168.131.3:53
> <http://192.168.131.3:53>,192.168.136.3:53
> <http://192.168.136.3:53>,192.168.142.3:53
> <http://192.168.142.3:53>,192.168.4.3:53
> <http://192.168.4.3:53>,192.168.45.3:53
> <http://192.168.45.3:53>,192.168.46.3:53
> <http://192.168.46.3:53>,192.168.47.3:53
> <http://192.168.47.3:53>,192.168.48.3:53
> <http://192.168.48.3:53>,192.168.49.3:53
> <http://192.168.49.3:53>,192.168.50.3:53
> <http://192.168.50.3:53>,192.168.51.3:53
> <http://192.168.51.3:53>,192.168.52.3:53
> <http://192.168.52.3:53>,192.168.53.3:53
> <http://192.168.53.3:53>,192.168.54.3:53
> <http://192.168.54.3:53>,192.168.55.3:53
> <http://192.168.55.3:53>,192.168.56.3:53
> <http://192.168.56.3:53>,192.168.57.3:53
> <http://192.168.57.3:53>,192.168.58.3:53
> <http://192.168.58.3:53>,192.168.59.4:53
> <http://192.168.59.4:53>,192.168.60.4:53
> <http://192.168.60.4:53>,192.168.61.3:53
> <http://192.168.61.3:53>,192.168.62.3:53
> <http://192.168.62.3:53>,192.168.63.3:53
> <http://192.168.63.3:53>,192.168.64.3:53
> <http://192.168.64.3:53>,192.168.65.3:53
> <http://192.168.65.3:53>,192.168.66.3:53
> <http://192.168.66.3:53>,192.168.67.3:53
> <http://192.168.67.3:53>,192.168.68.4:53
> <http://192.168.68.4:53>,192.168.69.3:53
> <http://192.168.69.3:53>,192.168.70.3:53
> <http://192.168.70.3:53>,192.168.71.3:53
> <http://192.168.71.3:53>,192.168.72.3:53
> <http://192.168.72.3:53>,192.168.73.3:53
> <http://192.168.73.3:53>,192.168.74.4:53
> <http://192.168.74.4:53>,192.168.75.4:53
> <http://192.168.75.4:53>,192.168.76.3:53
> <http://192.168.76.3:53>,192.168.77.3:53
> <http://192.168.77.3:53>,192.168.78.3:53
> <http://192.168.78.3:53>,192.168.79.3:53
> <http://192.168.79.3:53>,192.168.80.4:53
> <http://192.168.80.4:53>,192.168.81.3:53
> <http://192.168.81.3:53>,192.168.82.4:53
> <http://192.168.82.4:53>,192.168.83.4:53 <http://192.168.83.4:53>
> tcp
> 10.96.0.10:9153 <http://10.96.0.10:9153> 192.168.104.3:9153
> <http://192.168.104.3:9153>,192.168.105.3:9153
> <http://192.168.105.3:9153>,192.168.106.3:9153
> <http://192.168.106.3:9153>,192.168.107.3:9153
> <http://192.168.107.3:9153>,192.168.108.3:9153
> <http://192.168.108.3:9153>,192.168.109.3:9153
> <http://192.168.109.3:9153>,192.168.110.3:9153
> <http://192.168.110.3:9153>,192.168.111.3:9153
> <http://192.168.111.3:9153>,192.168.112.3:9153
> <http://192.168.112.3:9153>,192.168.113.3:9153
> <http://192.168.113.3:9153>,192.168.114.3:9153
> <http://192.168.114.3:9153>,192.168.115.3:9153
> <http://192.168.115.3:9153>,192.168.116.3:9153
> <http://192.168.116.3:9153>,192.168.118.4:9153
> <http://192.168.118.4:9153>,192.168.119.3:9153
> <http://192.168.119.3:9153>,192.168.120.4:9153
> <http://192.168.120.4:9153>,192.168.121.3:9153
> <http://192.168.121.3:9153>,192.168.122.3:9153
> <http://192.168.122.3:9153>,192.168.123.3:9153
> <http://192.168.123.3:9153>,192.168.130.3:9153
> <http://192.168.130.3:9153>,192.168.131.3:9153
> <http://192.168.131.3:9153>,192.168.136.3:9153
> <http://192.168.136.3:9153>,192.168.142.3:9153
> <http://192.168.142.3:9153>,192.168.4.3:9153
> <http://192.168.4.3:9153>,192.168.45.3:9153
> <http://192.168.45.3:9153>,192.168.46.3:9153
> <http://192.168.46.3:9153>,192.168.47.3:9153
> <http://192.168.47.3:9153>,192.168.48.3:9153
> <http://192.168.48.3:9153>,192.168.49.3:9153
> <http://192.168.49.3:9153>,192.168.50.3:9153
> <http://192.168.50.3:9153>,192.168.51.3:9153
> <http://192.168.51.3:9153>,192.168.52.3:9153
> <http://192.168.52.3:9153>,192.168.53.3:9153
> <http://192.168.53.3:9153>,192.168.54.3:9153
> <http://192.168.54.3:9153>,192.168.55.3:9153
> <http://192.168.55.3:9153>,192.168.56.3:9153
> <http://192.168.56.3:9153>,192.168.57.3:9153
> <http://192.168.57.3:9153>,192.168.58.3:9153
> <http://192.168.58.3:9153>,192.168.59.4:9153
> <http://192.168.59.4:9153>,192.168.60.4:9153
> <http://192.168.60.4:9153>,192.168.61.3:9153
> <http://192.168.61.3:9153>,192.168.62.3:9153
> <http://192.168.62.3:9153>,192.168.63.3:9153
> <http://192.168.63.3:9153>,192.168.64.3:9153
> <http://192.168.64.3:9153>,192.168.65.3:9153
> <http://192.168.65.3:9153>,192.168.66.3:9153
> <http://192.168.66.3:9153>,192.168.67.3:9153
> <http://192.168.67.3:9153>,192.168.68.4:9153
> <http://192.168.68.4:9153>,192.168.69.3:9153
> <http://192.168.69.3:9153>,192.168.70.3:9153
> <http://192.168.70.3:9153>,192.168.71.3:9153
> <http://192.168.71.3:9153>,192.168.72.3:9153
> <http://192.168.72.3:9153>,192.168.73.3:9153
> <http://192.168.73.3:9153>,192.168.74.4:9153
> <http://192.168.74.4:9153>,192.168.75.4:9153
> <http://192.168.75.4:9153>,192.168.76.3:9153
> <http://192.168.76.3:9153>,192.168.77.3:9153
> <http://192.168.77.3:9153>,192.168.78.3:9153
> <http://192.168.78.3:9153>,192.168.79.3:9153
> <http://192.168.79.3:9153>,192.168.80.4:9153
> <http://192.168.80.4:9153>,192.168.81.3:9153
> <http://192.168.81.3:9153>,192.168.82.4:9153
> <http://192.168.82.4:9153>,192.168.83.4:9153 <http://192.168.83.4:9153>
>
>
> Thanks,
> Dumitru
>
> > Please let me know if you need more information.
> >
> >
> > Regards,
> > Winson
> >
> >
> > On Thu, Jul 16, 2020 at 11:23 AM Dumitru Ceara
> <dceara at redhat.com <mailto:dceara at redhat.com>
> > <mailto:dceara at redhat.com <mailto:dceara at redhat.com>>> wrote:
> >
> > On 7/15/20 8:02 PM, Winson Wang wrote:
> > > +add ovn-Kubernetes group.
> > >
> > > Hi Dumitru,
> > >
> > > With recent patches from you and Han, now for k8s basic
> workload,
> > such
> > > node resources and pod resources are fixed and look good.
> > > Much thanks!
> >
> > Hi Winson,
> >
> > Glad to hear that!
> >
> > >
> > > For k8s workload which exposes as svc IP is every
> common, for
> > example,
> > > the coreDNS pod's deployment.
> > > With large cluster size such as 1000, there is service
> to auto scale
> > > up coreDNS deployment, if we use default 16 nodes per
> coredns,
> > it could be
> > > 63 coredns pods.
> > > On my 1006 nodes setup, deployment from coreDNS from 2
> to 63.
> > > SB raft election 16s is not good for this operation in
> my test
> > > environment, it makes one raft node cannot finish the
> election in two
> > > election slot when making all it's
> > > clients disconnect and reconnect to two other raft
> nodes, which makes
> > > raft clients in an unbalanced state after this operation.
> > > This condition might be avoided without larger election
> timer.
> > >
> > > For the SB and work node resource side:
> > > SB DB size increased 27MB.
> > > br-int open flows increased around 369K,
> > > RSS memory of (ovs + ovn-controller) increased more than
> 600MB.
> >
> > This increase on the hypervisor side is most likely
> because of the
> > openflows for hairpin traffic for VIPs (service IP). To
> confirm, would
> > it be possible to take a snapshot of the OVS flow table
> and see how many
> > flows there are per table?
> >
> > >
> > > So if OVN experts can figure how to optimize it would be
> very
> > great for
> > > ovn-k8s scale up to large cluster size I think.
> > >
> >
> > If the above is due to flows for LB flows to handle
> hairpin traffic, the
> > only idea I have is to use OVS "learn" action to have the
> flows
> > generated as needed. However, I didn't get the chance to
> try it out yet.
> >
> > Thanks,
> > Dumitru
> >
> > >
> > > Regards,
> > > Winson
> > >
> > >
> > > On Fri, May 1, 2020 at 1:35 AM Dumitru Ceara
> <dceara at redhat.com <mailto:dceara at redhat.com>
> > <mailto:dceara at redhat.com <mailto:dceara at redhat.com>>
> > > <mailto:dceara at redhat.com <mailto:dceara at redhat.com>
> <mailto:dceara at redhat.com <mailto:dceara at redhat.com>>>> wrote:
> > >
> > > On 5/1/20 12:00 AM, Winson Wang wrote:
> > > > Hi Han, Dumitru,
> > > >
> > >
> > > Hi Winson,
> > >
> > > > With the fix from Dumitru
> > > >
> > >
> >
> https://github.com/ovn-org/ovn/commit/97e82ae5f135a088c9e95b49122d8217718d23f4
> > > >
> > > > It can greatly reduced the OVS SB RAFT workload
> based on my
> > stress
> > > test
> > > > mode with k8s svc with large endpoints.
> > > >
> > > > The DB file size increased much less with fix, so
> it will
> > not trigger
> > > > the leader election with same work load.
> > > >
> > > > Dumitru, based my test, logic flows number is
> fixed with
> > cluster
> > > size
> > > > regardless of number of VIP endpoints.
> > >
> > > The number of logical flows will be fixed based on
> number of
> > VIPs (2 per
> > > VIP) but the size of the match expression depends on
> the number of
> > > backends per VIP so the SB DB size will increase
> when adding
> > backends to
> > > existing VIPs.
> > >
> > > >
> > > > But the open flow count on each node still have
> the relationship
> > > of the
> > > > endpoints size.
> > >
> > > Yes, this is due to the match expression in the
> logical flow
> > above which
> > > is of the form:
> > >
> > > (ip.src == backend-ip1 && ip.dst == backend-ip2) ||
> .. ||
> > (ip.src ==
> > > backend-ipn && ip.dst == backend-ipn)
> > >
> > > This will get expanded to n openflow rules, one per
> backend, to
> > > determine if traffic was hairpinned.
> > >
> > > > Any idea how to reduce the open flow cnt on each
> node's br-int?
> > > >
> > > >
> > >
> > > Unfortunately I don't think there's a way to
> determine if
> > traffic was
> > > hairpinned because I don't think we can have
> openflow rules
> > that match
> > > on "ip.src == ip.dst". So in the worst case, we will
> probably
> > need two
> > > openflow rules per backend IP (one for initiator
> traffic, one for
> > > reply).
> > >
> > > I'll think more about it though.
> > >
> > > Regards,
> > > Dumitru
> > >
> > > > Regards,
> > > > Winson
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, Apr 29, 2020 at 1:42 PM Winson Wang
> > > <windson.wang at gmail.com
> <mailto:windson.wang at gmail.com> <mailto:windson.wang at gmail.com
> <mailto:windson.wang at gmail.com>>
> > <mailto:windson.wang at gmail.com
> <mailto:windson.wang at gmail.com> <mailto:windson.wang at gmail.com
> <mailto:windson.wang at gmail.com>>>
> > > > <mailto:windson.wang at gmail.com
> <mailto:windson.wang at gmail.com>
> > <mailto:windson.wang at gmail.com
> <mailto:windson.wang at gmail.com>> <mailto:windson.wang at gmail.com
> <mailto:windson.wang at gmail.com>
> > <mailto:windson.wang at gmail.com
> <mailto:windson.wang at gmail.com>>>>>
> > > wrote:
> > > >
> > > > Hi Han,
> > > >
> > > > Thanks for quick reply.
> > > > Please see my reply below.
> > > >
> > > > On Wed, Apr 29, 2020 at 12:31 PM Han Zhou
> <hzhou at ovn.org <mailto:hzhou at ovn.org>
> > <mailto:hzhou at ovn.org <mailto:hzhou at ovn.org>>
> > > <mailto:hzhou at ovn.org <mailto:hzhou at ovn.org>
> <mailto:hzhou at ovn.org <mailto:hzhou at ovn.org>>>
> > > > <mailto:hzhou at ovn.org <mailto:hzhou at ovn.org>
> <mailto:hzhou at ovn.org <mailto:hzhou at ovn.org>>
> > <mailto:hzhou at ovn.org <mailto:hzhou at ovn.org>
> <mailto:hzhou at ovn.org <mailto:hzhou at ovn.org>>>>> wrote:
> > > >
> > > >
> > > >
> > > > On Wed, Apr 29, 2020 at 10:29 AM Winson Wang
> > > > <windson.wang at gmail.com
> <mailto:windson.wang at gmail.com>
> > <mailto:windson.wang at gmail.com
> <mailto:windson.wang at gmail.com>> <mailto:windson.wang at gmail.com
> <mailto:windson.wang at gmail.com>
> > <mailto:windson.wang at gmail.com
> <mailto:windson.wang at gmail.com>>>
> > > <mailto:windson.wang at gmail.com
> <mailto:windson.wang at gmail.com> <mailto:windson.wang at gmail.com
> <mailto:windson.wang at gmail.com>>
> > <mailto:windson.wang at gmail.com
> <mailto:windson.wang at gmail.com> <mailto:windson.wang at gmail.com
> <mailto:windson.wang at gmail.com>>>>> wrote:
> > > > >
> > > > > Hello Experts,
> > > > >
> > > > > I am doing stress with k8s cluster with
> ovn, one
> > thing I am
> > > > seeing is that when raft nodes
> > > > > got update for large data in short time from
> > ovn-northd, 3
> > > > raft nodes will trigger voting and leader
> role switched
> > > from one
> > > > node to another.
> > > > >
> > > > > From ovn-northd side, I can see
> ovn-northd will
> > trigger the
> > > > BACKOFF, RECONNECT...
> > > > >
> > > > > Since ovn-northd only connect to NB/SB
> leader only and
> > > how can
> > > > we make ovn-northd more available in most
> of the time?
> > > > >
> > > > > Is it possible to make ovn-northd have
> established
> > > connections
> > > > to all raft nodes to avoid the
> > > > > reconnect mechanism?
> > > > > Since the backoff time 8s is not
> configurable for now.
> > > > >
> > > > >
> > > > > Test logs:
> > > > >
> > > > >
> > > >
> > >
> >
> 2020-04-29T17:03:08.296Z|41861|ovsdb_idl|INFO|tcp:10.0.2.152:6642
> <http://10.0.2.152:6642>
> > <http://10.0.2.152:6642>
> > > <http://10.0.2.152:6642> <http://10.0.2.152:6642>:
> > > > clustered database server is not cluster
> leader; trying
> > > another
> > > > server
> > > > >
> > > > >
> > > >
> > >
> >
> 2020-04-29T17:03:08.296Z|41862|reconnect|DBG|tcp:10.0.2.152:6642
> <http://10.0.2.152:6642>
> > <http://10.0.2.152:6642>
> > > <http://10.0.2.152:6642>
> > > > <http://10.0.2.152:6642>: entering RECONNECT
> > > > >
> > > > >
> > > >
> > >
> >
> 2020-04-29T17:03:08.304Z|41863|reconnect|DBG|tcp:10.0.2.152:6642
> <http://10.0.2.152:6642>
> > <http://10.0.2.152:6642>
> > > <http://10.0.2.152:6642>
> > > > <http://10.0.2.152:6642>: entering BACKOFF
> > > > >
> > > > >
> > 2020-04-29T17:03:09.708Z|41867|coverage|INFO|Dropped 2 log
> > > > messages in last 78 seconds (most
> recently, 71 seconds
> > > ago) due
> > > > to excessive rate
> > > > >
> > > > >
> 2020-04-29T17:03:09.708Z|41868|coverage|INFO|Skipping
> > > details
> > > > of duplicate event coverage for hash=ceada91f
> > > > >
> > > > >
> > > >
> > >
> >
> 2020-04-29T17:03:16.304Z|41869|reconnect|DBG|tcp:10.0.2.153:6642
> <http://10.0.2.153:6642>
> > <http://10.0.2.153:6642>
> > > <http://10.0.2.153:6642>
> > > > <http://10.0.2.153:6642>: entering CONNECTING
> > > > >
> > > > >
> > > >
> > >
> >
> 2020-04-29T17:03:16.308Z|41870|reconnect|INFO|tcp:10.0.2.153:6642
> <http://10.0.2.153:6642>
> > <http://10.0.2.153:6642>
> > > <http://10.0.2.153:6642> <http://10.0.2.153:6642>:
> > > > connected
> > > > >
> > > > >
> > > >
> > >
> >
> 2020-04-29T17:03:16.308Z|41871|reconnect|DBG|tcp:10.0.2.153:6642
> <http://10.0.2.153:6642>
> > <http://10.0.2.153:6642>
> > > <http://10.0.2.153:6642>
> > > > <http://10.0.2.153:6642>: entering ACTIVE
> > > > >
> > > > >
> > >
> 2020-04-29T17:03:16.308Z|41872|ovn_northd|INFO|ovn-northd lock
> > > > lost. This ovn-northd instance is now on
> standby.
> > > > >
> > > > >
> > >
> 2020-04-29T17:03:16.309Z|41873|ovn_northd|INFO|ovn-northd lock
> > > > acquired. This ovn-northd instance is now
> active.
> > > > >
> > > > >
> > > >
> > >
> >
> 2020-04-29T17:03:16.311Z|41874|ovsdb_idl|INFO|tcp:10.0.2.153:6642
> <http://10.0.2.153:6642>
> > <http://10.0.2.153:6642>
> > > <http://10.0.2.153:6642> <http://10.0.2.153:6642>:
> > > > clustered database server is disconnected from
> > cluster; trying
> > > > another server
> > > > >
> > > > >
> > > >
> > >
> >
> 2020-04-29T17:03:16.311Z|41875|reconnect|DBG|tcp:10.0.2.153:6642
> <http://10.0.2.153:6642>
> > <http://10.0.2.153:6642>
> > > <http://10.0.2.153:6642>
> > > > <http://10.0.2.153:6642>: entering RECONNECT
> > > > >
> > > > >
> > > >
> > >
> >
> 2020-04-29T17:03:16.312Z|41876|reconnect|DBG|tcp:10.0.2.153:6642
> <http://10.0.2.153:6642>
> > <http://10.0.2.153:6642>
> > > <http://10.0.2.153:6642>
> > > > <http://10.0.2.153:6642>: entering BACKOFF
> > > > >
> > > > >
> > > >
> > >
> >
> 2020-04-29T17:03:24.316Z|41877|reconnect|DBG|tcp:10.0.2.151:6642
> <http://10.0.2.151:6642>
> > <http://10.0.2.151:6642>
> > > <http://10.0.2.151:6642>
> > > > <http://10.0.2.151:6642>: entering CONNECTING
> > > > >
> > > > >
> > > >
> > >
> >
> 2020-04-29T17:03:24.321Z|41878|reconnect|INFO|tcp:10.0.2.151:6642
> <http://10.0.2.151:6642>
> > <http://10.0.2.151:6642>
> > > <http://10.0.2.151:6642> <http://10.0.2.151:6642>:
> > > > connected
> > > > >
> > > > >
> > > >
> > >
> >
> 2020-04-29T17:03:24.321Z|41879|reconnect|DBG|tcp:10.0.2.151:6642
> <http://10.0.2.151:6642>
> > <http://10.0.2.151:6642>
> > > <http://10.0.2.151:6642>
> > > > <http://10.0.2.151:6642>: entering ACTIVE
> > > > >
> > > > >
> > >
> 2020-04-29T17:03:24.321Z|41880|ovn_northd|INFO|ovn-northd lock
> > > > lost. This ovn-northd instance is now on
> standby.
> > > > >
> > > > >
> > >
> 2020-04-29T17:03:24.354Z|41881|ovn_northd|INFO|ovn-northd lock
> > > > acquired. This ovn-northd instance is now
> active.
> > > > >
> > > > >
> > > >
> > >
> >
> 2020-04-29T17:03:24.358Z|41882|ovsdb_idl|INFO|tcp:10.0.2.151:6642
> <http://10.0.2.151:6642>
> > <http://10.0.2.151:6642>
> > > <http://10.0.2.151:6642> <http://10.0.2.151:6642>:
> > > > clustered database server is not cluster
> leader; trying
> > > another
> > > > server
> > > > >
> > > > >
> > > >
> > >
> >
> 2020-04-29T17:03:24.358Z|41883|reconnect|DBG|tcp:10.0.2.151:6642
> <http://10.0.2.151:6642>
> > <http://10.0.2.151:6642>
> > > <http://10.0.2.151:6642>
> > > > <http://10.0.2.151:6642>: entering RECONNECT
> > > > >
> > > > >
> > > >
> > >
> >
> 2020-04-29T17:03:24.360Z|41884|reconnect|DBG|tcp:10.0.2.151:6642
> <http://10.0.2.151:6642>
> > <http://10.0.2.151:6642>
> > > <http://10.0.2.151:6642>
> > > > <http://10.0.2.151:6642>: entering BACKOFF
> > > > >
> > > > >
> > > >
> > >
> >
> 2020-04-29T17:03:32.367Z|41885|reconnect|DBG|tcp:10.0.2.152:6642
> <http://10.0.2.152:6642>
> > <http://10.0.2.152:6642>
> > > <http://10.0.2.152:6642>
> > > > <http://10.0.2.152:6642>: entering CONNECTING
> > > > >
> > > > >
> > > >
> > >
> >
> 2020-04-29T17:03:32.372Z|41886|reconnect|INFO|tcp:10.0.2.152:6642
> <http://10.0.2.152:6642>
> > <http://10.0.2.152:6642>
> > > <http://10.0.2.152:6642> <http://10.0.2.152:6642>:
> > > > connected
> > > > >
> > > > >
> > > >
> > >
> >
> 2020-04-29T17:03:32.372Z|41887|reconnect|DBG|tcp:10.0.2.152:6642
> <http://10.0.2.152:6642>
> > <http://10.0.2.152:6642>
> > > <http://10.0.2.152:6642>
> > > > <http://10.0.2.152:6642>: entering ACTIVE
> > > > >
> > > > >
> > >
> 2020-04-29T17:03:32.372Z|41888|ovn_northd|INFO|ovn-northd lock
> > > > lost. This ovn-northd instance is now on
> standby.
> > > > >
> > > > >
> > >
> 2020-04-29T17:03:32.373Z|41889|ovn_northd|INFO|ovn-northd lock
> > > > acquired. This ovn-northd instance is now
> active.
> > > > >
> > > > >
> > > >
> > >
> >
> 2020-04-29T17:03:32.376Z|41890|ovsdb_idl|INFO|tcp:10.0.2.152:6642
> <http://10.0.2.152:6642>
> > <http://10.0.2.152:6642>
> > > <http://10.0.2.152:6642> <http://10.0.2.152:6642>:
> > > > clustered database server is not cluster
> leader; trying
> > > another
> > > > server
> > > > >
> > > > >
> > > >
> > >
> >
> 2020-04-29T17:03:32.376Z|41891|reconnect|DBG|tcp:10.0.2.152:6642
> <http://10.0.2.152:6642>
> > <http://10.0.2.152:6642>
> > > <http://10.0.2.152:6642>
> > > > <http://10.0.2.152:6642>: entering RECONNECT
> > > > >
> > > > >
> > > >
> > >
> >
> 2020-04-29T17:03:32.378Z|41892|reconnect|DBG|tcp:10.0.2.152:6642
> <http://10.0.2.152:6642>
> > <http://10.0.2.152:6642>
> > > <http://10.0.2.152:6642>
> > > > <http://10.0.2.152:6642>: entering BACKOFF
> > > > >
> > > > >
> > > >
> > >
> >
> 2020-04-29T17:03:40.381Z|41893|reconnect|DBG|tcp:10.0.2.153:6642
> <http://10.0.2.153:6642>
> > <http://10.0.2.153:6642>
> > > <http://10.0.2.153:6642>
> > > > <http://10.0.2.153:6642>: entering CONNECTING
> > > > >
> > > > >
> > > >
> > >
> >
> 2020-04-29T17:03:40.385Z|41894|reconnect|INFO|tcp:10.0.2.153:6642
> <http://10.0.2.153:6642>
> > <http://10.0.2.153:6642>
> > > <http://10.0.2.153:6642> <http://10.0.2.153:6642>:
> > > > connected
> > > > >
> > > > >
> > > >
> > >
> >
> 2020-04-29T17:03:40.385Z|41895|reconnect|DBG|tcp:10.0.2.153:6642
> <http://10.0.2.153:6642>
> > <http://10.0.2.153:6642>
> > > <http://10.0.2.153:6642>
> > > > <http://10.0.2.153:6642>: entering ACTIVE
> > > > >
> > > > >
> > >
> 2020-04-29T17:03:40.385Z|41896|ovn_northd|INFO|ovn-northd lock
> > > > lost. This ovn-northd instance is now on
> standby.
> > > > >
> > > > >
> > >
> 2020-04-29T17:03:40.385Z|41897|ovn_northd|INFO|ovn-northd lock
> > > > acquired. This ovn-northd instance is now
> active.
> > > > >
> > > > >
> > > > > --
> > > > > Winson
> > > >
> > > > Hi Winson,
> > > >
> > > > Since northd heavily writes to SB DB, it is
> > implemented to
> > > > connect to leader only, for better performance
> > (avoid the
> > > extra
> > > > cost of a follower forwarding writes to
> leader).
> > When leader
> > > > re-election happened, it has to reconnect
> to the new
> > leader.
> > > > However, if the cluster is unstable, this
> step would
> > also take
> > > > longer time than expected. I'd suggest to
> tune the
> > election
> > > > timer to avoid re-election during heavy
> operations.
> > > >
> > > > I can see with election timer to higher value
> can avoid
> > this,
> > > but if
> > > > more stress generated then I see it happen again.
> > > > For real workload, it may not hit the spike
> stress I
> > trigger for
> > > > stress test, so this is just for scale profiling.
> > > >
> > > >
> > > >
> > > > If the server is overloaded for too long
> and longer
> > election
> > > > timer is unacceptable, the only way to
> solve the
> > availability
> > > > problem is to improve ovsdb performance.
> How big is your
> > > > transaction and what's your election timer
> setting?
> > > >
> > > > I can see ovn-northd send 33MB data in short
> time, and
> > > ovsdb-server
> > > > need sync with clients, I run iftop on
> on-controller
> > side, each
> > > > node will receive around 25MB update.
> > > > Each ovn-controller get 25MB data, 3 raft
> nodes total send
> > > 25*646 ~16GB
> > > >
> > > >
> > > > The number of clients also impacts the
> performance
> > since the
> > > > heavy update needs to be synced to all
> clients. How many
> > > clients
> > > > do you have?
> > > >
> > > > Is there one mechanism for all the
> ovn-controller clients to
> > > connect
> > > > to raft followers only to skip leader?
> > > > That will make leader node more cpu resource
> for voting and
> > > cluster
> > > > level sync.
> > > > Based my stress test, after ovn-controller
> connected to 2
> > > follower
> > > > nodes, leader node only connect to ovn-northd.
> > > > This model can finish raft voting finish in
> shorter time
> > when
> > > > ovn-northd trigger same workload.
> > > >
> > > > Total clients is 646 nodes.
> > > > Before the leader role changes, all clients
> connected to 3
> > > nodes in
> > > > balanced way, each raft node has 200+
> connections.
> > > > After lead role change, ovn controller side
> get the
> > following
> > > messages:
> > > >
> > >
> >
> 2020-04-29T04:21:14.566Z|00674|ovsdb_idl|INFO|tcp:10.0.2.153:6642
> <http://10.0.2.153:6642>
> > <http://10.0.2.153:6642>
> > > <http://10.0.2.153:6642>
> > > > <http://10.0.2.153:6642>: clustered database
> server is
> > > disconnected
> > > > from cluster; trying another server
> > > >
> > > > Node 10.0.2.153 :
> > > >
> > > > SB role changed from follower to candidate on
> 21:21:06
> > > >
> > > > SB role changed from candidate to leader on
> 21:22:16
> > > >
> > > > netstat for 6642 port connections:
> > > >
> > > > 21:21:31 ESTABLISHED 202
> > > >
> > > > 21:21:31 Pending 0
> > > >
> > > > 21:21:41 ESTABLISHED 0
> > > >
> > > > 21:21:41 Pending 0
> > > >
> > > >
> > > > The above node in candidate role for more than
> 60s which
> > more than
> > > > my election timer setting 30s.
> > > >
> > > > all the 202 connections of node (10.0.2.153)
> shift to the
> > > other two
> > > > nodes in short time. After that only
> > > >
> > > > ovn-northd connected to this node.
> > > >
> > > >
> > > > Node 10.0.2.151 <http://10.0.2.151>:
> > > >
> > > > SB role changed from leader to follower on
> 21:21:23
> > > >
> > > >
> > > > 21:21:35 ESTABLISHED 233
> > > >
> > > > 21:21:35 Pending 0
> > > >
> > > > 21:21:45 ESTABLISHED 282
> > > >
> > > > 21:21:45 Pending 9
> > > >
> > > > 21:21:55 ESTABLISHED 330
> > > >
> > > > 21:21:55 Pending 1
> > > >
> > > > 21:22:05 ESTABLISHED 330
> > > >
> > > > 21:22:05 Pending 1
> > > >
> > > >
> > > >
> > > > Node 10.0.2.152 <http://10.0.2.152>:
> > > >
> > > > SB role changed from follower to candidate on
> 21:21:57
> > > >
> > > > SB role changed from candidate to follower on
> 21:22:17
> > > >
> > > >
> > > > 21:21:35 ESTABLISHED 211
> > > >
> > > > 21:21:35 Pending 0
> > > >
> > > > 21:21:45 ESTABLISHED 263
> > > >
> > > > 21:21:45 Pending 5
> > > >
> > > > 21:21:55 ESTABLISHED 316
> > > >
> > > > 21:21:55 Pending 0
> > > >
> > > >
> > > >
> > > >
> > > > Thanks,
> > > > Han
> > > >
> > > >
> > > >
> > > > --
> > > > Winson
> > > >
> > > >
> > > >
> > > > --
> > > > Winson
> > >
> > >
> > >
> > > --
> > > Winson
> >
> >
> >
> > --
> > Winson
>
>
>
> --
> Winson
>
>
>
> --
> Winson
More information about the discuss
mailing list