[ovs-discuss] OVN Scale with RAFT: how to make ovn-northd more reliable when RAFT leader unstable

Dumitru Ceara dceara at redhat.com
Fri Aug 28 06:58:59 UTC 2020


On 8/28/20 1:01 AM, Winson Wang wrote:
> Hi Dumitru,
> 
> Have you tried the OVS "learn" action to see if it address this scale issue?
> 
> 
> Regards,
> Winson
> 
>  

Hi Winson,

Sorry, didn't get a chance to look at this yet. It's still on my todo-list.

Regards,
Dumitru

> 
> On Fri, Jul 17, 2020 at 8:53 AM Winson Wang <windson.wang at gmail.com
> <mailto:windson.wang at gmail.com>> wrote:
> 
> 
> 
>     On Fri, Jul 17, 2020 at 12:54 AM Dumitru Ceara <dceara at redhat.com
>     <mailto:dceara at redhat.com>> wrote:
> 
>         On 7/17/20 2:58 AM, Winson Wang wrote:
>         > Hi Dumitru,
>         >
>         > most of the flows are in table 19.
> 
>         This is the ls_in_pre_hairpin table where we add flows for each
>         backend
>         of the load balancers.
> 
>         >
>         > -rw-r--r-- 1 root root 142M Jul 16 17:07 br-int.txt (all flows
>         dump file)
>         > -rw-r--r-- 1 root root 102M Jul 16 17:43 table-19.txt
>         > -rw-r--r-- 1 root root 7.8M Jul 16 17:43 table-11.txt
>         > -rw-r--r-- 1 root root 3.7M Jul 16 17:43 table-21.txt
>         >
>         > # cat table-19.txt |wc -l
>         > 408458
>         > ]# cat table-19.txt | grep "=9153" | wc -l
>         > 124744
>         >  cat table-19.txt | grep "=53" | wc -l
>         > 249488
>         > Coredns pod has svc with port number 53 and 9153.
>         >
> 
>         How many backends do you have for these VIPs (with port number
>         53 and
>         9153) in your load_balancer config?
> 
>     backends number is 63 with 63 CoreDNS pods running with exposing
>     Cluster IP 10.96.0.10.  tcp/53,  udp/53, tcp/9153.
>     lb-list  |grep 10.96.0.10
>     3b8a468a-44d2-4a34-94ca-626dac936cde                        udp    
>        10.96.0.10:53 <http://10.96.0.10:53>        192.168.104.3:53
>     <http://192.168.104.3:53>,192.168.105.3:53
>     <http://192.168.105.3:53>,192.168.106.3:53
>     <http://192.168.106.3:53>,192.168.107.3:53
>     <http://192.168.107.3:53>,192.168.108.3:53
>     <http://192.168.108.3:53>,192.168.109.3:53
>     <http://192.168.109.3:53>,192.168.110.3:53
>     <http://192.168.110.3:53>,192.168.111.3:53
>     <http://192.168.111.3:53>,192.168.112.3:53
>     <http://192.168.112.3:53>,192.168.113.3:53
>     <http://192.168.113.3:53>,192.168.114.3:53
>     <http://192.168.114.3:53>,192.168.115.3:53
>     <http://192.168.115.3:53>,192.168.116.3:53
>     <http://192.168.116.3:53>,192.168.118.4:53
>     <http://192.168.118.4:53>,192.168.119.3:53
>     <http://192.168.119.3:53>,192.168.120.4:53
>     <http://192.168.120.4:53>,192.168.121.3:53
>     <http://192.168.121.3:53>,192.168.122.3:53
>     <http://192.168.122.3:53>,192.168.123.3:53
>     <http://192.168.123.3:53>,192.168.130.3:53
>     <http://192.168.130.3:53>,192.168.131.3:53
>     <http://192.168.131.3:53>,192.168.136.3:53
>     <http://192.168.136.3:53>,192.168.142.3:53
>     <http://192.168.142.3:53>,192.168.4.3:53
>     <http://192.168.4.3:53>,192.168.45.3:53
>     <http://192.168.45.3:53>,192.168.46.3:53
>     <http://192.168.46.3:53>,192.168.47.3:53
>     <http://192.168.47.3:53>,192.168.48.3:53
>     <http://192.168.48.3:53>,192.168.49.3:53
>     <http://192.168.49.3:53>,192.168.50.3:53
>     <http://192.168.50.3:53>,192.168.51.3:53
>     <http://192.168.51.3:53>,192.168.52.3:53
>     <http://192.168.52.3:53>,192.168.53.3:53
>     <http://192.168.53.3:53>,192.168.54.3:53
>     <http://192.168.54.3:53>,192.168.55.3:53
>     <http://192.168.55.3:53>,192.168.56.3:53
>     <http://192.168.56.3:53>,192.168.57.3:53
>     <http://192.168.57.3:53>,192.168.58.3:53
>     <http://192.168.58.3:53>,192.168.59.4:53
>     <http://192.168.59.4:53>,192.168.60.4:53
>     <http://192.168.60.4:53>,192.168.61.3:53
>     <http://192.168.61.3:53>,192.168.62.3:53
>     <http://192.168.62.3:53>,192.168.63.3:53
>     <http://192.168.63.3:53>,192.168.64.3:53
>     <http://192.168.64.3:53>,192.168.65.3:53
>     <http://192.168.65.3:53>,192.168.66.3:53
>     <http://192.168.66.3:53>,192.168.67.3:53
>     <http://192.168.67.3:53>,192.168.68.4:53
>     <http://192.168.68.4:53>,192.168.69.3:53
>     <http://192.168.69.3:53>,192.168.70.3:53
>     <http://192.168.70.3:53>,192.168.71.3:53
>     <http://192.168.71.3:53>,192.168.72.3:53
>     <http://192.168.72.3:53>,192.168.73.3:53
>     <http://192.168.73.3:53>,192.168.74.4:53
>     <http://192.168.74.4:53>,192.168.75.4:53
>     <http://192.168.75.4:53>,192.168.76.3:53
>     <http://192.168.76.3:53>,192.168.77.3:53
>     <http://192.168.77.3:53>,192.168.78.3:53
>     <http://192.168.78.3:53>,192.168.79.3:53
>     <http://192.168.79.3:53>,192.168.80.4:53
>     <http://192.168.80.4:53>,192.168.81.3:53
>     <http://192.168.81.3:53>,192.168.82.4:53
>     <http://192.168.82.4:53>,192.168.83.4:53 <http://192.168.83.4:53>
>                                                                 tcp    
>        10.96.0.10:53 <http://10.96.0.10:53>        192.168.104.3:53
>     <http://192.168.104.3:53>,192.168.105.3:53
>     <http://192.168.105.3:53>,192.168.106.3:53
>     <http://192.168.106.3:53>,192.168.107.3:53
>     <http://192.168.107.3:53>,192.168.108.3:53
>     <http://192.168.108.3:53>,192.168.109.3:53
>     <http://192.168.109.3:53>,192.168.110.3:53
>     <http://192.168.110.3:53>,192.168.111.3:53
>     <http://192.168.111.3:53>,192.168.112.3:53
>     <http://192.168.112.3:53>,192.168.113.3:53
>     <http://192.168.113.3:53>,192.168.114.3:53
>     <http://192.168.114.3:53>,192.168.115.3:53
>     <http://192.168.115.3:53>,192.168.116.3:53
>     <http://192.168.116.3:53>,192.168.118.4:53
>     <http://192.168.118.4:53>,192.168.119.3:53
>     <http://192.168.119.3:53>,192.168.120.4:53
>     <http://192.168.120.4:53>,192.168.121.3:53
>     <http://192.168.121.3:53>,192.168.122.3:53
>     <http://192.168.122.3:53>,192.168.123.3:53
>     <http://192.168.123.3:53>,192.168.130.3:53
>     <http://192.168.130.3:53>,192.168.131.3:53
>     <http://192.168.131.3:53>,192.168.136.3:53
>     <http://192.168.136.3:53>,192.168.142.3:53
>     <http://192.168.142.3:53>,192.168.4.3:53
>     <http://192.168.4.3:53>,192.168.45.3:53
>     <http://192.168.45.3:53>,192.168.46.3:53
>     <http://192.168.46.3:53>,192.168.47.3:53
>     <http://192.168.47.3:53>,192.168.48.3:53
>     <http://192.168.48.3:53>,192.168.49.3:53
>     <http://192.168.49.3:53>,192.168.50.3:53
>     <http://192.168.50.3:53>,192.168.51.3:53
>     <http://192.168.51.3:53>,192.168.52.3:53
>     <http://192.168.52.3:53>,192.168.53.3:53
>     <http://192.168.53.3:53>,192.168.54.3:53
>     <http://192.168.54.3:53>,192.168.55.3:53
>     <http://192.168.55.3:53>,192.168.56.3:53
>     <http://192.168.56.3:53>,192.168.57.3:53
>     <http://192.168.57.3:53>,192.168.58.3:53
>     <http://192.168.58.3:53>,192.168.59.4:53
>     <http://192.168.59.4:53>,192.168.60.4:53
>     <http://192.168.60.4:53>,192.168.61.3:53
>     <http://192.168.61.3:53>,192.168.62.3:53
>     <http://192.168.62.3:53>,192.168.63.3:53
>     <http://192.168.63.3:53>,192.168.64.3:53
>     <http://192.168.64.3:53>,192.168.65.3:53
>     <http://192.168.65.3:53>,192.168.66.3:53
>     <http://192.168.66.3:53>,192.168.67.3:53
>     <http://192.168.67.3:53>,192.168.68.4:53
>     <http://192.168.68.4:53>,192.168.69.3:53
>     <http://192.168.69.3:53>,192.168.70.3:53
>     <http://192.168.70.3:53>,192.168.71.3:53
>     <http://192.168.71.3:53>,192.168.72.3:53
>     <http://192.168.72.3:53>,192.168.73.3:53
>     <http://192.168.73.3:53>,192.168.74.4:53
>     <http://192.168.74.4:53>,192.168.75.4:53
>     <http://192.168.75.4:53>,192.168.76.3:53
>     <http://192.168.76.3:53>,192.168.77.3:53
>     <http://192.168.77.3:53>,192.168.78.3:53
>     <http://192.168.78.3:53>,192.168.79.3:53
>     <http://192.168.79.3:53>,192.168.80.4:53
>     <http://192.168.80.4:53>,192.168.81.3:53
>     <http://192.168.81.3:53>,192.168.82.4:53
>     <http://192.168.82.4:53>,192.168.83.4:53 <http://192.168.83.4:53>
>                                                                 tcp    
>        10.96.0.10:9153 <http://10.96.0.10:9153>      192.168.104.3:9153
>     <http://192.168.104.3:9153>,192.168.105.3:9153
>     <http://192.168.105.3:9153>,192.168.106.3:9153
>     <http://192.168.106.3:9153>,192.168.107.3:9153
>     <http://192.168.107.3:9153>,192.168.108.3:9153
>     <http://192.168.108.3:9153>,192.168.109.3:9153
>     <http://192.168.109.3:9153>,192.168.110.3:9153
>     <http://192.168.110.3:9153>,192.168.111.3:9153
>     <http://192.168.111.3:9153>,192.168.112.3:9153
>     <http://192.168.112.3:9153>,192.168.113.3:9153
>     <http://192.168.113.3:9153>,192.168.114.3:9153
>     <http://192.168.114.3:9153>,192.168.115.3:9153
>     <http://192.168.115.3:9153>,192.168.116.3:9153
>     <http://192.168.116.3:9153>,192.168.118.4:9153
>     <http://192.168.118.4:9153>,192.168.119.3:9153
>     <http://192.168.119.3:9153>,192.168.120.4:9153
>     <http://192.168.120.4:9153>,192.168.121.3:9153
>     <http://192.168.121.3:9153>,192.168.122.3:9153
>     <http://192.168.122.3:9153>,192.168.123.3:9153
>     <http://192.168.123.3:9153>,192.168.130.3:9153
>     <http://192.168.130.3:9153>,192.168.131.3:9153
>     <http://192.168.131.3:9153>,192.168.136.3:9153
>     <http://192.168.136.3:9153>,192.168.142.3:9153
>     <http://192.168.142.3:9153>,192.168.4.3:9153
>     <http://192.168.4.3:9153>,192.168.45.3:9153
>     <http://192.168.45.3:9153>,192.168.46.3:9153
>     <http://192.168.46.3:9153>,192.168.47.3:9153
>     <http://192.168.47.3:9153>,192.168.48.3:9153
>     <http://192.168.48.3:9153>,192.168.49.3:9153
>     <http://192.168.49.3:9153>,192.168.50.3:9153
>     <http://192.168.50.3:9153>,192.168.51.3:9153
>     <http://192.168.51.3:9153>,192.168.52.3:9153
>     <http://192.168.52.3:9153>,192.168.53.3:9153
>     <http://192.168.53.3:9153>,192.168.54.3:9153
>     <http://192.168.54.3:9153>,192.168.55.3:9153
>     <http://192.168.55.3:9153>,192.168.56.3:9153
>     <http://192.168.56.3:9153>,192.168.57.3:9153
>     <http://192.168.57.3:9153>,192.168.58.3:9153
>     <http://192.168.58.3:9153>,192.168.59.4:9153
>     <http://192.168.59.4:9153>,192.168.60.4:9153
>     <http://192.168.60.4:9153>,192.168.61.3:9153
>     <http://192.168.61.3:9153>,192.168.62.3:9153
>     <http://192.168.62.3:9153>,192.168.63.3:9153
>     <http://192.168.63.3:9153>,192.168.64.3:9153
>     <http://192.168.64.3:9153>,192.168.65.3:9153
>     <http://192.168.65.3:9153>,192.168.66.3:9153
>     <http://192.168.66.3:9153>,192.168.67.3:9153
>     <http://192.168.67.3:9153>,192.168.68.4:9153
>     <http://192.168.68.4:9153>,192.168.69.3:9153
>     <http://192.168.69.3:9153>,192.168.70.3:9153
>     <http://192.168.70.3:9153>,192.168.71.3:9153
>     <http://192.168.71.3:9153>,192.168.72.3:9153
>     <http://192.168.72.3:9153>,192.168.73.3:9153
>     <http://192.168.73.3:9153>,192.168.74.4:9153
>     <http://192.168.74.4:9153>,192.168.75.4:9153
>     <http://192.168.75.4:9153>,192.168.76.3:9153
>     <http://192.168.76.3:9153>,192.168.77.3:9153
>     <http://192.168.77.3:9153>,192.168.78.3:9153
>     <http://192.168.78.3:9153>,192.168.79.3:9153
>     <http://192.168.79.3:9153>,192.168.80.4:9153
>     <http://192.168.80.4:9153>,192.168.81.3:9153
>     <http://192.168.81.3:9153>,192.168.82.4:9153
>     <http://192.168.82.4:9153>,192.168.83.4:9153 <http://192.168.83.4:9153>
> 
> 
>         Thanks,
>         Dumitru
> 
>         > Please let me know if you need more information.
>         >
>         >
>         > Regards,
>         > Winson
>         >
>         >
>         > On Thu, Jul 16, 2020 at 11:23 AM Dumitru Ceara
>         <dceara at redhat.com <mailto:dceara at redhat.com>
>         > <mailto:dceara at redhat.com <mailto:dceara at redhat.com>>> wrote:
>         >
>         >     On 7/15/20 8:02 PM, Winson Wang wrote:
>         >     > +add ovn-Kubernetes group.
>         >     >
>         >     > Hi Dumitru,
>         >     >
>         >     > With recent patches from you and Han,  now for k8s basic
>         workload,
>         >     such
>         >     > node resources and pod resources are fixed and look good.
>         >     > Much thanks!
>         >
>         >     Hi Winson,
>         >
>         >     Glad to hear that!
>         >
>         >     >
>         >     > For k8s workload which exposes as svc IP is every
>         common,  for
>         >     example,
>         >     > the coreDNS pod's deployment.
>         >     > With large cluster size such  as 1000,  there is service
>         to auto scale
>         >     > up coreDNS deployment,  if we use default 16 nodes per
>         coredns, 
>         >     it could be
>         >     > 63 coredns pods.
>         >     > On my 1006 nodes setup,  deployment from coreDNS from 2
>         to 63.
>         >     > SB raft election 16s is not good for this operation in
>         my test
>         >     > environment, it makes one raft node cannot finish the
>         election in two
>         >     > election slot when making all it's
>         >     > clients disconnect and reconnect to two other raft
>         nodes,  which makes
>         >     > raft clients in an unbalanced state after this operation.
>         >     > This condition might be avoided without larger election
>         timer.
>         >     >
>         >     > For the SB and work node resource side:
>         >     > SB DB size increased 27MB.
>         >     > br-int open flows increased around 369K, 
>         >     > RSS memory of (ovs + ovn-controller) increased more than
>         600MB.
>         >
>         >     This increase on the hypervisor side is most likely
>         because of the
>         >     openflows for hairpin traffic for VIPs (service IP). To
>         confirm, would
>         >     it be possible to take a snapshot of the OVS flow table
>         and see how many
>         >     flows there are per table?
>         >
>         >     >
>         >     > So if OVN experts can figure how to optimize it would be
>         very
>         >     great for
>         >     > ovn-k8s scale up to large cluster size I think.
>         >     >
>         >
>         >     If the above is due to flows for LB flows to handle
>         hairpin traffic, the
>         >     only idea I have is to use OVS "learn" action to have the
>         flows
>         >     generated as needed. However, I didn't get the chance to
>         try it out yet.
>         >
>         >     Thanks,
>         >     Dumitru
>         >
>         >     >
>         >     > Regards,
>         >     > Winson
>         >     >
>         >     >
>         >     > On Fri, May 1, 2020 at 1:35 AM Dumitru Ceara
>         <dceara at redhat.com <mailto:dceara at redhat.com>
>         >     <mailto:dceara at redhat.com <mailto:dceara at redhat.com>>
>         >     > <mailto:dceara at redhat.com <mailto:dceara at redhat.com>
>         <mailto:dceara at redhat.com <mailto:dceara at redhat.com>>>> wrote:
>         >     >
>         >     >     On 5/1/20 12:00 AM, Winson Wang wrote:
>         >     >     > Hi Han,  Dumitru,
>         >     >     >
>         >     >
>         >     >     Hi Winson,
>         >     >
>         >     >     > With the fix from Dumitru
>         >     >     >
>         >     >   
>         >   
>           https://github.com/ovn-org/ovn/commit/97e82ae5f135a088c9e95b49122d8217718d23f4
>         >     >     >
>         >     >     > It can greatly reduced the OVS SB RAFT workload
>         based on my
>         >     stress
>         >     >     test
>         >     >     > mode with k8s svc with large endpoints.
>         >     >     >
>         >     >     > The DB file size increased much less with fix, so
>         it will
>         >     not trigger
>         >     >     > the leader election with same work load.
>         >     >     >
>         >     >     > Dumitru,  based my test,  logic flows number is
>         fixed with
>         >     cluster
>         >     >     size
>         >     >     > regardless of number of VIP endpoints.
>         >     >
>         >     >     The number of logical flows will be fixed based on
>         number of
>         >     VIPs (2 per
>         >     >     VIP) but the size of the match expression depends on
>         the number of
>         >     >     backends per VIP so the SB DB size will increase
>         when adding
>         >     backends to
>         >     >     existing VIPs.
>         >     >
>         >     >     >
>         >     >     > But the open flow count on each node still have
>         the relationship
>         >     >     of the
>         >     >     > endpoints size.
>         >     >
>         >     >     Yes, this is due to the match expression in the
>         logical flow
>         >     above which
>         >     >     is of the form:
>         >     >
>         >     >     (ip.src == backend-ip1 && ip.dst == backend-ip2) ||
>         .. ||
>         >     (ip.src ==
>         >     >     backend-ipn && ip.dst == backend-ipn)
>         >     >
>         >     >     This will get expanded to n openflow rules, one per
>         backend, to
>         >     >     determine if traffic was hairpinned.
>         >     >
>         >     >     > Any idea how to reduce the open flow cnt on each
>         node's br-int?
>         >     >     >
>         >     >     >
>         >     >
>         >     >     Unfortunately I don't think there's a way to
>         determine if
>         >     traffic was
>         >     >     hairpinned because I don't think we can have
>         openflow rules
>         >     that match
>         >     >     on "ip.src == ip.dst". So in the worst case, we will
>         probably
>         >     need two
>         >     >     openflow rules per backend IP (one for initiator
>         traffic, one for
>         >     >     reply).
>         >     >
>         >     >     I'll think more about it though.
>         >     >
>         >     >     Regards,
>         >     >     Dumitru
>         >     >
>         >     >     > Regards,
>         >     >     > Winson
>         >     >     >
>         >     >     >
>         >     >     >
>         >     >     >
>         >     >     >
>         >     >     >
>         >     >     >
>         >     >     > On Wed, Apr 29, 2020 at 1:42 PM Winson Wang
>         >     >     <windson.wang at gmail.com
>         <mailto:windson.wang at gmail.com> <mailto:windson.wang at gmail.com
>         <mailto:windson.wang at gmail.com>>
>         >     <mailto:windson.wang at gmail.com
>         <mailto:windson.wang at gmail.com> <mailto:windson.wang at gmail.com
>         <mailto:windson.wang at gmail.com>>>
>         >     >     > <mailto:windson.wang at gmail.com
>         <mailto:windson.wang at gmail.com>
>         >     <mailto:windson.wang at gmail.com
>         <mailto:windson.wang at gmail.com>> <mailto:windson.wang at gmail.com
>         <mailto:windson.wang at gmail.com>
>         >     <mailto:windson.wang at gmail.com
>         <mailto:windson.wang at gmail.com>>>>>
>         >     >     wrote:
>         >     >     >
>         >     >     >     Hi Han,
>         >     >     >
>         >     >     >     Thanks for quick reply.
>         >     >     >     Please see my reply below.
>         >     >     >
>         >     >     >     On Wed, Apr 29, 2020 at 12:31 PM Han Zhou
>         <hzhou at ovn.org <mailto:hzhou at ovn.org>
>         >     <mailto:hzhou at ovn.org <mailto:hzhou at ovn.org>>
>         >     >     <mailto:hzhou at ovn.org <mailto:hzhou at ovn.org>
>         <mailto:hzhou at ovn.org <mailto:hzhou at ovn.org>>>
>         >     >     >     <mailto:hzhou at ovn.org <mailto:hzhou at ovn.org>
>         <mailto:hzhou at ovn.org <mailto:hzhou at ovn.org>>
>         >     <mailto:hzhou at ovn.org <mailto:hzhou at ovn.org>
>         <mailto:hzhou at ovn.org <mailto:hzhou at ovn.org>>>>> wrote:
>         >     >     >
>         >     >     >
>         >     >     >
>         >     >     >         On Wed, Apr 29, 2020 at 10:29 AM Winson Wang
>         >     >     >         <windson.wang at gmail.com
>         <mailto:windson.wang at gmail.com>
>         >     <mailto:windson.wang at gmail.com
>         <mailto:windson.wang at gmail.com>> <mailto:windson.wang at gmail.com
>         <mailto:windson.wang at gmail.com>
>         >     <mailto:windson.wang at gmail.com
>         <mailto:windson.wang at gmail.com>>>
>         >     >     <mailto:windson.wang at gmail.com
>         <mailto:windson.wang at gmail.com> <mailto:windson.wang at gmail.com
>         <mailto:windson.wang at gmail.com>>
>         >     <mailto:windson.wang at gmail.com
>         <mailto:windson.wang at gmail.com> <mailto:windson.wang at gmail.com
>         <mailto:windson.wang at gmail.com>>>>> wrote:
>         >     >     >         >
>         >     >     >         > Hello Experts,
>         >     >     >         >
>         >     >     >         > I am doing stress with k8s cluster with
>         ovn,  one
>         >     thing I am
>         >     >     >         seeing is that when raft nodes
>         >     >     >         > got update for large data in short time from
>         >     ovn-northd,  3
>         >     >     >         raft nodes will trigger voting and leader
>         role switched
>         >     >     from one
>         >     >     >         node to another.
>         >     >     >         >
>         >     >     >         > From ovn-northd side,  I can see
>         ovn-northd will
>         >     trigger the
>         >     >     >         BACKOFF, RECONNECT...
>         >     >     >         >
>         >     >     >         > Since ovn-northd only connect to NB/SB
>         leader only and
>         >     >     how can
>         >     >     >         we make ovn-northd more available  in most
>         of the time?
>         >     >     >         >
>         >     >     >         > Is it possible to make ovn-northd have
>         established
>         >     >     connections
>         >     >     >         to all raft nodes to avoid the
>         >     >     >         > reconnect mechanism?
>         >     >     >         > Since the backoff time 8s is not
>         configurable for now.
>         >     >     >         >
>         >     >     >         >
>         >     >     >         > Test logs:
>         >     >     >         >
>         >     >     >         >
>         >     >     >       
>         >     >   
>         >   
>            2020-04-29T17:03:08.296Z|41861|ovsdb_idl|INFO|tcp:10.0.2.152:6642
>         <http://10.0.2.152:6642>
>         >     <http://10.0.2.152:6642>
>         >     >     <http://10.0.2.152:6642> <http://10.0.2.152:6642>:
>         >     >     >         clustered database server is not cluster
>         leader; trying
>         >     >     another
>         >     >     >         server
>         >     >     >         >
>         >     >     >         >
>         >     >     >       
>         >     >   
>         >   
>            2020-04-29T17:03:08.296Z|41862|reconnect|DBG|tcp:10.0.2.152:6642
>         <http://10.0.2.152:6642>
>         >     <http://10.0.2.152:6642>
>         >     >     <http://10.0.2.152:6642>
>         >     >     >         <http://10.0.2.152:6642>: entering RECONNECT
>         >     >     >         >
>         >     >     >         >
>         >     >     >       
>         >     >   
>         >   
>            2020-04-29T17:03:08.304Z|41863|reconnect|DBG|tcp:10.0.2.152:6642
>         <http://10.0.2.152:6642>
>         >     <http://10.0.2.152:6642>
>         >     >     <http://10.0.2.152:6642>
>         >     >     >         <http://10.0.2.152:6642>: entering BACKOFF
>         >     >     >         >
>         >     >     >         >
>         >     2020-04-29T17:03:09.708Z|41867|coverage|INFO|Dropped 2 log
>         >     >     >         messages in last 78 seconds (most
>         recently, 71 seconds
>         >     >     ago) due
>         >     >     >         to excessive rate
>         >     >     >         >
>         >     >     >         >
>         2020-04-29T17:03:09.708Z|41868|coverage|INFO|Skipping
>         >     >     details
>         >     >     >         of duplicate event coverage for hash=ceada91f
>         >     >     >         >
>         >     >     >         >
>         >     >     >       
>         >     >   
>         >   
>            2020-04-29T17:03:16.304Z|41869|reconnect|DBG|tcp:10.0.2.153:6642
>         <http://10.0.2.153:6642>
>         >     <http://10.0.2.153:6642>
>         >     >     <http://10.0.2.153:6642>
>         >     >     >         <http://10.0.2.153:6642>: entering CONNECTING
>         >     >     >         >
>         >     >     >         >
>         >     >     >       
>         >     >   
>         >   
>            2020-04-29T17:03:16.308Z|41870|reconnect|INFO|tcp:10.0.2.153:6642
>         <http://10.0.2.153:6642>
>         >     <http://10.0.2.153:6642>
>         >     >     <http://10.0.2.153:6642> <http://10.0.2.153:6642>:
>         >     >     >         connected
>         >     >     >         >
>         >     >     >         >
>         >     >     >       
>         >     >   
>         >   
>            2020-04-29T17:03:16.308Z|41871|reconnect|DBG|tcp:10.0.2.153:6642
>         <http://10.0.2.153:6642>
>         >     <http://10.0.2.153:6642>
>         >     >     <http://10.0.2.153:6642>
>         >     >     >         <http://10.0.2.153:6642>: entering ACTIVE
>         >     >     >         >
>         >     >     >         >
>         >     >   
>          2020-04-29T17:03:16.308Z|41872|ovn_northd|INFO|ovn-northd lock
>         >     >     >         lost. This ovn-northd instance is now on
>         standby.
>         >     >     >         >
>         >     >     >         >
>         >     >   
>          2020-04-29T17:03:16.309Z|41873|ovn_northd|INFO|ovn-northd lock
>         >     >     >         acquired. This ovn-northd instance is now
>         active.
>         >     >     >         >
>         >     >     >         >
>         >     >     >       
>         >     >   
>         >   
>            2020-04-29T17:03:16.311Z|41874|ovsdb_idl|INFO|tcp:10.0.2.153:6642
>         <http://10.0.2.153:6642>
>         >     <http://10.0.2.153:6642>
>         >     >     <http://10.0.2.153:6642> <http://10.0.2.153:6642>:
>         >     >     >         clustered database server is disconnected from
>         >     cluster; trying
>         >     >     >         another server
>         >     >     >         >
>         >     >     >         >
>         >     >     >       
>         >     >   
>         >   
>            2020-04-29T17:03:16.311Z|41875|reconnect|DBG|tcp:10.0.2.153:6642
>         <http://10.0.2.153:6642>
>         >     <http://10.0.2.153:6642>
>         >     >     <http://10.0.2.153:6642>
>         >     >     >         <http://10.0.2.153:6642>: entering RECONNECT
>         >     >     >         >
>         >     >     >         >
>         >     >     >       
>         >     >   
>         >   
>            2020-04-29T17:03:16.312Z|41876|reconnect|DBG|tcp:10.0.2.153:6642
>         <http://10.0.2.153:6642>
>         >     <http://10.0.2.153:6642>
>         >     >     <http://10.0.2.153:6642>
>         >     >     >         <http://10.0.2.153:6642>: entering BACKOFF
>         >     >     >         >
>         >     >     >         >
>         >     >     >       
>         >     >   
>         >   
>            2020-04-29T17:03:24.316Z|41877|reconnect|DBG|tcp:10.0.2.151:6642
>         <http://10.0.2.151:6642>
>         >     <http://10.0.2.151:6642>
>         >     >     <http://10.0.2.151:6642>
>         >     >     >         <http://10.0.2.151:6642>: entering CONNECTING
>         >     >     >         >
>         >     >     >         >
>         >     >     >       
>         >     >   
>         >   
>            2020-04-29T17:03:24.321Z|41878|reconnect|INFO|tcp:10.0.2.151:6642
>         <http://10.0.2.151:6642>
>         >     <http://10.0.2.151:6642>
>         >     >     <http://10.0.2.151:6642> <http://10.0.2.151:6642>:
>         >     >     >         connected
>         >     >     >         >
>         >     >     >         >
>         >     >     >       
>         >     >   
>         >   
>            2020-04-29T17:03:24.321Z|41879|reconnect|DBG|tcp:10.0.2.151:6642
>         <http://10.0.2.151:6642>
>         >     <http://10.0.2.151:6642>
>         >     >     <http://10.0.2.151:6642>
>         >     >     >         <http://10.0.2.151:6642>: entering ACTIVE
>         >     >     >         >
>         >     >     >         >
>         >     >   
>          2020-04-29T17:03:24.321Z|41880|ovn_northd|INFO|ovn-northd lock
>         >     >     >         lost. This ovn-northd instance is now on
>         standby.
>         >     >     >         >
>         >     >     >         >
>         >     >   
>          2020-04-29T17:03:24.354Z|41881|ovn_northd|INFO|ovn-northd lock
>         >     >     >         acquired. This ovn-northd instance is now
>         active.
>         >     >     >         >
>         >     >     >         >
>         >     >     >       
>         >     >   
>         >   
>            2020-04-29T17:03:24.358Z|41882|ovsdb_idl|INFO|tcp:10.0.2.151:6642
>         <http://10.0.2.151:6642>
>         >     <http://10.0.2.151:6642>
>         >     >     <http://10.0.2.151:6642> <http://10.0.2.151:6642>:
>         >     >     >         clustered database server is not cluster
>         leader; trying
>         >     >     another
>         >     >     >         server
>         >     >     >         >
>         >     >     >         >
>         >     >     >       
>         >     >   
>         >   
>            2020-04-29T17:03:24.358Z|41883|reconnect|DBG|tcp:10.0.2.151:6642
>         <http://10.0.2.151:6642>
>         >     <http://10.0.2.151:6642>
>         >     >     <http://10.0.2.151:6642>
>         >     >     >         <http://10.0.2.151:6642>: entering RECONNECT
>         >     >     >         >
>         >     >     >         >
>         >     >     >       
>         >     >   
>         >   
>            2020-04-29T17:03:24.360Z|41884|reconnect|DBG|tcp:10.0.2.151:6642
>         <http://10.0.2.151:6642>
>         >     <http://10.0.2.151:6642>
>         >     >     <http://10.0.2.151:6642>
>         >     >     >         <http://10.0.2.151:6642>: entering BACKOFF
>         >     >     >         >
>         >     >     >         >
>         >     >     >       
>         >     >   
>         >   
>            2020-04-29T17:03:32.367Z|41885|reconnect|DBG|tcp:10.0.2.152:6642
>         <http://10.0.2.152:6642>
>         >     <http://10.0.2.152:6642>
>         >     >     <http://10.0.2.152:6642>
>         >     >     >         <http://10.0.2.152:6642>: entering CONNECTING
>         >     >     >         >
>         >     >     >         >
>         >     >     >       
>         >     >   
>         >   
>            2020-04-29T17:03:32.372Z|41886|reconnect|INFO|tcp:10.0.2.152:6642
>         <http://10.0.2.152:6642>
>         >     <http://10.0.2.152:6642>
>         >     >     <http://10.0.2.152:6642> <http://10.0.2.152:6642>:
>         >     >     >         connected
>         >     >     >         >
>         >     >     >         >
>         >     >     >       
>         >     >   
>         >   
>            2020-04-29T17:03:32.372Z|41887|reconnect|DBG|tcp:10.0.2.152:6642
>         <http://10.0.2.152:6642>
>         >     <http://10.0.2.152:6642>
>         >     >     <http://10.0.2.152:6642>
>         >     >     >         <http://10.0.2.152:6642>: entering ACTIVE
>         >     >     >         >
>         >     >     >         >
>         >     >   
>          2020-04-29T17:03:32.372Z|41888|ovn_northd|INFO|ovn-northd lock
>         >     >     >         lost. This ovn-northd instance is now on
>         standby.
>         >     >     >         >
>         >     >     >         >
>         >     >   
>          2020-04-29T17:03:32.373Z|41889|ovn_northd|INFO|ovn-northd lock
>         >     >     >         acquired. This ovn-northd instance is now
>         active.
>         >     >     >         >
>         >     >     >         >
>         >     >     >       
>         >     >   
>         >   
>            2020-04-29T17:03:32.376Z|41890|ovsdb_idl|INFO|tcp:10.0.2.152:6642
>         <http://10.0.2.152:6642>
>         >     <http://10.0.2.152:6642>
>         >     >     <http://10.0.2.152:6642> <http://10.0.2.152:6642>:
>         >     >     >         clustered database server is not cluster
>         leader; trying
>         >     >     another
>         >     >     >         server
>         >     >     >         >
>         >     >     >         >
>         >     >     >       
>         >     >   
>         >   
>            2020-04-29T17:03:32.376Z|41891|reconnect|DBG|tcp:10.0.2.152:6642
>         <http://10.0.2.152:6642>
>         >     <http://10.0.2.152:6642>
>         >     >     <http://10.0.2.152:6642>
>         >     >     >         <http://10.0.2.152:6642>: entering RECONNECT
>         >     >     >         >
>         >     >     >         >
>         >     >     >       
>         >     >   
>         >   
>            2020-04-29T17:03:32.378Z|41892|reconnect|DBG|tcp:10.0.2.152:6642
>         <http://10.0.2.152:6642>
>         >     <http://10.0.2.152:6642>
>         >     >     <http://10.0.2.152:6642>
>         >     >     >         <http://10.0.2.152:6642>: entering BACKOFF
>         >     >     >         >
>         >     >     >         >
>         >     >     >       
>         >     >   
>         >   
>            2020-04-29T17:03:40.381Z|41893|reconnect|DBG|tcp:10.0.2.153:6642
>         <http://10.0.2.153:6642>
>         >     <http://10.0.2.153:6642>
>         >     >     <http://10.0.2.153:6642>
>         >     >     >         <http://10.0.2.153:6642>: entering CONNECTING
>         >     >     >         >
>         >     >     >         >
>         >     >     >       
>         >     >   
>         >   
>            2020-04-29T17:03:40.385Z|41894|reconnect|INFO|tcp:10.0.2.153:6642
>         <http://10.0.2.153:6642>
>         >     <http://10.0.2.153:6642>
>         >     >     <http://10.0.2.153:6642> <http://10.0.2.153:6642>:
>         >     >     >         connected
>         >     >     >         >
>         >     >     >         >
>         >     >     >       
>         >     >   
>         >   
>            2020-04-29T17:03:40.385Z|41895|reconnect|DBG|tcp:10.0.2.153:6642
>         <http://10.0.2.153:6642>
>         >     <http://10.0.2.153:6642>
>         >     >     <http://10.0.2.153:6642>
>         >     >     >         <http://10.0.2.153:6642>: entering ACTIVE
>         >     >     >         >
>         >     >     >         >
>         >     >   
>          2020-04-29T17:03:40.385Z|41896|ovn_northd|INFO|ovn-northd lock
>         >     >     >         lost. This ovn-northd instance is now on
>         standby.
>         >     >     >         >
>         >     >     >         >
>         >     >   
>          2020-04-29T17:03:40.385Z|41897|ovn_northd|INFO|ovn-northd lock
>         >     >     >         acquired. This ovn-northd instance is now
>         active.
>         >     >     >         >
>         >     >     >         >
>         >     >     >         > --
>         >     >     >         > Winson
>         >     >     >
>         >     >     >         Hi Winson,
>         >     >     >
>         >     >     >         Since northd heavily writes to SB DB, it is
>         >     implemented to
>         >     >     >         connect to leader only, for better performance
>         >     (avoid the
>         >     >     extra
>         >     >     >         cost of a follower forwarding writes to
>         leader).
>         >     When leader
>         >     >     >         re-election happened, it has to reconnect
>         to the new
>         >     leader.
>         >     >     >         However, if the cluster is unstable, this
>         step would
>         >     also take
>         >     >     >         longer time than expected. I'd suggest to
>         tune the
>         >     election
>         >     >     >         timer to avoid re-election during heavy
>         operations.
>         >     >     >
>         >     >     >     I can see with election timer to higher value
>         can avoid
>         >     this,
>         >     >     but if
>         >     >     >     more stress generated then I  see it happen again.
>         >     >     >     For real workload,  it may not hit the spike
>         stress I
>         >     trigger for
>         >     >     >     stress test, so this is just for scale profiling.
>         >     >     >      
>         >     >     >
>         >     >     >
>         >     >     >         If the server is overloaded for too long
>         and longer
>         >     election
>         >     >     >         timer is unacceptable, the only way to
>         solve the
>         >     availability
>         >     >     >         problem is to improve ovsdb performance.
>         How big is your
>         >     >     >         transaction and what's your election timer
>         setting?
>         >     >     >
>         >     >     >     I can see ovn-northd send 33MB data in short
>         time,  and
>         >     >     ovsdb-server
>         >     >     >     need sync with clients,  I run iftop on
>         on-controller
>         >     side, each
>         >     >     >     node will receive around 25MB update.
>         >     >     >     Each ovn-controller get 25MB data,  3 raft
>         nodes total send
>         >     >     25*646 ~16GB
>         >     >     >      
>         >     >     >
>         >     >     >         The number of clients also impacts the
>         performance
>         >     since the
>         >     >     >         heavy update needs to be synced to all
>         clients. How many
>         >     >     clients
>         >     >     >         do you have?
>         >     >     >
>         >     >     >     Is there one mechanism for all the
>         ovn-controller clients to
>         >     >     connect
>         >     >     >     to raft followers only to skip leader?
>         >     >     >     That will make leader node more cpu resource
>         for voting and
>         >     >     cluster
>         >     >     >     level sync.
>         >     >     >     Based my stress test,  after ovn-controller
>         connected to 2
>         >     >     follower
>         >     >     >     nodes,  leader node only connect to ovn-northd.
>         >     >     >     This model can finish raft voting finish in
>         shorter time
>         >     when
>         >     >     >     ovn-northd trigger same workload.
>         >     >     >      
>         >     >     >      Total clients is 646 nodes.
>         >     >     >     Before the leader role changes,  all clients
>         connected to 3
>         >     >     nodes in
>         >     >     >     balanced way,  each raft node has 200+
>         connections.
>         >     >     >     After lead role change,  ovn controller side
>         get the
>         >     following
>         >     >     messages:
>         >     >     >   
>         >     >   
>         >   
>            2020-04-29T04:21:14.566Z|00674|ovsdb_idl|INFO|tcp:10.0.2.153:6642
>         <http://10.0.2.153:6642>
>         >     <http://10.0.2.153:6642>
>         >     >     <http://10.0.2.153:6642>
>         >     >     >     <http://10.0.2.153:6642>: clustered database
>         server is
>         >     >     disconnected
>         >     >     >     from cluster; trying another server
>         >     >     >
>         >     >     >     Node 10.0.2.153 :
>         >     >     >
>         >     >     >     SB role changed from follower to candidate on
>         21:21:06
>         >     >     >
>         >     >     >     SB role changed from candidate to leader on
>         21:22:16
>         >     >     >
>         >     >     >     netstat for 6642 port connections:
>         >     >     >
>         >     >     >     21:21:31 ESTABLISHED 202
>         >     >     >
>         >     >     >     21:21:31 Pending 0
>         >     >     >
>         >     >     >     21:21:41 ESTABLISHED 0
>         >     >     >
>         >     >     >     21:21:41 Pending 0
>         >     >     >
>         >     >     >
>         >     >     >     The above node in candidate role for more than
>         60s which
>         >     more than
>         >     >     >     my election timer setting 30s.
>         >     >     >
>         >     >     >     all the 202 connections of node (10.0.2.153)
>         shift to the
>         >     >     other two
>         >     >     >     nodes in short time. After that only
>         >     >     >
>         >     >     >     ovn-northd connected to this node.
>         >     >     >
>         >     >     >
>         >     >     >     Node 10.0.2.151 <http://10.0.2.151>:
>         >     >     >
>         >     >     >     SB role changed from leader to follower on
>         21:21:23
>         >     >     >
>         >     >     >
>         >     >     >     21:21:35 ESTABLISHED 233
>         >     >     >
>         >     >     >     21:21:35 Pending 0
>         >     >     >
>         >     >     >     21:21:45 ESTABLISHED 282
>         >     >     >
>         >     >     >     21:21:45 Pending 9
>         >     >     >
>         >     >     >     21:21:55 ESTABLISHED 330
>         >     >     >
>         >     >     >     21:21:55 Pending 1
>         >     >     >
>         >     >     >     21:22:05 ESTABLISHED 330
>         >     >     >
>         >     >     >     21:22:05 Pending 1
>         >     >     >
>         >     >     >
>         >     >     >
>         >     >     >     Node 10.0.2.152 <http://10.0.2.152>:
>         >     >     >
>         >     >     >     SB role changed from follower to candidate on
>         21:21:57
>         >     >     >
>         >     >     >     SB role changed from candidate to follower on
>         21:22:17
>         >     >     >
>         >     >     >
>         >     >     >     21:21:35 ESTABLISHED 211
>         >     >     >
>         >     >     >     21:21:35 Pending 0
>         >     >     >
>         >     >     >     21:21:45 ESTABLISHED 263
>         >     >     >
>         >     >     >     21:21:45 Pending 5
>         >     >     >
>         >     >     >     21:21:55 ESTABLISHED 316
>         >     >     >
>         >     >     >     21:21:55 Pending 0
>         >     >     >
>         >     >     >
>         >     >     >
>         >     >     >
>         >     >     >         Thanks,
>         >     >     >         Han
>         >     >     >
>         >     >     >
>         >     >     >
>         >     >     >     --
>         >     >     >     Winson
>         >     >     >
>         >     >     >
>         >     >     >
>         >     >     > --
>         >     >     > Winson
>         >     >
>         >     >
>         >     >
>         >     > --
>         >     > Winson
>         >
>         >
>         >
>         > --
>         > Winson
> 
> 
> 
>     -- 
>     Winson
> 
> 
> 
> -- 
> Winson



More information about the discuss mailing list