[ovs-dev] [ovs-discuss] [OVN] ovn-northd takes much CPU when no configuration update

Numan Siddique numans at ovn.org
Tue Aug 4 07:38:03 UTC 2020


On Tue, Aug 4, 2020 at 9:02 AM Tony Liu <tonyliu0592 at hotmail.com> wrote:

> The probe awakes recomputing?
> There is probe every 5 seconds. Without any connection up/down or failover,
> ovn-northd will recompute everything every 5 seconds, no matter what?
> Really?
>
> Anyways, I will increase the probe interval for now, see if that helps.
>

I think we should optimise this case. I am planning to look into this.

Thanks
Numan


>
>
> Thanks!
>
> Tony
>
> > -----Original Message-----
> > From: Han Zhou <hzhou at ovn.org>
> > Sent: Monday, August 3, 2020 8:22 PM
> > To: Tony Liu <tonyliu0592 at hotmail.com>
> > Cc: Han Zhou <hzhou at ovn.org>; ovs-discuss <ovs-discuss at openvswitch.org>;
> > ovs-dev <ovs-dev at openvswitch.org>
> > Subject: Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when no
> > configuration update
> >
> > Sorry that I didn't make it clear enough. The OVSDB probe itself doesn't
> > take much CPU, but the probe awakes ovn-northd main loop, which
> > recompute everything, which is why you see CPU spike.
> > It will be solved by incremental-processing, when only delta is
> > processed, and in case of probe handling, there is no change in
> > configuration, so the delta is zero.
> > For now, please follow the steps to adjust probe interval, if the CPU of
> > ovn-northd (when there is no configuration change) is a concern for you.
> > But please remember that this has no impact to the real CPU usage for
> > handling configuration changes.
> >
> >
> > Thanks,
> > Han
> >
> >
> > On Mon, Aug 3, 2020 at 8:11 PM Tony Liu <tonyliu0592 at hotmail.com
> > <mailto:tonyliu0592 at hotmail.com> > wrote:
> >
> >
> >       Health check (5 sec internal) taking 30%-100% CPU is definitely not
> > acceptable,
> >       if that's really the case. There must be some blocking (and not
> > yielding CPU)
> >       in coding, which is not supposed to be there.
> >
> >       Could you point me to the coding for such health check?
> >       Is it single thread? Does it use any event library?
> >
> >
> >       Thanks!
> >
> >       Tony
> >
> >       > -----Original Message-----
> >       > From: Han Zhou <hzhou at ovn.org <mailto:hzhou at ovn.org> >
> >       > Sent: Saturday, August 1, 2020 9:11 PM
> >       > To: Tony Liu <tonyliu0592 at hotmail.com
> > <mailto:tonyliu0592 at hotmail.com> >
> >       > Cc: ovs-discuss <ovs-discuss at openvswitch.org <mailto:ovs-
> > discuss at openvswitch.org> >; ovs-dev <ovs-
> >       > dev at openvswitch.org <mailto:dev at openvswitch.org> >
> >       > Subject: Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when
> > no
> >       > configuration update
> >       >
> >       >
> >       >
> >       > On Fri, Jul 31, 2020 at 4:14 PM Tony Liu <
> tonyliu0592 at hotmail.com
> > <mailto:tonyliu0592 at hotmail.com>
> >       > <mailto:tonyliu0592 at hotmail.com
> > <mailto:tonyliu0592 at hotmail.com> > > wrote:
> >       >
> >       >
> >       >       Hi,
> >       >
> >       >       I see the active ovn-northd takes much CPU (30% - 100%)
> > when there
> >       > is no
> >       >       configuration from OpenStack, nothing happening on all
> > chassis
> >       > nodes either.
> >       >
> >       >       Is this expected? What is it busy with?
> >       >
> >       >
> >       >
> >       >
> >       > Yes, this is expected. It is due to the OVSDB probe between ovn-
> > northd
> >       > and NB/SB OVSDB servers, which is used to detect the OVSDB
> > connection
> >       > failure.
> >       > Usually this is not a concern (unlike the probe with a large
> > number of
> >       > ovn-controller clients), because ovn-northd is a centralized
> > component
> >       > and the CPU cost when there is no configuration change doesn't
> > matter
> >       > that much. However, if it is a concern, the probe interval
> > (default 5
> >       > sec) can be changed.
> >       > If you change, remember to change on both server side and client
> > side.
> >       > For client side (ovn-northd), it is configured in the NB DB's
> > NB_Global
> >       > table's options:northd_probe_interval. See man page of ovn-nb(5).
> >       > For server side (NB and SB), it is configured in the NB and SB
> > DB's
> >       > Connection table's inactivity_probe column.
> >       >
> >       > Thanks,
> >       > Han
> >       >
> >       >
> >       >
> >       >       ========
> >       >       2020-07-31T23:08:09.511Z|04267|poll_loop|DBG|wakeup due to
> > [POLLIN]
> >       > on fd 8 (10.6.20.84:44358 <http://10.6.20.84:44358>
> > <http://10.6.20.84:44358> <->10.6.20.84:6641 <http://10.6.20.84:6641>
> >       > <http://10.6.20.84:6641> ) at lib/stream-fd.c:157 (68% CPU
> usage)
> >       >       2020-07-
> > 31T23:08:09.512Z|04268|jsonrpc|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641>
> >       > <http://10.6.20.84:6641> : received request, method="echo",
> > params=[],
> >       > id="echo"
> >       >       2020-07-
> > 31T23:08:09.512Z|04269|jsonrpc|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641>
> >       > <http://10.6.20.84:6641> : send reply, result=[], id="echo"
> >       >       2020-07-31T23:08:12.777Z|04270|poll_loop|DBG|wakeup due to
> > [POLLIN]
> >       > on fd 9 (10.6.20.84:49158 <http://10.6.20.84:49158>
> > <http://10.6.20.84:49158> <->10.6.20.85:6642 <http://10.6.20.85:6642>
> >       > <http://10.6.20.85:6642> ) at lib/stream-fd.c:157 (34% CPU
> usage)
> >       >       2020-07-
> > 31T23:08:12.777Z|04271|reconnect|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642>
> >       > <http://10.6.20.85:6642> : idle 5002 ms, sending inactivity
> probe
> >       >       2020-07-
> > 31T23:08:12.777Z|04272|reconnect|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642>
> >       > <http://10.6.20.85:6642> : entering IDLE
> >       >       2020-07-
> > 31T23:08:12.777Z|04273|jsonrpc|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642>
> >       > <http://10.6.20.85:6642> : send request, method="echo",
> params=[],
> >       > id="echo"
> >       >       2020-07-
> > 31T23:08:12.777Z|04274|jsonrpc|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642>
> >       > <http://10.6.20.85:6642> : received request, method="echo",
> > params=[],
> >       > id="echo"
> >       >       2020-07-
> > 31T23:08:12.777Z|04275|reconnect|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642>
> >       > <http://10.6.20.85:6642> : entering ACTIVE
> >       >       2020-07-
> > 31T23:08:12.777Z|04276|jsonrpc|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642>
> >       > <http://10.6.20.85:6642> : send reply, result=[], id="echo"
> >       >       2020-07-31T23:08:13.635Z|04277|poll_loop|DBG|wakeup due to
> > [POLLIN]
> >       > on fd 9 (10.6.20.84:49158 <http://10.6.20.84:49158>
> > <http://10.6.20.84:49158> <->10.6.20.85:6642 <http://10.6.20.85:6642>
> >       > <http://10.6.20.85:6642> ) at lib/stream-fd.c:157 (34% CPU
> usage)
> >       >       2020-07-
> > 31T23:08:13.635Z|04278|jsonrpc|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642>
> >       > <http://10.6.20.85:6642> : received reply, result=[], id="echo"
> >       >       2020-07-31T23:08:14.480Z|04279|hmap|DBG|Dropped 129 log
> > messages in
> >       > last 5 seconds (most recently, 0 seconds ago) due to excessive
> > rate
> >       >       2020-07-31T23:08:14.480Z|04280|hmap|DBG|lib/shash.c:112: 2
> > buckets
> >       > with 6+ nodes, including 2 buckets with 6 nodes (32 nodes total
> > across
> >       > 32 buckets)
> >       >       2020-07-31T23:08:14.513Z|04281|poll_loop|DBG|wakeup due to
> > 27-ms
> >       > timeout at lib/reconnect.c:643 (34% CPU usage)
> >       >       2020-07-
> > 31T23:08:14.513Z|04282|reconnect|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641>
> >       > <http://10.6.20.84:6641> : idle 5001 ms, sending inactivity
> probe
> >       >       2020-07-
> > 31T23:08:14.513Z|04283|reconnect|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641>
> >       > <http://10.6.20.84:6641> : entering IDLE
> >       >       2020-07-
> > 31T23:08:14.513Z|04284|jsonrpc|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641>
> >       > <http://10.6.20.84:6641> : send request, method="echo",
> params=[],
> >       > id="echo"
> >       >       2020-07-31T23:08:15.370Z|04285|poll_loop|DBG|wakeup due to
> > [POLLIN]
> >       > on fd 8 (10.6.20.84:44358 <http://10.6.20.84:44358>
> > <http://10.6.20.84:44358> <->10.6.20.84:6641 <http://10.6.20.84:6641>
> >       > <http://10.6.20.84:6641> ) at lib/stream-fd.c:157 (34% CPU
> usage)
> >       >       2020-07-
> > 31T23:08:15.370Z|04286|jsonrpc|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641>
> >       > <http://10.6.20.84:6641> : received request, method="echo",
> > params=[],
> >       > id="echo"
> >       >       2020-07-
> > 31T23:08:15.370Z|04287|reconnect|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641>
> >       > <http://10.6.20.84:6641> : entering ACTIVE
> >       >       2020-07-
> > 31T23:08:15.370Z|04288|jsonrpc|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641>
> >       > <http://10.6.20.84:6641> : send reply, result=[], id="echo"
> >       >       2020-07-31T23:08:16.236Z|04289|poll_loop|DBG|wakeup due to
> > 0-ms
> >       > timeout at tcp:10.6.20.84:6641 <http://10.6.20.84:6641>
> > <http://10.6.20.84:6641>  (100% CPU usage)
> >       >       2020-07-
> > 31T23:08:16.236Z|04290|jsonrpc|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641>
> >       > <http://10.6.20.84:6641> : received reply, result=[], id="echo"
> >       >       2020-07-31T23:08:17.778Z|04291|poll_loop|DBG|wakeup due to
> > [POLLIN]
> >       > on fd 9 (10.6.20.84:49158 <http://10.6.20.84:49158>
> > <http://10.6.20.84:49158> <->10.6.20.85:6642 <http://10.6.20.85:6642>
> >       > <http://10.6.20.85:6642> ) at lib/stream-fd.c:157 (100% CPU
> usage)
> >       >       2020-07-
> > 31T23:08:17.778Z|04292|jsonrpc|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642>
> >       > <http://10.6.20.85:6642> : received request, method="echo",
> > params=[],
> >       > id="echo"
> >       >       2020-07-
> > 31T23:08:17.778Z|04293|jsonrpc|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642>
> >       > <http://10.6.20.85:6642> : send reply, result=[], id="echo"
> >       >       2020-07-31T23:08:20.372Z|04294|poll_loop|DBG|wakeup due to
> > [POLLIN]
> >       > on fd 8 (10.6.20.84:44358 <http://10.6.20.84:44358>
> > <http://10.6.20.84:44358> <->10.6.20.84:6641 <http://10.6.20.84:6641>
> >       > <http://10.6.20.84:6641> ) at lib/stream-fd.c:157 (41% CPU
> usage)
> >       >       2020-07-
> > 31T23:08:20.372Z|04295|reconnect|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641>
> >       > <http://10.6.20.84:6641> : idle 5002 ms, sending inactivity
> probe
> >       >       2020-07-
> > 31T23:08:20.372Z|04296|reconnect|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641>
> >       > <http://10.6.20.84:6641> : entering IDLE
> >       >       2020-07-
> > 31T23:08:20.372Z|04297|jsonrpc|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641>
> >       > <http://10.6.20.84:6641> : send request, method="echo",
> params=[],
> >       > id="echo"
> >       >       2020-07-
> > 31T23:08:20.372Z|04298|jsonrpc|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641>
> >       > <http://10.6.20.84:6641> : received request, method="echo",
> > params=[],
> >       > id="echo"
> >       >       2020-07-
> > 31T23:08:20.372Z|04299|reconnect|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641>
> >       > <http://10.6.20.84:6641> : entering ACTIVE
> >       >       2020-07-
> > 31T23:08:20.372Z|04300|jsonrpc|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641>
> >       > <http://10.6.20.84:6641> : send reply, result=[], id="echo"
> >       >       2020-07-31T23:08:20.376Z|04301|hmap|DBG|Dropped 181 log
> > messages in
> >       > last 6 seconds (most recently, 1 seconds ago) due to excessive
> > rate
> >       >       2020-07-31T23:08:20.376Z|04302|hmap|DBG|northd/ovn-
> > northd.c:595: 2
> >       > buckets with 6+ nodes, including 2 buckets with 6 nodes (256
> > nodes total
> >       > across 256 buckets)
> >       >       2020-07-31T23:08:21.222Z|04303|poll_loop|DBG|wakeup due to
> > [POLLIN]
> >       > on fd 8 (10.6.20.84:44358 <http://10.6.20.84:44358>
> > <http://10.6.20.84:44358> <->10.6.20.84:6641 <http://10.6.20.84:6641>
> >       > <http://10.6.20.84:6641> ) at lib/stream-fd.c:157 (41% CPU
> usage)
> >       >       2020-07-
> > 31T23:08:21.223Z|04304|jsonrpc|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641>
> >       > <http://10.6.20.84:6641> : received reply, result=[], id="echo"
> >       >       2020-07-31T23:08:22.779Z|04305|poll_loop|DBG|wakeup due to
> > 706-ms
> >       > timeout at lib/reconnect.c:643 (41% CPU usage)
> >       >       2020-07-
> > 31T23:08:22.779Z|04306|reconnect|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642>
> >       > <http://10.6.20.85:6642> : idle 5001 ms, sending inactivity
> probe
> >       >       2020-07-
> > 31T23:08:22.779Z|04307|reconnect|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642>
> >       > <http://10.6.20.85:6642> : entering IDLE
> >       >       2020-07-
> > 31T23:08:22.779Z|04308|jsonrpc|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642>
> >       > <http://10.6.20.85:6642> : send request, method="echo",
> params=[],
> >       > id="echo"
> >       >       2020-07-31T23:08:23.632Z|04309|poll_loop|DBG|wakeup due to
> > [POLLIN]
> >       > on fd 9 (10.6.20.84:49158 <http://10.6.20.84:49158>
> > <http://10.6.20.84:49158> <->10.6.20.85:6642 <http://10.6.20.85:6642>
> >       > <http://10.6.20.85:6642> ) at lib/stream-fd.c:157 (79% CPU
> usage)
> >       >       2020-07-
> > 31T23:08:23.632Z|04310|jsonrpc|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642>
> >       > <http://10.6.20.85:6642> : received request, method="echo",
> > params=[],
> >       > id="echo"
> >       >       2020-07-
> > 31T23:08:23.632Z|04311|reconnect|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642>
> >       > <http://10.6.20.85:6642> : entering ACTIVE
> >       >       2020-07-
> > 31T23:08:23.632Z|04312|jsonrpc|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642>
> >       > <http://10.6.20.85:6642> : send reply, result=[], id="echo"
> >       >       2020-07-31T23:08:24.481Z|04313|poll_loop|DBG|wakeup due to
> > 0-ms
> >       > timeout at tcp:10.6.20.85:6642 <http://10.6.20.85:6642>
> > <http://10.6.20.85:6642>  (79% CPU usage)
> >       >       2020-07-
> > 31T23:08:24.481Z|04314|jsonrpc|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642>
> >       > <http://10.6.20.85:6642> : received reply, result=[], id="echo"
> >       >       2020-07-31T23:08:25.372Z|04315|poll_loop|DBG|wakeup due to
> > [POLLIN]
> >       > on fd 8 (10.6.20.84:44358 <http://10.6.20.84:44358>
> > <http://10.6.20.84:44358> <->10.6.20.84:6641 <http://10.6.20.84:6641>
> >       > <http://10.6.20.84:6641> ) at lib/stream-fd.c:157 (79% CPU
> usage)
> >       >       2020-07-
> > 31T23:08:25.372Z|04316|jsonrpc|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641>
> >       > <http://10.6.20.84:6641> : received request, method="echo",
> > params=[],
> >       > id="echo"
> >       >       2020-07-
> > 31T23:08:25.372Z|04317|jsonrpc|DBG|tcp:10.6.20.84:6641
> > <http://10.6.20.84:6641>
> >       > <http://10.6.20.84:6641> : send reply, result=[], id="echo"
> >       >       2020-07-31T23:08:28.634Z|04318|poll_loop|DBG|wakeup due to
> > [POLLIN]
> >       > on fd 9 (10.6.20.84:49158 <http://10.6.20.84:49158>
> > <http://10.6.20.84:49158> <->10.6.20.85:6642 <http://10.6.20.85:6642>
> >       > <http://10.6.20.85:6642> ) at lib/stream-fd.c:157 (50% CPU
> usage)
> >       >       2020-07-
> > 31T23:08:28.634Z|04319|reconnect|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642>
> >       > <http://10.6.20.85:6642> : idle 5002 ms, sending inactivity
> probe
> >       >       2020-07-
> > 31T23:08:28.634Z|04320|reconnect|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642>
> >       > <http://10.6.20.85:6642> : entering IDLE
> >       >       2020-07-
> > 31T23:08:28.634Z|04321|jsonrpc|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642>
> >       > <http://10.6.20.85:6642> : send request, method="echo",
> params=[],
> >       > id="echo"
> >       >       2020-07-
> > 31T23:08:28.634Z|04322|jsonrpc|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642>
> >       > <http://10.6.20.85:6642> : received request, method="echo",
> > params=[],
> >       > id="echo"
> >       >       2020-07-
> > 31T23:08:28.634Z|04323|reconnect|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642>
> >       > <http://10.6.20.85:6642> : entering ACTIVE
> >       >       2020-07-
> > 31T23:08:28.634Z|04324|jsonrpc|DBG|tcp:10.6.20.85:6642
> > <http://10.6.20.85:6642>
> >       > <http://10.6.20.85:6642> : send reply, result=[], id="echo"
> >       >       2020-07-31T23:08:28.638Z|04325|hmap|DBG|Dropped 266 log
> > messages in
> >       > last 9 seconds (most recently, 3 seconds ago) due to excessive
> > rate
> >       >       2020-07-31T23:08:28.638Z|04326|hmap|DBG|northd/ovn-
> > northd.c:595: 2
> >       > buckets with 6+ nodes, including 2 buckets with 6 nodes (256
> > nodes total
> >       > across 256 buckets)
> >       >       ========
> >       >
> >       >       Thanks!
> >       >
> >       >       Tony
> >       >
> >       >       _______________________________________________
> >       >       discuss mailing list
> >       >       discuss at openvswitch.org <mailto:discuss at openvswitch.org>
> > <mailto:discuss at openvswitch.org <mailto:discuss at openvswitch.org> >
> >       >       https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> >       >
> >
> >
>
> _______________________________________________
> discuss mailing list
> discuss at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
>


More information about the dev mailing list