[ovs-dev] [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no changes in sb-db
Han Zhou
zhouhan at gmail.com
Fri Aug 7 19:38:35 UTC 2020
On Fri, Aug 7, 2020 at 12:35 PM Tony Liu <tonyliu0592 at hotmail.com> wrote:
> Inline...
>
> Thanks!
>
> Tony
> > -----Original Message-----
> > From: Han Zhou <zhouhan at gmail.com>
> > Sent: Friday, August 7, 2020 12:29 PM
> > To: Tony Liu <tonyliu0592 at hotmail.com>
> > Cc: ovs-discuss <ovs-discuss at openvswitch.org>; ovs-dev <ovs-
> > dev at openvswitch.org>
> > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no
> > changes in sb-db
> >
> >
> >
> > On Fri, Aug 7, 2020 at 12:19 PM Tony Liu <tonyliu0592 at hotmail.com
> > <mailto:tonyliu0592 at hotmail.com> > wrote:
> >
> >
> > ovn-controller is using UNIX socket connecting to local ovsdb-
> > server.
> >
> > From the log you were showing, you were using tcp:127.0.0.1:6640
>
> Sorry, what I meant was, given your advice, I just made the change for
> ovn-controller to use UNIX socket.
>
> Oh, I see, no worries.
> <http://127.0.0.1:6640> to connect the local ovsdb.
> > > 2020-08-07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640
> > > <http://127.0.0.1:6640> <http://127.0.0.1:6640> : connection dropped
> > > (Broken pipe)
> >
> >
> > Inactivity probe doesn't seem to be the cause of high cpu usage.
> >
> > The wakeup on connection to sb-db is always followed by a
> > "unreasonably
> > long" warning. I guess the pollin event loop is stuck for too long,
> > like
> > 10s as below.
> > ========
> > 2020-08-07T18:46:49.301Z|00296|poll_loop|INFO|wakeup due to
> [POLLIN]
> > on fd 19 (10.6.20.91:60712 <http://10.6.20.91:60712> <->10.6.20.86:6642
> > <http://10.6.20.86:6642> ) at lib/stream-fd.c:157 (99% CPU usage)
> > 2020-08-07T18:46:59.460Z|00297|timeval|WARN|Unreasonably long
> > 10153ms poll interval (10075ms user, 1ms system)
> > ========
> >
> > Could that stuck loop be the cause of high cpu usage?
> > What is it polling in?
> > Why is it stuck, waiting for message from sb-db?
> > Isn't it supposed to release the cpu while waiting?
> >
> >
> >
> > This log means there are messages received from 10.6.20.86:6642
> > <http://10.6.20.86:6642> (the SB DB). Is there SB change? The CPU is
> > spent on handling the SB change. Some type of SB changes are not handled
> > incrementally.
>
> SB update is driven by ovn-northd in case anything changed in NB,
> and ovn-controller in case anything changed on chassis. No, there
> is nothing changed in NB, neither chassis.
>
> Should I bump logging level up to dbg? Is that going to show me
> what messages ovn-controller is handling?
>
> Yes, debug log should show the details.
> >
> > Thanks!
> >
> > Tony
> >
> > > -----Original Message-----
> > > From: Han Zhou <zhouhan at gmail.com <mailto:zhouhan at gmail.com> >
> > > Sent: Friday, August 7, 2020 10:32 AM
> > > To: Tony Liu <tonyliu0592 at hotmail.com
> > <mailto:tonyliu0592 at hotmail.com> >
> > > Cc: ovs-discuss <ovs-discuss at openvswitch.org <mailto:ovs-
> > discuss at openvswitch.org> >; ovs-dev <ovs-
> > > dev at openvswitch.org <mailto:dev at openvswitch.org> >
> > > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu
> > while no
> > > changes in sb-db
> > >
> > >
> > >
> > > On Fri, Aug 7, 2020 at 10:05 AM Tony Liu <
> tonyliu0592 at hotmail.com
> > <mailto:tonyliu0592 at hotmail.com>
> > > <mailto:tonyliu0592 at hotmail.com
> > <mailto:tonyliu0592 at hotmail.com> > > wrote:
> > >
> > >
> > > Hi,
> > >
> > > Here are some logging snippets from ovn-controller.
> > > ========
> > > 2020-08-07T16:38:04.020Z|29250|timeval|WARN|Unreasonably
> > long
> > > 8954ms poll interval (8895ms user, 0ms system)
> > > ========
> > > What's that mean? Is it harmless?
> > >
> > > ========
> > > 2020-08-07T16:38:04.021Z|29251|timeval|WARN|context
> > switches: 0
> > > voluntary, 6 involuntary
> > > 2020-08-07T16:38:04.022Z|29252|poll_loop|INFO|wakeup due to
> > [POLLIN]
> > > on fd 19 (10.6.20.91:60398 <http://10.6.20.91:60398>
> > <http://10.6.20.91:60398> <->10.6.20.86:6642 <http://10.6.20.86:6642>
> > > <http://10.6.20.86:6642> ) at lib/stream-fd.c:157 (99% CPU
> usage)
> > > ========
> > > Is this wakeup caused by changes in sb-db?
> > > Why is ovn-controller so busy?
> > >
> > > ========
> > > 2020-08-
> > 07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640
> > <http://127.0.0.1:6640>
> > > <http://127.0.0.1:6640> : connection dropped (Broken pipe)
> > > ========
> > > Connection to local ovsdb-server is dropped.
> > > Is this caused by the timeout of inactivity probe?
> > >
> > > ========
> > > 2020-08-07T16:38:04.035Z|29254|poll_loop|INFO|wakeup due to
> > [POLLIN]
> > > on fd 20 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-
> > fd.c:157
> > > (99% CPU usage)
> > > ========
> > > What causes this wakeup?
> > >
> > > ========
> > > 2020-08-07T16:38:04.048Z|29255|poll_loop|INFO|wakeup due to
> > 0-ms
> > > timeout at lib/ovsdb-idl.c:5391 (99% CPU usage)
> > > ========
> > > What's this 0-ms wakeup mean?
> > >
> > > ========
> > > 2020-08-07T16:38:05.022Z|29256|poll_loop|INFO|wakeup due to
> > 962-ms
> > > timeout at lib/reconnect.c:643 (99% CPU usage)
> > > 2020-08-
> > 07T16:38:05.023Z|29257|reconnect|INFO|tcp:127.0.0.1:6640
> > <http://127.0.0.1:6640>
> > > <http://127.0.0.1:6640> : connecting...
> > > 2020-08-07T16:38:05.041Z|29258|poll_loop|INFO|wakeup due to
> > > [POLLOUT] on fd 14 (127.0.0.1:51478 <http://127.0.0.1:51478>
> > <http://127.0.0.1:51478> <-
> > > >127.0.0.1:6640 <http://127.0.0.1:6640> <http://127.0.0.1:6640>
> )
> > at lib/stream-fd.c:153 (99%
> > > CPU usage)
> > > 2020-08-
> > 07T16:38:05.041Z|29259|reconnect|INFO|tcp:127.0.0.1:6640
> > <http://127.0.0.1:6640>
> > > <http://127.0.0.1:6640> : connected
> > > ================
> > > Retry to connect to local ovsdb-server. A pollout event is
> > > triggered
> > > right after connection is established. What's poolout?
> > >
> > > ovn-controller is taking 100% CPU now, and there is no
> > changes in
> > > sb-db (not busy). It seems that it's busy with local ovsdb-
> > server
> > > or vswitchd. I'd like to understand why ovn-controller is
> > so busy?
> > > All inactivity probe intervals are set to 30s.
> > >
> > >
> > >
> > >
> > > Is there change from the local ovsdb? You can enable dbg log to
> > see what
> > > is happening.
> > > For the local ovsdb probe, I have mentioned in the other thread
> > that
> > > UNIX socket is recommended (instead of tcp 127.0.0.1). Using UNIX
> > socket
> > > disables probe by default.
> > >
> > > Thanks,
> > > Han
> >
> >
>
>
More information about the dev
mailing list