[ovs-dev] [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no changes in sb-db

Tony Liu tonyliu0592 at hotmail.com
Fri Aug 7 19:19:28 UTC 2020


ovn-controller is using UNIX socket connecting to local ovsdb-server.
Inactivity probe doesn't seem to be the cause of high cpu usage.

The wakeup on connection to sb-db is always followed by a "unreasonably
long" warning. I guess the pollin event loop is stuck for too long, like
10s as below.
========
2020-08-07T18:46:49.301Z|00296|poll_loop|INFO|wakeup due to [POLLIN] on fd 19 (10.6.20.91:60712<->10.6.20.86:6642) at lib/stream-fd.c:157 (99% CPU usage)
2020-08-07T18:46:59.460Z|00297|timeval|WARN|Unreasonably long 10153ms poll interval (10075ms user, 1ms system)
========

Could that stuck loop be the cause of high cpu usage?
What is it polling in?
Why is it stuck, waiting for message from sb-db?
Isn't it supposed to release the cpu while waiting?


Thanks!

Tony

> -----Original Message-----
> From: Han Zhou <zhouhan at gmail.com>
> Sent: Friday, August 7, 2020 10:32 AM
> To: Tony Liu <tonyliu0592 at hotmail.com>
> Cc: ovs-discuss <ovs-discuss at openvswitch.org>; ovs-dev <ovs-
> dev at openvswitch.org>
> Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no
> changes in sb-db
> 
> 
> 
> On Fri, Aug 7, 2020 at 10:05 AM Tony Liu <tonyliu0592 at hotmail.com
> <mailto:tonyliu0592 at hotmail.com> > wrote:
> 
> 
> 	Hi,
> 
> 	Here are some logging snippets from ovn-controller.
> 	========
> 	2020-08-07T16:38:04.020Z|29250|timeval|WARN|Unreasonably long
> 8954ms poll interval (8895ms user, 0ms system)
> 	========
> 	What's that mean? Is it harmless?
> 
> 	========
> 	2020-08-07T16:38:04.021Z|29251|timeval|WARN|context switches: 0
> voluntary, 6 involuntary
> 	2020-08-07T16:38:04.022Z|29252|poll_loop|INFO|wakeup due to [POLLIN]
> on fd 19 (10.6.20.91:60398 <http://10.6.20.91:60398> <->10.6.20.86:6642
> <http://10.6.20.86:6642> ) at lib/stream-fd.c:157 (99% CPU usage)
> 	========
> 	Is this wakeup caused by changes in sb-db?
> 	Why is ovn-controller so busy?
> 
> 	========
> 	2020-08-07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640
> <http://127.0.0.1:6640> : connection dropped (Broken pipe)
> 	========
> 	Connection to local ovsdb-server is dropped.
> 	Is this caused by the timeout of inactivity probe?
> 
> 	========
> 	2020-08-07T16:38:04.035Z|29254|poll_loop|INFO|wakeup due to [POLLIN]
> on fd 20 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157
> (99% CPU usage)
> 	========
> 	What causes this wakeup?
> 
> 	========
> 	2020-08-07T16:38:04.048Z|29255|poll_loop|INFO|wakeup due to 0-ms
> timeout at lib/ovsdb-idl.c:5391 (99% CPU usage)
> 	========
> 	What's this 0-ms wakeup mean?
> 
> 	========
> 	2020-08-07T16:38:05.022Z|29256|poll_loop|INFO|wakeup due to 962-ms
> timeout at lib/reconnect.c:643 (99% CPU usage)
> 	2020-08-07T16:38:05.023Z|29257|reconnect|INFO|tcp:127.0.0.1:6640
> <http://127.0.0.1:6640> : connecting...
> 	2020-08-07T16:38:05.041Z|29258|poll_loop|INFO|wakeup due to
> [POLLOUT] on fd 14 (127.0.0.1:51478 <http://127.0.0.1:51478> <-
> >127.0.0.1:6640 <http://127.0.0.1:6640> ) at lib/stream-fd.c:153 (99%
> CPU usage)
> 	2020-08-07T16:38:05.041Z|29259|reconnect|INFO|tcp:127.0.0.1:6640
> <http://127.0.0.1:6640> : connected
> 	================
> 	Retry to connect to local ovsdb-server. A pollout event is
> triggered
> 	right after connection is established. What's poolout?
> 
> 	ovn-controller is taking 100% CPU now, and there is no changes in
> 	sb-db (not busy). It seems that it's busy with local ovsdb-server
> 	or vswitchd. I'd like to understand why ovn-controller is so busy?
> 	All inactivity probe intervals are set to 30s.
> 
> 
> 
> 
> Is there change from the local ovsdb? You can enable dbg log to see what
> is happening.
> For the local ovsdb probe, I have mentioned in the other thread that
> UNIX socket is recommended (instead of tcp 127.0.0.1). Using UNIX socket
> disables probe by default.
> 
> Thanks,
> Han



More information about the dev mailing list