[ovs-discuss] High CPU Usage on NETLINK_ROUTE Poll
Jason Huang @ Huajuan
akw at betaidc.com
Sat Jul 21 22:15:56 UTC 2018
Hi,
We've found OVS used 100% CPU on ovs-switchd, and we checked the log,
it seems there are some issue on NETLINK_ROUTE.
# OVS Config
Bridge "vmbr0"
Port "vmbr0"
Interface "vmbr0"
type: internal
Port "vlan3005"
tag: 3005
Interface "vlan3005"
type: internal
Port "vlan30"
tag: 30
Interface "vlan30"
type: internal
Port "bond0"
Interface "enp1s0f0"
Interface "enp1s0f1"
Port "vlan3702"
tag: 3702
Interface "vlan3702"
type: internal
Port "vlan3502"
tag: 3502
Interface "vlan3502"
type: internal
Bridge "vmbr1"
Port "vmbr1"
Interface "vmbr1"
type: internal
ovs_version: "2.9.2"
# Log
2018-07-21T22:12:03.941Z|05606|netlink_notifier|WARN|netlink receive buffer overflowed
2018-07-21T22:12:04.995Z|01446|ovs_rcu(urcu6)|WARN|blocked 1000 ms waiting for main to quiesce
2018-07-21T22:12:05.995Z|01447|ovs_rcu(urcu6)|WARN|blocked 2000 ms waiting for main to quiesce
2018-07-21T22:12:07.711Z|05607|timeval|WARN|Unreasonably long 3787ms poll interval (1502ms user, 1706ms system)
2018-07-21T22:12:07.711Z|05608|timeval|WARN|context switches: 2824 voluntary, 15 involuntary
2018-07-21T22:12:07.711Z|05609|poll_loop|INFO|wakeup due to [POLLIN] on fd 12 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (86% CPU usage)
2018-07-21T22:12:08.798Z|01448|ovs_rcu(urcu6)|WARN|blocked 1000 ms waiting for main to quiesce
2018-07-21T22:12:09.798Z|01449|ovs_rcu(urcu6)|WARN|blocked 2000 ms waiting for main to quiesce
2018-07-21T22:12:11.068Z|05610|timeval|WARN|Unreasonably long 3357ms poll interval (1715ms user, 1591ms system)
2018-07-21T22:12:11.068Z|05611|timeval|WARN|context switches: 3770 voluntary, 9 involuntary
2018-07-21T22:12:12.218Z|01450|ovs_rcu(urcu6)|WARN|blocked 1000 ms waiting for main to quiesce
2018-07-21T22:12:13.218Z|01451|ovs_rcu(urcu6)|WARN|blocked 2000 ms waiting for main to quiesce
2018-07-21T22:12:14.475Z|05612|timeval|WARN|Unreasonably long 3407ms poll interval (1609ms user, 1718ms system)
2018-07-21T22:12:14.475Z|05613|timeval|WARN|faults: 2095 minor, 0 major
2018-07-21T22:12:14.475Z|05614|timeval|WARN|context switches: 3485 voluntary, 7 involuntary
2018-07-21T22:12:14.475Z|05615|poll_loop|INFO|Dropped 2 log messages in last 7 seconds (most recently, 3 seconds ago) due to excessive rate
2018-07-21T22:12:14.475Z|05616|poll_loop|INFO|wakeup due to [POLLIN] on fd 14 (NETLINK_ROUTE<->NETLINK_ROUTE) at lib/netlink-socket.c:1331 (97% CPU usage)
2018-07-21T22:12:15.706Z|01452|ovs_rcu(urcu6)|WARN|blocked 1000 ms waiting for main to quiesce
2018-07-21T22:12:16.707Z|01453|ovs_rcu(urcu6)|WARN|blocked 2000 ms waiting for main to quiesce
2018-07-21T22:12:18.004Z|05617|timeval|WARN|Unreasonably long 3529ms poll interval (1763ms user, 1663ms system)
2018-07-21T22:12:18.004Z|05618|timeval|WARN|context switches: 3232 voluntary, 9 involuntary
It takes a long time to wait NETLINK_ROUTE response.
We've received full internet routes via BGP and import to the kernel.
It seems there are performance issue when kernel has 700k prefixes route.
----
Jason Huang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20180722/8f6bebc4/attachment.html>
More information about the discuss
mailing list