[ovs-discuss] High CPU Usage on NETLINK_ROUTE Poll

Jason Huang @ Huajuan akw at betaidc.com
Sat Jul 21 22:15:56 UTC 2018


Hi,


We've found OVS used 100% CPU on ovs-switchd, and we checked the log,
it seems there are some issue on NETLINK_ROUTE.


# OVS Config
    Bridge "vmbr0"
        Port "vmbr0"
            Interface "vmbr0"
                type: internal
        Port "vlan3005"
            tag: 3005
            Interface "vlan3005"
                type: internal
        Port "vlan30"
            tag: 30
            Interface "vlan30"
                type: internal
        Port "bond0"
            Interface "enp1s0f0"
            Interface "enp1s0f1"
        Port "vlan3702"
            tag: 3702
            Interface "vlan3702"
                type: internal
        Port "vlan3502"
            tag: 3502
            Interface "vlan3502"
                type: internal
    Bridge "vmbr1"
        Port "vmbr1"
            Interface "vmbr1"
                type: internal
    ovs_version: "2.9.2"


# Log
2018-07-21T22:12:03.941Z|05606|netlink_notifier|WARN|netlink receive buffer overflowed
2018-07-21T22:12:04.995Z|01446|ovs_rcu(urcu6)|WARN|blocked 1000 ms waiting for main to quiesce
2018-07-21T22:12:05.995Z|01447|ovs_rcu(urcu6)|WARN|blocked 2000 ms waiting for main to quiesce
2018-07-21T22:12:07.711Z|05607|timeval|WARN|Unreasonably long 3787ms poll interval (1502ms user, 1706ms system)
2018-07-21T22:12:07.711Z|05608|timeval|WARN|context switches: 2824 voluntary, 15 involuntary
2018-07-21T22:12:07.711Z|05609|poll_loop|INFO|wakeup due to [POLLIN] on fd 12 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (86% CPU usage)
2018-07-21T22:12:08.798Z|01448|ovs_rcu(urcu6)|WARN|blocked 1000 ms waiting for main to quiesce
2018-07-21T22:12:09.798Z|01449|ovs_rcu(urcu6)|WARN|blocked 2000 ms waiting for main to quiesce
2018-07-21T22:12:11.068Z|05610|timeval|WARN|Unreasonably long 3357ms poll interval (1715ms user, 1591ms system)
2018-07-21T22:12:11.068Z|05611|timeval|WARN|context switches: 3770 voluntary, 9 involuntary
2018-07-21T22:12:12.218Z|01450|ovs_rcu(urcu6)|WARN|blocked 1000 ms waiting for main to quiesce
2018-07-21T22:12:13.218Z|01451|ovs_rcu(urcu6)|WARN|blocked 2000 ms waiting for main to quiesce
2018-07-21T22:12:14.475Z|05612|timeval|WARN|Unreasonably long 3407ms poll interval (1609ms user, 1718ms system)
2018-07-21T22:12:14.475Z|05613|timeval|WARN|faults: 2095 minor, 0 major
2018-07-21T22:12:14.475Z|05614|timeval|WARN|context switches: 3485 voluntary, 7 involuntary
2018-07-21T22:12:14.475Z|05615|poll_loop|INFO|Dropped 2 log messages in last 7 seconds (most recently, 3 seconds ago) due to excessive rate
2018-07-21T22:12:14.475Z|05616|poll_loop|INFO|wakeup due to [POLLIN] on fd 14 (NETLINK_ROUTE<->NETLINK_ROUTE) at lib/netlink-socket.c:1331 (97% CPU usage)
2018-07-21T22:12:15.706Z|01452|ovs_rcu(urcu6)|WARN|blocked 1000 ms waiting for main to quiesce
2018-07-21T22:12:16.707Z|01453|ovs_rcu(urcu6)|WARN|blocked 2000 ms waiting for main to quiesce
2018-07-21T22:12:18.004Z|05617|timeval|WARN|Unreasonably long 3529ms poll interval (1763ms user, 1663ms system)
2018-07-21T22:12:18.004Z|05618|timeval|WARN|context switches: 3232 voluntary, 9 involuntary


It takes a long time to wait NETLINK_ROUTE response.
We've received full internet routes via BGP and import to the kernel.


It seems there are performance issue when kernel has 700k prefixes route.


----
Jason Huang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20180722/8f6bebc4/attachment.html>


More information about the discuss mailing list