[ovs-discuss] ovn-controller consuming lots of CPU

Kevin Lin kevin at kelda.io
Tue Dec 12 21:26:33 UTC 2017


Hi again,

We’re trying to scale up our OVN deployment and we’re seeing some worrying log messages. 
The topology is 32 containers connected to another 32 containers on 10 different ports. This is running on 17 machines (one machine runs ovn-northd and ovsdb-server, the other 16 run ovn-controller, ovs-vswitchd, and ovsdb-server). We’re using an address set for the source group, but not the destination group. We’re also creating a different ACL for each port. So the ACLs look like:
One address set for { container1, container2, … container32 }
addressSet -> container1 on port 80
addressSet -> container1 on port 81
…
addressSet -> container1 on port 90
addressSet -> container2 on port 80
…
addressSet -> container32 on port 90

The ovn-controller log:

2017-12-12T20:14:49Z|11878|timeval|WARN|Unreasonably long 1843ms poll interval (1840ms user, 0ms system)
2017-12-12T20:14:49Z|11879|timeval|WARN|disk: 0 reads, 16 writes
2017-12-12T20:14:49Z|11880|timeval|WARN|context switches: 0 voluntary, 21 involuntary
2017-12-12T20:14:49Z|11881|poll_loop|DBG|wakeup due to [POLLIN] on fd 9 (172.31.11.193:48460<->172.31.2.181:6640) at lib/stream-fd.c:157 (36% CPU usage)
2017-12-12T20:14:49Z|11882|poll_loop|DBG|wakeup due to [POLLIN] on fd 12 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (36% CPU usage)
2017-12-12T20:14:49Z|11883|jsonrpc|DBG|tcp:172.31.2.181:6640: received reply, result=[], id="echo"
2017-12-12T20:14:49Z|11884|netlink_socket|DBG|nl_sock_transact_multiple__ (Success): nl(len:36, type=38(family-defined), flags=9[REQUEST][ECHO], seq=b11, pid=2268452876
2017-12-12T20:14:49Z|11885|netlink_socket|DBG|nl_sock_recv__ (Success): nl(len:136, type=36(family-defined), flags=0, seq=b11, pid=2268452876
2017-12-12T20:14:49Z|11886|vconn|DBG|unix:/var/run/openvswitch/br-int.mgmt: received: OFPT_ECHO_REQUEST (OF1.3) (xid=0x0): 0 bytes of payload
2017-12-12T20:14:49Z|11887|vconn|DBG|unix:/var/run/openvswitch/br-int.mgmt: sent (Success): OFPT_ECHO_REPLY (OF1.3) (xid=0x0): 0 bytes of payload
2017-12-12T20:14:51Z|11888|timeval|WARN|Unreasonably long 1851ms poll interval (1844ms user, 8ms system)
2017-12-12T20:14:51Z|11889|timeval|WARN|context switches: 0 voluntary, 11 involuntary
2017-12-12T20:14:52Z|11890|poll_loop|DBG|wakeup due to [POLLIN] on fd 9 (172.31.11.193:48460<->172.31.2.181:6640) at lib/stream-fd.c:157 (73% CPU usage)
2017-12-12T20:14:52Z|11891|jsonrpc|DBG|tcp:172.31.2.181:6640: received request, method="echo", params=[], id="echo"
2017-12-12T20:14:52Z|11892|jsonrpc|DBG|tcp:172.31.2.181:6640: send reply, result=[], id="echo"
2017-12-12T20:14:52Z|11893|netlink_socket|DBG|nl_sock_transact_multiple__ (Success): nl(len:36, type=38(family-defined), flags=9[REQUEST][ECHO], seq=b12, pid=2268452876
2017-12-12T20:14:52Z|11894|netlink_socket|DBG|nl_sock_recv__ (Success): nl(len:136, type=36(family-defined), flags=0, seq=b12, pid=2268452876
2017-12-12T20:14:52Z|11895|netdev_linux|DBG|Dropped 18 log messages in last 56 seconds (most recently, 3 seconds ago) due to excessive rate
2017-12-12T20:14:52Z|11896|netdev_linux|DBG|unknown qdisc "mq"
2017-12-12T20:14:54Z|11897|hmap|DBG|Dropped 15511 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
2017-12-12T20:14:54Z|11898|hmap|DBG|ovn/lib/expr.c:2644: 6 nodes in bucket (128 nodes, 64 buckets)
2017-12-12T20:14:54Z|11899|timeval|WARN|Unreasonably long 1831ms poll interval (1828ms user, 4ms system)
2017-12-12T20:14:54Z|11900|timeval|WARN|context switches: 0 voluntary, 12 involuntary

The log messages show up continuously. The logs appear even when the network isn’t being used.

I poked around with Ethan Jackson and he noted that the hmap counters seem unusually high:
root at ip-172-31-11-193:/# ovs-appctl -t ovn-controller coverage/show
Event coverage, avg rate over last: 5 seconds, last minute, last hour,  hash=d6ee5804:
hmap_pathological        2323.6/sec  2662.467/sec     2514.0069/sec   total: 9407536
hmap_expand              3596.8/sec  4121.283/sec     3890.8833/sec   total: 14604479
txn_unchanged              0.8/sec     0.917/sec        0.8658/sec   total: 5659
txn_incomplete             0.0/sec     0.000/sec        0.0008/sec   total: 33
txn_success                0.0/sec     0.000/sec        0.0006/sec   total: 24
poll_create_node           2.4/sec     2.750/sec        2.5986/sec   total: 18218
poll_zero_timeout          0.0/sec     0.000/sec        0.0100/sec   total: 71
rconn_queued               0.0/sec     0.050/sec        0.0531/sec   total: 252570
rconn_sent                 0.0/sec     0.050/sec        0.0531/sec   total: 252570
seq_change                 1.2/sec     1.383/sec        1.2992/sec   total: 8500
pstream_open               0.0/sec     0.000/sec        0.0000/sec   total: 1
stream_open                0.0/sec     0.000/sec        0.0000/sec   total: 6
unixctl_received           0.0/sec     0.000/sec        0.0019/sec   total: 7
unixctl_replied            0.0/sec     0.000/sec        0.0019/sec   total: 7
util_xalloc              2731550.2/sec 3129900.483/sec   569276.5414/sec   total: 11057381035
vconn_open                 0.0/sec     0.000/sec        0.0000/sec   total: 4
vconn_received             0.0/sec     0.050/sec        0.0444/sec   total: 201
vconn_sent                 0.0/sec     0.000/sec        0.0144/sec   total: 253535
netdev_get_ifindex         0.4/sec     0.467/sec        0.4328/sec   total: 2822
netlink_received           0.4/sec     0.467/sec        0.4328/sec   total: 2822
netlink_sent               0.4/sec     0.467/sec        0.4328/sec   total: 2822
cmap_expand                0.0/sec     0.000/sec        0.0000/sec   total: 2
47 events never hit

I’ve also attached the output of ovs-bugtool run from the machine running ovn-northd, and one of the machines running ovn-controller and ovs-vswitchd.

Thanks,
—Kevin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20171212/02bc2fed/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ovn-controller-and-ovs-vswitchd-machine.tar.gz
Type: application/x-gzip
Size: 3561962 bytes
Desc: not available
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20171212/02bc2fed/attachment-0002.gz>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20171212/02bc2fed/attachment-0004.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ovn-northd-machine.tar.gz
Type: application/x-gzip
Size: 34828 bytes
Desc: not available
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20171212/02bc2fed/attachment-0003.gz>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20171212/02bc2fed/attachment-0005.html>


More information about the discuss mailing list