[ovs-discuss] ovn-controller consuming lots of CPU
Kevin Lin
kevin at kelda.io
Tue Dec 12 21:26:33 UTC 2017
Hi again,
We’re trying to scale up our OVN deployment and we’re seeing some worrying log messages.
The topology is 32 containers connected to another 32 containers on 10 different ports. This is running on 17 machines (one machine runs ovn-northd and ovsdb-server, the other 16 run ovn-controller, ovs-vswitchd, and ovsdb-server). We’re using an address set for the source group, but not the destination group. We’re also creating a different ACL for each port. So the ACLs look like:
One address set for { container1, container2, … container32 }
addressSet -> container1 on port 80
addressSet -> container1 on port 81
…
addressSet -> container1 on port 90
addressSet -> container2 on port 80
…
addressSet -> container32 on port 90
The ovn-controller log:
2017-12-12T20:14:49Z|11878|timeval|WARN|Unreasonably long 1843ms poll interval (1840ms user, 0ms system)
2017-12-12T20:14:49Z|11879|timeval|WARN|disk: 0 reads, 16 writes
2017-12-12T20:14:49Z|11880|timeval|WARN|context switches: 0 voluntary, 21 involuntary
2017-12-12T20:14:49Z|11881|poll_loop|DBG|wakeup due to [POLLIN] on fd 9 (172.31.11.193:48460<->172.31.2.181:6640) at lib/stream-fd.c:157 (36% CPU usage)
2017-12-12T20:14:49Z|11882|poll_loop|DBG|wakeup due to [POLLIN] on fd 12 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (36% CPU usage)
2017-12-12T20:14:49Z|11883|jsonrpc|DBG|tcp:172.31.2.181:6640: received reply, result=[], id="echo"
2017-12-12T20:14:49Z|11884|netlink_socket|DBG|nl_sock_transact_multiple__ (Success): nl(len:36, type=38(family-defined), flags=9[REQUEST][ECHO], seq=b11, pid=2268452876
2017-12-12T20:14:49Z|11885|netlink_socket|DBG|nl_sock_recv__ (Success): nl(len:136, type=36(family-defined), flags=0, seq=b11, pid=2268452876
2017-12-12T20:14:49Z|11886|vconn|DBG|unix:/var/run/openvswitch/br-int.mgmt: received: OFPT_ECHO_REQUEST (OF1.3) (xid=0x0): 0 bytes of payload
2017-12-12T20:14:49Z|11887|vconn|DBG|unix:/var/run/openvswitch/br-int.mgmt: sent (Success): OFPT_ECHO_REPLY (OF1.3) (xid=0x0): 0 bytes of payload
2017-12-12T20:14:51Z|11888|timeval|WARN|Unreasonably long 1851ms poll interval (1844ms user, 8ms system)
2017-12-12T20:14:51Z|11889|timeval|WARN|context switches: 0 voluntary, 11 involuntary
2017-12-12T20:14:52Z|11890|poll_loop|DBG|wakeup due to [POLLIN] on fd 9 (172.31.11.193:48460<->172.31.2.181:6640) at lib/stream-fd.c:157 (73% CPU usage)
2017-12-12T20:14:52Z|11891|jsonrpc|DBG|tcp:172.31.2.181:6640: received request, method="echo", params=[], id="echo"
2017-12-12T20:14:52Z|11892|jsonrpc|DBG|tcp:172.31.2.181:6640: send reply, result=[], id="echo"
2017-12-12T20:14:52Z|11893|netlink_socket|DBG|nl_sock_transact_multiple__ (Success): nl(len:36, type=38(family-defined), flags=9[REQUEST][ECHO], seq=b12, pid=2268452876
2017-12-12T20:14:52Z|11894|netlink_socket|DBG|nl_sock_recv__ (Success): nl(len:136, type=36(family-defined), flags=0, seq=b12, pid=2268452876
2017-12-12T20:14:52Z|11895|netdev_linux|DBG|Dropped 18 log messages in last 56 seconds (most recently, 3 seconds ago) due to excessive rate
2017-12-12T20:14:52Z|11896|netdev_linux|DBG|unknown qdisc "mq"
2017-12-12T20:14:54Z|11897|hmap|DBG|Dropped 15511 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
2017-12-12T20:14:54Z|11898|hmap|DBG|ovn/lib/expr.c:2644: 6 nodes in bucket (128 nodes, 64 buckets)
2017-12-12T20:14:54Z|11899|timeval|WARN|Unreasonably long 1831ms poll interval (1828ms user, 4ms system)
2017-12-12T20:14:54Z|11900|timeval|WARN|context switches: 0 voluntary, 12 involuntary
The log messages show up continuously. The logs appear even when the network isn’t being used.
I poked around with Ethan Jackson and he noted that the hmap counters seem unusually high:
root at ip-172-31-11-193:/# ovs-appctl -t ovn-controller coverage/show
Event coverage, avg rate over last: 5 seconds, last minute, last hour, hash=d6ee5804:
hmap_pathological 2323.6/sec 2662.467/sec 2514.0069/sec total: 9407536
hmap_expand 3596.8/sec 4121.283/sec 3890.8833/sec total: 14604479
txn_unchanged 0.8/sec 0.917/sec 0.8658/sec total: 5659
txn_incomplete 0.0/sec 0.000/sec 0.0008/sec total: 33
txn_success 0.0/sec 0.000/sec 0.0006/sec total: 24
poll_create_node 2.4/sec 2.750/sec 2.5986/sec total: 18218
poll_zero_timeout 0.0/sec 0.000/sec 0.0100/sec total: 71
rconn_queued 0.0/sec 0.050/sec 0.0531/sec total: 252570
rconn_sent 0.0/sec 0.050/sec 0.0531/sec total: 252570
seq_change 1.2/sec 1.383/sec 1.2992/sec total: 8500
pstream_open 0.0/sec 0.000/sec 0.0000/sec total: 1
stream_open 0.0/sec 0.000/sec 0.0000/sec total: 6
unixctl_received 0.0/sec 0.000/sec 0.0019/sec total: 7
unixctl_replied 0.0/sec 0.000/sec 0.0019/sec total: 7
util_xalloc 2731550.2/sec 3129900.483/sec 569276.5414/sec total: 11057381035
vconn_open 0.0/sec 0.000/sec 0.0000/sec total: 4
vconn_received 0.0/sec 0.050/sec 0.0444/sec total: 201
vconn_sent 0.0/sec 0.000/sec 0.0144/sec total: 253535
netdev_get_ifindex 0.4/sec 0.467/sec 0.4328/sec total: 2822
netlink_received 0.4/sec 0.467/sec 0.4328/sec total: 2822
netlink_sent 0.4/sec 0.467/sec 0.4328/sec total: 2822
cmap_expand 0.0/sec 0.000/sec 0.0000/sec total: 2
47 events never hit
I’ve also attached the output of ovs-bugtool run from the machine running ovn-northd, and one of the machines running ovn-controller and ovs-vswitchd.
Thanks,
—Kevin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20171212/02bc2fed/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ovn-controller-and-ovs-vswitchd-machine.tar.gz
Type: application/x-gzip
Size: 3561962 bytes
Desc: not available
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20171212/02bc2fed/attachment-0002.gz>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20171212/02bc2fed/attachment-0004.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ovn-northd-machine.tar.gz
Type: application/x-gzip
Size: 34828 bytes
Desc: not available
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20171212/02bc2fed/attachment-0003.gz>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20171212/02bc2fed/attachment-0005.html>
More information about the discuss
mailing list