[ovs-discuss] ovs-vswitchd 100% CPU in OVN scale test

Han Zhou zhouhan at gmail.com
Wed Feb 7 20:47:31 UTC 2018


When doing scale testing for OVN (using
https://github.com/openvswitch/ovn-scale-test), we had some interesting
findings, and need some help here.

We ran the test "create and bind lports" against branch 2.9 and branch 2.6,
and we found that 2.6 was must faster. With some analysis, we found out the
reason is not because of OVN gets slower in 2.9, but because the bottleneck
of this test in branch 2.9 is ovs-vswitchd.

The testing was run in an environment with 20 farm nodes, each has 50
sandbox HVs (I will just mention them as HVs in short). Before the test,
there are already 9500 lports bound in 950 HVs on 19 farm nodes. The test
run against the last farm node to bind the lport on the 50 HVs there. The
steps in the test scenario are:

1. Create 5 new LSs in NB (so that the LSs will not be shared with any of
HVs on other farm nodes)
2. create 100 lports in NB on a LS
3. bind these lports on HVs, 2 for each HV. They are bound sequentially on
each HV, and for each HV the 2 ports are bound using one command together:
ovs-vsctl add-port <port 1> -- set Interface external-ids:...  -- add-port
<port 2> -- set Interface external-ids:... (the script didn't set type to
internal, but I hope it is not an issue for this test).
4. wait the port stated changed to up in NB for all the 100 lports (with a
single ovn-nbctl command)

These steps are repeated for 5 times, one for each LS. So in the end we got
500 more lports created and bound (the total scale is then 1k HVs and 10k
lports).

When running with 2.6, the ovn-controllers were taking most of the CPU
time. However, with 2.9, the CPU of ovn-controllers spikes but there is
always ovs-vswitchd on the top with 100% CPU. It means the ovs-vswitchd is
the bottleneck in this testing. There is only one ovs-vswitchd with 100% at
the same time and different ovs-vswitchd will spike one after another,
since the ports are bound sequentially on each HV. From the rally log, each
2 ports binding takes around 4 - 5 seconds. This is just the ovs-vsctl
command execution time. The 100% CPU of ovs-vswitchd explains the slowness.

So, based on this result, we can not using the total time to evaluate the
efficiency of OVN, instead we can evaluate by CPU cost of ovn-controller
processes. In fact, 2.9 ovn-controller costs around 70% less CPU than 2.6,
which I think is due to some optimization we made earlier. (With my
work-in-progress patch it saves much more, and I will post later as RFC).

However, I cannot explain why ovs-vswitchd is getting slower than 2.6 when
doing port-binding. We need expert suggestions here, for what could be the
possible reason of this slowness. We can do more testing with different
versions between 2.6 and 2.9 to find out related change, but with some
pointers it might save some effort. Below are some logs of ovs-vswitchd
when port binding is happening:

======
2018-02-07T00:12:54.558Z|01767|bridge|INFO|bridge br-int: added interface
lport_bc65cd_QFOU3v on port 1028
2018-02-07T00:12:55.629Z|01768|timeval|WARN|Unreasonably long 1112ms poll
interval (1016ms user, 4ms system)
2018-02-07T00:12:55.629Z|01769|timeval|WARN|faults: 336 minor, 0 major
2018-02-07T00:12:55.629Z|01770|timeval|WARN|context switches: 0 voluntary,
13 involuntary
2018-02-07T00:12:55.629Z|01771|coverage|INFO|Event coverage, avg rate over
last: 5 seconds, last minute, last hour,  hash=b256889c:
2018-02-07T00:12:55.629Z|01772|coverage|INFO|bridge_reconfigure
 0.0/sec     0.000/sec        0.0056/sec   total:
29
2018-02-07T00:12:55.629Z|01773|coverage|INFO|ofproto_flush
 0.0/sec     0.000/sec        0.0000/sec   total:
1
2018-02-07T00:12:55.629Z|01774|coverage|INFO|ofproto_packet_out
 0.0/sec     0.000/sec        0.0111/sec   total:
90
2018-02-07T00:12:55.629Z|01775|coverage|INFO|ofproto_recv_openflow
 0.2/sec     0.033/sec        0.4858/sec   total:
6673
2018-02-07T00:12:55.629Z|01776|coverage|INFO|ofproto_update_port
 0.0/sec     0.000/sec        5.5883/sec   total:
28258
2018-02-07T00:12:55.629Z|01777|coverage|INFO|rev_reconfigure
 0.0/sec     0.000/sec        0.0056/sec   total:
32
2018-02-07T00:12:55.629Z|01778|coverage|INFO|rev_port_toggled
 0.0/sec     0.000/sec        0.0011/sec   total:
6
2018-02-07T00:12:55.629Z|01779|coverage|INFO|rev_flow_table
 0.0/sec     0.000/sec        0.0078/sec   total:
124
2018-02-07T00:12:55.629Z|01780|coverage|INFO|rev_mac_learning
 0.0/sec     0.000/sec        0.0036/sec   total:
13
2018-02-07T00:12:55.629Z|01781|coverage|INFO|xlate_actions
 0.0/sec     0.000/sec        0.0111/sec   total:
90
2018-02-07T00:12:55.629Z|01782|coverage|INFO|ccmap_expand
 0.0/sec     0.000/sec        0.0050/sec   total:
36
2018-02-07T00:12:55.629Z|01783|coverage|INFO|ccmap_shrink
 0.0/sec     0.000/sec        0.0000/sec   total:
49
2018-02-07T00:12:55.629Z|01784|coverage|INFO|cmap_expand
 0.0/sec     0.000/sec        0.0217/sec   total:
257
2018-02-07T00:12:55.629Z|01785|coverage|INFO|cmap_shrink
 0.0/sec     0.000/sec        0.0000/sec   total:
141
2018-02-07T00:12:55.629Z|01786|coverage|INFO|dpif_port_add
 0.0/sec     0.000/sec        0.0022/sec   total:
1020
2018-02-07T00:12:55.629Z|01787|coverage|INFO|dpif_port_del
 0.0/sec     0.000/sec        0.0000/sec   total:
40
2018-02-07T00:12:55.629Z|01788|coverage|INFO|dpif_flow_flush
 0.0/sec     0.000/sec        0.0000/sec   total:
2
2018-02-07T00:12:55.629Z|01789|coverage|INFO|dpif_flow_get
 0.0/sec     0.000/sec        0.0000/sec   total:
20
2018-02-07T00:12:55.629Z|01790|coverage|INFO|dpif_flow_put
 0.0/sec     0.000/sec        0.0000/sec   total:
21
2018-02-07T00:12:55.629Z|01791|coverage|INFO|dpif_flow_del
 0.0/sec     0.000/sec        0.0000/sec   total:
20
2018-02-07T00:12:55.629Z|01792|coverage|INFO|dpif_execute
 0.0/sec     0.000/sec        0.0100/sec   total:
71
2018-02-07T00:12:55.629Z|01793|coverage|INFO|flow_extract
 0.0/sec     0.000/sec        0.0000/sec   total:
4
2018-02-07T00:12:55.629Z|01794|coverage|INFO|miniflow_malloc
 0.0/sec     0.000/sec        0.4361/sec   total:
6435
2018-02-07T00:12:55.629Z|01795|coverage|INFO|hindex_pathological
 0.0/sec     0.000/sec        0.0000/sec   total:
17
2018-02-07T00:12:55.629Z|01796|coverage|INFO|hindex_expand
 0.0/sec     0.000/sec        0.0000/sec   total:
9
2018-02-07T00:12:55.629Z|01797|coverage|INFO|hmap_pathological
 0.0/sec     0.000/sec        0.8131/sec   total:
13752
2018-02-07T00:12:55.630Z|01798|coverage|INFO|hmap_expand
21.0/sec    18.333/sec    11253.0167/sec   total:
55905271
2018-02-07T00:12:55.630Z|01799|coverage|INFO|hmap_shrink
 0.0/sec     0.000/sec        0.0022/sec   total:
1029
2018-02-07T00:12:55.630Z|01800|coverage|INFO|mac_learning_learned
 0.0/sec     0.000/sec        0.0022/sec   total:
18
2018-02-07T00:12:55.630Z|01801|coverage|INFO|mac_learning_expired
 0.0/sec     0.000/sec        0.0017/sec   total:
6
2018-02-07T00:12:55.630Z|01802|coverage|INFO|netdev_sent
 0.0/sec     0.000/sec        0.0400/sec   total:
350
2018-02-07T00:12:55.630Z|01803|coverage|INFO|netdev_get_stats
203.4/sec   203.400/sec      198.4978/sec   total:
798522
2018-02-07T00:12:55.630Z|01804|coverage|INFO|txn_unchanged
 0.2/sec     0.200/sec        0.1911/sec   total:
776
2018-02-07T00:12:55.630Z|01805|coverage|INFO|txn_incomplete
 0.0/sec     0.000/sec        0.0167/sec   total:
106
2018-02-07T00:12:55.630Z|01806|coverage|INFO|txn_success
 0.0/sec     0.000/sec        0.0111/sec   total:
48
2018-02-07T00:12:55.630Z|01807|coverage|INFO|poll_create_node
162.6/sec   157.267/sec      156.3664/sec   total:
636465
2018-02-07T00:12:55.630Z|01808|coverage|INFO|poll_zero_timeout
 0.2/sec     0.250/sec        0.4564/sec   total:
1936
2018-02-07T00:12:55.630Z|01809|coverage|INFO|rconn_queued
 0.2/sec     0.033/sec        0.0453/sec   total:
1214
2018-02-07T00:12:55.630Z|01810|coverage|INFO|rconn_sent
 0.2/sec     0.033/sec        0.0453/sec   total:
1214
2018-02-07T00:12:55.630Z|01811|coverage|INFO|seq_change
1098.8/sec  1099.167/sec     1097.4869/sec   total:
4468823
2018-02-07T00:12:55.630Z|01812|coverage|INFO|pstream_open
 0.0/sec     0.000/sec        0.0000/sec   total:
5
2018-02-07T00:12:55.630Z|01813|coverage|INFO|stream_open
 0.0/sec     0.000/sec        0.0000/sec   total:
1
2018-02-07T00:12:55.630Z|01814|coverage|INFO|util_xalloc
7223.8/sec  6527.317/sec    51569.9900/sec   total:
251999059
2018-02-07T00:12:55.630Z|01815|coverage|INFO|vconn_received
 0.2/sec     0.033/sec        0.4858/sec   total:
6675
2018-02-07T00:12:55.630Z|01816|coverage|INFO|vconn_sent
 0.2/sec     0.033/sec        0.0453/sec   total:
1216
2018-02-07T00:12:55.630Z|01817|coverage|INFO|netlink_received
 0.0/sec     0.000/sec        0.0000/sec   total:
3
2018-02-07T00:12:55.630Z|01818|coverage|INFO|netlink_sent
 0.0/sec     0.000/sec        0.0000/sec   total:
1
2018-02-07T00:12:55.630Z|01819|coverage|INFO|57 events never hit
2018-02-07T00:12:56.761Z|01820|timeval|WARN|Unreasonably long 1112ms poll
interval (948ms user, 4ms system)
2018-02-07T00:12:56.761Z|01821|timeval|WARN|faults: 7 minor, 0 major
2018-02-07T00:12:56.761Z|01822|timeval|WARN|context switches: 2 voluntary,
17 involuntary
2018-02-07T00:12:56.761Z|01823|coverage|INFO|Event coverage, avg rate over
last: 5 seconds, last minute, last hour,  hash=fcc57021:
2018-02-07T00:12:56.761Z|01824|coverage|INFO|bridge_reconfigure
 0.0/sec     0.000/sec        0.0056/sec   total:
30
2018-02-07T00:12:56.761Z|01825|coverage|INFO|ofproto_flush
 0.0/sec     0.000/sec        0.0000/sec   total:
1
2018-02-07T00:12:56.761Z|01826|coverage|INFO|ofproto_packet_out
 0.0/sec     0.000/sec        0.0111/sec   total:
90
2018-02-07T00:12:56.761Z|01827|coverage|INFO|ofproto_recv_openflow
 0.2/sec     0.033/sec        0.4858/sec   total:
6683
2018-02-07T00:12:56.761Z|01828|coverage|INFO|ofproto_update_port
 0.0/sec     0.000/sec        5.5883/sec   total:
29268
2018-02-07T00:12:56.761Z|01829|coverage|INFO|rev_reconfigure
 0.0/sec     0.000/sec        0.0056/sec   total:
33
2018-02-07T00:12:56.761Z|01830|coverage|INFO|rev_port_toggled
 0.0/sec     0.000/sec        0.0011/sec   total:
6
2018-02-07T00:12:56.761Z|01831|coverage|INFO|rev_flow_table
 0.0/sec     0.000/sec        0.0078/sec   total:
124
2018-02-07T00:12:56.761Z|01832|coverage|INFO|rev_mac_learning
 0.0/sec     0.000/sec        0.0036/sec   total:
13
2018-02-07T00:12:56.761Z|01833|coverage|INFO|xlate_actions
 0.0/sec     0.000/sec        0.0111/sec   total:
90
2018-02-07T00:12:56.761Z|01834|coverage|INFO|ccmap_expand
 0.0/sec     0.000/sec        0.0050/sec   total:
36
2018-02-07T00:12:56.761Z|01835|coverage|INFO|ccmap_shrink
 0.0/sec     0.000/sec        0.0000/sec   total:
49
2018-02-07T00:12:56.761Z|01836|coverage|INFO|cmap_expand
 0.0/sec     0.000/sec        0.0217/sec   total:
257
2018-02-07T00:12:56.761Z|01837|coverage|INFO|cmap_shrink
 0.0/sec     0.000/sec        0.0000/sec   total:
141
2018-02-07T00:12:56.761Z|01838|coverage|INFO|dpif_port_add
 0.0/sec     0.000/sec        0.0022/sec   total:
1020
2018-02-07T00:12:56.761Z|01839|coverage|INFO|dpif_port_del
 0.0/sec     0.000/sec        0.0000/sec   total:
40
2018-02-07T00:12:56.761Z|01840|coverage|INFO|dpif_flow_flush
 0.0/sec     0.000/sec        0.0000/sec   total:
2
2018-02-07T00:12:56.761Z|01841|coverage|INFO|dpif_flow_get
 0.0/sec     0.000/sec        0.0000/sec   total:
20
2018-02-07T00:12:56.761Z|01842|coverage|INFO|dpif_flow_put
 0.0/sec     0.000/sec        0.0000/sec   total:
21
2018-02-07T00:12:56.761Z|01843|coverage|INFO|dpif_flow_del
 0.0/sec     0.000/sec        0.0000/sec   total:
20
2018-02-07T00:12:56.761Z|01844|coverage|INFO|dpif_execute
 0.0/sec     0.000/sec        0.0100/sec   total:
71
2018-02-07T00:12:56.761Z|01845|coverage|INFO|flow_extract
 0.0/sec     0.000/sec        0.0000/sec   total:
4
2018-02-07T00:12:56.761Z|01846|coverage|INFO|miniflow_malloc
 0.0/sec     0.000/sec        0.4361/sec   total:
6440
2018-02-07T00:12:56.761Z|01847|coverage|INFO|hindex_pathological
 0.0/sec     0.000/sec        0.0000/sec   total:
17
2018-02-07T00:12:56.761Z|01848|coverage|INFO|hindex_expand
 0.0/sec     0.000/sec        0.0000/sec   total:
9
2018-02-07T00:12:56.761Z|01849|coverage|INFO|hmap_pathological
 0.0/sec     0.000/sec        0.8131/sec   total:
13825
2018-02-07T00:12:56.761Z|01850|coverage|INFO|hmap_expand
21.0/sec    18.333/sec    11253.0167/sec   total:
57949607
2018-02-07T00:12:56.761Z|01851|coverage|INFO|hmap_shrink
 0.0/sec     0.000/sec        0.0022/sec   total:
1029
2018-02-07T00:12:56.761Z|01852|coverage|INFO|mac_learning_learned
 0.0/sec     0.000/sec        0.0022/sec   total:
18
2018-02-07T00:12:56.761Z|01853|coverage|INFO|mac_learning_expired
 0.0/sec     0.000/sec        0.0017/sec   total:
6
2018-02-07T00:12:56.761Z|01854|coverage|INFO|netdev_sent
 0.0/sec     0.000/sec        0.0400/sec   total:
350
2018-02-07T00:12:56.761Z|01855|coverage|INFO|netdev_get_stats
203.4/sec   203.400/sec      198.4978/sec   total:
799540
2018-02-07T00:12:56.761Z|01856|coverage|INFO|txn_unchanged
 0.2/sec     0.200/sec        0.1911/sec   total:
777
2018-02-07T00:12:56.761Z|01857|coverage|INFO|txn_incomplete
 0.0/sec     0.000/sec        0.0167/sec   total:
108
2018-02-07T00:12:56.761Z|01858|coverage|INFO|txn_success
 0.0/sec     0.000/sec        0.0111/sec   total:
49
2018-02-07T00:12:56.761Z|01859|coverage|INFO|poll_create_node
162.6/sec   157.267/sec      156.3664/sec   total:
636681
2018-02-07T00:12:56.761Z|01860|coverage|INFO|poll_zero_timeout
 0.2/sec     0.250/sec        0.4564/sec   total:
1937
2018-02-07T00:12:56.761Z|01861|coverage|INFO|rconn_queued
 0.2/sec     0.033/sec        0.0453/sec   total:
1216
2018-02-07T00:12:56.761Z|01862|coverage|INFO|rconn_sent
 0.2/sec     0.033/sec        0.0453/sec   total:
1216
2018-02-07T00:12:56.761Z|01863|coverage|INFO|seq_change
1098.8/sec  1099.167/sec     1097.4869/sec   total:
4471434
2018-02-07T00:12:56.761Z|01864|coverage|INFO|pstream_open
 0.0/sec     0.000/sec        0.0000/sec   total:
5
2018-02-07T00:12:56.761Z|01865|coverage|INFO|stream_open
 0.0/sec     0.000/sec        0.0000/sec   total:
1
2018-02-07T00:12:56.761Z|01866|coverage|INFO|util_xalloc
7223.8/sec  6527.317/sec    51569.9900/sec   total:
260206006
2018-02-07T00:12:56.761Z|01867|coverage|INFO|vconn_received
 0.2/sec     0.033/sec        0.4858/sec   total:
6685
2018-02-07T00:12:56.761Z|01868|coverage|INFO|vconn_sent
 0.2/sec     0.033/sec        0.0453/sec   total:
1218
2018-02-07T00:12:56.761Z|01869|coverage|INFO|netlink_received
 0.0/sec     0.000/sec        0.0000/sec   total:
3
2018-02-07T00:12:56.761Z|01870|coverage|INFO|netlink_sent
 0.0/sec     0.000/sec        0.0000/sec   total:
1
2018-02-07T00:12:56.761Z|01871|coverage|INFO|57 events never hit
2018-02-07T00:12:57.054Z|01872|bridge|INFO|bridge br-int: added interface
lport_bc65cd_lykjnm on port 1029
2018-02-07T00:12:58.202Z|01873|timeval|WARN|Unreasonably long 1179ms poll
interval (1032ms user, 0ms system)
2018-02-07T00:12:58.202Z|01874|timeval|WARN|faults: 41 minor, 0 major
2018-02-07T00:12:58.202Z|01875|timeval|WARN|context switches: 0 voluntary,
19 involuntary
2018-02-07T00:12:58.203Z|01876|coverage|INFO|Event coverage, avg rate over
last: 5 seconds, last minute, last hour,  hash=668121b0:
2018-02-07T00:12:58.203Z|01877|coverage|INFO|bridge_reconfigure
 0.0/sec     0.000/sec        0.0056/sec   total:
31
2018-02-07T00:12:58.203Z|01878|coverage|INFO|ofproto_flush
 0.0/sec     0.000/sec        0.0000/sec   total:
1
2018-02-07T00:12:58.203Z|01879|coverage|INFO|ofproto_packet_out
 0.0/sec     0.000/sec        0.0111/sec   total:
90
2018-02-07T00:12:58.203Z|01880|coverage|INFO|ofproto_recv_openflow
 0.2/sec     0.033/sec        0.4858/sec   total:
6733
2018-02-07T00:12:58.203Z|01881|coverage|INFO|ofproto_update_port
 0.0/sec     0.000/sec        5.5883/sec   total:
30279
2018-02-07T00:12:58.203Z|01882|coverage|INFO|rev_reconfigure
 0.0/sec     0.000/sec        0.0056/sec   total:
34
2018-02-07T00:12:58.203Z|01883|coverage|INFO|rev_port_toggled
 0.0/sec     0.000/sec        0.0011/sec   total:
6
2018-02-07T00:12:58.203Z|01884|coverage|INFO|rev_flow_table
 0.0/sec     0.000/sec        0.0078/sec   total:
125
2018-02-07T00:12:58.203Z|01885|coverage|INFO|rev_mac_learning
 0.0/sec     0.000/sec        0.0036/sec   total:
13
2018-02-07T00:12:58.203Z|01886|coverage|INFO|xlate_actions
 0.0/sec     0.000/sec        0.0111/sec   total:
90
2018-02-07T00:12:58.203Z|01887|coverage|INFO|ccmap_expand
 0.0/sec     0.000/sec        0.0050/sec   total:
36
2018-02-07T00:12:58.203Z|01888|coverage|INFO|ccmap_shrink
 0.0/sec     0.000/sec        0.0000/sec   total:
49
2018-02-07T00:12:58.203Z|01889|coverage|INFO|cmap_expand
 0.0/sec     0.000/sec        0.0217/sec   total:
260
2018-02-07T00:12:58.203Z|01890|coverage|INFO|cmap_shrink
 0.0/sec     0.000/sec        0.0000/sec   total:
141
2018-02-07T00:12:58.203Z|01891|coverage|INFO|dpif_port_add
 0.0/sec     0.000/sec        0.0022/sec   total:
1021
2018-02-07T00:12:58.203Z|01892|coverage|INFO|dpif_port_del
 0.0/sec     0.000/sec        0.0000/sec   total:
40
2018-02-07T00:12:58.203Z|01893|coverage|INFO|dpif_flow_flush
 0.0/sec     0.000/sec        0.0000/sec   total:
2
2018-02-07T00:12:58.203Z|01894|coverage|INFO|dpif_flow_get
 0.0/sec     0.000/sec        0.0000/sec   total:
20
2018-02-07T00:12:58.203Z|01895|coverage|INFO|dpif_flow_put
 0.0/sec     0.000/sec        0.0000/sec   total:
21
2018-02-07T00:12:58.203Z|01896|coverage|INFO|dpif_flow_del
 0.0/sec     0.000/sec        0.0000/sec   total:
20
2018-02-07T00:12:58.203Z|01897|coverage|INFO|dpif_execute
 0.0/sec     0.000/sec        0.0100/sec   total:
71
2018-02-07T00:12:58.203Z|01898|coverage|INFO|flow_extract
 0.0/sec     0.000/sec        0.0000/sec   total:
4
2018-02-07T00:12:58.203Z|01899|coverage|INFO|miniflow_malloc
 0.0/sec     0.000/sec        0.4361/sec   total:
6489
2018-02-07T00:12:58.203Z|01900|coverage|INFO|hindex_pathological
 0.0/sec     0.000/sec        0.0000/sec   total:
17
2018-02-07T00:12:58.203Z|01901|coverage|INFO|hindex_expand
 0.0/sec     0.000/sec        0.0000/sec   total:
9
2018-02-07T00:12:58.203Z|01902|coverage|INFO|hmap_pathological
 0.0/sec     0.000/sec        0.8131/sec   total:
13945
2018-02-07T00:12:58.203Z|01903|coverage|INFO|hmap_expand
21.0/sec    18.333/sec    11253.0167/sec   total:
59998044
2018-02-07T00:12:58.203Z|01904|coverage|INFO|hmap_shrink
 0.0/sec     0.000/sec        0.0022/sec   total:
1030
2018-02-07T00:12:58.203Z|01905|coverage|INFO|mac_learning_learned
 0.0/sec     0.000/sec        0.0022/sec   total:
18
2018-02-07T00:12:58.203Z|01906|coverage|INFO|mac_learning_expired
 0.0/sec     0.000/sec        0.0017/sec   total:
6
2018-02-07T00:12:58.203Z|01907|coverage|INFO|netdev_sent
 0.0/sec     0.000/sec        0.0400/sec   total:
350
2018-02-07T00:12:58.203Z|01908|coverage|INFO|netdev_get_stats
203.4/sec   203.400/sec      198.4978/sec   total:
799541
2018-02-07T00:12:58.203Z|01909|coverage|INFO|txn_unchanged
 0.2/sec     0.200/sec        0.1911/sec   total:
777
2018-02-07T00:12:58.203Z|01910|coverage|INFO|txn_incomplete
 0.0/sec     0.000/sec        0.0167/sec   total:
110
2018-02-07T00:12:58.203Z|01911|coverage|INFO|txn_success
 0.0/sec     0.000/sec        0.0111/sec   total:
50
2018-02-07T00:12:58.203Z|01912|coverage|INFO|poll_create_node
162.6/sec   157.267/sec      156.3664/sec   total:
636974
2018-02-07T00:12:58.203Z|01913|coverage|INFO|poll_zero_timeout
 0.2/sec     0.250/sec        0.4564/sec   total:
1939
2018-02-07T00:12:58.203Z|01914|coverage|INFO|rconn_queued
 0.2/sec     0.033/sec        0.0453/sec   total:
1217
2018-02-07T00:12:58.203Z|01915|coverage|INFO|rconn_sent
 0.2/sec     0.033/sec        0.0453/sec   total:
1217
2018-02-07T00:12:58.203Z|01916|coverage|INFO|seq_change
1098.8/sec  1099.167/sec     1097.4869/sec   total:
4474605
2018-02-07T00:12:58.203Z|01917|coverage|INFO|pstream_open
 0.0/sec     0.000/sec        0.0000/sec   total:
5
2018-02-07T00:12:58.203Z|01918|coverage|INFO|stream_open
 0.0/sec     0.000/sec        0.0000/sec   total:
1
2018-02-07T00:12:58.203Z|01919|coverage|INFO|util_xalloc
7223.8/sec  6527.317/sec    51569.9900/sec   total:
268435993
2018-02-07T00:12:58.203Z|01920|coverage|INFO|vconn_received
 0.2/sec     0.033/sec        0.4858/sec   total:
6735
2018-02-07T00:12:58.203Z|01921|coverage|INFO|vconn_sent
 0.2/sec     0.033/sec        0.0453/sec   total:
1219
2018-02-07T00:12:58.203Z|01922|coverage|INFO|netlink_received
 0.0/sec     0.000/sec        0.0000/sec   total:
3
2018-02-07T00:12:58.203Z|01923|coverage|INFO|netlink_sent
 0.0/sec     0.000/sec        0.0000/sec   total:
1
2018-02-07T00:12:58.203Z|01924|coverage|INFO|57 events never hit
2018-02-07T00:12:58.221Z|01925|bridge|INFO|bridge br-int: added interface
patch-br-int-to-provnet-lswitch_bc65cd_Y7sqEw on port 1030
2018-02-07T00:12:58.221Z|01926|bridge|INFO|bridge br0: added interface
patch-provnet-lswitch_bc65cd_Y7sqEw-to-br-int on port 15
2018-02-07T00:12:59.313Z|01927|timeval|WARN|Unreasonably long 1111ms poll
interval (1092ms user, 0ms system)
2018-02-07T00:12:59.313Z|01928|timeval|WARN|faults: 45 minor, 0 major
2018-02-07T00:12:59.313Z|01929|timeval|WARN|context switches: 6 voluntary,
59 involuntary
2018-02-07T00:12:59.313Z|01930|coverage|INFO|Skipping details of duplicate
event coverage for hash=668121b0
2018-02-07T00:13:00.430Z|01931|timeval|WARN|Unreasonably long 1117ms poll
interval (1112ms user, 0ms system)
2018-02-07T00:13:00.430Z|01932|timeval|WARN|faults: 34 minor, 0 major
2018-02-07T00:13:00.430Z|01933|timeval|WARN|context switches: 2 voluntary,
11 involuntary
2018-02-07T00:13:00.431Z|01934|coverage|INFO|Event coverage, avg rate over
last: 5 seconds, last minute, last hour,  hash=a612a6e0:
2018-02-07T00:13:00.431Z|01935|coverage|INFO|bridge_reconfigure
 0.8/sec     0.067/sec        0.0067/sec   total:
33
2018-02-07T00:13:00.431Z|01936|coverage|INFO|ofproto_flush
 0.0/sec     0.000/sec        0.0000/sec   total:
1
2018-02-07T00:13:00.431Z|01937|coverage|INFO|ofproto_packet_out
 0.2/sec     0.017/sec        0.0114/sec   total:
92
2018-02-07T00:13:00.431Z|01938|coverage|INFO|ofproto_recv_openflow
32.2/sec     2.717/sec        0.5306/sec   total:
6935
2018-02-07T00:13:00.431Z|01939|coverage|INFO|ofproto_update_port
808.8/sec    67.400/sec        6.7117/sec   total:
32303
2018-02-07T00:13:00.431Z|01940|coverage|INFO|rev_reconfigure
 0.8/sec     0.067/sec        0.0067/sec   total:
36
2018-02-07T00:13:00.431Z|01941|coverage|INFO|rev_port_toggled
 0.0/sec     0.000/sec        0.0011/sec   total:
7
2018-02-07T00:13:00.431Z|01942|coverage|INFO|rev_flow_table
 0.4/sec     0.033/sec        0.0083/sec   total:
126
2018-02-07T00:13:00.431Z|01943|coverage|INFO|rev_mac_learning
 0.0/sec     0.000/sec        0.0036/sec   total:
13
2018-02-07T00:13:00.431Z|01944|coverage|INFO|xlate_actions
 0.2/sec     0.017/sec        0.0114/sec   total:
92
2018-02-07T00:13:00.431Z|01945|coverage|INFO|ccmap_expand
 0.0/sec     0.000/sec        0.0050/sec   total:
36
2018-02-07T00:13:00.431Z|01946|coverage|INFO|ccmap_shrink
 0.0/sec     0.000/sec        0.0000/sec   total:
49
2018-02-07T00:13:00.431Z|01947|coverage|INFO|cmap_expand
 2.0/sec     0.167/sec        0.0244/sec   total:
271
2018-02-07T00:13:00.431Z|01948|coverage|INFO|cmap_shrink
 0.0/sec     0.000/sec        0.0000/sec   total:
141
2018-02-07T00:13:00.431Z|01949|coverage|INFO|dpif_port_add
 0.4/sec     0.033/sec        0.0028/sec   total:
1021
2018-02-07T00:13:00.431Z|01950|coverage|INFO|dpif_port_del
 0.0/sec     0.000/sec        0.0000/sec   total:
40
2018-02-07T00:13:00.431Z|01951|coverage|INFO|dpif_flow_flush
 0.0/sec     0.000/sec        0.0000/sec   total:
2
2018-02-07T00:13:00.431Z|01952|coverage|INFO|dpif_flow_get
 0.0/sec     0.000/sec        0.0000/sec   total:
20
2018-02-07T00:13:00.431Z|01953|coverage|INFO|dpif_flow_put
 0.0/sec     0.000/sec        0.0000/sec   total:
21
2018-02-07T00:13:00.431Z|01954|coverage|INFO|dpif_flow_del
 0.0/sec     0.000/sec        0.0000/sec   total:
20
2018-02-07T00:13:00.431Z|01955|coverage|INFO|dpif_execute
 0.0/sec     0.000/sec        0.0100/sec   total:
72
2018-02-07T00:13:00.431Z|01956|coverage|INFO|flow_extract
 0.0/sec     0.000/sec        0.0000/sec   total:
4
2018-02-07T00:13:00.431Z|01957|coverage|INFO|miniflow_malloc
30.8/sec     2.567/sec        0.4789/sec   total:
6689
2018-02-07T00:13:00.431Z|01958|coverage|INFO|hindex_pathological
 0.0/sec     0.000/sec        0.0000/sec   total:
17
2018-02-07T00:13:00.431Z|01959|coverage|INFO|hindex_expand
 0.0/sec     0.000/sec        0.0000/sec   total:
9
2018-02-07T00:13:00.431Z|01960|coverage|INFO|hmap_pathological
76.8/sec     6.400/sec        0.9197/sec   total:
14149
2018-02-07T00:13:00.431Z|01961|coverage|INFO|hmap_expand
1637509.8/sec
136476.267/sec    13527.3358/sec   total: 64098893
2018-02-07T00:13:00.431Z|01962|coverage|INFO|hmap_shrink
 0.4/sec     0.033/sec        0.0028/sec   total:
1030
2018-02-07T00:13:00.431Z|01963|coverage|INFO|mac_learning_learned
 0.0/sec     0.000/sec        0.0022/sec   total:
19
2018-02-07T00:13:00.431Z|01964|coverage|INFO|mac_learning_expired
 0.0/sec     0.000/sec        0.0017/sec   total:
6
2018-02-07T00:13:00.431Z|01965|coverage|INFO|netdev_sent
 0.0/sec     0.000/sec        0.0400/sec   total:
359
2018-02-07T00:13:00.431Z|01966|coverage|INFO|netdev_get_stats
204.4/sec   203.483/sec      198.7817/sec   total:
799543
2018-02-07T00:13:00.431Z|01967|coverage|INFO|txn_unchanged
 0.2/sec     0.200/sec        0.1914/sec   total:
777
2018-02-07T00:13:00.431Z|01968|coverage|INFO|txn_incomplete
 1.4/sec     0.117/sec        0.0186/sec   total:
113
2018-02-07T00:13:00.431Z|01969|coverage|INFO|txn_success
 0.6/sec     0.050/sec        0.0119/sec   total:
51
2018-02-07T00:13:00.431Z|01970|coverage|INFO|poll_create_node
241.8/sec   164.833/sec      156.7022/sec   total:
637792
2018-02-07T00:13:00.431Z|01971|coverage|INFO|poll_zero_timeout
 1.8/sec     0.367/sec        0.4589/sec   total:
1944
2018-02-07T00:13:00.431Z|01972|coverage|INFO|rconn_queued
 1.2/sec     0.133/sec        0.0469/sec   total:
1219
2018-02-07T00:13:00.431Z|01973|coverage|INFO|rconn_sent
 1.2/sec     0.133/sec        0.0469/sec   total:
1219
2018-02-07T00:13:00.431Z|01974|coverage|INFO|seq_change
2679.2/sec  1230.183/sec     1101.2081/sec   total:
4481090
2018-02-07T00:13:00.431Z|01975|coverage|INFO|pstream_open
 0.0/sec     0.000/sec        0.0000/sec   total:
5
2018-02-07T00:13:00.431Z|01976|coverage|INFO|stream_open
 0.0/sec     0.000/sec        0.0000/sec   total:
1
2018-02-07T00:13:00.431Z|01977|coverage|INFO|util_xalloc
6573732.0/sec
553875.117/sec    60700.1733/sec   total: 284890464
2018-02-07T00:13:00.431Z|01978|coverage|INFO|vconn_received
32.2/sec     2.717/sec        0.5306/sec   total:
6937
2018-02-07T00:13:00.431Z|01979|coverage|INFO|vconn_sent
 1.2/sec     0.133/sec        0.0469/sec   total:
1221
2018-02-07T00:13:00.431Z|01980|coverage|INFO|netlink_received
 0.0/sec     0.000/sec        0.0000/sec   total:
3
2018-02-07T00:13:00.431Z|01981|coverage|INFO|netlink_sent
 0.0/sec     0.000/sec        0.0000/sec   total:
1
2018-02-07T00:13:00.431Z|01982|coverage|INFO|57 events never hit
2018-02-07T00:13:00.431Z|01983|poll_loop|INFO|Dropped 6 log messages in
last 217 seconds (most recently, 216 seconds ago) due to excessive rate
2018-02-07T00:13:00.431Z|01984|poll_loop|INFO|wakeup due to [POLLIN] on fd
112 (/root/sandbox-192.168.64.0/br-int.mgmt<->) at ../lib/stream-fd.c:157
(90% CPU usage)

I managed to capture a perf record when a ovs-vswitchd is spiking, not sure
if it helps:
Samples: 6K of event 'cycles:pp', Event count (approx.): 118132588562209
  Children      Self  Command        Shared
Object       Symbol
                        ◆

+   66.01%     0.00%  ovs-vswitchd   ovs-vswitchd        [.]
_start                                                                        ▒

+   66.01%     0.00%  ovs-vswitchd   libc-2.23.so        [.]
__libc_start_main                                                             ▒

+   66.01%     0.00%  ovs-vswitchd   ovs-vswitchd        [.]
main                                                                          ▒

+   66.01%     0.00%  ovs-vswitchd   ovs-vswitchd        [.]
bridge_run                                                                    ▒

+   66.01%     0.00%  ovs-vswitchd   ovs-vswitchd        [.]
bridge_run__                                                                  ▒

+   41.88%     0.00%  ovs-vswitchd   ovs-vswitchd        [.]
ofproto_type_run                                                              ▒

+   41.88%     0.00%  ovs-vswitchd   ovs-vswitchd        [.]
type_run                                                                      ▒

+   24.13%     0.00%  ovs-vswitchd   ovs-vswitchd        [.]
bridge_reconfigure                                                            ▒

+   24.13%     0.00%  ovs-vswitchd   ovs-vswitchd        [.]
ofproto_run                                                                   ▒

+   24.13%     0.00%  ovs-vswitchd   ovs-vswitchd        [.]
update_port                                                                   ▒

+   23.31%     0.00%  ovs-vswitchd   ovs-vswitchd        [.]
update_mtu                                                                    ▒

+   23.31%     0.00%  ovs-vswitchd   ovs-vswitchd        [.]
update_mtu_ofproto                                                            ▒

+   23.31%     0.00%  ovs-vswitchd   ovs-vswitchd        [.]
ofport_is_mtu_overridden                                                      ▒

+   23.27%     0.00%  ovs-vswitchd   ovs-vswitchd        [.]
xlate_txn_start                                                               ▒

+   23.27%     0.00%  ovs-vswitchd   ovs-vswitchd        [.]
xlate_xport_copy                                                              ▒

+   23.27%     0.00%  ovs-vswitchd   libjemalloc.so.1    [.]
calloc                                                                        ▒

+   23.27%     0.00%  ovs-vswitchd   ovs-vswitchd        [.]
xcalloc                                                                       ▒

+   23.27%    23.27%  ovs-vswitchd   libc-2.23.so        [.]
0x000000000008f294                                                            ▒

+   22.38%     0.00%  ovs-vswitchd   ovs-vswitchd        [.]
ofproto_class_find__                                                          ▒

+   22.38%     0.00%  ovs-vswitchd   ovs-vswitchd        [.]
ofproto_port_open_type                                                        ▒

+   20.96%     0.00%  ovs-vswitchd   ovs-vswitchd        [.]
dp_enumerate_types                                                            ▒

+   20.14%    20.14%  ovs-vswitchd   libjemalloc.so.1    [.]
malloc                                                                        ▒

+   19.43%     0.81%  ovs-vswitchd   ovs-vswitchd        [.]
ovs_mutex_lock_at                                                             ▒

+   19.33%     0.00%  ovs-vswitchd   ovs-vswitchd        [.]
sset_add__                                                                    ▒

+   19.33%     0.00%  ovs-vswitchd   ovs-vswitchd        [.]
xmalloc                                                                       ▒

+   18.62%    18.62%  ovs-vswitchd   libpthread-2.23.so  [.]
pthread_mutex_lock                                                            ▒

+   18.61%     0.00%  ovs-vswitchd   ovs-vswitchd        [.]
dpif_netdev_run                                                               ▒

+   18.61%     0.00%  ovs-vswitchd   ovs-vswitchd        [.]
dp_netdev_process_rxq_port                                                    ▒

+   18.61%     0.00%  ovs-vswitchd   ovs-vswitchd        [.]
netdev_rxq_recv                                                               ▒

+   18.61%     0.00%  ovs-vswitchd   ovs-vswitchd        [.]
netdev_dummy_rxq_recv                                                         ▒

+   13.03%     0.00%  revalidator66  [kernel.kallsyms]   [k]
entry_SYSCALL_64_fastpath                                                     ▒

+   13.03%     0.00%  revalidator66  libc-2.23.so        [.]
__clone                                                                       ▒

+   13.03%     0.00%  revalidator66  libpthread-2.23.so  [.]
start_thread                                                                  ▒


Thanks,
Han
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20180207/abfbf7e9/attachment-0001.html>


More information about the discuss mailing list