[ovs-discuss] ovs-vswitchd 100% CPU usage after hard reboot
Jamon Camisso
jamon.camisso at canonical.com
Tue May 7 13:43:39 UTC 2019
Following up on this since it is still an issue.
strace on the thread in question shows the following:
13:35:47 poll([{fd=23, events=POLLIN}], 1, 0) = 0 (Timeout) <0.000018>
13:35:47 epoll_wait(42, [{EPOLLIN, {u32=3, u64=3}}], 9, 0) = 1 <0.000018>
13:35:47 recvmsg(417, {msg_namelen=0}, MSG_DONTWAIT) = -1 EAGAIN
(Resource temporarily unavailable) <0.000018>
13:35:47 poll([{fd=11, events=POLLIN}, {fd=42, events=POLLIN}, {fd=23,
events=POLLIN}], 3, 2147483647) = 1 ([{fd=42, revents=POLLIN}]) <0.000019>
13:35:47 getrusage(RUSAGE_THREAD, {ru_utime={tv_sec=490842,
tv_usec=749026}, ru_stime={tv_sec=710657, tv_usec=442946}, ...}) = 0
<0.000018>
13:35:47 poll([{fd=23, events=POLLIN}], 1, 0) = 0 (Timeout) <0.000018>
13:35:47 epoll_wait(42, [{EPOLLIN, {u32=3, u64=3}}], 9, 0) = 1 <0.000019>
13:35:47 recvmsg(417, {msg_namelen=0}, MSG_DONTWAIT) = -1 EAGAIN
(Resource temporarily unavailable) <0.000018>
13:35:47 poll([{fd=11, events=POLLIN}, {fd=42, events=POLLIN}, {fd=23,
events=POLLIN}], 3, 2147483647) = 1 ([{fd=42, revents=POLLIN}]) <0.000019>
13:35:47 getrusage(RUSAGE_THREAD, {ru_utime={tv_sec=490842,
tv_usec=749108}, ru_stime={tv_sec=710657, tv_usec=442946}, ...}) = 0
<0.000017>
And if I strace with -c to collect a summary, after 4-5 seconds it shows
the following:
sudo strace -c -p 1658344
strace: Process 1658344 attached
^Cstrace: Process 1658344 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
0.00 0.000000 0 9 write
0.00 0.000000 0 56397 poll
0.00 0.000000 0 28198 28198 recvmsg
0.00 0.000000 0 28199 getrusage
0.00 0.000000 0 148 56 futex
0.00 0.000000 0 28199 epoll_wait
------ ----------- ----------- --------- --------- ----------------
100.00 0.000000 141150 28254 total
I'm really at a loss here as to what's happening, has anyone seen
behaviour like this?
Cheers, Jamon
On 02/05/2019 23:12, Jamon Camisso wrote:
> I'm seeing an identical issue to the one posted here a few months ago:
>
> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-October/047558.html
> - I'll include the bug report template at the end.
>
> The issue is an ovs-vswitchd thread consuming 100% CPU in a very lightly
> used Openstack Rocky cloud running on Bionic. Logs are filled with
> entries like this, about ~14000 per day:
>
> 2019-05-01T18:34:30.110Z|237220|poll_loop(handler89)|INFO|Dropped
> 1092844 log messages in last 6 seconds (most recently, 0 seconds ago)
> due to excessive rate
>
> 2019-05-01T18:34:30.110Z|237221|poll_loop(handler89)|INFO|wakeup due to
> [POLLIN] on fd 42 (unknown anon_inode:[eventpoll]) at
> ../lib/dpif-netlink.c:2786 (99% CPU usage)
>
> ovs-vswitchd is running alongside various neutron processes
> (lbaasv2-agent, metadata-agent, l3-agent, dhcp-agent, openvswitch-agent)
> inside an LXC container on a physical host. There is a single neutron
> router, and the entire environment including br-tun, br-ex, and br-int
> traffic barely goes over 200KiB/s TX/RX combined.
>
> If it is an issue with the Ubuntu packaged version (the other report is
> the same 2.10.0 package on Bionic which is suspicious), I've also filed
> a bug to track things here:
> https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1827264
>
> Thanks for any feedback or troubleshooting steps anyone can provide.
>
> Cheers, Jamon
>
>
>
>
>
> Bug template:
>
> What you did that make the problem appear.
> - host server was hard rebooted. lxc containers came back up fine, but
> ovs-vswitchd thread is spinning CPU and has remained that way for 10
> days
>
> What you expected to happen.
> - negligible CPU usage since the cloud isn't in production
>
> What actually happened.
> - a single ovs-vswitchd thread is spinning at 100% CPU and logs are
> populated with thousands of messages claiming a million+ messages
> are dropped every 6 seconds
>
> The Open vSwitch version number (as output by ovs-vswitchd --version).
> - ovs-vswitchd (Open vSwitch) 2.10.0 (the Ubuntu packaged version)
>
> The Git commit number (as output by git rev-parse HEAD)
> - N/A
>
> Any local patches or changes you have applied (if any).
> - N/A
>
> The kernel version on which Open vSwitch is running (from /proc/version)
> - Linux version 4.15.0-47-generic (buildd at lgw01-amd64-001) (gcc
> version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #50-Ubuntu SMP Wed Mar 13
> 10:44:52 UTC 2019
>
> The distribution and version number of your OS (e.g. “Centos 5.0”).
> - Ubuntu 18.04.2 LTS
>
> The contents of the vswitchd configuration database (usually
> /etc/openvswitch/conf.db).
> - See attached conf.db.txt
>
> The output of ovs-dpctl show.
> - See below:
>
> root at juju-df624b-4-lxd-10:~# ovs-dpctl show
> system at ovs-system:
> lookups: hit:223561120 missed:5768546 lost:798
> flows: 131
> masks: hit:2284286371 total:15 hit/pkt:9.96
> port 0: ovs-system (internal)
> port 1: br-ex (internal)
> port 2: eth1
> port 3: gre_sys (gre: packet_type=ptap)
> port 4: br-tun (internal)
> port 5: br-int (internal)
> port 6: tapa062d6f1-40
> port 7: tapf90e3ab6-13
> port 8: tap45ba891c-4c
>
> root at juju-df624b-4-lxd-10:~# ovs-ofctl show br-int
> OFPT_FEATURES_REPLY (xid=0x2): dpid:00003643edb09542
> n_tables:254, n_buffers:0
> capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
> actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src
> mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
> 1(int-br-ex): addr:4a:b4:cc:dd:aa:ac
> config: 0
> state: 0
> speed: 0 Mbps now, 0 Mbps max
> 2(patch-tun): addr:7a:36:76:1f:47:6e
> config: 0
> state: 0
> speed: 0 Mbps now, 0 Mbps max
> 3(tapa062d6f1-40): addr:92:d4:6c:26:55:0a
> config: 0
> state: 0
> current: 10GB-FD COPPER
> speed: 10000 Mbps now, 0 Mbps max
> 4(tapf90e3ab6-13): addr:9e:8b:4f:ae:8f:ba
> config: 0
> state: 0
> current: 10GB-FD COPPER
> speed: 10000 Mbps now, 0 Mbps max
> 5(tap45ba891c-4c): addr:76:45:39:c8:d7:e0
> config: 0
> state: 0
> current: 10GB-FD COPPER
> speed: 10000 Mbps now, 0 Mbps max
> LOCAL(br-int): addr:36:43:ed:b0:95:42
> config: PORT_DOWN
> state: LINK_DOWN
> speed: 0 Mbps now, 0 Mbps max
> OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
>
> root at juju-df624b-4-lxd-10:~# ovs-ofctl show br-tun
> OFPT_FEATURES_REPLY (xid=0x2): dpid:000092c6c8f4a545
> n_tables:254, n_buffers:0
> capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
> actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src
> mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
> 1(patch-int): addr:02:1d:bc:ec:0c:3b
> config: 0
> state: 0
> speed: 0 Mbps now, 0 Mbps max
> 2(gre-0a30029b): addr:fe:09:61:11:92:8b
> config: 0
> state: 0
> speed: 0 Mbps now, 0 Mbps max
> 3(gre-0a3002a2): addr:a2:64:93:48:e2:82
> config: 0
> state: 0
> speed: 0 Mbps now, 0 Mbps max
> 4(gre-0a30029d): addr:12:b1:73:32:19:d4
> config: 0
> state: 0
> speed: 0 Mbps now, 0 Mbps max
> 5(gre-0a3002ce): addr:ca:10:51:f2:0f:05
> config: 0
> state: 0
> speed: 0 Mbps now, 0 Mbps max
> 6(gre-0a3002a1): addr:de:fe:95:33:d9:67
> config: 0
> state: 0
> speed: 0 Mbps now, 0 Mbps max
> 7(gre-0a30029e): addr:5a:fa:ce:15:7b:5c
> config: 0
> state: 0
> speed: 0 Mbps now, 0 Mbps max
> 8(gre-0a3002c3): addr:b6:87:66:09:fc:04
> config: 0
> state: 0
> speed: 0 Mbps now, 0 Mbps max
> 10(gre-0a3002a0): addr:ce:79:f2:bf:a8:94
> config: 0
> state: 0
> speed: 0 Mbps now, 0 Mbps max
> 14(gre-0a30029f): addr:b6:b0:84:7d:9f:aa
> config: 0
> state: 0
> speed: 0 Mbps now, 0 Mbps max
> 18(gre-0a30029c): addr:da:14:12:54:1c:ec
> config: 0
> state: 0
> speed: 0 Mbps now, 0 Mbps max
> LOCAL(br-tun): addr:92:c6:c8:f4:a5:45
> config: PORT_DOWN
> state: LINK_DOWN
> speed: 0 Mbps now, 0 Mbps max
> OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
>
> root at juju-df624b-4-lxd-10:~# ovs-ofctl show br-ex
> OFPT_FEATURES_REPLY (xid=0x2): dpid:000000163e9a47a2
> n_tables:254, n_buffers:0
> capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
> actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src
> mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
> 1(eth1): addr:00:16:3e:9a:47:a2
> config: 0
> state: 0
> current: 10GB-FD COPPER
> speed: 10000 Mbps now, 0 Mbps max
> 2(phy-br-ex): addr:ce:a8:6a:25:55:0e
> config: 0
> state: 0
> speed: 0 Mbps now, 0 Mbps max
> LOCAL(br-ex): addr:00:16:3e:9a:47:a2
> config: 0
> state: 0
> speed: 0 Mbps now, 0 Mbps max
> OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
>
>
> _______________________________________________
> discuss mailing list
> discuss at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
More information about the discuss
mailing list