[ovs-discuss] ovs-vswitchd 100% CPU usage after hard reboot

Jamon Camisso jamon.camisso at canonical.com
Tue May 7 13:43:39 UTC 2019


Following up on this since it is still an issue.

strace on the thread in question shows  the following:

13:35:47 poll([{fd=23, events=POLLIN}], 1, 0) = 0 (Timeout) <0.000018>
13:35:47 epoll_wait(42, [{EPOLLIN, {u32=3, u64=3}}], 9, 0) = 1 <0.000018>
13:35:47 recvmsg(417, {msg_namelen=0}, MSG_DONTWAIT) = -1 EAGAIN
(Resource temporarily unavailable) <0.000018>
13:35:47 poll([{fd=11, events=POLLIN}, {fd=42, events=POLLIN}, {fd=23,
events=POLLIN}], 3, 2147483647) = 1 ([{fd=42, revents=POLLIN}]) <0.000019>
13:35:47 getrusage(RUSAGE_THREAD, {ru_utime={tv_sec=490842,
tv_usec=749026}, ru_stime={tv_sec=710657, tv_usec=442946}, ...}) = 0
<0.000018>
13:35:47 poll([{fd=23, events=POLLIN}], 1, 0) = 0 (Timeout) <0.000018>
13:35:47 epoll_wait(42, [{EPOLLIN, {u32=3, u64=3}}], 9, 0) = 1 <0.000019>
13:35:47 recvmsg(417, {msg_namelen=0}, MSG_DONTWAIT) = -1 EAGAIN
(Resource temporarily unavailable) <0.000018>
13:35:47 poll([{fd=11, events=POLLIN}, {fd=42, events=POLLIN}, {fd=23,
events=POLLIN}], 3, 2147483647) = 1 ([{fd=42, revents=POLLIN}]) <0.000019>
13:35:47 getrusage(RUSAGE_THREAD, {ru_utime={tv_sec=490842,
tv_usec=749108}, ru_stime={tv_sec=710657, tv_usec=442946}, ...}) = 0
<0.000017>


And if I strace with -c to collect a summary, after 4-5 seconds it shows
the following:

sudo strace -c -p 1658344
strace: Process 1658344 attached
^Cstrace: Process 1658344 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
  0.00    0.000000           0         9           write
  0.00    0.000000           0     56397           poll
  0.00    0.000000           0     28198     28198 recvmsg
  0.00    0.000000           0     28199           getrusage
  0.00    0.000000           0       148        56 futex
  0.00    0.000000           0     28199           epoll_wait
------ ----------- ----------- --------- --------- ----------------
100.00    0.000000                141150     28254 total

I'm really at a loss here as to what's happening, has anyone seen
behaviour like this?

Cheers, Jamon
On 02/05/2019 23:12, Jamon Camisso wrote:
> I'm seeing an identical issue to the one posted here a few months ago:
> 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-October/047558.html
> - I'll include the bug report template at the end.
> 
> The issue is an ovs-vswitchd thread consuming 100% CPU in a very lightly
> used Openstack Rocky cloud running on Bionic. Logs are filled with
> entries like this, about ~14000 per day:
> 
> 2019-05-01T18:34:30.110Z|237220|poll_loop(handler89)|INFO|Dropped
> 1092844 log messages in last 6 seconds (most recently, 0 seconds ago)
> due to excessive rate
> 
> 2019-05-01T18:34:30.110Z|237221|poll_loop(handler89)|INFO|wakeup due to
> [POLLIN] on fd 42 (unknown anon_inode:[eventpoll]) at
> ../lib/dpif-netlink.c:2786 (99% CPU usage)
> 
> ovs-vswitchd is running alongside various neutron processes
> (lbaasv2-agent, metadata-agent, l3-agent, dhcp-agent, openvswitch-agent)
> inside an LXC container on a physical host. There is a single neutron
> router, and the entire environment including br-tun, br-ex, and br-int
> traffic barely goes over 200KiB/s TX/RX combined.
> 
> If it is an issue with the Ubuntu packaged version (the other report is
> the same 2.10.0 package on Bionic which is suspicious), I've also filed
> a bug to track things here:
> https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1827264
> 
> Thanks for any feedback or troubleshooting steps anyone can provide.
> 
> Cheers, Jamon
> 
> 
> 
> 
> 
> Bug template:
> 
> What you did that make the problem appear.
>   - host server was hard rebooted. lxc containers came back up fine, but
>     ovs-vswitchd thread is spinning CPU and has remained that way for 10
>     days
> 
> What you expected to happen.
>   - negligible CPU usage since the cloud isn't in production
> 
> What actually happened.
>   - a single ovs-vswitchd thread is spinning at 100% CPU and logs are
>     populated with thousands of messages claiming a million+ messages
>     are dropped every 6 seconds
> 
> The Open vSwitch version number (as output by ovs-vswitchd --version).
>   - ovs-vswitchd (Open vSwitch) 2.10.0 (the Ubuntu packaged version)
> 
> The Git commit number (as output by git rev-parse HEAD)
>   - N/A
> 
> Any local patches or changes you have applied (if any).
>   - N/A
> 
> The kernel version on which Open vSwitch is running (from /proc/version)
>   - Linux version 4.15.0-47-generic (buildd at lgw01-amd64-001) (gcc
>     version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #50-Ubuntu SMP Wed Mar 13
>     10:44:52 UTC 2019
> 
> The distribution and version number of your OS (e.g. “Centos 5.0”).
>   - Ubuntu 18.04.2 LTS
> 
> The contents of the vswitchd configuration database (usually
> /etc/openvswitch/conf.db).
>   - See attached conf.db.txt
> 
> The output of ovs-dpctl show.
>   - See below:
> 
> root at juju-df624b-4-lxd-10:~# ovs-dpctl show
> system at ovs-system:
>   lookups: hit:223561120 missed:5768546 lost:798
>   flows: 131
>   masks: hit:2284286371 total:15 hit/pkt:9.96
>   port 0: ovs-system (internal)
>   port 1: br-ex (internal)
>   port 2: eth1
>   port 3: gre_sys (gre: packet_type=ptap)
>   port 4: br-tun (internal)
>   port 5: br-int (internal)
>   port 6: tapa062d6f1-40
>   port 7: tapf90e3ab6-13
>   port 8: tap45ba891c-4c
> 
> root at juju-df624b-4-lxd-10:~# ovs-ofctl show br-int
> OFPT_FEATURES_REPLY (xid=0x2): dpid:00003643edb09542
> n_tables:254, n_buffers:0
> capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
> actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src
> mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
>  1(int-br-ex): addr:4a:b4:cc:dd:aa:ac
>      config: 0
>      state: 0
>      speed: 0 Mbps now, 0 Mbps max
>  2(patch-tun): addr:7a:36:76:1f:47:6e
>      config: 0
>      state: 0
>      speed: 0 Mbps now, 0 Mbps max
>  3(tapa062d6f1-40): addr:92:d4:6c:26:55:0a
>      config: 0
>      state: 0
>      current: 10GB-FD COPPER
>      speed: 10000 Mbps now, 0 Mbps max
>  4(tapf90e3ab6-13): addr:9e:8b:4f:ae:8f:ba
>      config: 0
>      state: 0
>      current: 10GB-FD COPPER
>      speed: 10000 Mbps now, 0 Mbps max
>  5(tap45ba891c-4c): addr:76:45:39:c8:d7:e0
>      config: 0
>      state: 0
>      current: 10GB-FD COPPER
>      speed: 10000 Mbps now, 0 Mbps max
>  LOCAL(br-int): addr:36:43:ed:b0:95:42
>      config: PORT_DOWN
>      state: LINK_DOWN
>      speed: 0 Mbps now, 0 Mbps max
> OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
> 
> root at juju-df624b-4-lxd-10:~# ovs-ofctl show br-tun
> OFPT_FEATURES_REPLY (xid=0x2): dpid:000092c6c8f4a545
> n_tables:254, n_buffers:0
> capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
> actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src
> mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
>  1(patch-int): addr:02:1d:bc:ec:0c:3b
>      config: 0
>      state: 0
>      speed: 0 Mbps now, 0 Mbps max
>  2(gre-0a30029b): addr:fe:09:61:11:92:8b
>      config: 0
>      state: 0
>      speed: 0 Mbps now, 0 Mbps max
>  3(gre-0a3002a2): addr:a2:64:93:48:e2:82
>      config: 0
>      state: 0
>      speed: 0 Mbps now, 0 Mbps max
>  4(gre-0a30029d): addr:12:b1:73:32:19:d4
>      config: 0
>      state: 0
>      speed: 0 Mbps now, 0 Mbps max
>  5(gre-0a3002ce): addr:ca:10:51:f2:0f:05
>      config: 0
>      state: 0
>      speed: 0 Mbps now, 0 Mbps max
>  6(gre-0a3002a1): addr:de:fe:95:33:d9:67
>      config: 0
>      state: 0
>      speed: 0 Mbps now, 0 Mbps max
>  7(gre-0a30029e): addr:5a:fa:ce:15:7b:5c
>      config: 0
>      state: 0
>      speed: 0 Mbps now, 0 Mbps max
>  8(gre-0a3002c3): addr:b6:87:66:09:fc:04
>      config: 0
>      state: 0
>      speed: 0 Mbps now, 0 Mbps max
>  10(gre-0a3002a0): addr:ce:79:f2:bf:a8:94
>      config: 0
>      state: 0
>      speed: 0 Mbps now, 0 Mbps max
>  14(gre-0a30029f): addr:b6:b0:84:7d:9f:aa
>      config: 0
>      state: 0
>      speed: 0 Mbps now, 0 Mbps max
>  18(gre-0a30029c): addr:da:14:12:54:1c:ec
>      config: 0
>      state: 0
>      speed: 0 Mbps now, 0 Mbps max
>  LOCAL(br-tun): addr:92:c6:c8:f4:a5:45
>      config: PORT_DOWN
>      state: LINK_DOWN
>      speed: 0 Mbps now, 0 Mbps max
> OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
> 
> root at juju-df624b-4-lxd-10:~# ovs-ofctl show br-ex
> OFPT_FEATURES_REPLY (xid=0x2): dpid:000000163e9a47a2
> n_tables:254, n_buffers:0
> capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
> actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src
> mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
>  1(eth1): addr:00:16:3e:9a:47:a2
>      config: 0
>      state: 0
>      current: 10GB-FD COPPER
>      speed: 10000 Mbps now, 0 Mbps max
>  2(phy-br-ex): addr:ce:a8:6a:25:55:0e
>      config: 0
>      state: 0
>      speed: 0 Mbps now, 0 Mbps max
>  LOCAL(br-ex): addr:00:16:3e:9a:47:a2
>      config: 0
>      state: 0
>      speed: 0 Mbps now, 0 Mbps max
> OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
> 
> 
> _______________________________________________
> discuss mailing list
> discuss at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> 



More information about the discuss mailing list