[ovs-discuss] |timeval|WARN|Unreasonably long 2020ms poll interval (365882ms user, 464906ms system)

yangwanli yangwanli at jd.com
Sat May 23 04:54:16 UTC 2015


Thank you for your response!

My servers OS are all  CentOS 6.6,  kernel version: 2.6.32-504.el6.x86_64, vCPU:32,  RAM:256G
#lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Stepping:              2
CPU MHz:               2599.971
BogoMIPS:              5199.24
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              20480K
NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31

OpenvSwitch kernel module(openvswitch.ko) and user space utilities are all build  I use OpenvSwitch-2.3.1 source (change nothing ) (http://openvswitch.org/releases/openvswitch-2.3.1.tar.gz)
#modinfo openvswitch
/lib/modules/2.6.32-504.el6.x86_64/kernel/net/openvswitch/openvswitch.ko
version:        2.3.1
license:        GPL
description:    Open vSwitch switching datapath
srcversion:     4D7CD38A83A9A4A782F73A1
depends:        libcrc32c
vermagic:       2.6.32-504.el6.x86_64 SMP mod_unload modversions 
parm:           vlan_tso:Enable TSO for VLAN packets (int)

I have 300 servers are all the same hardware , OS and OpenvSwitch-2.3.1.
Since May 15, 2015, There are 6 servers  appear this problem that  when I execute  ovs-ofctl dump-flows br-int , not always reponse,
OVS ports not transmit datas, led to the virtual machines cannot be accessed.
When appear this problem , server load is not high, and while I excute netstat -anp , find some CONNECTING state of unix socket.
I restart ovs-vswitchd and ovdb-server, becames normal.
I doubt the thread dealock of ovs-vswitchd. 

All servers have the following log info:
/var/log/openvswitch/ovs-vswitd.log:
2015-05-18T12:13:04.491Z|00138|coverage|INFO|Event coverage, avg rate over last: 5 seconds, last minute, last hour,  hash=9e13ffc8:
2015-05-18T12:13:04.491Z|00139|coverage|INFO|nln_changed                0.0/sec     0.000/sec        0.0000/sec   total: 21
2015-05-18T12:13:04.491Z|00140|coverage|INFO|netlink_received          59.4/sec    53.850/sec       53.6072/sec   total: 1159232
2015-05-18T12:13:04.491Z|00141|coverage|INFO|netlink_recv_jumbo         3.8/sec     2.767/sec        2.6536/sec   total: 53239
2015-05-18T12:13:04.491Z|00142|coverage|INFO|netlink_sent              68.0/sec    62.783/sec       62.9394/sec   total: 1353375
2015-05-18T12:13:04.491Z|00143|coverage|INFO|netdev_set_policing        0.0/sec     0.000/sec        0.0000/sec   total: 12
2015-05-18T12:13:04.491Z|00144|coverage|INFO|netdev_get_ifindex         0.0/sec     0.000/sec        0.0000/sec   total: 6
2015-05-18T12:13:04.491Z|00145|coverage|INFO|netdev_get_hwaddr          0.0/sec     0.000/sec        0.0000/sec   total: 6
2015-05-18T12:13:04.491Z|00146|coverage|INFO|netdev_set_hwaddr          0.0/sec     0.000/sec        0.0000/sec   total: 3
2015-05-18T12:13:04.491Z|00147|coverage|INFO|netdev_get_ethtool         0.0/sec     0.000/sec        0.0000/sec   total: 22
2015-05-18T12:13:04.491Z|00148|coverage|INFO|netdev_set_ethtool         0.0/sec     0.000/sec        0.0000/sec   total: 4
2015-05-18T12:13:04.491Z|00149|coverage|INFO|vconn_received             2.4/sec     2.000/sec        1.9067/sec   total: 41642
2015-05-18T12:13:04.491Z|00150|coverage|INFO|vconn_sent                 1.8/sec     1.500/sec        1.4300/sec   total: 31228
2015-05-18T12:13:04.491Z|00151|coverage|INFO|util_xalloc              1987.2/sec  1867.150/sec     1817.8008/sec   total: 37808073
2015-05-18T12:13:04.491Z|00152|coverage|INFO|pstream_open               0.0/sec     0.000/sec        0.0000/sec   total: 5
2015-05-18T12:13:04.491Z|00153|coverage|INFO|stream_open                0.0/sec     0.000/sec        0.0000/sec   total: 1
2015-05-18T12:13:04.491Z|00154|coverage|INFO|rconn_queued               1.2/sec     1.000/sec        0.9533/sec   total: 20815
2015-05-18T12:13:04.491Z|00155|coverage|INFO|rconn_sent                 1.2/sec     1.000/sec        0.9533/sec   total: 20815
2015-05-18T12:13:04.491Z|00156|coverage|INFO|poll_fd_wait             537.2/sec   481.217/sec      462.1683/sec   total: 9762985
2015-05-18T12:13:04.491Z|00157|coverage|INFO|poll_zero_timeout          5.2/sec     4.333/sec        3.8581/sec   total: 83823
2015-05-18T12:13:04.491Z|00158|coverage|INFO|txn_unchanged              0.0/sec     0.000/sec        0.0000/sec   total: 11
2015-05-18T12:13:04.491Z|00159|coverage|INFO|txn_incomplete             8.8/sec     8.450/sec        8.3044/sec   total: 170811
2015-05-18T12:13:04.491Z|00160|coverage|INFO|txn_success                0.2/sec     0.200/sec        0.1989/sec   total: 4276
2015-05-18T12:13:04.491Z|00161|coverage|INFO|netdev_get_stats           1.6/sec     1.600/sec        1.5911/sec   total: 32833
2015-05-18T12:13:04.491Z|00162|coverage|INFO|mac_learning_learned       0.2/sec     0.133/sec        0.0686/sec   total: 1595
2015-05-18T12:13:04.491Z|00163|coverage|INFO|mac_learning_expired       0.4/sec     0.150/sec        0.0689/sec   total: 1518
2015-05-18T12:13:04.491Z|00164|coverage|INFO|hmap_pathological          0.0/sec     0.000/sec        0.0000/sec   total: 27
2015-05-18T12:13:04.491Z|00165|coverage|INFO|hmap_expand              102.4/sec    92.300/sec       88.6772/sec   total: 1853939
2015-05-18T12:13:04.491Z|00166|coverage|INFO|flow_extract              23.0/sec    19.883/sec       19.7067/sec   total: 416747
2015-05-18T12:13:04.491Z|00167|coverage|INFO|dpif_port_add              0.0/sec     0.000/sec        0.0000/sec   total: 6
2015-05-18T12:13:04.491Z|00168|coverage|INFO|dpif_flow_flush            0.0/sec     0.000/sec        0.0000/sec   total: 4
2015-05-18T12:13:04.491Z|00169|coverage|INFO|dpif_flow_put              0.0/sec     0.000/sec        0.0000/sec   total: 2
2015-05-18T12:13:04.491Z|00170|coverage|INFO|dpif_flow_del              0.0/sec     0.000/sec        0.0000/sec   total: 1
2015-05-18T12:13:04.491Z|00171|coverage|INFO|dpif_execute               0.0/sec     0.000/sec        0.0000/sec   total: 1
2015-05-18T12:13:04.491Z|00172|coverage|INFO|xlate_actions             38.8/sec    27.317/sec       22.9731/sec   total: 507271
2015-05-18T12:13:04.491Z|00173|coverage|INFO|rev_reconfigure            0.0/sec     0.000/sec        0.0000/sec   total: 20
2015-05-18T12:13:04.491Z|00174|coverage|INFO|rev_port_toggled           0.0/sec     0.000/sec        0.0000/sec   total: 3
2015-05-18T12:13:04.491Z|00175|coverage|INFO|rev_flow_table             0.0/sec     0.000/sec        0.0000/sec   total: 11
2015-05-18T12:13:04.491Z|00176|coverage|INFO|rev_mac_learning           0.6/sec     0.283/sec        0.1286/sec   total: 2813
2015-05-18T12:13:04.491Z|00177|coverage|INFO|ofproto_flush              0.0/sec     0.000/sec        0.0000/sec   total: 3
2015-05-18T12:13:04.491Z|00178|coverage|INFO|ofproto_recv_openflow      1.8/sec     1.500/sec        1.4300/sec   total: 31229
2015-05-18T12:13:04.491Z|00179|coverage|INFO|ofproto_update_port        0.0/sec     0.000/sec        0.0000/sec   total: 34
2015-05-18T12:13:04.491Z|00180|coverage|INFO|bridge_reconfigure         0.0/sec     0.000/sec        0.0000/sec   total: 15
2015-05-18T12:13:04.491Z|00181|coverage|INFO|49 events never hit
2015-05-18T14:59:05.014Z|00182|timeval|WARN|Unreasonably long 2019ms poll interval (43448ms user, 50465ms system)
2015-05-18T14:59:05.014Z|00183|timeval|WARN|faults: 840 minor, 0 major
2015-05-18T14:59:05.014Z|00184|timeval|WARN|disk: 0 reads, 120 writes
2015-05-18T14:59:05.014Z|00185|timeval|WARN|context switches: 2167382 voluntary, 529 involuntary
2015-05-18T14:59:05.014Z|00186|coverage|INFO|Skipping details of duplicate event coverage for hash=9e13ffc8
2015-05-18T17:46:50.578Z|00187|timeval|WARN|Unreasonably long 2018ms poll interval (56724ms user, 65745ms system)
2015-05-18T17:46:50.578Z|00188|timeval|WARN|faults: 841 minor, 0 major
2015-05-18T17:46:50.578Z|00189|timeval|WARN|disk: 0 reads, 128 writes
2015-05-18T17:46:50.578Z|00190|timeval|WARN|context switches: 2853218 voluntary, 538 involuntary
2015-05-18T17:46:50.578Z|00191|coverage|INFO|Skipping details of duplicate event coverage for hash=9e13ffc8
2015-05-18T20:29:36.092Z|00192|timeval|WARN|Unreasonably long 2018ms poll interval (69561ms user, 80775ms system)
2015-05-18T20:29:36.092Z|00193|timeval|WARN|faults: 841 minor, 0 major
2015-05-18T20:29:36.092Z|00194|timeval|WARN|disk: 0 reads, 136 writes
2015-05-18T20:29:36.092Z|00195|timeval|WARN|context switches: 3519924 voluntary, 544 involuntary
2015-05-18T20:29:36.092Z|00196|coverage|INFO|Skipping details of duplicate event coverage for hash=9e13ffc8
2015-05-18T23:21:51.729Z|00197|timeval|WARN|Unreasonably long 2021ms poll interval (83029ms user, 96728ms system)
2015-05-18T23:21:51.729Z|00198|timeval|WARN|faults: 841 minor, 0 major
2015-05-18T23:21:51.729Z|00199|timeval|WARN|disk: 0 reads, 144 writes
2015-05-18T23:21:51.729Z|00200|timeval|WARN|context switches: 4224452 voluntary, 553 involuntary
2015-05-18T23:21:51.729Z|00201|coverage|INFO|Skipping details of duplicate event coverage for hash=9e13ffc8



WanLi Yang
JD.COM 
*****************************************************************
 
From: Joe Stringer
Date: 2015-05-23 05:36
To: yangwanli
CC: bugs
Subject: Re: [ovs-discuss] |timeval|WARN|Unreasonably long 2020ms poll interval (365882ms user, 464906ms system)
On 19 May 2015 at 03:32, yangwanli <yangwanli at jd.com> wrote:
> Hi,
> I'm a cloud computing R&D engineer. Use OVS as virtual switch, all virtual
> machines and containers communicate with OVS.
> I use OVS  VLAN.
> OpenvSwitch Version 2.3.1  (nothing change)
> CentOS 6.6
> In the process of I use, encounter the flowing problems.
> Need your help!
>
> 1、ovs-vswitchd.log always report warnings:
>
> 2015-05-17T00:52:50.463Z|00575|timeval|WARN|Unreasonably long 2020ms poll
> interval (365882ms user, 464906ms system)
> 2015-05-17T00:52:50.463Z|00576|timeval|WARN|faults: 708 minor, 0 major
> 2015-05-17T00:52:50.463Z|00577|timeval|WARN|disk: 0 reads, 392 writes
> 2015-05-17T00:52:50.463Z|00578|timeval|WARN|context switches: 19571142
> voluntary, 2080 involuntary
> 2015-05-17T00:52:50.463Z|00579|coverage|INFO|Skipping details of duplicate
> event coverage for hash=35d02982
> 2015-05-17T03:26:36.336Z|00580|timeval|WARN|Unreasonably long 2020ms poll
> interval (380306ms user, 483259ms system)
> 2015-05-17T03:26:36.336Z|00581|timeval|WARN|faults: 708 minor, 0 major
> 2015-05-17T03:26:36.336Z|00582|timeval|WARN|disk: 0 reads, 400 writes
> 2015-05-17T03:26:36.336Z|00583|timeval|WARN|context switches: 20299734
> voluntary, 2174 involuntary
> 2015-05-17T03:26:36.336Z|00584|coverage|INFO|Skipping details of duplicate
> event coverage for hash=35d02982
 
The main loop of ovs-vswitchd usually expects to cycle at least once
per second. This message appears when this takes longer than usual,
which may indicate that ovs-vswitchd is under heavy load, or some sort
of bug.
 
>
> 2、I also encounter this problem:
>     when I execute
>     ovs-ofctl dump-flows br-int
>    not always reponse,  but ovs-vsctl show is OK!
 
The main thread of ovs-vswitchd assembles responses to OpenFlow
requests, and given the above messages it sounds like it's busy doing
other things. Ovs-vsctl on the other hand talks to ovsdb-server, which
looks like it is OK in this scenario.
 
These both seem like symptoms, but it's not obvious to me what the
underlying cause is. Is there anything else in the logs? Is
ovs-vswitchd using a lot of CPU?
 
Are you using the OVS kernel module from CentOS or one distributed
with the openvswitch package? ('modinfo openvswitch' should give a
hint here)
 
On 19 May 2015 at 03:32, yangwanli <yangwanli at jd.com> wrote:
> Hi,
> I'm a cloud computing R&D engineer. Use OVS as virtual switch, all virtual
> machines and containers communicate with OVS.
> I use OVS  VLAN.
> OpenvSwitch Version 2.3.1  (nothing change)
> CentOS 6.6
> In the process of I use, encounter the flowing problems.
> Need your help!
>
> 1、ovs-vswitchd.log always report warnings:
>
> 2015-05-17T00:52:50.463Z|00575|timeval|WARN|Unreasonably long 2020ms poll
> interval (365882ms user, 464906ms system)
> 2015-05-17T00:52:50.463Z|00576|timeval|WARN|faults: 708 minor, 0 major
> 2015-05-17T00:52:50.463Z|00577|timeval|WARN|disk: 0 reads, 392 writes
> 2015-05-17T00:52:50.463Z|00578|timeval|WARN|context switches: 19571142
> voluntary, 2080 involuntary
> 2015-05-17T00:52:50.463Z|00579|coverage|INFO|Skipping details of duplicate
> event coverage for hash=35d02982
> 2015-05-17T03:26:36.336Z|00580|timeval|WARN|Unreasonably long 2020ms poll
> interval (380306ms user, 483259ms system)
> 2015-05-17T03:26:36.336Z|00581|timeval|WARN|faults: 708 minor, 0 major
> 2015-05-17T03:26:36.336Z|00582|timeval|WARN|disk: 0 reads, 400 writes
> 2015-05-17T03:26:36.336Z|00583|timeval|WARN|context switches: 20299734
> voluntary, 2174 involuntary
> 2015-05-17T03:26:36.336Z|00584|coverage|INFO|Skipping details of duplicate
> event coverage for hash=35d02982
>
> 2、I also encounter this problem:
>     when I execute
>     ovs-ofctl dump-flows br-int
>    not always reponse,  but ovs-vsctl show is OK!
>
> Thanks!
> WanLi Yang
>
>
> ________________________________
>
> WanLi Yang
>
> JD.COM
>
> *****************************************************************
>
>
> _______________________________________________
> discuss mailing list
> discuss at openvswitch.org
> http://openvswitch.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://openvswitch.org/pipermail/ovs-discuss/attachments/20150523/fa373636/attachment-0002.html>


More information about the discuss mailing list