[ovs-discuss] |timeval|WARN|Unreasonably long 2020ms poll interval (365882ms user, 464906ms system)

Joe Stringer joestringer at nicira.com
Tue May 26 21:05:06 UTC 2015


Are you using STP?

On 22 May 2015 at 21:54, yangwanli <yangwanli at jd.com> wrote:
> Thank you for your response!
>
> My servers OS are all  CentOS 6.6,  kernel version: 2.6.32-504.el6.x86_64,
> vCPU:32,  RAM:256G
> #lscpu
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                32
> On-line CPU(s) list:   0-31
> Thread(s) per core:    2
> Core(s) per socket:    8
> Socket(s):             2
> NUMA node(s):          2
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 63
> Stepping:              2
> CPU MHz:               2599.971
> BogoMIPS:              5199.24
> Virtualization:        VT-x
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              256K
> L3 cache:              20480K
> NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
> NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31
>
> OpenvSwitch kernel module(openvswitch.ko) and user space utilities are all
> build  I use OpenvSwitch-2.3.1 source (change nothing )
> (http://openvswitch.org/releases/openvswitch-2.3.1.tar.gz)
> #modinfo openvswitch
> /lib/modules/2.6.32-504.el6.x86_64/kernel/net/openvswitch/openvswitch.ko
> version:        2.3.1
> license:        GPL
> description:    Open vSwitch switching datapath
> srcversion:     4D7CD38A83A9A4A782F73A1
> depends:        libcrc32c
> vermagic:       2.6.32-504.el6.x86_64 SMP mod_unload modversions
> parm:           vlan_tso:Enable TSO for VLAN packets (int)
>
> I have 300 servers are all the same hardware , OS and OpenvSwitch-2.3.1.
> Since May 15, 2015, There are 6 servers  appear this problem that  when I
> execute  ovs-ofctl dump-flows br-int , not always reponse,
> OVS ports not transmit datas, led to the virtual machines cannot be
> accessed.
> When appear this problem , server load is not high, and while I excute
> netstat -anp , find some CONNECTING state of unix socket.
> I restart ovs-vswitchd and ovdb-server, becames normal.
> I doubt the thread dealock of ovs-vswitchd.
>
> All servers have the following log info:
> /var/log/openvswitch/ovs-vswitd.log:
> 2015-05-18T12:13:04.491Z|00138|coverage|INFO|Event coverage, avg rate over
> last: 5 seconds, last minute, last hour,  hash=9e13ffc8:
> 2015-05-18T12:13:04.491Z|00139|coverage|INFO|nln_changed
> 0.0/sec     0.000/sec        0.0000/sec   total: 21
> 2015-05-18T12:13:04.491Z|00140|coverage|INFO|netlink_received
> 59.4/sec    53.850/sec       53.6072/sec   total: 1159232
> 2015-05-18T12:13:04.491Z|00141|coverage|INFO|netlink_recv_jumbo
> 3.8/sec     2.767/sec        2.6536/sec   total: 53239
> 2015-05-18T12:13:04.491Z|00142|coverage|INFO|netlink_sent
> 68.0/sec    62.783/sec       62.9394/sec   total: 1353375
> 2015-05-18T12:13:04.491Z|00143|coverage|INFO|netdev_set_policing
> 0.0/sec     0.000/sec        0.0000/sec   total: 12
> 2015-05-18T12:13:04.491Z|00144|coverage|INFO|netdev_get_ifindex
> 0.0/sec     0.000/sec        0.0000/sec   total: 6
> 2015-05-18T12:13:04.491Z|00145|coverage|INFO|netdev_get_hwaddr
> 0.0/sec     0.000/sec        0.0000/sec   total: 6
> 2015-05-18T12:13:04.491Z|00146|coverage|INFO|netdev_set_hwaddr
> 0.0/sec     0.000/sec        0.0000/sec   total: 3
> 2015-05-18T12:13:04.491Z|00147|coverage|INFO|netdev_get_ethtool
> 0.0/sec     0.000/sec        0.0000/sec   total: 22
> 2015-05-18T12:13:04.491Z|00148|coverage|INFO|netdev_set_ethtool
> 0.0/sec     0.000/sec        0.0000/sec   total: 4
> 2015-05-18T12:13:04.491Z|00149|coverage|INFO|vconn_received
> 2.4/sec     2.000/sec        1.9067/sec   total: 41642
> 2015-05-18T12:13:04.491Z|00150|coverage|INFO|vconn_sent
> 1.8/sec     1.500/sec        1.4300/sec   total: 31228
> 2015-05-18T12:13:04.491Z|00151|coverage|INFO|util_xalloc
> 1987.2/sec  1867.150/sec     1817.8008/sec   total: 37808073
> 2015-05-18T12:13:04.491Z|00152|coverage|INFO|pstream_open
> 0.0/sec     0.000/sec        0.0000/sec   total: 5
> 2015-05-18T12:13:04.491Z|00153|coverage|INFO|stream_open
> 0.0/sec     0.000/sec        0.0000/sec   total: 1
> 2015-05-18T12:13:04.491Z|00154|coverage|INFO|rconn_queued
> 1.2/sec     1.000/sec        0.9533/sec   total: 20815
> 2015-05-18T12:13:04.491Z|00155|coverage|INFO|rconn_sent
> 1.2/sec     1.000/sec        0.9533/sec   total: 20815
> 2015-05-18T12:13:04.491Z|00156|coverage|INFO|poll_fd_wait
> 537.2/sec   481.217/sec      462.1683/sec   total: 9762985
> 2015-05-18T12:13:04.491Z|00157|coverage|INFO|poll_zero_timeout
> 5.2/sec     4.333/sec        3.8581/sec   total: 83823
> 2015-05-18T12:13:04.491Z|00158|coverage|INFO|txn_unchanged
> 0.0/sec     0.000/sec        0.0000/sec   total: 11
> 2015-05-18T12:13:04.491Z|00159|coverage|INFO|txn_incomplete
> 8.8/sec     8.450/sec        8.3044/sec   total: 170811
> 2015-05-18T12:13:04.491Z|00160|coverage|INFO|txn_success
> 0.2/sec     0.200/sec        0.1989/sec   total: 4276
> 2015-05-18T12:13:04.491Z|00161|coverage|INFO|netdev_get_stats
> 1.6/sec     1.600/sec        1.5911/sec   total: 32833
> 2015-05-18T12:13:04.491Z|00162|coverage|INFO|mac_learning_learned
> 0.2/sec     0.133/sec        0.0686/sec   total: 1595
> 2015-05-18T12:13:04.491Z|00163|coverage|INFO|mac_learning_expired
> 0.4/sec     0.150/sec        0.0689/sec   total: 1518
> 2015-05-18T12:13:04.491Z|00164|coverage|INFO|hmap_pathological
> 0.0/sec     0.000/sec        0.0000/sec   total: 27
> 2015-05-18T12:13:04.491Z|00165|coverage|INFO|hmap_expand
> 102.4/sec    92.300/sec       88.6772/sec   total: 1853939
> 2015-05-18T12:13:04.491Z|00166|coverage|INFO|flow_extract
> 23.0/sec    19.883/sec       19.7067/sec   total: 416747
> 2015-05-18T12:13:04.491Z|00167|coverage|INFO|dpif_port_add
> 0.0/sec     0.000/sec        0.0000/sec   total: 6
> 2015-05-18T12:13:04.491Z|00168|coverage|INFO|dpif_flow_flush
> 0.0/sec     0.000/sec        0.0000/sec   total: 4
> 2015-05-18T12:13:04.491Z|00169|coverage|INFO|dpif_flow_put
> 0.0/sec     0.000/sec        0.0000/sec   total: 2
> 2015-05-18T12:13:04.491Z|00170|coverage|INFO|dpif_flow_del
> 0.0/sec     0.000/sec        0.0000/sec   total: 1
> 2015-05-18T12:13:04.491Z|00171|coverage|INFO|dpif_execute
> 0.0/sec     0.000/sec        0.0000/sec   total: 1
> 2015-05-18T12:13:04.491Z|00172|coverage|INFO|xlate_actions
> 38.8/sec    27.317/sec       22.9731/sec   total: 507271
> 2015-05-18T12:13:04.491Z|00173|coverage|INFO|rev_reconfigure
> 0.0/sec     0.000/sec        0.0000/sec   total: 20
> 2015-05-18T12:13:04.491Z|00174|coverage|INFO|rev_port_toggled
> 0.0/sec     0.000/sec        0.0000/sec   total: 3
> 2015-05-18T12:13:04.491Z|00175|coverage|INFO|rev_flow_table
> 0.0/sec     0.000/sec        0.0000/sec   total: 11
> 2015-05-18T12:13:04.491Z|00176|coverage|INFO|rev_mac_learning
> 0.6/sec     0.283/sec        0.1286/sec   total: 2813
> 2015-05-18T12:13:04.491Z|00177|coverage|INFO|ofproto_flush
> 0.0/sec     0.000/sec        0.0000/sec   total: 3
> 2015-05-18T12:13:04.491Z|00178|coverage|INFO|ofproto_recv_openflow
> 1.8/sec     1.500/sec        1.4300/sec   total: 31229
> 2015-05-18T12:13:04.491Z|00179|coverage|INFO|ofproto_update_port
> 0.0/sec     0.000/sec        0.0000/sec   total: 34
> 2015-05-18T12:13:04.491Z|00180|coverage|INFO|bridge_reconfigure
> 0.0/sec     0.000/sec        0.0000/sec   total: 15
> 2015-05-18T12:13:04.491Z|00181|coverage|INFO|49 events never hit
> 2015-05-18T14:59:05.014Z|00182|timeval|WARN|Unreasonably long 2019ms poll
> interval (43448ms user, 50465ms system)
> 2015-05-18T14:59:05.014Z|00183|timeval|WARN|faults: 840 minor, 0 major
> 2015-05-18T14:59:05.014Z|00184|timeval|WARN|disk: 0 reads, 120 writes
> 2015-05-18T14:59:05.014Z|00185|timeval|WARN|context switches: 2167382
> voluntary, 529 involuntary
> 2015-05-18T14:59:05.014Z|00186|coverage|INFO|Skipping details of duplicate
> event coverage for hash=9e13ffc8
> 2015-05-18T17:46:50.578Z|00187|timeval|WARN|Unreasonably long 2018ms poll
> interval (56724ms user, 65745ms system)
> 2015-05-18T17:46:50.578Z|00188|timeval|WARN|faults: 841 minor, 0 major
> 2015-05-18T17:46:50.578Z|00189|timeval|WARN|disk: 0 reads, 128 writes
> 2015-05-18T17:46:50.578Z|00190|timeval|WARN|context switches: 2853218
> voluntary, 538 involuntary
> 2015-05-18T17:46:50.578Z|00191|coverage|INFO|Skipping details of duplicate
> event coverage for hash=9e13ffc8
> 2015-05-18T20:29:36.092Z|00192|timeval|WARN|Unreasonably long 2018ms poll
> interval (69561ms user, 80775ms system)
> 2015-05-18T20:29:36.092Z|00193|timeval|WARN|faults: 841 minor, 0 major
> 2015-05-18T20:29:36.092Z|00194|timeval|WARN|disk: 0 reads, 136 writes
> 2015-05-18T20:29:36.092Z|00195|timeval|WARN|context switches: 3519924
> voluntary, 544 involuntary
> 2015-05-18T20:29:36.092Z|00196|coverage|INFO|Skipping details of duplicate
> event coverage for hash=9e13ffc8
> 2015-05-18T23:21:51.729Z|00197|timeval|WARN|Unreasonably long 2021ms poll
> interval (83029ms user, 96728ms system)
> 2015-05-18T23:21:51.729Z|00198|timeval|WARN|faults: 841 minor, 0 major
> 2015-05-18T23:21:51.729Z|00199|timeval|WARN|disk: 0 reads, 144 writes
> 2015-05-18T23:21:51.729Z|00200|timeval|WARN|context switches: 4224452
> voluntary, 553 involuntary
> 2015-05-18T23:21:51.729Z|00201|coverage|INFO|Skipping details of duplicate
> event coverage for hash=9e13ffc8
>
> ________________________________
>
> WanLi Yang
>
> JD.COM
>
> *****************************************************************
>
>
> From: Joe Stringer
> Date: 2015-05-23 05:36
> To: yangwanli
> CC: bugs
> Subject: Re: [ovs-discuss] |timeval|WARN|Unreasonably long 2020ms poll
> interval (365882ms user, 464906ms system)
> On 19 May 2015 at 03:32, yangwanli <yangwanli at jd.com> wrote:
>> Hi,
>> I'm a cloud computing R&D engineer. Use OVS as virtual switch, all virtual
>> machines and containers communicate with OVS.
>> I use OVS  VLAN.
>> OpenvSwitch Version 2.3.1  (nothing change)
>> CentOS 6.6
>> In the process of I use, encounter the flowing problems.
>> Need your help!
>>
>> 1、ovs-vswitchd.log always report warnings:
>>
>> 2015-05-17T00:52:50.463Z|00575|timeval|WARN|Unreasonably long 2020ms poll
>> interval (365882ms user, 464906ms system)
>> 2015-05-17T00:52:50.463Z|00576|timeval|WARN|faults: 708 minor, 0 major
>> 2015-05-17T00:52:50.463Z|00577|timeval|WARN|disk: 0 reads, 392 writes
>> 2015-05-17T00:52:50.463Z|00578|timeval|WARN|context switches: 19571142
>> voluntary, 2080 involuntary
>> 2015-05-17T00:52:50.463Z|00579|coverage|INFO|Skipping details of duplicate
>> event coverage for hash=35d02982
>> 2015-05-17T03:26:36.336Z|00580|timeval|WARN|Unreasonably long 2020ms poll
>> interval (380306ms user, 483259ms system)
>> 2015-05-17T03:26:36.336Z|00581|timeval|WARN|faults: 708 minor, 0 major
>> 2015-05-17T03:26:36.336Z|00582|timeval|WARN|disk: 0 reads, 400 writes
>> 2015-05-17T03:26:36.336Z|00583|timeval|WARN|context switches: 20299734
>> voluntary, 2174 involuntary
>> 2015-05-17T03:26:36.336Z|00584|coverage|INFO|Skipping details of duplicate
>> event coverage for hash=35d02982
>
> The main loop of ovs-vswitchd usually expects to cycle at least once
> per second. This message appears when this takes longer than usual,
> which may indicate that ovs-vswitchd is under heavy load, or some sort
> of bug.
>
>>
>> 2、I also encounter this problem:
>>     when I execute
>>     ovs-ofctl dump-flows br-int
>>    not always reponse,  but ovs-vsctl show is OK!
>
> The main thread of ovs-vswitchd assembles responses to OpenFlow
> requests, and given the above messages it sounds like it's busy doing
> other things. Ovs-vsctl on the other hand talks to ovsdb-server, which
> looks like it is OK in this scenario.
>
> These both seem like symptoms, but it's not obvious to me what the
> underlying cause is. Is there anything else in the logs? Is
> ovs-vswitchd using a lot of CPU?
>
> Are you using the OVS kernel module from CentOS or one distributed
> with the openvswitch package? ('modinfo openvswitch' should give a
> hint here)
>
> On 19 May 2015 at 03:32, yangwanli <yangwanli at jd.com> wrote:
>> Hi,
>> I'm a cloud computing R&D engineer. Use OVS as virtual switch, all virtual
>> machines and containers communicate with OVS.
>> I use OVS  VLAN.
>> OpenvSwitch Version 2.3.1  (nothing change)
>> CentOS 6.6
>> In the process of I use, encounter the flowing problems.
>> Need your help!
>>
>> 1、ovs-vswitchd.log always report warnings:
>>
>> 2015-05-17T00:52:50.463Z|00575|timeval|WARN|Unreasonably long 2020ms poll
>> interval (365882ms user, 464906ms system)
>> 2015-05-17T00:52:50.463Z|00576|timeval|WARN|faults: 708 minor, 0 major
>> 2015-05-17T00:52:50.463Z|00577|timeval|WARN|disk: 0 reads, 392 writes
>> 2015-05-17T00:52:50.463Z|00578|timeval|WARN|context switches: 19571142
>> voluntary, 2080 involuntary
>> 2015-05-17T00:52:50.463Z|00579|coverage|INFO|Skipping details of duplicate
>> event coverage for hash=35d02982
>> 2015-05-17T03:26:36.336Z|00580|timeval|WARN|Unreasonably long 2020ms poll
>> interval (380306ms user, 483259ms system)
>> 2015-05-17T03:26:36.336Z|00581|timeval|WARN|faults: 708 minor, 0 major
>> 2015-05-17T03:26:36.336Z|00582|timeval|WARN|disk: 0 reads, 400 writes
>> 2015-05-17T03:26:36.336Z|00583|timeval|WARN|context switches: 20299734
>> voluntary, 2174 involuntary
>> 2015-05-17T03:26:36.336Z|00584|coverage|INFO|Skipping details of duplicate
>> event coverage for hash=35d02982
>>
>> 2、I also encounter this problem:
>>     when I execute
>>     ovs-ofctl dump-flows br-int
>>    not always reponse,  but ovs-vsctl show is OK!
>>
>> Thanks!
>> WanLi Yang
>>
>>
>> ________________________________
>>
>> WanLi Yang
>>
>> JD.COM
>>
>> *****************************************************************
>>
>>
>> _______________________________________________
>> discuss mailing list
>> discuss at openvswitch.org
>> http://openvswitch.org/mailman/listinfo/discuss
>>



More information about the discuss mailing list