[ovs-discuss] |timeval|WARN|Unreasonably long 2020ms poll interval (365882ms user, 464906ms system)

yangwanli yangwanli at jd.com
Thu May 28 02:58:38 UTC 2015


Thank you very much!
I upgraded to the latest kernel 2.6.32-504.16.2.el6.x86_64 for 200 servers  and so far, this problem(the ovs-vswitchd hang) has not happened.
I seems to be related to this bug(http://bugs.centos.org/view.php?id=8703). But,  I'm not sure!


From: Joe Stringer
Date: 2015-05-28 09:55
To: yangwanli
CC: bugs
Subject: Re: Re: [ovs-discuss] |timeval|WARN|Unreasonably long 2020ms poll interval (365882ms user, 464906ms system)
>Hmm. I have a few of different paths that might help:
 
> 1) Can you get dmesg and/or syslog around the time of the event?
  Not find errors or warning in dmesg and syslog about ovs-vswitchd.
> 2) Does ovs-vswitchd crash or hang?
 Hang, I think so! Because when I excute netstat -anp, I find some unix sockets state is CONNECTING. ovs-ofctl dump-flows br-int, not always reponse.
 Recently, I found  the ovs-vswitchd process of a server later changed to normal, about 20 minutes. I do nothing! ovs-ofctl dump-flows br-int is OK!
> A backtrace of the code that ovs-vswitchd is executing would be really
> useful. If it crashes, can you get a core dump and use gdb to get a
> backtrace? If it hangs, perhaps you could attach gdb and print a
> backtrace and post that?
 
> 3) If you're already compiling your own OVS-2.3, you could try
> downloading the latest branch-2.3 from here:
> https://github.com/openvswitch/ovs/tree/branch-2.3 
> There are a few bugfixes that have not made it into a new release yet.
 OK! I will try it. 
4) Is OVS running on the bare metal or is it running within a VM?
 bare metal 
> 5) Lastly, it's interesting that there is a delay of around ~2.5 hours
> (or 10,000 seconds) between the "unreasonably long poll interval"
> messages. Maybe this means something, maybe it's irrelevant.
 
On 26 May 2015 at 18:46, yangwanli <yangwanli at jd.com> wrote:
> I don't enable STP.
>
> #ovs-vsctl show
> 350424e8-f03c-4f5c-aaa9-fd78ec78f97b
>     Bridge br-int
>         fail_mode: secure
>         Port "int-br-eth1"
>             Interface "int-br-eth1"
>                 type: patch
>                 options: {peer="phy-br-eth1"}
>         Port "tap5c0d4002-26"
>             tag: 1
>             Interface "tap5c0d4002-26"
>         Port "tap4954ce53-06"
>             tag: 3
>             Interface "tap4954ce53-06"
>         Port "tap6631c08d-91"
>             tag: 3
>             Interface "tap6631c08d-91"
>         Port "tapdfd3962f-3a"
>             tag: 2
>             Interface "tapdfd3962f-3a"
>         Port br-int
>             Interface br-int
>                 type: internal
>         Port "tapb2cd1333-03"
>             tag: 1
>             Interface "tapb2cd1333-03"
>         Port "tap0c1153d7-bb"
>             tag: 1
>             Interface "tap0c1153d7-bb"
>     Bridge "br-eth1"
>         Port "br-eth1"
>             Interface "br-eth1"
>                 type: internal
>         Port "eth3"
>             Interface "eth3"
>         Port "phy-br-eth1"
>             Interface "phy-br-eth1"
>                 type: patch
>                 options: {peer="int-br-eth1"}
>     ovs_version: "2.3.1"
>
>
> eth3 is physical NIC.
> The bidge of OVS is created by:
> ovs-vsctl add-br br-int
> ovs-vsctl add-br br-eth1
> ovs-vsctl --timeout=10 -- set-fail-mode br-int secure
>
> ________________________________
>
> WanLi  Yang
>
> JD.COM
>
> *****************************************************************
>
>
> From: Joe Stringer
> Date: 2015-05-27 05:05
> To: yangwanli
> CC: bugs
> Subject: Re: Re: [ovs-discuss] |timeval|WARN|Unreasonably long 2020ms poll
> interval (365882ms user, 464906ms system)
> Are you using STP?
>
> On 22 May 2015 at 21:54, yangwanli <yangwanli at jd.com> wrote:
>> Thank you for your response!
>>
>> My servers OS are all  CentOS 6.6,  kernel version: 2.6.32-504.el6.x86_64,
>> vCPU:32,  RAM:256G
>> #lscpu
>> Architecture:          x86_64
>> CPU op-mode(s):        32-bit, 64-bit
>> Byte Order:            Little Endian
>> CPU(s):                32
>> On-line CPU(s) list:   0-31
>> Thread(s) per core:    2
>> Core(s) per socket:    8
>> Socket(s):             2
>> NUMA node(s):          2
>> Vendor ID:             GenuineIntel
>> CPU family:            6
>> Model:                 63
>> Stepping:              2
>> CPU MHz:               2599.971
>> BogoMIPS:              5199.24
>> Virtualization:        VT-x
>> L1d cache:             32K
>> L1i cache:             32K
>> L2 cache:              256K
>> L3 cache:              20480K
>> NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
>> NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31
>>
>> OpenvSwitch kernel module(openvswitch.ko) and user space utilities are all
>> build  I use OpenvSwitch-2.3.1 source (change nothing )
>> (http://openvswitch.org/releases/openvswitch-2.3.1.tar.gz)
>> #modinfo openvswitch
>> /lib/modules/2.6.32-504.el6.x86_64/kernel/net/openvswitch/openvswitch.ko
>> version:        2.3.1
>> license:        GPL
>> description:    Open vSwitch switching datapath
>> srcversion:     4D7CD38A83A9A4A782F73A1
>> depends:        libcrc32c
>> vermagic:       2.6.32-504.el6.x86_64 SMP mod_unload modversions
>> parm:           vlan_tso:Enable TSO for VLAN packets (int)
>>
>> I have 300 servers are all the same hardware , OS and OpenvSwitch-2.3.1.
>> Since May 15, 2015, There are 6 servers  appear this problem that  when I
>> execute  ovs-ofctl dump-flows br-int , not always reponse,
>> OVS ports not transmit datas, led to the virtual machines cannot be
>> accessed.
>> When appear this problem , server load is not high, and while I excute
>> netstat -anp , find some CONNECTING state of unix socket.
>> I restart ovs-vswitchd and ovdb-server, becames normal.
>> I doubt the thread dealock of ovs-vswitchd.
>>
>> All servers have the following log info:
>> /var/log/openvswitch/ovs-vswitd.log:
>> 2015-05-18T12:13:04.491Z|00138|coverage|INFO|Event coverage, avg rate over
>> last: 5 seconds, last minute, last hour,  hash=9e13ffc8:
>> 2015-05-18T12:13:04.491Z|00139|coverage|INFO|nln_changed
>> 0.0/sec     0.000/sec        0.0000/sec   total: 21
>> 2015-05-18T12:13:04.491Z|00140|coverage|INFO|netlink_received
>> 59.4/sec    53.850/sec       53.6072/sec   total: 1159232
>> 2015-05-18T12:13:04.491Z|00141|coverage|INFO|netlink_recv_jumbo
>> 3.8/sec     2.767/sec        2.6536/sec   total: 53239
>> 2015-05-18T12:13:04.491Z|00142|coverage|INFO|netlink_sent
>> 68.0/sec    62.783/sec       62.9394/sec   total: 1353375
>> 2015-05-18T12:13:04.491Z|00143|coverage|INFO|netdev_set_policing
>> 0.0/sec     0.000/sec        0.0000/sec   total: 12
>> 2015-05-18T12:13:04.491Z|00144|coverage|INFO|netdev_get_ifindex
>> 0.0/sec     0.000/sec        0.0000/sec   total: 6
>> 2015-05-18T12:13:04.491Z|00145|coverage|INFO|netdev_get_hwaddr
>> 0.0/sec     0.000/sec        0.0000/sec   total: 6
>> 2015-05-18T12:13:04.491Z|00146|coverage|INFO|netdev_set_hwaddr
>> 0.0/sec     0.000/sec        0.0000/sec   total: 3
>> 2015-05-18T12:13:04.491Z|00147|coverage|INFO|netdev_get_ethtool
>> 0.0/sec     0.000/sec        0.0000/sec   total: 22
>> 2015-05-18T12:13:04.491Z|00148|coverage|INFO|netdev_set_ethtool
>> 0.0/sec     0.000/sec        0.0000/sec   total: 4
>> 2015-05-18T12:13:04.491Z|00149|coverage|INFO|vconn_received
>> 2.4/sec     2.000/sec        1.9067/sec   total: 41642
>> 2015-05-18T12:13:04.491Z|00150|coverage|INFO|vconn_sent
>> 1.8/sec     1.500/sec        1.4300/sec   total: 31228
>> 2015-05-18T12:13:04.491Z|00151|coverage|INFO|util_xalloc
>> 1987.2/sec  1867.150/sec     1817.8008/sec   total: 37808073
>> 2015-05-18T12:13:04.491Z|00152|coverage|INFO|pstream_open
>> 0.0/sec     0.000/sec        0.0000/sec   total: 5
>> 2015-05-18T12:13:04.491Z|00153|coverage|INFO|stream_open
>> 0.0/sec     0.000/sec        0.0000/sec   total: 1
>> 2015-05-18T12:13:04.491Z|00154|coverage|INFO|rconn_queued
>> 1.2/sec     1.000/sec        0.9533/sec   total: 20815
>> 2015-05-18T12:13:04.491Z|00155|coverage|INFO|rconn_sent
>> 1.2/sec     1.000/sec        0.9533/sec   total: 20815
>> 2015-05-18T12:13:04.491Z|00156|coverage|INFO|poll_fd_wait
>> 537.2/sec   481.217/sec      462.1683/sec   total: 9762985
>> 2015-05-18T12:13:04.491Z|00157|coverage|INFO|poll_zero_timeout
>> 5.2/sec     4.333/sec        3.8581/sec   total: 83823
>> 2015-05-18T12:13:04.491Z|00158|coverage|INFO|txn_unchanged
>> 0.0/sec     0.000/sec        0.0000/sec   total: 11
>> 2015-05-18T12:13:04.491Z|00159|coverage|INFO|txn_incomplete
>> 8.8/sec     8.450/sec        8.3044/sec   total: 170811
>> 2015-05-18T12:13:04.491Z|00160|coverage|INFO|txn_success
>> 0.2/sec     0.200/sec        0.1989/sec   total: 4276
>> 2015-05-18T12:13:04.491Z|00161|coverage|INFO|netdev_get_stats
>> 1.6/sec     1.600/sec        1.5911/sec   total: 32833
>> 2015-05-18T12:13:04.491Z|00162|coverage|INFO|mac_learning_learned
>> 0.2/sec     0.133/sec        0.0686/sec   total: 1595
>> 2015-05-18T12:13:04.491Z|00163|coverage|INFO|mac_learning_expired
>> 0.4/sec     0.150/sec        0.0689/sec   total: 1518
>> 2015-05-18T12:13:04.491Z|00164|coverage|INFO|hmap_pathological
>> 0.0/sec     0.000/sec        0.0000/sec   total: 27
>> 2015-05-18T12:13:04.491Z|00165|coverage|INFO|hmap_expand
>> 102.4/sec    92.300/sec       88.6772/sec   total: 1853939
>> 2015-05-18T12:13:04.491Z|00166|coverage|INFO|flow_extract
>> 23.0/sec    19.883/sec       19.7067/sec   total: 416747
>> 2015-05-18T12:13:04.491Z|00167|coverage|INFO|dpif_port_add
>> 0.0/sec     0.000/sec        0.0000/sec   total: 6
>> 2015-05-18T12:13:04.491Z|00168|coverage|INFO|dpif_flow_flush
>> 0.0/sec     0.000/sec        0.0000/sec   total: 4
>> 2015-05-18T12:13:04.491Z|00169|coverage|INFO|dpif_flow_put
>> 0.0/sec     0.000/sec        0.0000/sec   total: 2
>> 2015-05-18T12:13:04.491Z|00170|coverage|INFO|dpif_flow_del
>> 0.0/sec     0.000/sec        0.0000/sec   total: 1
>> 2015-05-18T12:13:04.491Z|00171|coverage|INFO|dpif_execute
>> 0.0/sec     0.000/sec        0.0000/sec   total: 1
>> 2015-05-18T12:13:04.491Z|00172|coverage|INFO|xlate_actions
>> 38.8/sec    27.317/sec       22.9731/sec   total: 507271
>> 2015-05-18T12:13:04.491Z|00173|coverage|INFO|rev_reconfigure
>> 0.0/sec     0.000/sec        0.0000/sec   total: 20
>> 2015-05-18T12:13:04.491Z|00174|coverage|INFO|rev_port_toggled
>> 0.0/sec     0.000/sec        0.0000/sec   total: 3
>> 2015-05-18T12:13:04.491Z|00175|coverage|INFO|rev_flow_table
>> 0.0/sec     0.000/sec        0.0000/sec   total: 11
>> 2015-05-18T12:13:04.491Z|00176|coverage|INFO|rev_mac_learning
>> 0.6/sec     0.283/sec        0.1286/sec   total: 2813
>> 2015-05-18T12:13:04.491Z|00177|coverage|INFO|ofproto_flush
>> 0.0/sec     0.000/sec        0.0000/sec   total: 3
>> 2015-05-18T12:13:04.491Z|00178|coverage|INFO|ofproto_recv_openflow
>> 1.8/sec     1.500/sec        1.4300/sec   total: 31229
>> 2015-05-18T12:13:04.491Z|00179|coverage|INFO|ofproto_update_port
>> 0.0/sec     0.000/sec        0.0000/sec   total: 34
>> 2015-05-18T12:13:04.491Z|00180|coverage|INFO|bridge_reconfigure
>> 0.0/sec     0.000/sec        0.0000/sec   total: 15
>> 2015-05-18T12:13:04.491Z|00181|coverage|INFO|49 events never hit
>> 2015-05-18T14:59:05.014Z|00182|timeval|WARN|Unreasonably long 2019ms poll
>> interval (43448ms user, 50465ms system)
>> 2015-05-18T14:59:05.014Z|00183|timeval|WARN|faults: 840 minor, 0 major
>> 2015-05-18T14:59:05.014Z|00184|timeval|WARN|disk: 0 reads, 120 writes
>> 2015-05-18T14:59:05.014Z|00185|timeval|WARN|context switches: 2167382
>> voluntary, 529 involuntary
>> 2015-05-18T14:59:05.014Z|00186|coverage|INFO|Skipping details of duplicate
>> event coverage for hash=9e13ffc8
>> 2015-05-18T17:46:50.578Z|00187|timeval|WARN|Unreasonably long 2018ms poll
>> interval (56724ms user, 65745ms system)
>> 2015-05-18T17:46:50.578Z|00188|timeval|WARN|faults: 841 minor, 0 major
>> 2015-05-18T17:46:50.578Z|00189|timeval|WARN|disk: 0 reads, 128 writes
>> 2015-05-18T17:46:50.578Z|00190|timeval|WARN|context switches: 2853218
>> voluntary, 538 involuntary
>> 2015-05-18T17:46:50.578Z|00191|coverage|INFO|Skipping details of duplicate
>> event coverage for hash=9e13ffc8
>> 2015-05-18T20:29:36.092Z|00192|timeval|WARN|Unreasonably long 2018ms poll
>> interval (69561ms user, 80775ms system)
>> 2015-05-18T20:29:36.092Z|00193|timeval|WARN|faults: 841 minor, 0 major
>> 2015-05-18T20:29:36.092Z|00194|timeval|WARN|disk: 0 reads, 136 writes
>> 2015-05-18T20:29:36.092Z|00195|timeval|WARN|context switches: 3519924
>> voluntary, 544 involuntary
>> 2015-05-18T20:29:36.092Z|00196|coverage|INFO|Skipping details of duplicate
>> event coverage for hash=9e13ffc8
>> 2015-05-18T23:21:51.729Z|00197|timeval|WARN|Unreasonably long 2021ms poll
>> interval (83029ms user, 96728ms system)
>> 2015-05-18T23:21:51.729Z|00198|timeval|WARN|faults: 841 minor, 0 major
>> 2015-05-18T23:21:51.729Z|00199|timeval|WARN|disk: 0 reads, 144 writes
>> 2015-05-18T23:21:51.729Z|00200|timeval|WARN|context switches: 4224452
>> voluntary, 553 involuntary
>> 2015-05-18T23:21:51.729Z|00201|coverage|INFO|Skipping details of duplicate
>> event coverage for hash=9e13ffc8
>>
>> ________________________________
>>
>> WanLi Yang
>>
>> JD.COM
>>
>> *****************************************************************
>>
>>
>> From: Joe Stringer
>> Date: 2015-05-23 05:36
>> To: yangwanli
>> CC: bugs
>> Subject: Re: [ovs-discuss] |timeval|WARN|Unreasonably long 2020ms poll
>> interval (365882ms user, 464906ms system)
>> On 19 May 2015 at 03:32, yangwanli <yangwanli at jd.com> wrote:
>>> Hi,
>>> I'm a cloud computing R&D engineer. Use OVS as virtual switch, all
>>> virtual
>>> machines and containers communicate with OVS.
>>> I use OVS  VLAN.
>>> OpenvSwitch Version 2.3.1  (nothing change)
>>> CentOS 6.6
>>> In the process of I use, encounter the flowing problems.
>>> Need your help!
>>>
>>> 1、ovs-vswitchd.log always report warnings:
>>>
>>> 2015-05-17T00:52:50.463Z|00575|timeval|WARN|Unreasonably long 2020ms poll
>>> interval (365882ms user, 464906ms system)
>>> 2015-05-17T00:52:50.463Z|00576|timeval|WARN|faults: 708 minor, 0 major
>>> 2015-05-17T00:52:50.463Z|00577|timeval|WARN|disk: 0 reads, 392 writes
>>> 2015-05-17T00:52:50.463Z|00578|timeval|WARN|context switches: 19571142
>>> voluntary, 2080 involuntary
>>> 2015-05-17T00:52:50.463Z|00579|coverage|INFO|Skipping details of
>>> duplicate
>>> event coverage for hash=35d02982
>>> 2015-05-17T03:26:36.336Z|00580|timeval|WARN|Unreasonably long 2020ms poll
>>> interval (380306ms user, 483259ms system)
>>> 2015-05-17T03:26:36.336Z|00581|timeval|WARN|faults: 708 minor, 0 major
>>> 2015-05-17T03:26:36.336Z|00582|timeval|WARN|disk: 0 reads, 400 writes
>>> 2015-05-17T03:26:36.336Z|00583|timeval|WARN|context switches: 20299734
>>> voluntary, 2174 involuntary
>>> 2015-05-17T03:26:36.336Z|00584|coverage|INFO|Skipping details of
>>> duplicate
>>> event coverage for hash=35d02982
>>
>> The main loop of ovs-vswitchd usually expects to cycle at least once
>> per second. This message appears when this takes longer than usual,
>> which may indicate that ovs-vswitchd is under heavy load, or some sort
>> of bug.
>>
>>>
>>> 2、I also encounter this problem:
>>>     when I execute
>>>     ovs-ofctl dump-flows br-int
>>>    not always reponse,  but ovs-vsctl show is OK!
>>
>> The main thread of ovs-vswitchd assembles responses to OpenFlow
>> requests, and given the above messages it sounds like it's busy doing
>> other things. Ovs-vsctl on the other hand talks to ovsdb-server, which
>> looks like it is OK in this scenario.
>>
>> These both seem like symptoms, but it's not obvious to me what the
>> underlying cause is. Is there anything else in the logs? Is
>> ovs-vswitchd using a lot of CPU?
>>
>> Are you using the OVS kernel module from CentOS or one distributed
>> with the openvswitch package? ('modinfo openvswitch' should give a
>> hint here)
>>
>> On 19 May 2015 at 03:32, yangwanli <yangwanli at jd.com> wrote:
>>> Hi,
>>> I'm a cloud computing R&D engineer. Use OVS as virtual switch, all
>>> virtual
>>> machines and containers communicate with OVS.
>>> I use OVS  VLAN.
>>> OpenvSwitch Version 2.3.1  (nothing change)
>>> CentOS 6.6
>>> In the process of I use, encounter the flowing problems.
>>> Need your help!
>>>
>>> 1、ovs-vswitchd.log always report warnings:
>>>
>>> 2015-05-17T00:52:50.463Z|00575|timeval|WARN|Unreasonably long 2020ms poll
>>> interval (365882ms user, 464906ms system)
>>> 2015-05-17T00:52:50.463Z|00576|timeval|WARN|faults: 708 minor, 0 major
>>> 2015-05-17T00:52:50.463Z|00577|timeval|WARN|disk: 0 reads, 392 writes
>>> 2015-05-17T00:52:50.463Z|00578|timeval|WARN|context switches: 19571142
>>> voluntary, 2080 involuntary
>>> 2015-05-17T00:52:50.463Z|00579|coverage|INFO|Skipping details of
>>> duplicate
>>> event coverage for hash=35d02982
>>> 2015-05-17T03:26:36.336Z|00580|timeval|WARN|Unreasonably long 2020ms poll
>>> interval (380306ms user, 483259ms system)
>>> 2015-05-17T03:26:36.336Z|00581|timeval|WARN|faults: 708 minor, 0 major
>>> 2015-05-17T03:26:36.336Z|00582|timeval|WARN|disk: 0 reads, 400 writes
>>> 2015-05-17T03:26:36.336Z|00583|timeval|WARN|context switches: 20299734
>>> voluntary, 2174 involuntary
>>> 2015-05-17T03:26:36.336Z|00584|coverage|INFO|Skipping details of
>>> duplicate
>>> event coverage for hash=35d02982
>>>
>>> 2、I also encounter this problem:
>>>     when I execute
>>>     ovs-ofctl dump-flows br-int
>>>    not always reponse,  but ovs-vsctl show is OK!
>>>
>>> Thanks!
>>> WanLi Yang
>>>
>>>
>>> ________________________________
>>>
>>> WanLi Yang
>>>
>>> JD.COM
>>>
>>> *****************************************************************
>>>
>>>
>>> _______________________________________________
>>> discuss mailing list
>>> discuss at openvswitch.org
>>> http://openvswitch.org/mailman/listinfo/discuss
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://openvswitch.org/pipermail/ovs-discuss/attachments/20150528/51feaca5/attachment-0002.html>


More information about the discuss mailing list