[ovs-discuss] [OVN] ovn-controller takes 100% cpu while no changes in sb-db
Han Zhou
zhouhan at gmail.com
Fri Aug 7 22:38:30 UTC 2020
The change in external-ids is not monitored by ovn-controller any more
after version 20.06. Probably you are still using an older version?
On Fri, Aug 7, 2020 at 3:11 PM Tony Liu <tonyliu0592 at hotmail.com> wrote:
> Is the changes made by Neutron liveness-check in SB properly handled
> by inc_proc_eng?
>
> Given my testing scale, even the frequency is lowered by that fix,
> ovn-controller still takes quite lot of CPU and time to compute SB.
>
>
> Thanks!
>
> Tony
> > -----Original Message-----
> > From: Tony Liu <tonyliu0592 at hotmail.com>
> > Sent: Friday, August 7, 2020 1:25 PM
> > To: Tony Liu <tonyliu0592 at hotmail.com>; Han Zhou <zhouhan at gmail.com>
> > Cc: ovs-dev <ovs-dev at openvswitch.org>; ovs-discuss <ovs-
> > discuss at openvswitch.org>
> > Subject: RE: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while no
> > changes in sb-db
> >
> > Thanks for the hint!
> > ========
> > 2020-08-07T19:44:18.019Z|616614|jsonrpc|DBG|tcp:10.6.20.84:6642:
> > received notification, method="update3",
> > params=[["monid","OVN_Southbound"],"5002cb22-13e5-490a-9a64-
> > 5d48914138ca",{"Chassis":{"0390b621-152b-48a0-a3d1-
> > 2973c0b823cc":{"modify":{"external_ids":["map",[["neutron:liveness_check
> > _at","2020-08-07T19:44:07.130233+00:00"]]]}}}}]
> > ========
> >
> > Nailed it...
> > https://bugs.launchpad.net/neutron/+bug/1883554
> >
> >
> > Tony
> > > -----Original Message-----
> > > From: dev <ovs-dev-bounces at openvswitch.org> On Behalf Of Tony Liu
> > > Sent: Friday, August 7, 2020 1:14 PM
> > > To: Han Zhou <zhouhan at gmail.com>
> > > Cc: ovs-dev <ovs-dev at openvswitch.org>; ovs-discuss <ovs-
> > > discuss at openvswitch.org>
> > > Subject: Re: [ovs-dev] [ovs-discuss] [OVN] ovn-controller takes 100%
> > > cpu while no changes in sb-db
> > >
> > >
> > > Here is the outpu.
> > > ================
> > > [root at gateway-1 ~]# docker exec ovn_controller ovs-appctl -t
> > > /run/ovn/ovn-controller.6.ctl coverage/show Event coverage, avg rate
> > > over last: 5 seconds, last minute, last hour, hash=e70a83c8:
> > > lflow_run 0.0/sec 0.083/sec 0.0725/sec
> > > total: 295
> > > miniflow_malloc 0.0/sec 44356.817/sec 44527.3975/sec
> > > total: 180635403
> > > hindex_pathological 0.0/sec 0.000/sec 0.0000/sec
> > > total: 7187
> > > hindex_expand 0.0/sec 0.000/sec 0.0000/sec
> > > total: 17
> > > hmap_pathological 0.0/sec 4.167/sec 4.1806/sec
> > > total: 25091
> > > hmap_expand 0.0/sec 5366.500/sec 5390.0800/sec
> > > total: 23680738
> > > txn_unchanged 0.0/sec 0.300/sec 0.3175/sec
> > > total: 11024
> > > txn_incomplete 0.0/sec 0.100/sec 0.0836/sec
> > > total: 974
> > > txn_success 0.0/sec 0.033/sec 0.0308/sec
> > > total: 129
> > > txn_try_again 0.0/sec 0.000/sec 0.0003/sec
> > > total: 1
> > > poll_create_node 0.4/sec 1.933/sec 1.9575/sec
> > > total: 55611
> > > poll_zero_timeout 0.0/sec 0.067/sec 0.0556/sec
> > > total: 241
> > > rconn_queued 0.0/sec 0.050/sec 0.0594/sec
> > > total: 1208720
> > > rconn_sent 0.0/sec 0.050/sec 0.0594/sec
> > > total: 1208720
> > > seq_change 0.2/sec 0.783/sec 0.7492/sec
> > > total: 13962
> > > pstream_open 0.0/sec 0.000/sec 0.0000/sec
> > > total: 1
> > > stream_open 0.0/sec 0.000/sec 0.0003/sec
> > > total: 5
> > > unixctl_received 0.0/sec 0.000/sec 0.0011/sec
> > > total: 4
> > > unixctl_replied 0.0/sec 0.000/sec 0.0011/sec
> > > total: 4
> > > util_xalloc 0.8/sec 1396586.967/sec 240916.6047/sec
> > > total: 5834154064
> > > vconn_open 0.0/sec 0.000/sec 0.0000/sec
> > > total: 2
> > > vconn_received 0.0/sec 0.050/sec 0.0594/sec
> > > total: 632
> > > vconn_sent 0.0/sec 0.050/sec 0.0494/sec
> > > total: 1213248
> > > netlink_received 0.0/sec 0.300/sec 0.2900/sec
> > > total: 1188
> > > netlink_recv_jumbo 0.0/sec 0.083/sec 0.0725/sec
> > > total: 296
> > > netlink_sent 0.0/sec 0.300/sec 0.2900/sec
> > > total: 1188
> > > cmap_expand 0.0/sec 0.000/sec 0.0000/sec
> > > total: 3
> > > 82 events never hit
> > > [root at gateway-1 ~]# docker exec ovn_controller ovs-appctl -t
> > > /run/ovn/ovn-controller.6.ctl coverage/show Event coverage, avg rate
> > > over last: 5 seconds, last minute, last hour, hash=d0107601:
> > > lflow_run 0.2/sec 0.083/sec 0.0717/sec
> > > total: 296
> > > miniflow_malloc 122834.2/sec 51180.917/sec 43930.2869/sec
> > > total: 181249574
> > > hindex_pathological 0.0/sec 0.000/sec 0.0000/sec
> > > total: 7187
> > > hindex_expand 0.0/sec 0.000/sec 0.0000/sec
> > > total: 17
> > > hmap_pathological 13.2/sec 4.967/sec 4.1264/sec
> > > total: 25157
> > > hmap_expand 14982.2/sec 6205.067/sec 5317.9547/sec
> > > total: 23755649
> > > txn_unchanged 1.4/sec 0.400/sec 0.3144/sec
> > > total: 11031
> > > txn_incomplete 0.4/sec 0.117/sec 0.0825/sec
> > > total: 976
> > > txn_success 0.2/sec 0.050/sec 0.0306/sec
> > > total: 130
> > > txn_try_again 0.0/sec 0.000/sec 0.0003/sec
> > > total: 1
> > > poll_create_node 7.6/sec 2.467/sec 1.9353/sec
> > > total: 55649
> > > poll_zero_timeout 0.4/sec 0.100/sec 0.0547/sec
> > > total: 243
> > > rconn_queued 0.4/sec 0.083/sec 0.0592/sec
> > > total: 1208722
> > > rconn_sent 0.4/sec 0.083/sec 0.0592/sec
> > > total: 1208722
> > > seq_change 2.2/sec 0.900/sec 0.7394/sec
> > > total: 13973
> > > pstream_open 0.0/sec 0.000/sec 0.0000/sec
> > > total: 1
> > > stream_open 0.0/sec 0.000/sec 0.0003/sec
> > > total: 5
> > > unixctl_received 0.2/sec 0.017/sec 0.0014/sec
> > > total: 5
> > > unixctl_replied 0.2/sec 0.017/sec 0.0014/sec
> > > total: 5
> > > util_xalloc 3870474.6/sec 1611767.833/sec
> > 222119.2950/sec
> > > total: 5853506437
> > > vconn_open 0.0/sec 0.000/sec 0.0000/sec
> > > total: 2
> > > vconn_received 0.4/sec 0.083/sec 0.0592/sec
> > > total: 634
> > > vconn_sent 0.4/sec 0.083/sec 0.0492/sec
> > > total: 1213250
> > > netlink_received 0.8/sec 0.333/sec 0.2861/sec
> > > total: 1192
> > > netlink_recv_jumbo 0.2/sec 0.083/sec 0.0717/sec
> > > total: 297
> > > netlink_sent 0.8/sec 0.333/sec 0.2861/sec
> > > total: 1192
> > > cmap_expand 0.0/sec 0.000/sec 0.0000/sec
> > > total: 3
> > > 82 events never hit
> > > [root at gateway-1 ~]# docker exec ovn_controller ovs-appctl -t
> > > /run/ovn/ovn-controller.6.ctl coverage/show Event coverage, avg rate
> > > over last: 5 seconds, last minute, last hour, hash=069e0b25:
> > > lflow_run 0.2/sec 0.100/sec 0.0719/sec
> > > total: 297
> > > miniflow_malloc 122834.2/sec 61417.100/sec 44100.8900/sec
> > > total: 181863745
> > > hindex_pathological 0.0/sec 0.000/sec 0.0000/sec
> > > total: 7187
> > > hindex_expand 0.0/sec 0.000/sec 0.0000/sec
> > > total: 17
> > > hmap_pathological 10.0/sec 5.800/sec 4.1403/sec
> > > total: 25207
> > > hmap_expand 14756.6/sec 7434.783/sec 5338.4500/sec
> > > total: 23829432
> > > txn_unchanged 0.4/sec 0.433/sec 0.3150/sec
> > > total: 11033
> > > txn_incomplete 0.0/sec 0.117/sec 0.0825/sec
> > > total: 976
> > > txn_success 0.0/sec 0.050/sec 0.0306/sec
> > > total: 130
> > > txn_try_again 0.0/sec 0.000/sec 0.0003/sec
> > > total: 1
> > > poll_create_node 2.6/sec 2.650/sec 1.9389/sec
> > > total: 55662
> > > poll_zero_timeout 0.0/sec 0.100/sec 0.0547/sec
> > > total: 243
> > > rconn_queued 0.0/sec 0.083/sec 0.0592/sec
> > > total: 1208722
> > > rconn_sent 0.0/sec 0.083/sec 0.0592/sec
> > > total: 1208722
> > > seq_change 1.4/sec 1.000/sec 0.7414/sec
> > > total: 13980
> > > pstream_open 0.0/sec 0.000/sec 0.0000/sec
> > > total: 1
> > > stream_open 0.0/sec 0.000/sec 0.0003/sec
> > > total: 5
> > > unixctl_received 0.2/sec 0.033/sec 0.0017/sec
> > > total: 6
> > > unixctl_replied 0.2/sec 0.033/sec 0.0017/sec
> > > total: 6
> > > util_xalloc 3864890.0/sec 1933841.933/sec
> > 227487.1978/sec
> > > total: 5872830887
> > > vconn_open 0.0/sec 0.000/sec 0.0000/sec
> > > total: 2
> > > vconn_received 0.0/sec 0.083/sec 0.0592/sec
> > > total: 634
> > > vconn_sent 0.0/sec 0.083/sec 0.0492/sec
> > > total: 1213250
> > > netlink_received 0.8/sec 0.400/sec 0.2872/sec
> > > total: 1196
> > > netlink_recv_jumbo 0.2/sec 0.100/sec 0.0719/sec
> > > total: 298
> > > netlink_sent 0.8/sec 0.400/sec 0.2872/sec
> > > total: 1196
> > > cmap_expand 0.0/sec 0.000/sec 0.0000/sec
> > > total: 3
> > > 82 events never hit
> > > [root at gateway-1 ~]# docker exec ovn_controller ovs-appctl -t
> > > /run/ovn/ovn-controller.6.ctl coverage/show Event coverage, avg rate
> > > over last: 5 seconds, last minute, last hour, hash=069e0b25:
> > > lflow_run 0.0/sec 0.083/sec 0.0719/sec
> > > total: 297
> > > miniflow_malloc 0.0/sec 51180.917/sec 44100.8900/sec
> > > total: 181863745
> > > hindex_pathological 0.0/sec 0.000/sec 0.0000/sec
> > > total: 7187
> > > hindex_expand 0.0/sec 0.000/sec 0.0000/sec
> > > total: 17
> > > hmap_pathological 2.4/sec 4.967/sec 4.1436/sec
> > > total: 25219
> > > hmap_expand 171.8/sec 6205.350/sec 5338.6886/sec
> > > total: 23830291
> > > txn_unchanged 1.2/sec 0.433/sec 0.3167/sec
> > > total: 11039
> > > txn_incomplete 0.0/sec 0.100/sec 0.0825/sec
> > > total: 976
> > > txn_success 0.0/sec 0.033/sec 0.0306/sec
> > > total: 130
> > > txn_try_again 0.0/sec 0.000/sec 0.0003/sec
> > > total: 1
> > > poll_create_node 4.6/sec 2.583/sec 1.9453/sec
> > > total: 55685
> > > poll_zero_timeout 0.0/sec 0.067/sec 0.0547/sec
> > > total: 243
> > > rconn_queued 0.2/sec 0.083/sec 0.0594/sec
> > > total: 1208723
> > > rconn_sent 0.2/sec 0.083/sec 0.0594/sec
> > > total: 1208723
> > > seq_change 1.0/sec 0.933/sec 0.7428/sec
> > > total: 13985
> > > pstream_open 0.0/sec 0.000/sec 0.0000/sec
> > > total: 1
> > > stream_open 0.0/sec 0.000/sec 0.0003/sec
> > > total: 5
> > > unixctl_received 0.2/sec 0.050/sec 0.0019/sec
> > > total: 7
> > > unixctl_replied 0.2/sec 0.050/sec 0.0019/sec
> > > total: 7
> > > util_xalloc 4345.0/sec 1611785.933/sec 227493.2325/sec
> > > total: 5872852612
> > > vconn_open 0.0/sec 0.000/sec 0.0000/sec
> > > total: 2
> > > vconn_received 0.2/sec 0.083/sec 0.0594/sec
> > > total: 635
> > > vconn_sent 0.2/sec 0.083/sec 0.0494/sec
> > > total: 1213251
> > > netlink_received 0.0/sec 0.333/sec 0.2872/sec
> > > total: 1196
> > > netlink_recv_jumbo 0.0/sec 0.083/sec 0.0719/sec
> > > total: 298
> > > netlink_sent 0.0/sec 0.333/sec 0.2872/sec
> > > total: 1196
> > > cmap_expand 0.0/sec 0.000/sec 0.0000/sec
> > > total: 3
> > > 82 events never hit
> > > ================
> > >
> > >
> > > Thanks!
> > >
> > > Tony
> > > > -----Original Message-----
> > > > From: Han Zhou <zhouhan at gmail.com>
> > > > Sent: Friday, August 7, 2020 1:09 PM
> > > > To: Tony Liu <tonyliu0592 at hotmail.com>
> > > > Cc: ovs-discuss <ovs-discuss at openvswitch.org>; ovs-dev <ovs-
> > > > dev at openvswitch.org>
> > > > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu while
> > > > no changes in sb-db
> > > >
> > > >
> > > >
> > > > On Fri, Aug 7, 2020 at 12:57 PM Tony Liu <tonyliu0592 at hotmail.com
> > > > <mailto:tonyliu0592 at hotmail.com> > wrote:
> > > >
> > > >
> > > > Enabled debug logging, there are tons of messages.
> > > > Note there are 4353 datapath bindings and 13078 port bindings in
> SB.
> > > > 4097 LS, 8470 LSP, 256 LR and 4352 LRP in NB. Every 16 LS connect
> > > to
> > > > a router. All routers connect to the external network.
> > > >
> > > > ovn-controller on compute node is good. The ovn-controller on
> > > gateway
> > > > node is taking 100% cpu. It's probably related to the ports on the
> > > > external network? Any specific messages I need to check?
> > > >
> > > > Any hint to look into it is appreciated!
> > > >
> > > >
> > > >
> > > >
> > > > If it happens only on gateway, it may be related to ARP handling.
> > > > Could you share the output of ovn-appctl -t ovn-controller
> > > > coverage/show, with
> > > > 2 - 3 runs in 5s interval?
> > > > For the debug log, I'd first check if there is any OVSDB
> > > > notification from SB DB, and if yes, what are the changes.
> > > >
> > > >
> > > >
> > > > Thanks!
> > > >
> > > > Tony
> > > > > -----Original Message-----
> > > > > From: Han Zhou <zhouhan at gmail.com <mailto:zhouhan at gmail.com> >
> > > > > Sent: Friday, August 7, 2020 12:39 PM
> > > > > To: Tony Liu <tonyliu0592 at hotmail.com
> > > > <mailto:tonyliu0592 at hotmail.com> >
> > > > > Cc: ovs-discuss <ovs-discuss at openvswitch.org <mailto:ovs-
> > > > discuss at openvswitch.org> >; ovs-dev <ovs-
> > > > > dev at openvswitch.org <mailto:dev at openvswitch.org> >
> > > > > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes 100% cpu
> > > > while no
> > > > > changes in sb-db
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Aug 7, 2020 at 12:35 PM Tony Liu <
> tonyliu0592 at hotmail.com
> > > > <mailto:tonyliu0592 at hotmail.com>
> > > > > <mailto:tonyliu0592 at hotmail.com
> > > > <mailto:tonyliu0592 at hotmail.com> > > wrote:
> > > > >
> > > > >
> > > > > Inline...
> > > > >
> > > > > Thanks!
> > > > >
> > > > > Tony
> > > > > > -----Original Message-----
> > > > > > From: Han Zhou <zhouhan at gmail.com
> > > > <mailto:zhouhan at gmail.com> <mailto:zhouhan at gmail.com
> > > > <mailto:zhouhan at gmail.com> > >
> > > > > > Sent: Friday, August 7, 2020 12:29 PM
> > > > > > To: Tony Liu <tonyliu0592 at hotmail.com
> > > > <mailto:tonyliu0592 at hotmail.com>
> > > > > <mailto:tonyliu0592 at hotmail.com
> > > > <mailto:tonyliu0592 at hotmail.com> > >
> > > > > > Cc: ovs-discuss <ovs-discuss at openvswitch.org <mailto:
> ovs-
> > > > discuss at openvswitch.org> <mailto:ovs- <mailto:ovs->
> > > > > discuss at openvswitch.org <mailto:discuss at openvswitch.org> > >;
> > > > ovs-dev <ovs-
> > > > > > dev at openvswitch.org <mailto:dev at openvswitch.org>
> > > > <mailto:dev at openvswitch.org <mailto:dev at openvswitch.org> > >
> > > > > > Subject: Re: [ovs-discuss] [OVN] ovn-controller takes
> 100%
> > > > cpu
> > > > > while no
> > > > > > changes in sb-db
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Aug 7, 2020 at 12:19 PM Tony Liu
> > > > <tonyliu0592 at hotmail.com <mailto:tonyliu0592 at hotmail.com>
> > > > > <mailto:tonyliu0592 at hotmail.com <mailto:tonyliu0592 at hotmail.com>
> >
> > > > > > <mailto:tonyliu0592 at hotmail.com
> > > > <mailto:tonyliu0592 at hotmail.com>
> > > > > <mailto:tonyliu0592 at hotmail.com
> > > > <mailto:tonyliu0592 at hotmail.com> > > > wrote:
> > > > > >
> > > > > >
> > > > > > ovn-controller is using UNIX socket connecting to
> > > > local
> > > > > ovsdb-
> > > > > > server.
> > > > > >
> > > > > > From the log you were showing, you were using
> > > > tcp:127.0.0.1:6640 <http://127.0.0.1:6640>
> > > > > <http://127.0.0.1:6640>
> > > > >
> > > > > Sorry, what I meant was, given your advice, I just made the
> > > > change
> > > > > for
> > > > > ovn-controller to use UNIX socket.
> > > > >
> > > > >
> > > > >
> > > > > Oh, I see, no worries.
> > > > >
> > > > >
> > > > > > <http://127.0.0.1:6640> to connect the local ovsdb.
> > > > > > > 2020-08-
> > > > > 07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640
> > > > <http://127.0.0.1:6640>
> > > > > <http://127.0.0.1:6640>
> > > > > > > <http://127.0.0.1:6640> <http://127.0.0.1:6640> :
> > > > connection
> > > > > dropped
> > > > > > > (Broken pipe)
> > > > > >
> > > > > >
> > > > > > Inactivity probe doesn't seem to be the cause of
> > > > high cpu
> > > > > usage.
> > > > > >
> > > > > > The wakeup on connection to sb-db is always
> > > > followed by a
> > > > > > "unreasonably
> > > > > > long" warning. I guess the pollin event loop is
> > > > stuck for
> > > > > too long,
> > > > > > like
> > > > > > 10s as below.
> > > > > > ========
> > > > > > 2020-08-
> > > > 07T18:46:49.301Z|00296|poll_loop|INFO|wakeup due to
> > > > > [POLLIN]
> > > > > > on fd 19 (10.6.20.91:60712 <http://10.6.20.91:60712>
> > > > <http://10.6.20.91:60712>
> > > > > <http://10.6.20.91:60712> <->10.6.20.86:6642
> > > > <http://10.6.20.86:6642> <http://10.6.20.86:6642>
> > > > > > <http://10.6.20.86:6642> ) at lib/stream-fd.c:157 (99%
> > > > CPU usage)
> > > > > > 2020-08-
> > > > 07T18:46:59.460Z|00297|timeval|WARN|Unreasonably
> > > > > long
> > > > > > 10153ms poll interval (10075ms user, 1ms system)
> > > > > > ========
> > > > > >
> > > > > > Could that stuck loop be the cause of high cpu
> > > > usage?
> > > > > > What is it polling in?
> > > > > > Why is it stuck, waiting for message from sb-db?
> > > > > > Isn't it supposed to release the cpu while waiting?
> > > > > >
> > > > > >
> > > > > >
> > > > > > This log means there are messages received from
> > > > 10.6.20.86:6642 <http://10.6.20.86:6642>
> > > > > <http://10.6.20.86:6642>
> > > > > > <http://10.6.20.86:6642> (the SB DB). Is there SB
> change?
> > > > The
> > > > > CPU is
> > > > > > spent on handling the SB change. Some type of SB changes
> > > > are not
> > > > > handled
> > > > > > incrementally.
> > > > >
> > > > > SB update is driven by ovn-northd in case anything changed
> > > > in NB,
> > > > > and ovn-controller in case anything changed on chassis. No,
> > > > there
> > > > > is nothing changed in NB, neither chassis.
> > > > >
> > > > > Should I bump logging level up to dbg? Is that going to
> > > > show me
> > > > > what messages ovn-controller is handling?
> > > > >
> > > > >
> > > > >
> > > > > Yes, debug log should show the details.
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > Tony
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Han Zhou <zhouhan at gmail.com
> > > > <mailto:zhouhan at gmail.com>
> > > > > <mailto:zhouhan at gmail.com <mailto:zhouhan at gmail.com> >
> > > > <mailto:zhouhan at gmail.com <mailto:zhouhan at gmail.com>
> > > > > <mailto:zhouhan at gmail.com <mailto:zhouhan at gmail.com> > > >
> > > > > > > Sent: Friday, August 7, 2020 10:32 AM
> > > > > > > To: Tony Liu <tonyliu0592 at hotmail.com
> > > > <mailto:tonyliu0592 at hotmail.com>
> > > > > <mailto:tonyliu0592 at hotmail.com <mailto:tonyliu0592 at hotmail.com>
> >
> > > > > > <mailto:tonyliu0592 at hotmail.com
> > > > <mailto:tonyliu0592 at hotmail.com>
> > > > > <mailto:tonyliu0592 at hotmail.com
> > > > <mailto:tonyliu0592 at hotmail.com> > > >
> > > > > > > Cc: ovs-discuss <ovs-discuss at openvswitch.org
> > > > <mailto:ovs-discuss at openvswitch.org> <mailto:ovs- <mailto:ovs->
> > > > > discuss at openvswitch.org <mailto:discuss at openvswitch.org> >
> > > > <mailto:ovs- <mailto:ovs-> <mailto:ovs- <mailto:ovs-> >
> > > > > > discuss at openvswitch.org <mailto:discuss at openvswitch.org>
> > > > <mailto:discuss at openvswitch.org
> > <mailto:discuss at openvswitch.org> > > >;
> > > > > ovs-dev <ovs-
> > > > > > > dev at openvswitch.org <mailto:dev at openvswitch.org>
> > > > <mailto:dev at openvswitch.org <mailto:dev at openvswitch.org> >
> > > > > <mailto:dev at openvswitch.org <mailto:dev at openvswitch.org>
> > > > <mailto:dev at openvswitch.org <mailto:dev at openvswitch.org> > > >
> > > > > > > Subject: Re: [ovs-discuss] [OVN] ovn-controller
> > > > takes 100%
> > > > > cpu
> > > > > > while no
> > > > > > > changes in sb-db
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Aug 7, 2020 at 10:05 AM Tony Liu
> > > > > <tonyliu0592 at hotmail.com <mailto:tonyliu0592 at hotmail.com>
> > > > <mailto:tonyliu0592 at hotmail.com <mailto:tonyliu0592 at hotmail.com> >
> > > > > > <mailto:tonyliu0592 at hotmail.com
> > > > <mailto:tonyliu0592 at hotmail.com> <mailto:tonyliu0592 at hotmail.com
> > > > <mailto:tonyliu0592 at hotmail.com> > >
> > > > > > > <mailto:tonyliu0592 at hotmail.com
> > > > <mailto:tonyliu0592 at hotmail.com>
> > > > > <mailto:tonyliu0592 at hotmail.com <mailto:tonyliu0592 at hotmail.com>
> >
> > > > > > <mailto:tonyliu0592 at hotmail.com
> > > > <mailto:tonyliu0592 at hotmail.com>
> > > > > <mailto:tonyliu0592 at hotmail.com
> > > > <mailto:tonyliu0592 at hotmail.com> > > > > wrote:
> > > > > > >
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Here are some logging snippets from ovn-
> > > > controller.
> > > > > > > ========
> > > > > > > 2020-08-
> > > > > 07T16:38:04.020Z|29250|timeval|WARN|Unreasonably
> > > > > > long
> > > > > > > 8954ms poll interval (8895ms user, 0ms system)
> > > > > > > ========
> > > > > > > What's that mean? Is it harmless?
> > > > > > >
> > > > > > > ========
> > > > > > > 2020-08-
> > > > 07T16:38:04.021Z|29251|timeval|WARN|context
> > > > > > switches: 0
> > > > > > > voluntary, 6 involuntary
> > > > > > > 2020-08-
> > > > > 07T16:38:04.022Z|29252|poll_loop|INFO|wakeup due to
> > > > > > [POLLIN]
> > > > > > > on fd 19 (10.6.20.91:60398
> > > > <http://10.6.20.91:60398> <http://10.6.20.91:60398>
> > > > > <http://10.6.20.91:60398>
> > > > > > <http://10.6.20.91:60398> <->10.6.20.86:6642
> > > > <http://10.6.20.86:6642>
> > > > > <http://10.6.20.86:6642> <http://10.6.20.86:6642>
> > > > > > > <http://10.6.20.86:6642> ) at
> lib/stream-fd.c:157
> > > > (99%
> > > > > CPU usage)
> > > > > > > ========
> > > > > > > Is this wakeup caused by changes in sb-db?
> > > > > > > Why is ovn-controller so busy?
> > > > > > >
> > > > > > > ========
> > > > > > > 2020-08-
> > > > > > 07T16:38:04.022Z|29253|reconnect|WARN|tcp:127.0.0.1:6640
> > > > <http://127.0.0.1:6640>
> > > > > <http://127.0.0.1:6640>
> > > > > > <http://127.0.0.1:6640>
> > > > > > > <http://127.0.0.1:6640> : connection dropped
> > > > (Broken pipe)
> > > > > > > ========
> > > > > > > Connection to local ovsdb-server is
> dropped.
> > > > > > > Is this caused by the timeout of inactivity
> > > > probe?
> > > > > > >
> > > > > > > ========
> > > > > > > 2020-08-
> > > > > 07T16:38:04.035Z|29254|poll_loop|INFO|wakeup due to
> > > > > > [POLLIN]
> > > > > > > on fd 20 (<->/var/run/openvswitch/br-int.mgmt) at
> > > > > lib/stream-
> > > > > > fd.c:157
> > > > > > > (99% CPU usage)
> > > > > > > ========
> > > > > > > What causes this wakeup?
> > > > > > >
> > > > > > > ========
> > > > > > > 2020-08-
> > > > > 07T16:38:04.048Z|29255|poll_loop|INFO|wakeup due to
> > > > > > 0-ms
> > > > > > > timeout at lib/ovsdb-idl.c:5391 (99% CPU usage)
> > > > > > > ========
> > > > > > > What's this 0-ms wakeup mean?
> > > > > > >
> > > > > > > ========
> > > > > > > 2020-08-
> > > > > 07T16:38:05.022Z|29256|poll_loop|INFO|wakeup due to
> > > > > > 962-ms
> > > > > > > timeout at lib/reconnect.c:643 (99% CPU usage)
> > > > > > > 2020-08-
> > > > > > 07T16:38:05.023Z|29257|reconnect|INFO|tcp:127.0.0.1:6640
> > > > <http://127.0.0.1:6640>
> > > > > <http://127.0.0.1:6640>
> > > > > > <http://127.0.0.1:6640>
> > > > > > > <http://127.0.0.1:6640> : connecting...
> > > > > > > 2020-08-
> > > > > 07T16:38:05.041Z|29258|poll_loop|INFO|wakeup due to
> > > > > > > [POLLOUT] on fd 14 (127.0.0.1:51478
> > > > <http://127.0.0.1:51478>
> > > > > <http://127.0.0.1:51478> <http://127.0.0.1:51478>
> > > > > > <http://127.0.0.1:51478> <-
> > > > > > > >127.0.0.1:6640 <http://127.0.0.1:6640>
> > > > <http://127.0.0.1:6640>
> > > > > <http://127.0.0.1:6640> <http://127.0.0.1:6640> )
> > > > > > at lib/stream-fd.c:153 (99%
> > > > > > > CPU usage)
> > > > > > > 2020-08-
> > > > > > 07T16:38:05.041Z|29259|reconnect|INFO|tcp:127.0.0.1:6640
> > > > <http://127.0.0.1:6640>
> > > > > <http://127.0.0.1:6640>
> > > > > > <http://127.0.0.1:6640>
> > > > > > > <http://127.0.0.1:6640> : connected
> > > > > > > ================
> > > > > > > Retry to connect to local ovsdb-server. A
> > > > pollout
> > > > > event is
> > > > > > > triggered
> > > > > > > right after connection is established.
> > > > What's
> > > > > poolout?
> > > > > > >
> > > > > > > ovn-controller is taking 100% CPU now, and
> > > > there is
> > > > > no
> > > > > > changes in
> > > > > > > sb-db (not busy). It seems that it's busy
> > > > with
> > > > > local ovsdb-
> > > > > > server
> > > > > > > or vswitchd. I'd like to understand why
> > > > ovn-
> > > > > controller is
> > > > > > so busy?
> > > > > > > All inactivity probe intervals are set to
> > > > 30s.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Is there change from the local ovsdb? You can
> > > > enable dbg
> > > > > log to
> > > > > > see what
> > > > > > > is happening.
> > > > > > > For the local ovsdb probe, I have mentioned in
> > > > the other
> > > > > thread
> > > > > > that
> > > > > > > UNIX socket is recommended (instead of tcp
> > > > 127.0.0.1).
> > > > > Using UNIX
> > > > > > socket
> > > > > > > disables probe by default.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Han
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> > > _______________________________________________
> > > dev mailing list
> > > dev at openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20200807/874f32d7/attachment-0001.html>
More information about the discuss
mailing list