[ovs-discuss] High ovs-vswitchd CPU usage cause Neutron VRRP switch

yedc yedongcan at awcloud.com
Tue Jul 27 07:01:31 UTC 2021


Hi OVS-team,

We had just met a problem in our customer environments, Neutron HA router active/standby had switched after high ovs-vswitchd cpu usage.

From the ovs-vswitchd log:
2021-07-22T23:35:00.282Z|420006320|bridge|INFO|bridge br-int: deleted interface tapb3eee529-02 on port 1017
2021-07-22T23:35:00.318Z|420006321|bridge|INFO|bridge br-int: added interface tapc257bed3-fc on port 1024
2021-07-22T23:35:00.366Z|420006322|bridge|INFO|bridge br-int: added interface tap7c1963c9-32 on port 934
2021-07-22T23:35:00.403Z|420006323|bridge|INFO|bridge br-int: added interface tap5de0f4ed-70 on port 1094
2021-07-22T23:35:00.441Z|420006324|bridge|INFO|bridge br-int: added interface tap76421ff2-e1 on port 990
2021-07-22T23:35:00.487Z|420006325|bridge|INFO|bridge br-int: added interface tap71a4dae8-75 on port 1084
2021-07-22T23:35:00.516Z|420006326|bridge|INFO|bridge br-int: added interface tapd3177fe4-19 on port 956
2021-07-22T23:35:00.558Z|420006327|bridge|INFO|bridge br-int: added interface tapb3eee529-02 on port 1017
2021-07-22T23:35:00.880Z|420006328|bridge|INFO|bridge br-int: deleted interface tap64c1f4fe-8b on port 1024
2021-07-22T23:35:00.881Z|420006329|bridge|INFO|bridge br-int: deleted interface tap29b6ec20-65 on port 990
2021-07-22T23:35:00.882Z|420006330|bridge|INFO|bridge br-int: deleted interface tap7f5e65fd-59 on port 1084
2021-07-22T23:35:00.883Z|420006331|bridge|INFO|bridge br-int: deleted interface tape3046139-8c on port 934
2021-07-22T23:35:00.884Z|420006332|bridge|INFO|bridge br-int: deleted interface tap9eba01c2-54 on port 1017
2021-07-22T23:35:00.885Z|420006333|bridge|INFO|bridge br-int: deleted interface tap26652576-4e on port 1094
2021-07-22T23:35:00.885Z|420006334|bridge|INFO|bridge br-int: deleted interface tap7799343f-22 on port 956
2021-07-22T23:35:00.983Z|420006335|bridge|INFO|bridge br-int: added interface tap64c1f4fe-8b on port 1024
2021-07-22T23:35:01.056Z|420006336|bridge|INFO|bridge br-int: added interface tap29b6ec20-65 on port 990
2021-07-22T23:35:01.101Z|420006337|bridge|INFO|bridge br-int: added interface tap7f5e65fd-59 on port 1084
2021-07-22T23:35:01.206Z|420006338|bridge|INFO|bridge br-int: added interface tape3046139-8c on port 934
2021-07-22T23:35:01.297Z|420006339|bridge|INFO|bridge br-int: added interface tap9eba01c2-54 on port 1017
2021-07-22T23:35:01.657Z|420006340|bridge|INFO|bridge br-int: added interface tap26652576-4e on port 1094
2021-07-22T23:35:01.733Z|02155|ofproto_dpif_upcall(handler228)|WARN|Dropped 9793 log messages in last 62 seconds (most recently, 2 seconds ago) due to excessive rate
2021-07-22T23:35:01.733Z|02156|ofproto_dpif_upcall(handler228)|WARN|upcall: datapath flow limit reached
2021-07-22T23:35:02.395Z|420006341|bridge|INFO|bridge br-int: added interface tap7799343f-22 on port 956
2021-07-22T23:35:03.363Z|00328|ovs_rcu(urcu9)|WARN|blocked 1000 ms waiting for handler196 to quiesce
2021-07-22T23:35:03.398Z|420006342|ovs_rcu|WARN|blocked 1000 ms waiting for handler196 to quiesce
2021-07-22T23:35:03.413Z|138790|timeval(handler196)|WARN|Unreasonably long 1049ms poll interval (0ms user, 763ms system)
2021-07-22T23:35:03.413Z|138791|timeval(handler196)|WARN|context switches: 236 voluntary, 107 involuntary
2021-07-22T23:35:03.413Z|138792|coverage(handler196)|INFO|Skipping details of duplicate event coverage for hash=ec24845d
2021-07-22T23:35:05.222Z|420006343|ovs_rcu|WARN|blocked 1000 ms waiting for handler196 to quiesce
2021-07-22T23:35:05.224Z|00329|ovs_rcu(urcu9)|WARN|blocked 1001 ms waiting for handler196 to quiesce
2021-07-22T23:35:05.492Z|138793|timeval(handler196)|WARN|Unreasonably long 1272ms poll interval (0ms user, 982ms system)
2021-07-22T23:35:05.492Z|138794|timeval(handler196)|WARN|context switches: 240 voluntary, 104 involuntary
2021-07-22T23:35:05.492Z|138795|coverage(handler196)|INFO|Skipping details of duplicate event coverage for hash=ec24845d
2021-07-22T23:35:05.492Z|138796|poll_loop(handler196)|INFO|Dropped 1088 log messages in last 6 seconds (most recently, 4 seconds ago) due to excessive rate
2021-07-22T23:35:05.492Z|138797|poll_loop(handler196)|INFO|wakeup due to 0-ms timeout at ofproto/ofproto-dpif-upcall.c:752 (79% CPU usage)
2021-07-22T23:35:09.472Z|138798|poll_loop(handler196)|INFO|Dropped 6 log messages in last 4 seconds (most recently, 1 seconds ago) due to excessive rate
2021-07-22T23:35:09.473Z|138799|poll_loop(handler196)|INFO|wakeup due to 0-ms timeout at ofproto/ofproto-dpif-upcall.c:752 (78% CPU usage)
2021-07-22T23:35:09.473Z|138800|dpif_netlink(handler196)|WARN|system at ovs-system: lost packet on port channel 44 of handler 2
2021-07-22T23:35:09.501Z|420006344|bridge|INFO|bridge br-int: deleted interface tapc257bed3-fc on port 1024
2021-07-22T23:35:10.473Z|00330|ovs_rcu(urcu9)|WARN|blocked 1000 ms waiting for handler196 to quiesce
2021-07-22T23:35:10.501Z|420006345|ovs_rcu|WARN|blocked 1000 ms waiting for handler196 to quiesce
2021-07-22T23:35:11.473Z|00331|ovs_rcu(urcu9)|WARN|blocked 2000 ms waiting for handler196 to quiesce

In the time 23:35:05(UTC time), the handler196 thread cpu usage has high value 79%. Before that time the log displays blocked handler threads and involuntary context switches.
Also there is strange behavior, different tap devices added or deleted on the same ovs ofport, the same tap device occurs 50+ times in 1 minutes:
bridge br-int: added interface tapc257bed3-fc on port 1024
bridge br-int: deleted interface tap64c1f4fe-8b on port 1024

It’s no cule to find why the tap device continually added and deleted on the same ovs ofport, it seems had lasted for more than one month.


And then we observed in /var/log/messages, the keepalived VRRP switched:
Jul 23 07:35:06 NFDW5-TSTACK-NET-01 Keepalived_vrrp[64451]: VRRP_Instance(VR_1) Transition to MASTER STATE
Jul 23 07:35:06 NFDW5-TSTACK-NET-01 Keepalived_vrrp[69471]: VRRP_Instance(VR_1) Transition to MASTER STATE
Jul 23 07:35:06 NFDW5-TSTACK-NET-01 Keepalived_vrrp[66825]: VRRP_Instance(VR_1) Transition to MASTER STATE
Jul 23 07:35:06 NFDW5-TSTACK-NET-01 Keepalived_vrrp[55727]: VRRP_Instance(VR_1) Transition to MASTER STATE
Jul 23 07:35:06 NFDW5-TSTACK-NET-01 Keepalived_vrrp[55735]: VRRP_Instance(VR_1) Transition to MASTER STATE
Jul 23 07:35:06 NFDW5-TSTACK-NET-01 Keepalived_vrrp[55709]: VRRP_Instance(VR_1) Transition to MASTER STATE
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[66784]: VRRP_Instance(VR_1) Transition to MASTER STATE
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[68081]: VRRP_Instance(VR_192) Transition to MASTER STATE
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[69471]: VRRP_Instance(VR_1) Received advert with higher priority 50, ours 50
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[69471]: VRRP_Instance(VR_1) Entering BACKUP STATE
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[53128]: VRRP_Instance(VR_1) Transition to MASTER STATE
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[64102]: VRRP_Instance(VR_1) Transition to MASTER STATE
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[72616]: VRRP_Instance(VR_1) Transition to MASTER STATE
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[66712]: VRRP_Instance(VR_1) Transition to MASTER STATE
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[70117]: VRRP_Instance(VR_1) Transition to MASTER STATE
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[64451]: VRRP_Instance(VR_1) Received advert with higher priority 50, ours 50
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[64451]: VRRP_Instance(VR_1) Entering BACKUP STATE
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[63906]: VRRP_Instance(VR_1) Transition to MASTER STATE
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[58249]: VRRP_Instance(VR_1) Transition to MASTER STATE
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[186176]: VRRP_Instance(VR_220) Transition to MASTER STATE
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[67535]: VRRP_Instance(VR_1) Transition to MASTER STATE
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[58302]: VRRP_Instance(VR_1) Transition to MASTER STATE
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[61177]: VRRP_Instance(VR_1) Transition to MASTER STATE
Jul 23 07:35:07 NFDW5-TSTACK-NET-01 Keepalived_vrrp[72059]: VRRP_Instance(VR_1) Transition to MASTER STATE


After keepalived VRRP switched, there are losts of lost packet and flow limit reached logs:
|WARN|system at ovs-system: lost packet on port channel
|WARN|upcall: datapath flow limit reached 




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20210727/0d29d669/attachment.html>


More information about the discuss mailing list