[ovs-discuss] ovn-controller is taking 100% CPU all the time in one deployment
mmichels at redhat.com
Fri Aug 30 13:44:45 UTC 2019
On 8/30/19 5:39 AM, Daniel Alvarez Sanchez wrote:
> On Thu, Aug 29, 2019 at 10:01 PM Mark Michelson <mmichels at redhat.com> wrote:
>> On 8/29/19 2:39 PM, Numan Siddique wrote:
>>> Hello Everyone,
>>> In one of the OVN deployments, we are seeing 100% CPU usage by
>>> ovn-controllers all the time.
>>> After investigations we found the below
>>> - ovn-controller is taking more than 20 seconds to complete full loop
>>> (mainly in lflow_run() function)
>>> - The physical switch is sending GARPs periodically every 10 seconds.
>>> - There is ovn-bridge-mappings configured and these GARP packets
>>> reaches br-int via the patch port.
>>> - We have a flow in router pipeline which applies the action - put_arp
>>> if it is arp packet.
>>> - ovn-controller pinctrl thread receives these garps, stores the
>>> learnt mac-ips in the 'put_mac_bindings' hmap and notifies the
>>> ovn-controller main thread by incrementing the seq no.
>>> - In the ovn-controller main thread, after lflow_run() finishes,
>>> pinctrl_wait() is called. This function calls - poll_immediate_wake() as
>>> 'put_mac_bindings' hmap is not empty.
>>> - This causes the ovn-controller poll_block() to not sleep at all and
>>> this repeats all the time resulting in 100% cpu usage.
>>> The deployment has OVS/OVN 2.9. We have back ported the pinctrl_thread
>>> Some time back I had reported an issue about lflow_run() taking lot of
>>> time - https://mail.openvswitch.org/pipermail/ovs-dev/2019-July/360414.html
>>> I think we need to improve the logical processing sooner or later.
>> I agree that this is very important. I know that logical flow processing
>> is the biggest bottleneck for ovn-controller, but 20 seconds is just
>> ridiculous. In your scale testing, you found that lflow_run() was taking
>> 10 seconds to complete.
> I support this statement 100% (20 seconds is just ridiculous). To be
> precise, in this deployment we see over 23 seconds for the main loop
> to process and I've seen even 30 seconds some times. I've been talking
> to Numan these days about this issue and I support profiling this
> actual deployment so that we can figure out how incremental processing
> would help.
>> I'm curious if there are any factors in this particular deployment's
>> configuration that might contribute to this. For instance, does this
>> deployment have a glut of ACLs? Are they not using port groups?
> They're not using port groups because it's 2.9 and it is not there.
> However, I don't think port groups would make a big difference in
> terms of ovn-controller computation. I might be wrong but Port Groups
> help reduce the number of ACLs in the NB database while the # of
> Logical Flows would still remain the same. We'll try to get the
> contents of the NB database and figure out what's killing it.
You're right that port groups won't reduce the number of logical flows.
However, it can reduce the computation in ovn-controller. The reason is
that the logical flows generated by ACLs that use port groups may result
in conjunctive matches being used. If you want a bit more information,
see the "Port groups" section of this blog post I wrote:
The TL;DR is that with port groups, I saw the number of OpenFlow flows
generated by ovn-controller drop by 3 orders of magnitude. And that
meant that flow processing was 99% faster for large networks.
You may not see the same sort of improvement for this deployment, mainly
because my test case was tailored to illustrate how port groups help.
There may be other factors in this deployment that complicate flow
>> This particular deployment's configuration may give us a good scenario
>> for our testing to improve lflow processing time.
>>> But to fix this issue urgently, we are thinking of the below approach.
>>> - pinctrl_thread will locally cache the mac_binding entries (just like
>>> it caches the dns entries). (Please note pinctrl_thread can not access
>>> the SB DB IDL).
>>> - Upon receiving any arp packet (via the put_arp action), pinctrl_thread
>>> will check the local mac_binding cache and will only wake up the main
>>> ovn-controller thread only if the mac_binding update is required.
>>> This approach will solve the issue since the MAC sent by the physical
>>> switches will not change. So there is no need to wake up ovn-controller
>>> main thread.
>> I think this can work well. We have a lot of what's needed already in
>> pinctrl at this point. We have the hash table of mac bindings already.
>> Currently, we flush this table after we write the data to the southbound
>> database. Instead, we would keep the bindings in memory. We would need
>> to ensure that the in-memory MAC bindings eventually get deleted if they
>> become stale.
>>> In the present master/2.12 these GARPs will not cause this 100% cpu loop
>>> issue because incremental processing will not recompute flows.
>> Another mitigating factor for master is something I'm currently working
>> on. I've got the beginnings of a patch series going where I am
>> separating pinctrl into a separate process from ovn-controller:
>> It's in the early stages right now, so please don't judge :)
>> Separating pinctrl to its own process means that it cannot directly
>> cause ovn-controller to wake up like it currently might.
>>> Even though the above approach is not really required for master/2.12, I
>>> think it is still Ok to have this as there is no harm.
>>> I would like to know your comments and any concerns if any.
>> Hm, I don't really understand why we'd want to put this in master/2.12
>> if the problem doesn't exist there. The main concern I have is with
>> regards to cache lifetime. I don't want to introduce potential memory
>> growth concerns into a branch if it's not necessary.
>> Is there a way for us to get this included in 2.9-2.11 without having to
>> put it in master or 2.12? It's hard to classify this as a bug fix,
>> really, but it does prevent unwanted behavior in real-world setups.
>> Could we get an opinion from committers on this?
>>> discuss mailing list
>>> discuss at openvswitch.org
>> discuss mailing list
>> discuss at openvswitch.org
More information about the discuss