[ovs-discuss] ovn-controller is taking 100% CPU all the time in one deployment

Numan Siddique nusiddiq at redhat.com
Thu Aug 29 19:15:48 UTC 2019


On Fri, Aug 30, 2019 at 12:37 AM Han Zhou <zhouhan at gmail.com> wrote:

>
>
> On Thu, Aug 29, 2019 at 11:40 AM Numan Siddique <nusiddiq at redhat.com>
> wrote:
> >
> > Hello Everyone,
> >
> > In one of the OVN deployments, we are seeing 100% CPU usage by
> ovn-controllers all the time.
> >
> > After investigations we found the below
> >
> >  - ovn-controller is taking more than 20 seconds to complete full loop
> (mainly in lflow_run() function)
> >
> >  - The physical switch is sending GARPs periodically every 10 seconds.
> >
> >  - There is ovn-bridge-mappings configured and these GARP packets
> reaches br-int via the patch port.
> >
> >  - We have a flow in router pipeline which applies the action - put_arp
> > if it is arp packet.
> >
> >  - ovn-controller pinctrl thread receives these garps, stores the learnt
> mac-ips in the 'put_mac_bindings' hmap and notifies the ovn-controller main
> thread by incrementing the seq no.
> >
> >  - In the ovn-controller main thread, after lflow_run() finishes,
> pinctrl_wait() is called. This function calls - poll_immediate_wake() as
> 'put_mac_bindings' hmap is not empty.
> >
> > - This causes the ovn-controller poll_block() to not sleep at all and
> this repeats all the time resulting in 100% cpu usage.
> >
> > The deployment has OVS/OVN 2.9.  We have back ported the pinctrl_thread
> patch.
> >
> > Some time back I had reported an issue about lflow_run() taking lot of
> time -
> https://mail.openvswitch.org/pipermail/ovs-dev/2019-July/360414.html
> >
> > I think we need to improve the logical processing sooner or later.
> >
> > But to fix this issue urgently, we are thinking of the below approach.
> >
> >  - pinctrl_thread will locally cache the mac_binding entries (just like
> it caches the dns entries). (Please note pinctrl_thread can not access the
> SB DB IDL).
> >
> > - Upon receiving any arp packet (via the put_arp action), pinctrl_thread
> will check the local mac_binding cache and will only wake up the main
> ovn-controller thread only if the mac_binding update is required.
> >
> > This approach will solve the issue since the MAC sent by the physical
> switches will not change. So there is no need to wake up ovn-controller
> main thread.
> >
> > In the present master/2.12 these GARPs will not cause this 100% cpu loop
> issue because incremental processing will not recompute flows.
> >
> > Even though the above approach is not really required for master/2.12, I
> think it is still Ok to have this as there is no harm.
> >
> > I would like to know your comments and any concerns if any.
> >
> > Thanks
> > Numan
> >
>
> Hi Numan,
>
> I think this approach should work. Just to make sure, to update the cache
> efficiently (to avoid another kind of recompute), it should use ovsdb
> change-tracking to update it incrementally.
>
> Regarding master/2.12, it is not harmful except that it will add some more
> code and increase memory footprint. For our current use cases, there can be
> easily 10,000s mac_bindings, but it may still be ok because each entry is
> very small. However, is there any benefit for doing this in master/2.12?
>

I don't see much benefit. But I can't submit a patch to branch 2.9 without
the fix getting merged in master first right ?
May be once it is merged in branch 2.9, we can consider to delete it ?

Thanks
Numan


>
> Thanks,
> Han
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20190830/c321e18e/attachment-0001.html>


More information about the discuss mailing list