[ovs-dev] ovs-vswitchd too large memory consumption with OVN stateless ACL

Han Zhou zhouhan at gmail.com
Sat Nov 20 02:39:05 UTC 2021


On Fri, Nov 19, 2021 at 3:11 PM Ilya Maximets <i.maximets at ovn.org> wrote:
>
> On 11/19/21 19:12, Vladislav Odintsov wrote:
> > Hi,
> >
> > I’m testing OVN stateless ACL rules with `$port_group_ipVERSION` in
match portion.
> > There’s a strange behaviour and sometimes I got configuration, which
totally kills my transport nodes, where logical switch ports reside.
> > ovs-vswitchd and ovn-controller processes utilise 100% 1 core CPU each
and ovs-vswitchd consumes all free memory and repeatedly got killed by
OOM-killer. It consumes 5GB memory in 5-10 seconds!
> >
> > I reproduced this with OVS 2.13.4 & OVN main, but also tried with
actual OVS master branch and the problem still reproduces.
> >
> > Below are steps to reproduce:
>
> <snip>
>
> >
> > I couldn’t get any source of the problem except to find the steps to
reproduce.
> > Can somebody please take a look on this?
> > This looks like a potential serious problem for OVN transport nodes.
>
> This indeed looks like a serious issue.
> And thanks for the great detailed report!  That was really easy to
reproduce.
>
> I think, I found the main problem.  Could you try the following patch:
>
https://patchwork.ozlabs.org/project/openvswitch/patch/20211119230738.2765297-1-i.maximets@ovn.org/
> ?

Thanks Vladislav for reporting and thanks Ilya for the quick fix!
The fix looks good to me. However, I think there are more problems revealed
by this bug report to be addressed.
I could also reproduce it easily and I see at least 3 problems:

1) The simple ACL condition shouldn't generate the huge number of flows
(>60k) in the first place. The ovn-controller expression parser doesn't
handle != for const sets efficiently. It can be optimized to combine most
of the matches. For the example in this report, I'd expect at most hundreds
of flows in total. I have some ideas but need to try it out.
2) The memory spike problem caused by  in OVS as explained and fixed by
Ilya. Really great finding and fix! It is definitely required even if 1) is
solved, because we have real situations when a large number of flows will
be generated and installed at once.
3) What's left unclear to me, related to 2), is that after the bundle
processing is finished, the quiescent state should be entered, and the RCU
thread should free the temporarily allocated memory, right? But at least in
my test I don't see the memory goes down. With 60K flows OVS has 3.3G RES
which is unreasonable.

Thanks,
Han


>
> Best regards, Ilya Maximets.
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev


More information about the dev mailing list