[ovs-dev] [PATCH RFC ovn 1/1] RFC: Logical flow generation in ovn-controller

Han Zhou zhouhan at gmail.com
Fri Jul 16 18:53:04 UTC 2021


On Fri, Jul 16, 2021 at 7:15 AM Numan Siddique <numans at ovn.org> wrote:
>
> On Mon, Jul 12, 2021 at 11:09 AM Mark Michelson <mmichels at redhat.com>
wrote:
> >
> > Full disclosure: I have not looked at all the details in this patch,
> > since it is quite large. However, I felt I should comment on the idea.
>
> Hi Mark,
>
> Thanks for the review and the comments.
>
> >
> > The memory savings in ovn-northd and the southbound database are quite
> > nice. It's to be expected since the southbound database has fewer
> > logical flows, and ovn-northd doesn't have to calculate logical flows. I
> > think it's telling that the scale tests didn't show any noticeable
> > improvement in performance.
> >
> > I think one thing that could help would be to just skip logical flow
> > creation altogether in ovn-controller. It doesn't make much sense to
> > take the source data, translate it into logical flow syntax, just to run
> > it through the expression parser and translate it into OpenFlow. You
> > could presumably create OpenFlow matches and actions directly, saving
> > lots of processing. Of course, this takes this
> > semi-backwards-incompatible change and makes it wholly
> > backwards-incompatible :)
> >
> >
>
> I thought about skipping logical flow generation altogether.  But it has
its own
> challenges
>   -  We will break backward compatibility and many tools - ovn-sbctl,
> ovn-trace/ovn-detrace
>   -  And it is a lot of effort.
>
> If breaking those tools is Ok,  I guess we can explore this option.
> So far ovsdb-server
> has been a bottleneck in the scale environments and the goal of this
> approach was
> to relieve that pressure from ovsdb-servers.  Since Ilya's relay
> feature is merged we
> can definitely test that feature and see if it solves all the scale
> concerns.  If it
> solves,  then I think we can drop the idea of removing the logical
> flows from the
> Southbound db.  If not,  I think we may have to revisit this idea.  In
> my free cycles
> I'd like to experiment a bit and see if we can generate OF flows
> directly in ovn-controller.
>
> I think it is still good to discuss in this thread if generating OF
> flows directly can be a potential
> re-architecture or not.
>
> @Han - You mentioned in the irc meeting that you have some comments
> too.  It would be
> great if you can share your thoughts.

Sorry for my slow response. My comments regarding this RFC patch are
primarily the ones we discussed offline after the irc meeting. In fact most
of them are already covered by Mark's comments in his reply. Let me put it
here with some additional points:

1) I agree with Mark that this patch achieves its goal of reducing the size
of SB DB, without obvious e2e performance gains which is also expected.
While the SB DB reduction is significant, the benefit might not justify
this big architecture change. And I agree with you that let's see how the
OVSDB relay approach helps for the SB DB scale, and revisit this patch if
necessary.

2) I also mentioned the same idea a while ago for removing the logical flow
phase. If we remove logical flows from SB DB, it is more natural and
efficient to not generate them at all. However, I also thought about that
again recently and I feel not so sure now if it is the right way to
proceed. The open issues are:
    - First of all, as you mentioned, this is an even bigger change,
probably close to rewriting all the components, and the tools. (We might
look at directly using DDlog to achieve that, and hopefully some code in
the northd-ddlog can be reused.) Most importantly, it took a great amount
of effort for OVN to stabilize (well, to some extent) for production, so
I'd expect such a big change at this level, if implemented, would require a
really long period of time, probably very painful, to stabilize again.
    - We lose the benefit of logical-physical separation, which could make
trouble-shooting much more difficult. Nevertheless, the debuggability may
be achieved by other ways, and nothing prevents us from implementing tools
that generate human friendly logical flows for debugging purposes only, if
it is helpful.
    - It seems ovn-controller should connect to NB DB directly, but at the
same time maintain SB DB for port-bindings and chassis information. (This
is not a problem, but something to think about)
    - Although it avoids the cost of logical flow parsing, it doesn't
completely avoid the cost of parsing "match" strings, because the "match"
fields in ACL (and also Policy Routing, QoS) are still flexible strings
specified by users. If these objects are too many, specifically, if ACLs
contribute to a significant part of today's logical flow handling, then
even if logical flows are removed, parsing ACL matches could still be a
bottleneck, at least for some use cases. So we may need to lower our
expectations for the e2e performance gains after all the great efforts.

So I am still open to the idea of rearchitecting to remove logical flows.
It would be great to have a POC to evaluate the e2e performance gains,
which itself can take a really big effort, at the risk of being completely
discarded. Since there is an ongoing effort of rewriting ovn-controller
using DDlog, which I believe is a big effort, too, perhaps this can be
considered as an option together. @Ben Pfaff <blp at ovn.org>

3) Regarding the potential performance gains of this patch by using
incremental processing in ovn-controller, I don't think it is a strong
argument, because I believe our goal for ovn-northd is still making
ovn-northd-ddlog mature and replace the C version, so when comparing e2e
performance we should consider ovn-northd-ddlog which inherently does
incremental processing, instead of comparing with the current C version of
ovn-northd. (In case the direction of using ovn-northd-ddlog is somehow
changed, hopefully not, we need to raise it clearly in the community and
discuss it seriously.)

4) It seems we (the community) put more focus on scalability again recently
and a lot of improvements have been achieved, and more ongoing. Although
none of them are architectural changes, they turned out to be effective (or
with great potential). We could re-evaluate the longer term plan after we
complete these non-architectural improvements.

Thanks,
Han

>
> Few comments below
>
> >
> > I'm concerned about a few things in the patch as presented.
> >
> > First, moving logical flow generation from ovn-northd to ovn-controller
> > seems like it's just shifting the work from one place to another. And in
> > the case where datapaths have ports bound on many nodes, it means a lot
> > of the same work is being done on multiple ovn-controller instances. I-P
> > in ovn-controller could reduce the amount of work ovn-controller is
> > performing per iteration, but it also introduces error potential.
>
> It definitely adds some load to ovn-controllers. But I don't think it
> is significant.
> There are many logical flows which are generic (which datapath groups
> has already
> leveraged) and the proposed RFC patch generates those at start up.
>
> I do agree that there is potential for errors and adding new features
> could be a bit
> challenging.  But it should not be so hard.
>
>
> >
> > Second, is there a danger in moving logical flows out of the southbound
> > database? You modified `ovn-sbctl lflow-list` to do some magic to list
> > the logical flows, but are there any CMS's that access the Logical_Flow
> > table directly in the SB DB for any reason?
>
> I don't think CMS should rely on the logical_flow table.  I'm not
> aware of any CMS
> doing it.  Even if any CMS is doing it, I think it probably shouldn't.
>
> >
> > Third, does ovn-trace still work?
>
> In the proposed patch, ovn-trace doesn't work but it should be
straightforward
> to make it work.  In this proposed patch, a new command is added in
ovn-sbctl
> called - "ctrl-lflow-list"  and it calls the same functions to
> generate logical flows
> as ovn-controller.  ovn-trace can do the same.
>
> >
> > Overall, I'm fine with the idea of re-architecting things in OVN, but I
> > think it requires a bit more of a plan than this. A change like this is
> > doing its best to hide that anything has changed, when behind the scenes
> > things have changed a lot. I think a true rearchitecting will require
> > user-facing changes, too. If we still used major-minor versioning for
> > OVN, this would be the sort of thing that would result in a major
> > version bump.
>
> I agree.
>
> Thanks
> Numan
>
> >
> >
> > On 6/25/21 7:31 PM, numans at ovn.org wrote:
> > > From: Numan Siddique <numans at ovn.org>
> > >
> > > This is a an RFC patch to move the logical flow generation from
> > > ovn-northd to ovn-controller.
> > >
> > > This patch doesn't move all the generation to ovn-controller.
> > > ovn-northd still does the flow generation for
> > >    - ACLs/Port groups.
> > >    - DHCP options
> > >    - Multicast groups
> > >    - And few others.
> > >
> > > Other than ACLs and Port groups it would be possible to move
> > > flow generation for the above mentioned things.  But before doing
> > > all that, it is worth to evaluate if the proposed RFC makes sense.
> > >
> > > The main motivation for this RFC effort is to
> > >    - Address scale issues seen.  For large scale deployments
> > >      ovn-northd takes lot of CPU for the computation (ovn-northd-ddlog
> > >      should help here) and memory and so does Southbound
ovsdb-servers.
> > >
> > >    - Having a very huge southbound DB and logical flows affects the
> > >      raft consenses and it requires increasing the raft election
timers.
> > >
> > >    - Logical flows contributes majorly to the overall south bound DB.
> > >
> > > This RFC demonstrates that it is possible for each ovn-controller
> > > to generate logical flows.
> > >
> > > These are some of the findings with my general and scale testing.
> > >
> > > Below are the test findings with a huge pre-existing Northbound
database with
> > > datapath groups enabled with a size of 13 M
> > >    - Southbound DB size is:
> > >           * with ovn-northd-master - 35 M
> > >           * with ovn-northd-proposed-rfc - 12M
> > >
> > >    - Number of logical flows:
> > >           * with ovn-northd-master - 78581
> > >           * with ovn-northd-proposed-rfc - 7933
> > >
> > >    - RSS Memory consumption of
> > >          * ovn-northd-master -       441368 KiB
> > >          * ovn-northd-proposed-rfc - 115540 KiB
> > >
> > >          * ovn-controller-master -       1267716 KiB
> > >          * ovn-controller-proposed-rfc - 915876 KiB
> > >
> > >          * SB ovsdb-server-with-ovn-master -       612296 KiB
> > >          * SB ovsdb-server-with-proposed-rfc-ovn - 134680 KiB
> > >
> > > With the scale testing of 500 fake multinode nodes and each node
> > > creating having a few port bindings claimed,  the end result is
> > > almost similar.  No signifcant improvements seen with the proposed
> > > RFC patch.  The results are identical.
> > >
> > > I think more scale testing needs to be done to determine if
> > > the CPU usage and memory usage reduction in the ovn-northd and
> > > ovsdb-servers will have a major impact or not.  Testing with
> > > a real Kubernetes/Openstack deployments would help.
> > >
> > > Few Observations
> > >    -  It is possible to move the flow generation to each
ovn-controller.
> > >
> > >    -  Each ovn-controller only generates the logical flows if required
> > >       i.e if the datapath is in the 'local_datapaths'.
> > >
> > >    -  This RFC patch do complicate ovn-controller code which has
already
> > >       many complicated bits.
> > >
> > >    -  I was expecting the scale test results to improve and the
> > >       end-to-end time of a pod/VM creation would be quicker. But that
is
> > >       not the case, which is a disappointment.
> > >
> > > Submitting this RFC patch to get a feed back and have conversation
> > > if it is worth the effort.
> > >
>
> > >
> >
> >
> > _______________________________________________
> > dev mailing list
> > dev at openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev


More information about the dev mailing list