[ovs-dev] ovn-controller CPU use and incremental processing dependencies

Thu Jun 7 23:51:34 UTC 2018

On Thu, Jun 07, 2018 at 04:32:10PM -0700, Han Zhou wrote:
> On Thu, Jun 7, 2018 at 11:15 AM, Ben Pfaff <blp at ovn.org> wrote:
> >
> > To review the problem we're trying to solve, ovn-controller uses too
> > much CPU in many circumstances because it recomputes everything on every
> > iteration on the main loop.  That includes when it wakes up for any
> > reason, e.g. in response to a timer, or to an "ovs-appctl" command, or
> > to respond to an "echo request" probing whether an OpenFlow or OVSDB
> > connection is still up.  In even a medium-sized network, this is a lot
> > of work and can easily lead to ovn-controller using 100% of a CPU
> > without doing significant valuable work.
> >
> > The solution, as you and others have noticed, is to recompute only when
> > an input has changed.  This can be done on at least three different
> > granularities:
> >
> > 1. Whole system.  Recompute everything if any input changed.
> >
> > 2. Coarsely granular.  Recompute intermediate data and output data,
> coarsely
> >    defined, if their respective inputs changed.
> >
> > 3. Finely granular.  Recompute individual rows (etc.) of intermediate or
> output
> >    data if their individual inputs changed.
> >
> > The 2016 patches from Ryan Moats were mostly #1 with some #2, and, Han,
> > as I see it your patch series is mostly #2 with some #3.
> >
> > Ryan Moats' series was eventually reverted because it was not correct and
> could
> > not be confidently fixed or maintained.  In turn, this was because it was
> too
> > difficult to figure out each calculation's actual dependencies.  This was
> in
> > fact difficult even for #1, let alone #2.
> >
> > Your new patch series uses a much more systematic approach to
> > dependencies, since it uses a formal incremental processing engine to
> > propagate changes through a DAG of computations.  However, assuming that
> > it is correct as is, I am concerned that it will eventually suffer from
> > the same problem.
> >
> > The way I see it, the incremental computation systems that have been
> proposed
> > so far are analogous to a Makefile that explicitly lists all of the .h
> files
> > that each .c file includes (directly or indirectly).  If you've ever
> maintained
> > a Makefile that works this way, you know that it's difficult to get it
> right at
> > the start and that it is essentially impossible to keep it correct as a
> project
> > evolves.  That's why every modern build system computes header file
> > dependencies automatically as a side effect of compilation.  Similarly, I
> don't
> > see a realistic way that we can keep these dependencies correct as
> > ovn-controller evolves.
> >
> > So, I've been spending some time thinking about how we can come up with
> the
> > ovn--controller equivalent of automatic dependency computation.  My
> current
> > thinking is that we need to adopt a more functional programming attitude
> here.
> > That is, any given part of the ovn-controller computation should only be
> passed
> > as input the data that it actually needs, and as output the data
> structures
> > that it produces.  This will give us the opportunity to make sure that the
> > input and the dependencies are in sync.  Ideally, we'd be able to tie
> those
> > together in the code, so that passing a piece of data into a computation
> > function automatically makes that a dependency from the engine's
> perspective.
> 
> Agree. While I don't think it is totally unrealistic, but it is indeed my
> biggest concern how to ensure the dependencies are correctly maintained in
> the longer term. It was a manual and tedious process for me (although it
> helped me to get better understanding of the whole implementation, as a
> personal benefit) to figure out the dependencies, by looking at the
> parameters passed to those xxx_run() functions, and also check carefully
> the global variables, and specially, the OVSDB tables they might be using.
> If I miss anything, or in the future some contributor miss updating the
> dependency graph when they should, it will result in problem. It would be
> great if we have a mechanism to self-describe the dependency from code and
> then deduce the dependencies automatically (or it might be good enough to
> still adding dependency manually, but relying on detecting and warning for
> missing dependencies during make). I have no concrete idea of the
> implementation though, any detailed suggestion would be helpful.

I've been playing with some ideas about that this week.  I hope to have
something to show soon.

> > Currently, we're pretty far from being able to do this.  Most of the state
> > computation functions in ovn-controller receive pointers to the OVSDB and
> OVNSB
> > IDLs (via struct controller_ctx), which means that they can read and write
> > anything in any part of those databases, and even if they don't now it's
> easy
> > to carelessly add new ones.  To pass only a function's actual
> dependencies into
> > it, we will need to pass, at a minimum, individual tables into a function
> > rather than entire databases.  This is more difficult than it sounds,
> because
> > the OVSDB IDL doesn't have a way to pass around a table.  Thus, some
> steps to
> > take here, starting from the easiest:
> >
> > 1. Add some ability for the IDL to pass around tables.  Exact mechanism
> not
> >    obvious yet.
> 
> We need to consider the index usage, too, which also requires passing the
> IDL.

I can see that, too.  I don't think it's too hard to adapt the indexes
in a similar way.

> > 2. Add some ability for the IDL to pass just part of a table, e.g.
> specific
> >    columns, to make dependencies more specific.  This would also make it
> >    possible to control dependencies related to hopping from one table to
> >    another via pointers in the table's columns.
> 
> I am not sure how would this help for the "hopping" case. If node N depends
> on column C1 and C2 of table T. Assume C2 refers to table T2. If node N's
> compute function accesses T2's data, how would this access be controlled by
> the mechanism?

Presumably, if N depends on C1 and C2, and C2 refers to table T2, then N
must also declare a dependency on T2 as well.  I think that this can be
enforced, by preventing a node from depending on a column without
depending on the table that the column references.

> > (Not every input comes from an OVSDB instance.  Other input sources tend
> to be
> > easier to deal with.  I don't know of specific problems there yet.)
> 
> It seems to me really no other input sources related to flow computing. Do
> you have anything in mind?

mff_ovn_geneve is the example that comes to mind.

> > There's another issue related to OVSDB.  Currently ovn-controller only
> composes
> > OVSDB transactions when there is not already a transaction in flight.
> This
> > causes trouble for the interaction between some parts of ovn-controller
> that
> > only do their work when they can compose a transaction and parts that
> need to
> > keep track of individual changes in tables ("change tracking") because
> change
> > tracking only keeps the changes for a single main loop iteration.  Thus,
> change
> > tracking changes can be missed if there's a transaction outstanding.  It
> seems
> > like there should be a way to solve this problem by always allowing a
> > transaction to be composed (and probably just discarding the transaction
> in
> > some cases).
> 
> Current patch handles this situation by forcing a full recompute in next
> iteration immediately. Being able to either always compose a transaction or
> keep track of changes across iterations would avoid the unnecessary
> recomputes. This is about a further performance improvement rather than
> correctness, so I don't see it urgent for now.

I think that's a good summary of the situation.

> > I'm currently working on step #1 above, to make dependencies easier to
> track by
> > passing individual tables into functions.  At that point, I think that the
> > incremental processing engine makes a lot of sense.
> 
> This is great! Looking forward to the updates and let me know anything else
> needed from my side. Thanks again.

I think I'm going to try to send out patches, even ones that are
incomplete solutions, to see if anyone has comments, pretty soon.