[ovs-dev] [RFC 00/14] ovn-controller incremental processing.

Sun Aug 5 19:11:35 UTC 2018

Hi Mark,

Thanks for the review and very valuable comments! (I was on vacation last
week so sorry for slow response).

On Tue, Jul 31, 2018 at 3:47 AM, Mark Michelson <mmichels at redhat.com> wrote:
>
> Hi Han,
>
> I've given this patchset a look, and I was following along pretty well
until I got to about patch 11. From that point on, I had to re-read the
code more times than I care to admit before I finally understood what was
going on :)
>
> What you have is a structure (lflow_ref_list_node) that is simultaneously
in two lists. These two lists are each the data in separate hmap nodes. The
hmap nodes are in two separate hmaps. One hmap uses the reference type and
name as a key, and the other uses the lflow UUID as a key. This way given
an address set name, you can find the associated logical flow UUIDs in the
ref_lflow_table. Or given a logical flow UUID, you can find address sets.
>
> I'm wondering if this can be simplified somehow.
>
> Right now, if logical flows change, the change handler has to update the
ref_lflow_table so that address sets no longer reference that logical flow.
If address sets change, then the lflow_ref_table is updated. In both cases
consider_logical_flow() gets called and realigns the tables as appropriate.
>
> The problem with this is that it reeks of cross-cutting concerns, and it
seems like it wouldn't scale well (consider a 3- or 4-chain of
dependencies). Ideally, the dependency chain would make sure that the
change handler for logical flows only deals with logical flows, and the
change handler for address sets only deals with address sets.

I agree that maintaining the cross reference has some overhead, but I don't
see a scaling issue in this case. Adding entries to the cross reference
table is a by-product of the consider_logical_flow() when parsing the
lflow, and deleting the entries is also efficient with O(N), N = number of
address sets used by a lflow, which in most cases should be a very small
number (correct me if I am wrong). As to memory consumption, it maintains
only the mapping between resource names and lflow uuids, so I don't expect
it to be a significant cost either. Could you explain a little more about
the "3- or 4-chain of dependencies" example?

For the "cross-cutting concern", I don't see it that way. I see it as a
pattern of change handler implementation. In general, output of an engine
node is the result of a "join" operation of its inputs. When there are
multiple inputs and one of them changes, for a change handler to compute
the output incrementally, we will need to use the changed row to probe all
the other inputs to update the final output. For the address-set and
port-group handlers, it is join between two tables only, and the cross
reference table is built to make the probing efficient in change handler.
The cross reference table is also generalized so that any resources
referenced by logical flows can reuse the same data structure and
interfaces, and now it is reused by both address-sets and port-groups. We
can make it more generalized to be used for other mappings if needed.

If we really wants to make it more generalized, I think the answer is the
datalog approach. I would be great if it can be implemented that way, but I
am pessimistic for it to be applied to ovn-controller in a practical time
line, given that ovn-controller is more complex in terms of both data
sources and processing logic compared with ovn-northd. And I think it is
practical and simple to implement the probing for most frequent scenarios
as demonstrated by this RFC.

>
> If we generalize things a bit, there are likely to be two ways
dependencies manifest in the database. In this particular case, text in one
row expands to data of a separate database row. The other case would be
where a database row contains the UUID (or list of UUIDs) of other database
rows.
>
> For the textual case, I think the easiest way to handle this is to
replace the text with what it expands to earlier than when we currently do
it. Consider that a logical flow references address set $foo. Currently,
the logical flow in the southbound database has the text "$foo" in it. If
$foo were replaced with the actual addresses from the address set, then
when an address set changes, the text of the logical flow would change as
well, thus resulting in a direct change of the logical flow. A less
disruptive version of this might be to use some reserved character
automatically in the logical flow match followed by a sequence number. So
for instance, if a logical flow were set up to reference address set $foo,
then the actual logical flow might be something like $foo?1. Then if
northbound address set foo changes, the logical flow could be updated to
$foo?2 by ovn-northd. Again, the textual change in the logical flow would
result in triggering the change tracker.
>

This proposal is interesting and I think it is a valid alternative. It is
trying to implement the probing without maintaining a cross reference table
in ovn-controller. In fact it moves the effort of building the reference
table from ovn-controller to maintaining the sequence number for each
address-set/port-group resources in ovn-northd. I am just not sure if this
makes the system simpler or more complex. I will need to think more about
it.

> For the database referencing case, it would be nice if the IDL change
tracking code could automatically do this for us. This way if record foo
has a column that references row bar, then if bar changes, we would be told
that foo also changed. This strikes me as difficult to implement and could
result in some interesting dependency graphs within the IDL code though.
>
> What do you think?
>

For the database referencing case, it seems not directly related to your
concern regarding address sets handling (or port-group handling). Please
correct me if I misunderstood something here. But I agree with idea of
utilizing and improving the IDL capability to build the dependency graph
for table references, and this is exactly in my TODO as mentioned in the
cover letter:

"For exposing the dependencies introduced by reference access, it is a big
TODO item and it is the major reason this patch series is RFC only."

Thanks,
Han