[ovs-dev] DDLog after one week

Dumitru Ceara dceara at redhat.com
Thu Sep 26 08:53:05 UTC 2019

On Thu, Sep 26, 2019 at 5:15 AM Leonid Ryzhyk <lryzhyk at vmware.com> wrote:
> Hi Dimitru,
> > Looking at IGMP I'm quite happy with the end result in DDlog. It
> > really allowed me to focus on the feature specific logic and spared me
> > the code complexity of custom hashtables and IDL lookups to implement
> > stable ID allocation for multicast groups. Also the code size was
> > significantly smaller for IGMP: ~500 lines in DDlog vs ~1500 lines in
> > C.
> Glad you feel this way despite the bumpy ride!
> > Q: How are the OutProxy_{TableName} relations used? e.g.
> > OutProxy_Datapath_Binding.
> > A: In the end I found the answer in the DDlog repo in the OVSDB
> > compiler implementation comments [1] but maybe we should add this part
> > to northd/docs.
> Agreed, these need to be documented. But first I am planning to refactor and simplify
> all these intermediate tables by taking advantage of this new feature proposed by Ben:
> https://patchwork.ozlabs.org/patch/1153455/
> If DDlog can explicitly assign UUIDs to output records, then a lot of the scaffolding
> should become unnecessary. Based on scale benchmarks kindly run by Han and Mark,
> these intermediate tables are responsible for a huge chunk of `ovn-northd-ddlog`'s
> CPU and memory usage.  And as you know they also introduce a lot of complexity,
> so simplifying them is at the top of my OVN TODO list.


> > Bug: "ovn-ctl stop_northd" doesn't work.
> Can you please submit an issue on this? Unfortunately this is not something I can easily fix without Justin's help.

It was actually quite easy to fix. I sent a PR:

> > Q: Why doesn't the ddlog compiler generate Out_IP_Multicast and only
> > "input IP_Multicast" in OVN_Southbound.dl.
> > A: For every table to which ovn-northd-ddlog needs to write we need to
> > update northd/automake.mk. Could we make this more robust or is it
> > fine to keep adding tables/keys/readonly fields in the makefile?
> I can see how this is error-prone, but I am not sure how to improve things other
> than adding documentation (which you already mostly did)
> Problem is, I cannot extract this information automatically from the OVSDB schema,
> so somehow the user has to tell the tool about all these attributes. We could store
> these attributes in a separate configuration file or even add them as annotations
> to the OVSDB schema, but I am not sure this will significanly improve the usability.
> I am of course open to suggestions.

I agree that it would be hard to automate so I see two options:
1. generate all the tables (including Proxy) for all OVSDB tables in
the schema. One issue with that would be the read-only columns and
keys which would still need to be manually specified. Is there any
other downside to this? Compile time increase?
2. add a new ovsdb2ddlog argument to pass a "spec" file with the
tables that need to be translated to ddlog. Mainly what we have now in
automake.mk but isolated in its own file to have a clearer view.

If it's not too much work I would go for option 2. Any thoughts?

> > Q: Is there more documentation on the ddlog standard library or just
> > the comments in std.dl [3]?
> Nope. Worse, our "standard library" is extremely ad hoc at the moment (so far)
> we've only added functions on-demand.  We have a "need help" issue on this.
> This is one area where contributions are very welcome; otherwise one of us
> will try to get around to working on this soonish:
> https://github.com/vmware/differential-datalog/issues/282#issuecomment-510574711

I see, i'll have a look if I get a chance.

> > Q: Why do some generated relations have uuid_name fields while others don't?
> > A: I guess to save memory but I only got (partial) confirmation after
> > looking at the compiler comments [1].
> This field is only needed in those relations that are referenced by at least one
> other table in OVSDB. The `ovsdb2ddlog` compiler automatically identifies such
> relations and generates the field for them. It does save some memory, but also
> eliminates the need to generate unique names for relations that are not referenced
> by anyone.

Got it, thanks.

> > Q: Is there a way to leave a field uninitialized in an output
> > relation? For example "eth_src" in sb.IP_Multicast is of type "string"
> > in the schema but I couldn't find a way to leave it uninitialized
> > (NULL) in DDlog.
> If a column is declared with multiplicity `"min": 0, "max": 1`, then it ends up being compiled
> into a DDlog `Set<>` (it should really be  `Option<>` , but sets made life a bit easier for me;
> this is one of the things I would like to improve in the future), and then an empty
> set can be used to represent `NULL`. I was under the impression that a column with
> multilplicity of exactly `1`, such as `sb.IP_Multicast` is not allowed to be NULL. Am I wrong?
> If so, when exactly are NULLs allowed by OVSDB?

Actually the definition of sb.IP_Multicast.eth_src is:

"eth_src": {"type": "string"}

I guess the only case when NULLs occur in OVSDB is for strings.

> > Issue: We always build for release (hardcoded) => no debug symbols.
> > Maybe we should rework northd/automake.mk to include debug symbols for
> > the rust files or enable debug builds.
> Good point. I created an issue for it: https://github.com/ovn-org/ovn/issues/13


> > ovn-northd-ddlog.c specific:
> > ----------------------------
> > Q: There are scenarios when ovn-northd-ddlog might flush the whole
> > SB/NB DB (e.g., reconnect)? What happens to tables that are not
> > populated by ovn-northd (e.g., IGMP_Group)?
> > A: ddlog_clear_sb() clears tables from the sb_input_relations array.
> > But it was a bit confusing in the beginning to me because I initially
> > had the impression that sb_input_relations are OVSDB tables from the
> > Southbound database that will be fed as input to the ddlog program.
> > However that's not true, ovsdb2ddlog generates input relations for all
> > tables in the OVSDB schema.
> > Q: I defined all my intermediate relations and rules to populate
> > sb.Out_<TableName>, dumping the table shows the records but why isn't
> > the data written to the database?
> > A: Update get_sb_ops() and get_nb_ops().
> Agreed, we need better documentation for these.
> > I also opened a pull request for a "how to add an OVN feature to
> > ovn-northd-ddlog" tutorial [2] where I tried to document the steps i
> > had to do for IGMP. Do you mind having a look at it when you have
> > time?
> The document looks great and will hopefully be really helpful for other
> OVN developers.  As discussed on github, I'd like to add some more
> details, which I will submit as a PR against your PR.
> > Some suggestions that I think would improve debugging without the need
> > of a full fledged debugger:
> > - It would be nice if we could use ddlog_dump_table for all tables as
> > now it works only for "output" relations. I tried to dig through the
> > DDlog implementation but couldn't figure out the reason why this is
> > so. Right now it's quite cumbersome because marking a relation as
> > "output" needs a rebuild which takes 5-10 minutes on my machine.
> The reason DDlog cannot dump non-output tables is that it simply does
> not have easy access to their contents. Some of these tables may not even
> be stored (tables are really just data streams that are only materialized
> when needed, e.g., to join it with another table),  while others are stored
> inside the differential dataflow engine in a form that is difficult to access
> (for performance reasons). Declaring a table as `output` tells DDlog to
> actually store its state, making it possible to dump the table, at the cost of
> some memory and CPU overhead.
> I will  add a flag to DDlog to implicitly label all tables as `output` for debugging
> purposes.  This will still require re-compilation, but only once, and will not
> require any changes to the source code:
> https://github.com/vmware/differential-datalog/issues/391

Works for me.

> > - Maybe the ddlog runtime could pass more information to the callback
> > that is registered in ddlog_run. Right now we can only dump the record
> > and deduce the operation from the weight argument. I didn't
> > investigate enough but it would be nice if we could filter what types
> > of records we want the callback to be called for.
> The callback also takes `table_id`, which can be used to filter out only
> records that belong to a subset of relations of interest. It can be used
> in conjunction with `ddlog_get_table_id`, which maps table name to id.
> We could also allow the user to subscribe to a subset of relations as you
> are suggesting. My only concern is that it will make the DDlog API more
> complex.

Ah, right, we have the table_id. I missed that. Is there also a
ddlog_get_table_name(table_id) API? I didn't see any when I looked

> Also, I want to mention that a better way to observe changes to relations is
> to use `ddlog_transaction_commit_dump_changes` instead of
> `ddlog_transaction_commit`.  It takes a callback with the same
> signature as `ddlog_run`, but this callback will be invoked at most once
> for every record modified during the transaction. The `ddlog_run`
> callback does not offer the same guarantee.

Interesting. But this will be called at commit time, right? I was
thinking that the advantage of a callback in ddlog_run would be that
the callback is called in the order the updates happen whereas at
commit time we lose that ordering. I'm not completely sure if that's
relevant, it might be just my imperative-programming style of

> > Also, I know you
> > mentioned that we can't answer provenance queries but I see that in
> > the lib.rs generated code the part where relations are created is
> > annotated with a comment which is actually the .dl rule definition.
> > That would be great to pass (if not .dl file and line number) to the
> > update_handler that gets called when we register a callback to
> > ddlog_run:
> This is a good idea, and we can probably do something like that. But there is a
> caveat that a record in DDlog can be derived multiple times by the same or
> different rules. Tracking all derivations can be very expensive, and, more
> importanly, changes the semantics of the DDlog program. One workaround
> is to make this derivation metadata a hidden field, so that records with
> the same value, but different derivations appear identical to DDlog.  This way
> we will end up observing a single, random, and possibly outdated, derivation
> of each output record, but that might still be helpful for debugging:
> https://github.com/vmware/differential-datalog/issues/103

I think that as long as we can see that the derivation is possibly
outdated it might still be helpful. Better than nothing anyway :)

> ----------
> PS.  We are working on some other DDlog improvements right now.
> I am hoping we will be done with them in a few days and will start implementing
> some of the features promised above.


More information about the dev mailing list