[ovs-dev] DDLog after one week

Leonid Ryzhyk lryzhyk at vmware.com
Thu Sep 26 03:14:59 UTC 2019

Hi Dimitru,

> Looking at IGMP I'm quite happy with the end result in DDlog. It
> really allowed me to focus on the feature specific logic and spared me
> the code complexity of custom hashtables and IDL lookups to implement
> stable ID allocation for multicast groups. Also the code size was
> significantly smaller for IGMP: ~500 lines in DDlog vs ~1500 lines in
> C.

Glad you feel this way despite the bumpy ride!

> Q: How are the OutProxy_{TableName} relations used? e.g.
> OutProxy_Datapath_Binding.
> A: In the end I found the answer in the DDlog repo in the OVSDB
> compiler implementation comments [1] but maybe we should add this part
> to northd/docs.

Agreed, these need to be documented. But first I am planning to refactor and simplify
all these intermediate tables by taking advantage of this new feature proposed by Ben:

If DDlog can explicitly assign UUIDs to output records, then a lot of the scaffolding
should become unnecessary. Based on scale benchmarks kindly run by Han and Mark,
these intermediate tables are responsible for a huge chunk of `ovn-northd-ddlog`'s
CPU and memory usage.  And as you know they also introduce a lot of complexity,
so simplifying them is at the top of my OVN TODO list.

> Bug: "ovn-ctl stop_northd" doesn't work.

Can you please submit an issue on this? Unfortunately this is not something I can easily fix without Justin's help.

> Q: Why doesn't the ddlog compiler generate Out_IP_Multicast and only
> "input IP_Multicast" in OVN_Southbound.dl.
> A: For every table to which ovn-northd-ddlog needs to write we need to
> update northd/automake.mk. Could we make this more robust or is it
> fine to keep adding tables/keys/readonly fields in the makefile?

I can see how this is error-prone, but I am not sure how to improve things other
than adding documentation (which you already mostly did) 😉

Problem is, I cannot extract this information automatically from the OVSDB schema,
so somehow the user has to tell the tool about all these attributes. We could store
these attributes in a separate configuration file or even add them as annotations
to the OVSDB schema, but I am not sure this will significanly improve the usability.
I am of course open to suggestions.

> Q: Is there more documentation on the ddlog standard library or just
> the comments in std.dl [3]?

Nope. Worse, our "standard library" is extremely ad hoc at the moment (so far)
we've only added functions on-demand.  We have a "need help" issue on this.
This is one area where contributions are very welcome; otherwise one of us
will try to get around to working on this soonish:


> Q: Why do some generated relations have uuid_name fields while others don't?
> A: I guess to save memory but I only got (partial) confirmation after
> looking at the compiler comments [1].

This field is only needed in those relations that are referenced by at least one
other table in OVSDB. The `ovsdb2ddlog` compiler automatically identifies such
relations and generates the field for them. It does save some memory, but also
eliminates the need to generate unique names for relations that are not referenced
by anyone.

> Q: Is there a way to leave a field uninitialized in an output
> relation? For example "eth_src" in sb.IP_Multicast is of type "string"
> in the schema but I couldn't find a way to leave it uninitialized
> (NULL) in DDlog.

If a column is declared with multiplicity `"min": 0, "max": 1`, then it ends up being compiled
into a DDlog `Set<>` (it should really be  `Option<>` , but sets made life a bit easier for me;
this is one of the things I would like to improve in the future), and then an empty
set can be used to represent `NULL`. I was under the impression that a column with
multilplicity of exactly `1`, such as `sb.IP_Multicast` is not allowed to be NULL. Am I wrong?
If so, when exactly are NULLs allowed by OVSDB?

> Issue: We always build for release (hardcoded) => no debug symbols.
> Maybe we should rework northd/automake.mk to include debug symbols for
> the rust files or enable debug builds.

Good point. I created an issue for it: https://github.com/ovn-org/ovn/issues/13

> ovn-northd-ddlog.c specific:
> ----------------------------
> Q: There are scenarios when ovn-northd-ddlog might flush the whole
> SB/NB DB (e.g., reconnect)? What happens to tables that are not
> populated by ovn-northd (e.g., IGMP_Group)?
> A: ddlog_clear_sb() clears tables from the sb_input_relations array.
> But it was a bit confusing in the beginning to me because I initially
> had the impression that sb_input_relations are OVSDB tables from the
> Southbound database that will be fed as input to the ddlog program.
> However that's not true, ovsdb2ddlog generates input relations for all
> tables in the OVSDB schema.
> Q: I defined all my intermediate relations and rules to populate
> sb.Out_<TableName>, dumping the table shows the records but why isn't
> the data written to the database?
> A: Update get_sb_ops() and get_nb_ops().

Agreed, we need better documentation for these.

> I also opened a pull request for a "how to add an OVN feature to
> ovn-northd-ddlog" tutorial [2] where I tried to document the steps i
> had to do for IGMP. Do you mind having a look at it when you have
> time?

The document looks great and will hopefully be really helpful for other
OVN developers.  As discussed on github, I'd like to add some more
details, which I will submit as a PR against your PR.

> Some suggestions that I think would improve debugging without the need
> of a full fledged debugger:

> - It would be nice if we could use ddlog_dump_table for all tables as
> now it works only for "output" relations. I tried to dig through the
> DDlog implementation but couldn't figure out the reason why this is
> so. Right now it's quite cumbersome because marking a relation as
> "output" needs a rebuild which takes 5-10 minutes on my machine.

The reason DDlog cannot dump non-output tables is that it simply does
not have easy access to their contents. Some of these tables may not even
be stored (tables are really just data streams that are only materialized
when needed, e.g., to join it with another table),  while others are stored
inside the differential dataflow engine in a form that is difficult to access
(for performance reasons). Declaring a table as `output` tells DDlog to
actually store its state, making it possible to dump the table, at the cost of
some memory and CPU overhead.

I will  add a flag to DDlog to implicitly label all tables as `output` for debugging
purposes.  This will still require re-compilation, but only once, and will not
require any changes to the source code:

> - Maybe the ddlog runtime could pass more information to the callback
> that is registered in ddlog_run. Right now we can only dump the record
> and deduce the operation from the weight argument. I didn't
> investigate enough but it would be nice if we could filter what types
> of records we want the callback to be called for.

The callback also takes `table_id`, which can be used to filter out only
records that belong to a subset of relations of interest. It can be used
in conjunction with `ddlog_get_table_id`, which maps table name to id.
We could also allow the user to subscribe to a subset of relations as you
are suggesting. My only concern is that it will make the DDlog API more

Also, I want to mention that a better way to observe changes to relations is
to use `ddlog_transaction_commit_dump_changes` instead of
`ddlog_transaction_commit`.  It takes a callback with the same
signature as `ddlog_run`, but this callback will be invoked at most once
for every record modified during the transaction. The `ddlog_run`
callback does not offer the same guarantee.

> Also, I know you
> mentioned that we can't answer provenance queries but I see that in
> the lib.rs generated code the part where relations are created is
> annotated with a comment which is actually the .dl rule definition.
> That would be great to pass (if not .dl file and line number) to the
> update_handler that gets called when we register a callback to
> ddlog_run:

This is a good idea, and we can probably do something like that. But there is a
caveat that a record in DDlog can be derived multiple times by the same or
different rules. Tracking all derivations can be very expensive, and, more
importanly, changes the semantics of the DDlog program. One workaround
is to make this derivation metadata a hidden field, so that records with
the same value, but different derivations appear identical to DDlog.  This way
we will end up observing a single, random, and possibly outdated, derivation
of each output record, but that might still be helpful for debugging:



PS.  We are working on some other DDlog improvements right now.
I am hoping we will be done with them in a few days and will start implementing
some of the features promised above.

More information about the dev mailing list