[ovs-dev] DDLog after one week

Dumitru Ceara dceara at redhat.com
Fri Sep 27 07:31:32 UTC 2019


On Thu, Sep 26, 2019 at 6:45 PM Leonid Ryzhyk <lryzhyk at vmware.com> wrote:
>
> > > > Q: Why doesn't the ddlog compiler generate Out_IP_Multicast and only
> > > > "input IP_Multicast" in OVN_Southbound.dl.
>  > > > A: For every table to which ovn-northd-ddlog needs to write we need to
> > > > update northd/automake.mk. Could we make this more robust or is it
> > > > fine to keep adding tables/keys/readonly fields in the makefile?
> > >
> > > I can see how this is error-prone, but I am not sure how to improve things other
> > > than adding documentation (which you already mostly did)
> > >
> > > Problem is, I cannot extract this information automatically from the OVSDB schema,
> > > so somehow the user has to tell the tool about all these attributes. We could store
> > > these attributes in a separate configuration file or even add them as annotations
> > > to the OVSDB schema, but I am not sure this will significanly improve the usability.
> > > I am of course open to suggestions.
> >
> > I agree that it would be hard to automate so I see two options:
> > 1. generate all the tables (including Proxy) for all OVSDB tables in
> > the schema. One issue with that would be the read-only columns and
> > keys which would still need to be manually specified. Is there any
> > other downside to this? Compile time increase?
>
> Generating output tables for everything will actually break things. The way DDlog is setup for ovn-northd is that it compares all tables labeled as output (`-o` in `automake.mk`) generated by the DDlog program against the current state in OVSDB and generates an OVSDB transaction to sync OVSDB state with DDlog. If the DDlog program does not produce any records for an OVSDB table labeled as output, this will have the effect of deleting everything from that table.

I see, thanks for clearing it up for me.

>
> > 2. add a new ovsdb2ddlog argument to pass a "spec" file with the
> > tables that need to be translated to ddlog. Mainly what we have now in
> > automake.mk but isolated in its own file to have a clearer view.
> >
> > If it's not too much work I would go for option 2. Any thoughts?
>
> Agreed, this will also resolve https://github.com/ovn-org/ovn/issues/15
>
> I created an issue for this:
> https://github.com/vmware/differential-datalog/issues/393
>
> Also, just want to mention that, once we have the simplified OVSDB interface, we should no longer need to specify keys (the `-k` switch), since the `uuid` column will always act as a key.

Nice!

>
> > > > Q: Is there a way to leave a field uninitialized in an output
> > > > relation? For example "eth_src" in sb.IP_Multicast is of type "string"
> > > > in the schema but I couldn't find a way to leave it uninitialized
> > > > (NULL) in DDlog.
> > >
> > > If a column is declared with multiplicity `"min": 0, "max": 1`, then it ends up being compiled
> > > into a DDlog `Set<>` (it should really be  `Option<>` , but sets made life a bit easier for me;
> > > this is one of the things I would like to improve in the future), and then an empty
> > > set can be used to represent `NULL`. I was under the impression that a column with
> > > multilplicity of exactly `1`, such as `sb.IP_Multicast` is not allowed to be NULL. Am I wrong?
> > > If so, when exactly are NULLs allowed by OVSDB?
> >
> > Actually the definition of sb.IP_Multicast.eth_src is:
> >
> > "eth_src": {"type": "string"}
>
> Which, I think, implies multiplicity 1.
>
> > I guess the only case when NULLs occur in OVSDB is for strings.
>
> Do you know if this is specified anywhere? I could not find anything relevant in the OVSDB RFC.

I'm not really sure tbh. Maybe the others can comment on this too.

>
> > > > - Maybe the ddlog runtime could pass more information to the callback
> > > > that is registered in ddlog_run. Right now we can only dump the record
> > > > and deduce the operation from the weight argument. I didn't
> > > > investigate enough but it would be nice if we could filter what types
> > > > of records we want the callback to be called for.
> > >
> > > The callback also takes `table_id`, which can be used to filter out only
> > > records that belong to a subset of relations of interest. It can be used
> > > in conjunction with `ddlog_get_table_id`, which maps table name to id.
> > > We could also allow the user to subscribe to a subset of relations as you
> > > are suggesting. My only concern is that it will make the DDlog API more
> > > complex.
> >
> > Ah, right, we have the table_id. I missed that. Is there also a
> > ddlog_get_table_name(table_id) API? I didn't see any when I looked
> > last.
>
> Nope, but there has to be one: https://github.com/vmware/differential-datalog/issues/394

Ack.

>
> > > Also, I want to mention that a better way to observe changes to relations is
> > > to use `ddlog_transaction_commit_dump_changes` instead of
> > > `ddlog_transaction_commit`.  It takes a callback with the same
> > > signature as `ddlog_run`, but this callback will be invoked at most once
> > > for every record modified during the transaction. The `ddlog_run`
> > > callback does not offer the same guarantee.
> >
> > Interesting. But this will be called at commit time, right? I was
> > thinking that the advantage of a callback in ddlog_run would be that
> > the callback is called in the order the updates happen whereas at
> > commit time we lose that ordering. I'm not completely sure if that's
> > relevant, it might be just my imperative-programming style of
> > thinking.
>
> As a matter of fact, both callbacks are called at commit time. DDlog does not start computing until `ddlog_transaction_commit()` or `ddlog_transaction_commit_XXX()` is invoked. The difference is that the `ddlog_run()` callback is invoked instantly when DDlog derives a new output record or deletes an existing record.  The downside is, because of how differential dataflow works, you may see funny behaviors, e.g., the same record can be deleted and re-derived multiple times.  The `ddlog_transaction_commit_dump_changes()` callback, on the other hand, is invoked after all computation has completed and DDlog knows exactly what's changed.

Then maybe it's better for the developer to be able to decide where to
use the callback (either with ddlog_run() or with
ddlog_transaction_commit()). So it would be nice if the provenance
information we discussed about would be passed to the callbacks in
both cases.

Thanks,
Dumitru

>
> > > > Also, I know you
> > > > mentioned that we can't answer provenance queries but I see that in
> > > > the lib.rs generated code the part where relations are created is
> > > > annotated with a comment which is actually the .dl rule definition.
> > > > That would be great to pass (if not .dl file and line number) to the
> > > > update_handler that gets called when we register a callback to
> > > > ddlog_run:
> > >
> > > This is a good idea, and we can probably do something like that. But there is a
> > > caveat that a record in DDlog can be derived multiple times by the same or
> > > different rules. Tracking all derivations can be very expensive, and, more
> > > importanly, changes the semantics of the DDlog program. One workaround
> > > is to make this derivation metadata a hidden field, so that records with
> > > the same value, but different derivations appear identical to DDlog.  This way
> > > we will end up observing a single, random, and possibly outdated, derivation
> > > of each output record, but that might still be helpful for debugging:
> > >
> > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fvmware%2Fdifferential-datalog%2Fissues%2F103&amp;data=02%7C01%7Clryzhyk%40vmware.com%7Cff2e7291b62e44b94b1808d7425ef9b7%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637050848019583600&amp;sdata=zLF4z9TrvDvpeLNRGDMjHD14JlxpvXWt7ix78D4Tnvo%3D&amp;reserved=0
> >
> > I think that as long as we can see that the derivation is possibly
> > outdated it might still be helpful. Better than nothing anyway :)
>
> Ack.
>
> Leonid


More information about the dev mailing list