[ovs-dev] DDLog after one week
mmichels at redhat.com
Mon Sep 23 13:04:08 UTC 2019
I'm adding the ovs-dev list since this discussion may be useful to the
general development community.
See my responses inline.
On 9/20/19 8:04 PM, Leonid Ryzhyk wrote:
> Hi Mark,
> Firstly, many thanks for giving DDlog a try! I am extremely impressed
> that you managed to get this far without any help from us. Hopefully,
> we will be able to help make life easier for you going forward.
> Happy to hear that you see the pros of using DDlog in OVN. Your summary
> of where it helps to improve things is exactly what we intended it for.
> Let's see if we can help with some of the problems!
> * The need for Rust code:
> It's certainly true that DDlog adds yet another language (Rust) in
> the mix. One of the things on our roadmap is to expand DDlog's libraries
> with a more complete set of string manipulation routines, so that many
> of the functions that currently must be implemented in Rust can be coded
> directly in DDlog. Having said that, it's interesting that you had to
> implement your string manipulation in Rust. In my experience with OVN,
> most string processing logic is already implemented in C, so in most
> cases all I had to do was to write a Rust wrapper around an existing C
Unfortunately, this wasn't possible without exporting some new C
functions. In this particular case, the string parsing is taking care of
the "addresses" field of logical switch ports. The field can look like
any of the following:
"<MAC> <IPv4 address>"
"<MAC> <IPv6 address>"
"dynamic <IPv4 address>"
"dynamic <IPv6 address>"
The first five of these can be done using an exported C function
(extract_addresses(), I believe). The final two are handled internally
within ovn-northd.c and not exported. Therefore, in the ddlog version,
these two are handled in northd/ovn.rs. The sixth version is handled
with scan_static_dynamic_ip(). The final version was not handled at all,
so I added scan_static_dynamic_ip6() to attempt to cover it.
> * Failure resilience:
> I think your summary is correct. I don't really understand
> `ovn-northd-ddlog.c`, but Justin did mention that he plans to cleanup
> and harden it. DDlog itself should be able to handle intermittent
> failures, e.g., if we lose connection to Southbound DB and then restore
> it, DDlog can just pick up the new database state and generate the set
> of changes needed to bring it up to speed with the NB database.
Yes, my impressions were that this wasn't really a fault of DDLog as
much as the program driving it. I'm guessing that the reason for not
using the OVSDB IDL is that it doesn't have the hooks necessary for
DDLog. Rather than modify the OVSDB IDL, ovn-northd-ddlog.c essentially
copies the logic of the IDL and inserts code where necessary for DDLog
> * Debuggability:
> I agree that debugging is a problem. We keep building new debugging
> tools, and it sounds like you've already tried some of them, but I can
> totally see that these may not always be enough. I think the best
> strategy is to go case-by-case and figure out how we can make it easier
> to troubleshoot every particular type of error in the future. Can you
> point me to your code that fails?
I imagine you'll take one look and immediately realize what's wrong :)
From my POV, what would have made this easiest to figure out was the
ability to open a debugger and insert break points into the DDlog to
determine the current values at various steps. I think the closest I'd
be able to do right now would be to step through the generated Rust code
instead. It's possible that I could figure out what I'm doing wrong that
way, but it would be so much better if it were possible to debug at the
> * Compile times
> Yep, those are driving me nuts, and sadly I don't think we will have
> a radical solution in the near future. Have you tried
> `--enable-ddlog-fast-build`? In my experience, this speeds up
> compilation by a factor of 2, at the cost of the compiled code being
> twice slower, which is usually not a problem during development. Also,
> make sure you do _not_ use `--enable-ddlog-northd-cli` unless you need
> to do replay debugging with CLI. I think, with those two improvements I
> get compilation times in the order of 5 minutes on my laptop.
I have not enabled the fast build option, nor have I enabled
ddlog-northd-cli. I'll give the fast build option a try. Is there a
reason it is not enabled by default?
> *From:* Mark Michelson <mmichels at redhat.com>
> *Sent:* Friday, September 20, 2019 2:31 PM
> *To:* Leonid Ryzhyk <lryzhyk at vmware.com>; Numan Siddique
> <nusiddiq at redhat.com>; Lorenzo Bianconi <lbiancon at redhat.com>; Dumitru
> Ceara <dceara at redhat.com>
> *Subject:* DDLog after one week
> Hi Leonid,
> After learning, reading, and using DDLog for a week, I figure I should
> give my thoughts.
> On the positive side, I think that once we get used to writing DDLog, it
> will be easier to maintain and add to than the current incremental
> engine we have in the C code. The language requires different thinking
> than we're used to, but that's not a bad thing.
> Regarding language features, I like the memory safety and strong type
> system. It reminds me of using a functional language (e.g. Haskell) in
> that regard. I also like the fact that its rules engine makes for easy
> incremental processing of the code. I only have to think about how a
> relation translates to another relation. I don't need to think about the
> deltas to the existing items or anything like that.
> On the not-exactly-positive-but-also-not-exactly-negative side, I was
> surprised about the amount of Rust I might have to know and use when
> using DDLog. For instance, the issue I chose to take on required more
> Rust coding than DDLog coding. That's because my change required
> changing some string parsing code, and that's offloaded to Rust.
> I have three main negatives to share:
> 1) The current implementation is not resilient in the face of failures.
> For example, if ovsdb transactions don't go properly, there is a chance
> that the code can enter a busy loop and no longer function. I get the
> feeling that ovn-northd-ddlog.c was written with the mindset of getting
> something working quickly. It doesn't use the ovsdb IDL, and as a
> result, much of the state machine is re-copied locally. I don't think
> failure scenarios are properly handled.
> 2) Debuggability. While I was making my changes, I found that what I had
> written had resulted in a database commit failing. However, even when
> replaying DDLog, dumping a relevant DDLog relation, and adding printfs,
> I still have no idea why my code is wrong. I'm kind of just stuck at
> this point. Even more of a concern is debugging live deployments.
> 3) Compile times. While working on my improvement, I found that making
> changes and recompiling resulted in 10+ minute builds every time. It
> didn't matter how small or large my changes were. When trying to debug
> why a test is failing, and incrementally adding more changes to the
> code, this is excruciating.
> I've included Numan, Dumitru, and Lorenzo on this message so they can
> chime in with their thoughts as well. Note that for them it's already
> night time on Friday, so they may not add their thoughts until after the
> weekend :)
> Mark Michelson
More information about the dev