[ovs-dev] Let's talk the NB DB IDL Part I - things we've see scaling the networking-ovn to NB DB connection

Wed Aug 3 01:57:25 UTC 2016

"dev" <dev-bounces at openvswitch.org> wrote on 08/02/2016 04:25:03 PM:

> From: Ryan Moats/Omaha/IBM at IBMUS
> To: ovs-dev <dev at openvswitch.org>
> Date: 08/02/2016 04:26 PM
> Subject: [ovs-dev] Let's talk the NB DB IDL Part I - things we've
> see scaling the networking-ovn to NB DB connection
> Sent by: "dev" <dev-bounces at openvswitch.org>
>
>
> As folks know, we've been working on scaling up OVN coupled
> with openstack and we are now starting to hit issues in the
> networking-ovn to NB DBconnection.
>
> First, this code from ./ovsdb/execution.c (L693-707) looks
> a bit funny:
>
> if (!strcmp(json_string(until), "==") != equal) {
>     if (timeout && x->elapsed_msec >= timeout_msec) {
>         if (x->elapsed_msec) {
>             error = ovsdb_error("timed out",
>                                 "\"wait\" timed out after %lld ms",
>                                 x->elapsed_msec);
>         } else {
>             error = ovsdb_error("timed out", "\"wait\" timed out");
>         }
>     } else {
>         /* ovsdb_execute() will change this, if triggers really are
>          * supported. */
>         error = ovsdb_error("not supported", "triggers not supported");
>     }
> }
>
> Specifically, returning a message for "timed out" when a
> timeout of 0 seconds has been specified and the conditional
> has not matched is misleading at best. I think it would make
> more sense to say "wait condition not met" or something like
> that.
>
> The second issue is a bit more serious.  Right now, before the
> networking-ovn IDL will add a new logical switch port and
> associated ACLs (and if my memory serves me, it creates seven
> for each port), there are a pair of wait clauses that have to
> be met.  The first of these is that the ports and ACLs for
> the Logical Switch have to have a particular list of UUIDs
> in each, and each entire list is sent in the transaction.
>
> Is this strictly necessary?  I ask because I'm looking at
> scaling to a point where the wait condition will have
> 8000 uuids for the logical switch ports part and
> 56000 uuids for the ACLs part and I haven't yet figured
> out why it is needed...

I've done some more reading of code and docs and I now understand
the reason for why the wait clauses are there (to ensure that the
read before an insert isn't dirty).  I'm wondering if we could
reduce the amount of data being sent by sending the number of
elements in the list and two independent hashes of the element
data (why two? because I'm paranoid about collisions). In the
scale case I'm looking at above, that would reduce the amount of
data from 64000*36 bytes to 9+2*the size of hash1+2*the size of
hash2...