[ovs-dev] [PATCH 15/15] ovsdb: Introduce experimental support for clustered databases. (1/2)

Ben Pfaff blp at ovn.org
Fri Mar 23 16:30:43 UTC 2018


Thanks for the review.  I just went ahead and fixed most of this stuff
and so I'm only including below the bits that needed some kind of
comment.

On Fri, Feb 16, 2018 at 01:37:04AM -0800, Justin Pettit wrote:
> > On Dec 31, 2017, at 9:16 PM, Ben Pfaff <blp at ovn.org> wrote:
> > diff --git a/lib/jsonrpc.c b/lib/jsonrpc.c
> > index f8786f909ac8..0b8c1468b8d1 100644
> > --- a/lib/jsonrpc.c
> > +++ b/lib/jsonrpc.c
> > ...
> > +struct jsonrpc *
> > +jsonrpc_session_steal(struct jsonrpc_session *s)
> > +{
> > +    struct jsonrpc *rpc = s->rpc;
> > +    s->rpc = NULL;
> > +    jsonrpc_session_close(s);
> > +    return rpc;
> > +}
> 
> I don't think there are any external callers to this function, so it
> could be declared static.

Its only caller is in ovsdb-client.c.

> > diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c
> > index c286ccbcaf8d..103261853952 100644
> > --- a/ovn/controller/ovn-controller.c
> > +++ b/ovn/controller/ovn-controller.c
> > @@ -615,6 +615,7 @@ main(int argc, char *argv[])
> >     char *ovnsb_remote = get_ovnsb_remote(ovs_idl_loop.idl);
> >     struct ovsdb_idl_loop ovnsb_idl_loop = OVSDB_IDL_LOOP_INITIALIZER(
> >         ovsdb_idl_create(ovnsb_remote, &sbrec_idl_class, true, true));
> > +    ovsdb_idl_set_leader_only(ovnsb_idl_loop.idl, false);
> > 
> >     create_ovnsb_indexes(ovnsb_idl_loop.idl);
> >     lport_init(ovnsb_idl_loop.idl);
> 
> Is there a reason ovn-controller doesn't try to connect the leader?
> My guess is that this will have better performance, and most of the
> writing it does is fairly siloed, but there are things such as MAC
> bindings that could be shared.  I suppose if there are multiple
> writers to the same address there may be more serious problems.

ovn-controller is the main daemon that will benefit from improved OVSDB
read performance, since it's the only one that connects to a central
OVSDB and runs on many chassis.  If there isn't a performance benefit
for ovn-controller then we might as well not bother with connecting to
anything other than the leader.

> > start_sb_ovsdb() {
> > -    # Check and eventually start ovsdb-server for Southbound DB
> > -    if ! pidfile_is_running $DB_SB_PID; then
> > +    # Check and eventually start ovsdb-server for Northbound DB
> 
> This should probably be "Southbound".

Thanks.

> Since the functions are basically identical in structure, the comments I mentioned in ovn_nb_ovsdb() apply here as well.

Oops.  These functions are identical except for s/nb/sb/, so I
refactored them into a single function.

> > @@ -494,6 +555,14 @@ File location options:
> >   --db-sb-sync-from-port=ADDR OVN Southbound active db tcp port (default: $DB_SB_SYNC_FROM_PORT)
> >   --db-sb-sync-from-proto=PROTO OVN Southbound active db transport (default: $DB_SB_SYNC_FROM_PROTO)
> >   --db-sb-create-insecure-remote=yes|no Create ptcp OVN Southbound remote (default: $DB_SB_CREATE_INSECURE_REMOTE)
> > +  --db-nb-cluster-local-addr=ADDR OVN_Northbound cluster local address \
> > +  (default: $DB_NB_CLUSTER_LOCAL_ADDR)
> > +  --db-nb-cluster-remote-addr=ADDR OVN_Northbound cluster remote address \
> > +  (default: $DB_NB_CLUSTER_REMOTE_ADDR)
> > +  --db-sb-cluster-local-addr=ADDR OVN_Northbound cluster local address \
> > +  (default: $DB_SB_CLUSTER_LOCAL_ADDR)
> > +  --db-sb-cluster-remote-addr=ADDR OVN_Northbound cluster remote address \
> > +  (default: $DB_SB_CLUSTER_REMOTE_ADDR)
> 
> I believe these last two should say "OVN_Southbound cluster".
> 
> These commands don't appear to be documented in the ovn-ctl man page.  It looks like that man page is generally missing a lot descriptions for options, but I'm wondering if some better guidance for starting up OVN in a cluster would be helpful.  For example, I think it would be a problem if one were to specify both a local and remote address.
> 
> > diff --git a/ovsdb/_server.xml b/ovsdb/_server.xml
> > index 8ef782fb97b2..e4536671ccbe 100644
> > --- a/ovsdb/_server.xml
> > +++ b/ovsdb/_server.xml
> > @@ -37,13 +37,13 @@
> > 
> >     <p>
> >       When a database is removed from the server, in addition to
> > +      <code>Database</code> table updates, the server sends
> > +      <code>canceled</code> messages, as described in RFC 7047 section 4.1.4,
> 
> Should that be "cancel" message instead of "canceled"?

No, the servers sends the "canceled" message described in RFC 7047
section 4.1.4 in this case.

> Also, from the description in 4.1.4, it sounds like just a client
> sends the "cancel" message to the server, not vice versa.

Yes, the client sends the "cancel" message.  The server sends the
"canceled" message.

> > diff --git a/ovsdb/file.c b/ovsdb/file.c
> > index dadb988d3088..d01e54fbe6ae 100644
> > --- a/ovsdb/file.c
> > +++ b/ovsdb/file.c
> > ...
> > struct json *
> > +ovsdb_to_txn_json(const struct ovsdb *db, const char *comment)
> > {
> 
> I think it would be nice to mention that 'comment' is not consumed.

It would be very unusual for a function to consume a const argument,
since that implies that it eventually frees it, which it can't without
casting away the const.  I don't think that it is worth noting the
contrary.

> > +struct json *
> > +ovsdb_file_txn_annotate(struct json *json, const char *comment)
> > {
> 
> Same here.

Ditto.

> > diff --git a/ovsdb/jsonrpc-server.c b/ovsdb/jsonrpc-server.c
> > index df268cd4eedc..0df08da341be 100644
> > --- a/ovsdb/jsonrpc-server.c
> > +++ b/ovsdb/jsonrpc-server.c
> > 
> > @@ -1099,7 +1126,10 @@ ovsdb_jsonrpc_trigger_create(struct ovsdb_jsonrpc_session *s, struct ovsdb *db,
> >     }
> > 
> >     if (disconnect_all) {
> > -        ovsdb_jsonrpc_server_reconnect(s->remote->server, false);
> > +        ovsdb_jsonrpc_server_reconnect(s->remote->server, false,
> > +                                       xasprintf("committed %s database "
> > +                                                 "schema conversion",
> > +                                                 db->name));
> 
> Is this the correct comment for this circumstance?

This message is correct, but maybe it is surprising, so I added a
comment.

> > diff --git a/ovsdb/ovsdb-client.c b/ovsdb/ovsdb-client.c
> > index b00f04147d39..de23cc14bbb0 100644
> > --- a/ovsdb/ovsdb-client.c
> > +++ b/ovsdb/ovsdb-client.c
> > ...
> > +static int
> > +open_rpc(int min_args, enum args_needed need,
> > +         int argc, char *argv[], struct jsonrpc **rpcp, char **databasep)
> > +{
> > +    struct svec remotes = SVEC_EMPTY_INITIALIZER;
> > +    struct uuid cid = UUID_ZERO;
> > +
> > +    /* First figure out the remote(s).  If the first command-line argument has
> > +     * the form of a remote, use it, otherwise use the default. */
> 
> Do you think it's worth specifying what the default is?  I think it's
> "Open_vSwitch" if it's defined or the database that doesn't begin with
> "_" if there's a single database.

The default remote is unix:$RUNDIR/db.sock.

default_remote() is just a few lines up and it's a single line; I don't
think there's much value in copying that here.

> > diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c
> > index f7bf1e270120..68584f396d10 100644
> > --- a/ovsdb/ovsdb-server.c
> > +++ b/ovsdb/ovsdb-server.c
> > 
> > @@ -201,10 +216,25 @@ main_loop(struct ovsdb_jsonrpc_server *jsonrpc, struct shash *all_dbs,
> > 
> > +        struct shash_node *next;
> > +        SHASH_FOR_EACH_SAFE (node, next, all_dbs) {
> >             struct db *db = node->data;
> >             if (ovsdb_trigger_run(db->db, time_msec())) {
> > +                ovsdb_jsonrpc_server_reconnect(
> > +                    jsonrpc, false,
> > +                    xasprintf("committed %s database schema conversion",
> > +                              db->db->name));
> 
> Is this the correct message?

Yes.  I added a comment.

> > diff --git a/ovsdb/ovsdb-tool.1.in b/ovsdb/ovsdb-tool.1.in
> > index 7b89ffeec8bf..3efa6a3e7032 100644
> > --- a/ovsdb/ovsdb-tool.1.in
> > +++ b/ovsdb/ovsdb-tool.1.in
> > @@ -15,6 +15,9 @@ ovsdb\-tool \- Open vSwitch database management utility
> > .IP "Database Creation Commands:"
> > \fBovsdb\-tool \fR[\fIoptions\fR] \fBcreate \fR[\fIdb\fR [\fIschema\fR]]
> > .br
> > +\fBovsdb\-tool \fR[\fIoptions\fR] \fBcreate\-cluster \fIdb contents address\fR
> > +.br
> > +\fBovsdb\-tool [\fB\-\-cid=\fIuuid\fR] \fBjoin\-cluster\fI db name local remote\fR...
>
> This isn't just a documentation issue, but the other ovsdb-tool
> commands allow the "db" and "schema" arguments to be optional, but
> none of the new cluster commands do.  Was that intentional?

The defaults for the other commands are oriented around the default
database being the ovs-vswitchd database and the default schema being
the ovs-vswitchd schema.  Those don't make sense here.

> > +.IP "\fBcreate\-cluster\fI db contents local"
> > ...
> > +point of failure.  It creates clustered database file \fIdb\fR and
> > +configures the server to listen on \fIlocal\fR, which must take the
> > +form \fIprotocol\fB:\fIip\fB:\fIport\fR, where \fIprotocol\fR is
> > +\fBtcp\fR or \fBssl\fR, \fIip\fR is the server's IP (either an IPv4
> > +address or an IPv6 address enclosed in square brackets), and
> > +\fIport\fR is a TCP port number.  Only one address is specified, for
> 
> > +\fIport\fR is a TCP port number.  Only one address is specified, for
> > +the first server in the cluster, ordinarily the one for the server
> > +running \fBcreate\-cluster\fR.  The address is used for communication
> 
> For "create-cluster", it sounds like "local" will set be set up in
> what we'd call a "passive" mode, but the argument is one that is in
> the style of "active".  I wonder if that will cause any confusion.

I have great faith in the OVS community to read the documentation.

> > +static void
> > +do_show_log_standalone(struct ovsdb_log *log)
> > +{
> > +    struct shash names = SHASH_INITIALIZER(&names);
> > ...
> > +    ovsdb_schema_destroy(schema);
> > +    /* XXX free 'names'. */
> > +}
> 
> This comment about freeing 'names' was in the previous version, too.  Should it be addressed?

That's a good idea but orthogonal.  I'll send a separate patch.

> > +static void
> > +print_raft_record(const struct raft_record *r,
> > +                  struct ovsdb_schema **schemap, struct shash *names)
> > +{
> > +    if (r->comment) {
> > +        printf(" comment: \"%s\"\n", r->comment);
> > +    }
> > +    if (r->term) {
> > +        printf(" term: %"PRIu64"\n", r->term);
> > +    }
> > +
> > +    switch (r->type) {
> > +    case RAFT_REC_ENTRY:
> > +        printf(" index: %"PRIu64"\n", r->entry.index);
> > +        print_servers("servers", r->entry.servers);
> > +        if (!uuid_is_zero(&r->entry.eid)) {
> > +            printf(" eid: %04x\n", uuid_prefix(&r->entry.eid, 4));
> > +        }
> > +        print_data("", r->entry.data, schemap, names);
> > +        break;
> > 
> 
> The record printouts seem to contain a lot of newlines.  I suspect (hope!) most people using the logs are trying to debug their applications and not Raft, so I'm wondering if it would be better if printing the log took up less vertical space.  Would it make sense to combine some of this information on the same line
> 
> > -void
> > ovsdb_destroy(struct ovsdb *db)
> > {
> 
> Should this function be freeing 'triggers'?

There must be no triggers at this point.  I added a comment and an
assertion.

> > +        ovsdb_tool db-local-address $DB_FILE
> > +        if [ "$?" = "1" ]; then
> > +            version=`ovsdb_tool db-version "$DB_FILE"`
> > +            cksum=`ovsdb_tool db-cksum "$DB_FILE" | awk '{print $1}'`
> > +            backup=$DB_FILE.backup$version-$cksum
> > +            action "Backing up database to $backup" cp "$DB_FILE" "$backup" || return 1
> > +
> > +            action "Creating cluster database $DB_FILE from existing one" \
> > +            ovsdb_tool create-cluster "$DB_FILE" "$backup" "$LOCAL_ADDR"
> > +        fi
> > +    fi
> > +}
> 
> Here are a few other random things that i think might be worth looking at:
> 
> 	- I'm seeing some reported memory leaks.  You may want to try running some of the unit tests under valgrind (e.g., "2242: insert row, query table, commit durably - cluster of 5").

Thanks for pointing that out.  There were several minor leaks.  I fixed
them.

> 	- checkpatch has quite a few errors and warnings.  A few of them actually look correct.

Thanks, I'll check into those.

> 	- It looks like support for the clustered database wasn't added to the Python IDL.  Do you plan on adding that?

I think it'll have to wait for another series.  It sounds like Han Zhou
might take it up.


More information about the dev mailing list