[ovs-dev] [PATCH 15/15] ovsdb: Introduce experimental support for clustered databases. (2/2)

Ben Pfaff blp at ovn.org
Sun Mar 25 00:46:26 UTC 2018


I applied this series to master, with the exception of ovn-ctl, which
needs some extra work and will come in a separate patch to be posted
later.

On Fri, Mar 23, 2018 at 03:27:24PM -0700, Ben Pfaff wrote:
> Thank you for the review.  As before, I am applying most of your
> comments and not bothering to respond to them, so below you can find
> what needed a reply.
> 
> On Sat, Feb 24, 2018 at 03:17:51PM -0800, Justin Pettit wrote:
> > > On Dec 31, 2017, at 9:16 PM, Ben Pfaff <blp at ovn.org> wrote:
> > > diff --git a/ovsdb/raft-private.c b/ovsdb/raft-private.c
> > > new file mode 100644
> > > index 000000000000..457d1292a949
> > > --- /dev/null
> > > +++ b/ovsdb/raft-private.c
> > > 
> > > +struct ovsdb_error * OVS_WARN_UNUSED_RESULT
> > > +raft_address_validate(const char *address)
> > > +{
> > > ...
> > > +        return ovsdb_error(NULL, "%s: expected \"tcp\" or \"ssl\" address",
> > > +                           address);
> > 
> > I assume you're downplaying "unix:", since it only makes sense for testing purposes?
> 
> Yes.  I don't want to encourage users to use it, since it isn't useful
> for distribution but only for testing..
> 
> > > +char *
> > > +raft_address_to_nickname(const char *address, const struct uuid *sid)
> > > +{
> > 
> > I think it may be worth explaining what a nickname is and how one gets one.
> 
> OK, I added a comment.
> 
> > > +void
> > > +raft_servers_format(const struct hmap *servers, struct ds *ds)
> > > +{
> > > +    int i = 0;
> > > +    const struct raft_server *s;
> > > +    HMAP_FOR_EACH (s, hmap_node, servers) {
> > > +        if (i++) {
> > > +            ds_put_cstr(ds, ", ");
> > > +        }
> > > +        ds_put_format(ds, SID_FMT"(%s)", SID_ARGS(&s->sid), s->address);
> > 
> > Do you think it's worth putting a space between the SID and parenthetical address?
> 
> It's actually more readable the way it is.  I spent a lot of time
> reading this stuff.
> 
> > > +static void
> > > +raft_append_request_from_jsonrpc(struct ovsdb_parser *p,
> > > +                                 struct raft_append_request *rq)
> > > +{
> > > ...
> > > +
> > > +    const struct json *log = ovsdb_parser_member(p, "log", OP_ARRAY);
> > > +    if (!log) {
> > > +        return;
> > > +    }
> > 
> > Should this parser argument include OP_OPTIONAL?
> 
> No, an append request always includes a log entry array (although I
> guess it could be an empty array).  Can you help me to understand what
> makes you think so?  Perhaps there is some inconsistency here that I do
> not yet see, that I should fix.
> 
> > > +static void
> > > +raft_append_reply_to_jsonrpc(const struct raft_append_reply *rpy,
> > > +                             struct json *args)
> > > +{
> > > ...
> > > +    json_object_put_string(args, "result",
> > > +                           raft_append_result_to_string(rpy->result));
> > 
> > I believe raft_append_result_to_string() can return NULL, which could
> > cause a problem here.
> 
> The code shouldn't try to send an invalid RPC; if it does, then a
> segfault will allow the problem to be diagnosed about as well as an
> assertion failure would.
> 
> > > +static void
> > > +raft_format_append_reply(const struct raft_append_reply *rpy, struct ds *s)
> > > +{
> > > ...
> > > +    ds_put_format(s, " result=\"%s\"",
> > > +                  raft_append_result_to_string(rpy->result));
> > 
> > I don't think this will crash, but it will print strange if
> > 'rpy->result' is not valid.
> 
> The code shouldn't try to print an invalid RPC; such an RPC, if
> received, would fail to demarshal before it got to that point.
> 
> > Is it intentional that "n_entries" and "prev_log_*" aren't printed?
> 
> They aren't very useful for debugging and clutter the output.
> 
> > > +void
> > > +raft_rpc_format(const union raft_rpc *rpc, struct ds *s)
> > > +{
> > > +    ds_put_format(s, "%s", raft_rpc_type_to_string(rpc->type));
> > > +    if (rpc->common.comment) {
> > > +        ds_put_format(s, " \"%s\"", rpc->common.comment);
> > > +    }
> > > +    ds_put_char(s, ':');
> > 
> > Is printing the SID not important?
> 
> It's better to let the caller print it, since it has more context that
> allows for more human-friendly names (e.g. those nicknames you asked
> about).
> 
> I added a comment.
> 
> > > +/* Creates a database file that represents a new server in an existing Raft
> > > + * cluster.
> > > + *
> > > + * Creates the local copy of the cluster's log in 'file_name', which must not
> > > + * already exist.  Gives it the name 'name', which must be the same name
> > > + * passed in to raft_create_cluster() earlier.
> > 
> > Is raft_create_cluster() supposed to be called before this?  I think
> > raft_create_cluster() would have already created the log file, which
> > the previous sentence said must not already exist.
> 
> Every server needs a log file, not just the first server in the cluster.
> 
> > > + * This only creates the on-disk file and does no network access.  Use
> > > + * raft_open() to start operating the new server.  (Until this happens, the
> > > + * new server has not joined the cluster.)
> > 
> > The name for this function (and the corresponding ovsdb-tool command)
> > seems a bit confusing in this way.  Not a big deal, but it doesn't
> > seem obvious.
> 
> I'm open to better names, can you suggest one?
> 
> > > + * Returns null if successful, otherwise an ovsdb_error describing the
> > > + * problem. */
> > > +struct ovsdb_error * OVS_WARN_UNUSED_RESULT
> > > +raft_join_cluster(const char *file_name,
> > > +                  const char *name, const char *local_address,
> > > +                  const struct sset *remote_addresses,
> > > +                  const struct uuid *cid)
> > > +{
> > > ...
> > > +    /* Write log file. */
> > > +    struct raft_header h = {
> > > +        .sid = uuid_random(),
> > > +        .cid = cid ? *cid : UUID_ZERO,
> > > +        .name = xstrdup(name),
> > > +        .local_address = xstrdup(local_address),
> > > +        .joining = true,
> > > +    };
> > > +    sset_clone(&h.remote_addresses, remote_addresses);
> > > +    error = ovsdb_log_write_and_free(log, raft_header_to_json(&h));
> > > +    raft_header_uninit(&h);
> > 
> > I don't think anything initializes 'h->snap', which seems like it
> > could be a problem for raft_header_to_json() and raft_header_uninit().
> 
> In C, an initializer always initializes every member, and those
> functions appear to properly handle nulls.
> 
> I added a comment.
> 
> > > +static struct json *
> > > +raft_servers_for_index(const struct raft *raft, uint64_t index)
> > 
> > I think it would be worth mentioning that the caller must free the
> > return value.
> 
> Good catch.  The existing caller didn't, so there was a bug here.
> Should be fixed now.
> 
> > > +static void
> > > +raft_add_conn(struct raft *raft, struct jsonrpc_session *js,
> > > +              const struct uuid *sid, bool incoming)
> > > +{
> > > ...
> > > +    if (sid) {
> > > +        conn->sid = *sid;
> > > +    }
> > > +    conn->nickname = raft_address_to_nickname(jsonrpc_session_get_name(js),
> > > +                                              &conn->sid);
> > 
> > The 'sid' argument can be null, which seems like it could be a problem with raft_address_to_nickname().
> 
> Yes, the nickname can end up being 0000 for a while, but it changes to a
> better one on the first received RPC.  It's not ideal, admittedly, but
> it's clear enough in the logs.
> 
> > > +static bool
> > > +raft_conn_receive(struct raft *raft, struct raft_conn *conn,
> > > +                  union raft_rpc *rpc)
> > > +{
> > > ...
> > > +    } else if (!uuid_equals(&conn->sid, &rpc->common.sid)) {
> > > +        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
> > > +        VLOG_WARN_RL(&rl,
> > > +                     "%s: remote server ID changed from "SID_FMT" to "SID_FMT,
> > > +                     jsonrpc_session_get_name(conn->js),
> > > +                     SID_ARGS(&conn->sid), SID_ARGS(&rpc->common.sid));
> > 
> > If the SID changed, shouldn't we update "conn->sid"?
> 
> It's more of an error message.  The sid shouldn't change.  I updated the
> code to make that clearer, and to discard the message.
> 
> > > +static void
> > > +raft_run_reconfigure(struct raft *raft)
> > > +{
> > > ...
> > > +    /* Remove a server, if one is scheduled for removal. */
> > > +    HMAP_FOR_EACH (s, hmap_node, &raft->servers) {
> > > +        if (s->phase == RAFT_PHASE_REMOVE) {
> > > +            hmap_remove(&raft->servers, &s->hmap_node);
> > > +            raft->remove_server = s;
> > 
> > Are we sure that 'raft->remove_server' will only be assigned once?  I
> > think it might be worth asserting that it's null before assigning
> > here.
> 
> We can be sure that it is null because we set it to null earlier in the
> function.
> 
> > > +/* Returns true if 'raft' has grown enough that reducing the log to a snapshot
> > > + * would be valuable, false otherwise.  When this function returns true, the
> > > + * client should consider using raft_store_snapshot() to reduce the log storage
> > > + * requirements. */
> > > +bool
> > > +raft_grew_lots(const struct raft *raft)
> > > +{
> > > +    return (!raft->joining
> > > +            && !raft->leaving
> > > +            && !raft->left
> > > +            && !raft->failed
> > > +            && raft->last_applied - raft->log_start >= 100
> > > +            && ovsdb_log_grew_lots(raft->log));
> > 
> > ovsdb_log_grew_lots() is based on the filesize.  Raft entries are
> > bigger.  It's probably not significant, but should this difference be
> > taken that into account?
> 
> I think that doubling the size of a file should be independent of the
> size of the individual entries.  It's meant to be, anyhow.
> 
> > > +/* Replaces the log for 'raft', up to the last log entry read, by
> > > + * 'new_snapshot_data'.  Returns NULL if successful, otherwise an error that
> > > + * the caller must eventually free. */
> > > +struct ovsdb_error * OVS_WARN_UNUSED_RESULT
> > > +raft_store_snapshot(struct raft *raft, const struct json *new_snapshot_data)
> > 
> > I think it would be worth mentioning that this function doesn't take
> > ownership of 'new_snapshot_data'.
> 
> I still think that 'const' is a good enough hint.  I always assume that
> const arguments don't transfer ownership, unless there's a comment to
> the contrary.
> 
> > Some final random things:
> > 
> > 	- It might be nice to add some tests that use "tcp" instead of "unix" sockets.  I thought I noticed some instances in the code that could be problematic.
> > 
> > 	- I think corner cases, such as 'raft->failed' getting set to true aren't handled in the client code.
> > 
> > 	- It's very difficult to determine whether an argument is consumed or not, or whether the return value should be freed.  Better documentation (particularly for public functions) would help.
> >
> > Thanks for implementing this important feature!
> > 
> > Acked-by: Justin Pettit <jpettit at ovn.org>
> 
> Thanks.  I'll look at those last things and polish this a bit more and
> then commit it to master.


More information about the dev mailing list