[ovs-dev] [PATCH 15/15] ovsdb: Introduce experimental support for clustered databases. (2/2)

aginwala aginwala at asu.edu
Thu Mar 22 20:50:31 UTC 2018


Test discussion @
https://mail.openvswitch.org/pipermail/ovs-discuss/2018-March/046440.html

Tested-by: aginwala <aginwala at asu.edu>

On Sat, Feb 24, 2018 at 3:17 PM, Justin Pettit <jpettit at ovn.org> wrote:

> >
> > On Dec 31, 2017, at 9:16 PM, Ben Pfaff <blp at ovn.org> wrote:
> >
> > diff --git a/Documentation/ref/ovsdb.5.rst b/Documentation/ref/ovsdb.5.
> rst
> > index f3e50976b5c7..0ab888996eac 100644
> > --- a/Documentation/ref/ovsdb.5.rst
> > +++ b/Documentation/ref/ovsdb.5.rst
> >
> > +When the server successfully joins the cluster, the database file is
> replaced
> > +by one in the general format described earlier.
>
> Just to be crystal clear, I think it be good to reference that it's the
> one described in the "Clustered Format" section.
>
> > diff --git a/ovsdb/raft-private.c b/ovsdb/raft-private.c
> > new file mode 100644
> > index 000000000000..457d1292a949
> > --- /dev/null
> > +++ b/ovsdb/raft-private.c
> >
> > +struct ovsdb_error * OVS_WARN_UNUSED_RESULT
> > +raft_address_validate(const char *address)
> > +{
> > ...
> > +        return ovsdb_error(NULL, "%s: expected \"tcp\" or \"ssl\"
> address",
> > +                           address);
>
> I assume you're downplaying "unix:", since it only makes sense for testing
> purposes?
>
> > +struct ovsdb_error * OVS_WARN_UNUSED_RESULT
> > +raft_address_validate_json(const struct json *address)
>
> This function is only used within this file, so it could be declared
> static if you don't think it's generally useful.
>
> > +char *
> > +raft_address_to_nickname(const char *address, const struct uuid *sid)
> > +{
>
> I think it may be worth explaining what a nickname is and how one gets one.
>
> > +static struct ovsdb_error * OVS_WARN_UNUSED_RESULT
> > +raft_servers_from_json__(const struct json *json, struct hmap *servers)
> > +{
> > ...
> > +        if (!uuid_from_string(&sid, node->name)) {
> > +            return ovsdb_syntax_error(json, NULL, "%s is a not a UUID",
>
> I think you can drop the first "a" in that error message.
>
> > +void
> > +raft_servers_format(const struct hmap *servers, struct ds *ds)
> > +{
> > +    int i = 0;
> > +    const struct raft_server *s;
> > +    HMAP_FOR_EACH (s, hmap_node, servers) {
> > +        if (i++) {
> > +            ds_put_cstr(ds, ", ");
> > +        }
> > +        ds_put_format(ds, SID_FMT"(%s)", SID_ARGS(&s->sid), s->address);
>
> Do you think it's worth putting a space between the SID and parenthetical
> address?
>
> > +static void
> > +raft_header_from_json__(struct raft_header *h, struct ovsdb_parser *p)
> > +{
> > ...
> > +    /* Parse "remotes", if present.
>
> Should that be "remote_addresses"?
>
> > +static void
> > +raft_record_from_json__(struct raft_record *r, struct ovsdb_parser *p)
> > +{
> >
> > +    /* All remaining types of log records include "term", plus at most
> one of:
> > +     *
> > +     *     - "index" plus zero or more of "data" and "servers".  If
> "data" is
> > +     *       present then "eid" may also be present.
> > ...
> > +        r->entry.data = json_nullable_clone(
> > +            ovsdb_parser_member(p, "data",
> > +                                OP_OBJECT | OP_ARRAY | OP_OPTIONAL));
> > +        r->entry.eid = (r->entry.data
> > +                        ? raft_parse_required_uuid(p, "eid")
> > +                        : UUID_ZERO);
>
> The description of "index" makes it sound like "eid" is optional if "data"
> is present, but this code seems to expect it to be set.
>
> > diff --git a/ovsdb/raft-private.h b/ovsdb/raft-private.h
> > new file mode 100644
> > index 000000000000..6e147fadb0ac
> > --- /dev/null
> > +++ b/ovsdb/raft-private.h
> > ...
> > +    char *nickname;             /* 1ab3(s3) */
>
> It might be nice to put the example in quotes.
>
> > +const char *raft_servers_get_nickname__(const struct hmap *servers,
> > +                                        const struct uuid *sid);
> > ...
> > +bool raft_parse_uuid__(struct ovsdb_parser *, const char *name, bool
> optional,
> > +                       struct uuid *);
>
> The coding style states that functions with a "__" suffix are used to
> indicate "internal use only".  I suppose this entire file is supposed to be
> internal, but it still strikes me as funny to see these function in header
> files.  What do you think?
>
> > diff --git a/ovsdb/raft-rpc.c b/ovsdb/raft-rpc.c
> > new file mode 100644
> > index 000000000000..617d4aa4eaad
> > --- /dev/null
> > +++ b/ovsdb/raft-rpc.c
> >
> > ...
> > +static void
> > +raft_append_request_from_jsonrpc(struct ovsdb_parser *p,
> > +                                 struct raft_append_request *rq)
> > +{
> > ...
> > +
> > +    const struct json *log = ovsdb_parser_member(p, "log", OP_ARRAY);
> > +    if (!log) {
> > +        return;
> > +    }
>
> Should this parser argument include OP_OPTIONAL?
>
> > +static void
> > +raft_append_reply_to_jsonrpc(const struct raft_append_reply *rpy,
> > +                             struct json *args)
> > +{
> > ...
> > +    json_object_put_string(args, "result",
> > +                           raft_append_result_to_string(rpy->result));
>
> I believe raft_append_result_to_string() can return NULL, which could
> cause a problem here.
>
> > +static void
> > +raft_format_append_reply(const struct raft_append_reply *rpy, struct
> ds *s)
> > +{
> > ...
> > +    ds_put_format(s, " result=\"%s\"",
> > +                  raft_append_result_to_string(rpy->result));
>
> I don't think this will crash, but it will print strange if 'rpy->result'
> is not valid.
>
> Is it intentional that "n_entries" and "prev_log_*" aren't printed?
>
> > +static void
> > +raft_format_vote_request(const struct raft_vote_request *rq, struct ds
> *s)
> > +{
> > +    ds_put_format(s, " term=%"PRIu64, rq->term);
> > +    ds_put_format(s, " last_log_index=%"PRIu64, rq->last_log_index);
> > +    ds_put_format(s, " last_log_term=%"PRIu64, rq->last_log_term);
> > +}
>
> Should "leadership_transfer" be printed?
>
> > +static void
> > +raft_install_snapshot_request_to_jsonrpc(
> > +    const struct raft_install_snapshot_request *rq, struct json *args)
> > +{
> > ...
> > +    json_object_put(args, "last_servers", json_clone(rq->last_servers));
> > ...
> > +
> > +    json_object_put(args, "data", json_clone(rq->data));
>
> I believe "rq->last_servers" and 'rq->data' may be null from
> raft_install_snapshot_request_clone().
>
> > +static void
> > +raft_format_install_snapshot_reply(
> > +    const struct raft_install_snapshot_reply *rpy, struct ds *s)
> > +{
> > +    ds_put_format(s, " term=%"PRIu64, rpy->term);
> > +}
>
> Was it intentional to leave off "last_index" and "last_term"?
>
> > +static void
> > +raft_execute_command_request_to_jsonrpc(
> > +    const struct raft_execute_command_request *rq, struct json *args)
> > +{
> > +    json_object_put(args, "data", json_clone(rq->data));
>
> Can 'rq->data' be null from raft_execute_command_request_clone()?
>
> > +static void
> > +raft_format_execute_command_request(
> > +    const struct raft_execute_command_request *rq, struct ds *s)
> > +{
> > ...
> > +    json_to_ds(rq->data, JSSF_SORT, s);
>
> Same question about 'rq->data' being null.
>
> > +static void
> > +raft_execute_command_reply_to_jsonrpc(
> > +    const struct raft_execute_command_reply *rpy, struct json *args)
> > +{
> > +    json_object_put_format(args, "result", UUID_FMT,
> UUID_ARGS(&rpy->result));
> > +    json_object_put_string(args, "status",
> > +                           raft_command_status_to_string(rpy->status));
>
> I believe raft_command_status_to_string() can return null.
>
> > +static void
> > +raft_format_execute_command_reply(
> > +    const struct raft_execute_command_reply *rpy, struct ds *s)
> > +{
> > +    ds_put_format(s, " result="UUID_FMT, UUID_ARGS(&rpy->result));
> > +    ds_put_format(s, " status=\"%s\"",
> > +                  raft_command_status_to_string(rpy->status));
>
> Same concern, but this wouldn't cause a crash.
>
> > +struct ovsdb_error * OVS_WARN_UNUSED_RESULT
> > +raft_rpc_from_jsonrpc(struct uuid *cidp,
> > +                      const struct uuid *sid,
> > +                      const struct jsonrpc_msg *msg,
> > +                      union raft_rpc *rpc)
> > +{
>
> I think it would be worth adding a comment to this function; behavior such
> as how 'cidp' is used is not obvious unless you read the code.
>
> > +void
> > +raft_rpc_format(const union raft_rpc *rpc, struct ds *s)
> > +{
> > +    ds_put_format(s, "%s", raft_rpc_type_to_string(rpc->type));
> > +    if (rpc->common.comment) {
> > +        ds_put_format(s, " \"%s\"", rpc->common.comment);
> > +    }
> > +    ds_put_char(s, ':');
>
> Is printing the SID not important?
>
> > diff --git a/lib/jsonrpc.h b/lib/jsonrpc.h
> > index 969a6ed38cd6..a44114e8dcd9 100644
> > --- a/lib/jsonrpc.h
> > +++ b/lib/jsonrpc.h
> >
> > struct raft_rpc_common {
> >     enum raft_rpc_type type;
> >     struct uuid sid;            /* SID of peer server. */
>
> I found it surprising in the code that 'sid' could be the source or
> destination depending on whether we were converting to or from jsonrpc.  I
> think it could be made clearer here.
>
> > diff --git a/ovsdb/raft.c b/ovsdb/raft.c
> > new file mode 100644
> > index 000000000000..9c358bf35687
> > --- /dev/null
> > +++ b/ovsdb/raft.c
> > ...
> > +/* An action deferred until a log write commits to disk. */
> > +struct raft_waiter {
> > +    struct ovs_list list_node;
> > +    uint64_t commit_ticket;
> > +
> > +    enum raft_waiter_type type;
> > +    union {
> > +        /* RAFT_W_ENTRY.
> > +         *
> > +         * Waits for a RAFT_REC_ENTRY write to our local log to
> commit.  Upon
> > +         * completion, updates log_synced to indicate that the new log
> entry or
> > +         * entries are committed and, if we are leader, also updates
> our local
> > +         * match_index. */
> > +        struct {
> > +            uint64_t index;
> > +        } entry;
> > +
> > +        /* RAFT_W_TERM.
> > +         *
> > +         * Waits for a RAFT_REC_TERM or RAFT_REC_VOTE record write to
> commit.
> > +         * Upon completion, updates synced_term and synced_vote, which
> triggers
> > +         * sending RPCs deferred by the uncommitted term and vote. */
> > +        struct {
> > +            uint64_t term;
> > +            struct uuid vote;
> > +        } term;
> > +
> > +        /* RAFT_W_RPC.
> > +         *
> > +         * Sometimes, sending an RPC to a peer must be delayed until an
> entry,
> > +         * a term, or a vote mentioned in the RPC is synced to disk.
> This
> > +         * waiter keeps a copy of such an RPC until the previous
> waiters have
> > +         * committed. */
> > +        union raft_rpc *rpc;
> > +    };
> > +};
>
> Minor, but I think it's a little easier to read when single quotes are
> used around references to struct members.
>
> > +/* The Raft state machine. */
> > +struct raft {
> > ...
> > +/* Persistent derived state.
> > + *
> > + * This must be updated on stable storage before responding to RPCs.
> It can be
> > + * derived from the header, snapshot, and log in 'log'. */
> > +
> > +    struct uuid cid;            /* Cluster ID (immutable for the
> cluster). */
> > +    struct uuid sid;            /* Server ID (immutable for the
> server). */
> > +    char *local_address;        /* Local address (immutable for the
> server). */
> > +    char *local_nickname;       /* Used for local server in log
> messages. */
> > +    char *name;                 /* Cluster name (immutable for the
> cluster). */
>
> I think 'name' is the schema name based on other references, including
> raft_get_name(), but this makes it sound different to me.
>
> > +/* Persistent state on all servers.
> > + *
> > + * Must be updated on stable storage before responding to RPCs. */
> > +
> > +    /* Current term and vote, which might be on the way to disk now. */
> > +    uint64_t term;              /* Initialized to 0 and only increases.
> */
> > +    struct uuid vote;           /* In 'term', or all-zeros if none. */
>
> I found the 'vote' description confusing.
>
> > +/* Creates an on-disk file that represents a new Raft cluster and
> initializes
> > + * it to consist of a single server, the one on which this function is
> called.
> > + *
> > + * Creates the local copy of the cluster's log in 'file_name', which
> must not
> > + * already exist.  Gives it the name 'name', which should be the
> database
> > + * schema name and which is used only to match up this database with
> server
> > + * added to the cluster later if the cluster ID is unavailable.
>
> "with *the* server"
>
> > + * The new server is located at 'local_address', which must take one of
> the
> > + * forms "tcp:IP[:PORT]" or "ssl:IP[:PORT]", where IP is an IPv4
> address or a
> > + * square bracket enclosed IPv6 address.  PORT, if present, is a port
> number
> > + * that defaults to RAFT_PORT.
>
> I think the port must be explicitly set.  I don't think there is a
> "RAFT_PORT" default defined.
>
> > +/* Creates a database file that represents a new server in an existing
> Raft
> > + * cluster.
> > + *
> > + * Creates the local copy of the cluster's log in 'file_name', which
> must not
> > + * already exist.  Gives it the name 'name', which must be the same name
> > + * passed in to raft_create_cluster() earlier.
>
> Is raft_create_cluster() supposed to be called before this?  I think
> raft_create_cluster() would have already created the log file, which the
> previous sentence said must not already exist.
>
> > + * 'cid' is optional.  If specified, the new server will join only the
> cluster
> > + * with the given cluster ID.
> > + *
> > + * The new server is located at 'local_address', which must take one of
> the
> > + * forms "tcp:IP[:PORT]" or "ssl:IP[:PORT]", where IP is an IPv4
> address or a
> > + * square bracket enclosed IPv6 address.  PORT, if present, is a port
> number
> > + * that defaults to RAFT_PORT.
>
> Same comment about explicitly setting the port and "RAFT_PORT" not being
> defined.
>
> > + *
> > + * Joining the cluster requiring contacting it.  Thus,
> 'remote_addresses'
>
> s/requiring/requires/
>
> > + * This only creates the on-disk file and does no network access.  Use
> > + * raft_open() to start operating the new server.  (Until this happens,
> the
> > + * new server has not joined the cluster.)
>
> The name for this function (and the corresponding ovsdb-tool command)
> seems a bit confusing in this way.  Not a big deal, but it doesn't seem
> obvious.
>
> > + * Returns null if successful, otherwise an ovsdb_error describing the
> > + * problem. */
> > +struct ovsdb_error * OVS_WARN_UNUSED_RESULT
> > +raft_join_cluster(const char *file_name,
> > +                  const char *name, const char *local_address,
> > +                  const struct sset *remote_addresses,
> > +                  const struct uuid *cid)
> > +{
> > ...
> > +    /* Write log file. */
> > +    struct raft_header h = {
> > +        .sid = uuid_random(),
> > +        .cid = cid ? *cid : UUID_ZERO,
> > +        .name = xstrdup(name),
> > +        .local_address = xstrdup(local_address),
> > +        .joining = true,
> > +    };
> > +    sset_clone(&h.remote_addresses, remote_addresses);
> > +    error = ovsdb_log_write_and_free(log, raft_header_to_json(&h));
> > +    raft_header_uninit(&h);
>
> I don't think anything initializes 'h->snap', which seems like it could be
> a problem for raft_header_to_json() and raft_header_uninit().
>
> > +/* Reads the initial header record from 'log', which must be a Raft
> clustered
> > + * database log, and populates '*md' with the information read from
> it.  The
> > + * caller must eventually destroy 'md'.
>
> You might mention that they do this by calling raft_metadata_destroy().
>
> > + *
> > + * On success, returns NULL.  On failure, returns a error that the
> caller must
> > + * eventually destroy and zeros '*md'. */
>
> s/a error/an error/
>
> > +static struct json *
> > +raft_servers_for_index(const struct raft *raft, uint64_t index)
>
> I think it would be worth mentioning that the caller must free the return
> value.
>
> > +static struct ovsdb_error * OVS_WARN_UNUSED_RESULT
> > +raft_read_header(struct raft *raft)
> > +{
> > +    /* Read header record. */
> > +    struct json *json;
> > +    struct ovsdb_error *error = ovsdb_log_read(raft->log, &json);
>
> I don't think anything free 'json'.
>
> > +static void
> > +raft_add_conn(struct raft *raft, struct jsonrpc_session *js,
> > +              const struct uuid *sid, bool incoming)
> > +{
> > ...
> > +    if (sid) {
> > +        conn->sid = *sid;
> > +    }
> > +    conn->nickname = raft_address_to_nickname(
> jsonrpc_session_get_name(js),
> > +                                              &conn->sid);
>
> The 'sid' argument can be null, which seems like it could be a problem
> with raft_address_to_nickname().
>
> > +/* Attempts to start 'raft' leaving its cluster.  The caller can check
> progress
> > + * using raft_is_leaving() and raft_left(). */
> > +void
> > +raft_leave(struct raft *raft)
> > +{
> > ...
> > +    raft->leaving = true;
> > +    raft_transfer_leadership(raft, "this server is leaving the
> cluster");
> > +    raft_become_follower(raft);
> > +    raft_send_remove_server_requests(raft);
> > +    raft->leave_timeout = time_msec() + ELECTION_BASE_MSEC;
> > +    raft->leaving = true;
>
> This sets 'raft->leaving' twice.  Was that intentional?
>
> > +/* Returns true if 'raft' is successfully left its cluster. */
> > +bool
> > +raft_left(const struct raft *raft)
>
> I think you can drop "is" in the comment describing the function.
>
> > +static void
> > +raft_close__(struct raft *raft)
> > +{
>
> I think it would be worth explaining why this would be called separately
> from raft_close().
>
> > +    struct raft_server *rs = raft->remove_server;
> > +    if (rs) {
> > +        raft_send_remove_server_reply__(raft, &rs->sid,
> &rs->requester_sid,
> > +                                        rs->requester_conn, false,
> > +                                        RAFT_SERVER_SHUTDOWN);
> > +        raft_server_destroy(raft->remove_server);
> > +    }
>
> I think it might be safer to set 'raft->remove_server' to null in case
> this gets called, and then raft_close() is called.
>
> > void
> > raft_close(struct raft *raft)
> > {
>
> Should something free the 'raft->waiters' entries?
>
> > +static bool
> > +raft_conn_receive(struct raft *raft, struct raft_conn *conn,
> > +                  union raft_rpc *rpc)
> > +{
> > ...
> > +    } else if (!uuid_equals(&conn->sid, &rpc->common.sid)) {
> > +        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
> > +        VLOG_WARN_RL(&rl,
> > +                     "%s: remote server ID changed from "SID_FMT" to
> "SID_FMT,
> > +                     jsonrpc_session_get_name(conn->js),
> > +                     SID_ARGS(&conn->sid), SID_ARGS(&rpc->common.sid));
>
> If the SID changed, shouldn't we update "conn->sid"?
>
> > +/* Returns true if 'conn' should stay open, 'conn' if it should be
> closed. */
> > +static bool
> > +raft_conn_should_stay_open(struct raft *raft, struct raft_conn *conn)
>
> I assume "false" if it should be closed.  :-)
>
> > +/* Allows 'raft' to maintain the distributed log.  Call this function
> as part
> > + * of the process's main loop. */
> > +void
> > +raft_run(struct raft *raft)
> > +{
> > ...
> > +    /* Close unneeded sessions. */
> > +    struct raft_conn *next;
> > +    LIST_FOR_EACH_SAFE (conn, next, list_node, &raft->conns) {
> > +        if (!raft_conn_should_stay_open(raft, conn)) {
> > +            jsonrpc_session_close(conn->js);
> > +            ovs_list_remove(&conn->list_node);
> > +            free(conn);
>
> I think 'conn->nickname' needs to be freed, too.  Maybe a helper function,
> since there are a couple of places that free 'conn'
>
> > +    if (raft->joining && time_msec() >= raft->join_timeout) {
> > +        raft->join_timeout = time_msec() + 1000;
> > +        struct raft_conn *conn;
>
> This 'conn' shadows a previous declaration.  I think you can just drop
> declaring this one.
>
> > +    if (time_msec() >= raft->ping_timeout) {
> > +        if (raft->role == RAFT_LEADER) {
> > +            raft_send_heartbeats(raft);
> > +        } else {
> > +            long long int now = time_msec();
> > +            struct raft_command *cmd, *next;
>
> This 'next' shadows a 'next' of a different type.
>
> > +/* Frees 'cmd'. */
> > +void
> > +raft_command_unref(struct raft_command *cmd)
> > +{
>
> It seems like a more accurate description would be along the lines of
> "Decrements the ref count for 'cmd' and frees it if zero."
>
> > +/* 'sid' is the server being added or removed. */
> > +static void
> > +raft_send_add_server_reply__(struct raft *raft, const struct uuid *sid,
> > +                             const char *address,
> > +                             bool success, const char *comment)
>
> The description makes it sound like a server could be removed using this
> function, but I don't think that's the case.
>
> > +static void
> > +raft_server_init_leader(struct raft *raft, struct raft_server *s)
> > +{
> > +    s->next_index = raft->log_end;
> > +    s->match_index = 0;
> > +    s->phase = RAFT_PHASE_STABLE;
> > +}
>
> The name of this function is a little surprising to me.  It seems like
> it's initializing a server, not necessarily a leader.
>
> > +/* Processes term 'term' received as part of RPC 'common'.  Returns
> true if the
> > + * caller should continue processing the RPC, false if the caller
> should reject
> > + * it due to a stale term. */
> > +static bool
> > +raft_receive_term__(struct raft *raft, const struct raft_rpc_common
> *common,
> > +                    uint64_t term)
> > +{
> > +    /* Section 3.3 says:
> > +     *
> > +     *     Current terms are exchanged whenever servers communicate; if
> one
> > +     *     server’s current term is smaller than the other’s, then it
> updates
> > +     *     its current term to the larger value.  If a candidate or
> leader
> > +     *     discovers that its term is out of date, it immediately
> reverts to
> > +     *     follower state.  If a server receives a request with a stale
> term
> > +     *     number, it rejects the request.
> > +     */
> > +    if (term > raft->term) {
> > +        if (!raft_set_term(raft, term, NULL)) {
> > +            return false;
>
> I don't understand how the result of raft_set_term() plays into this
> decision to return false.
>
> > +static const struct json *
> > +raft_peek_next_entry(struct raft *raft, struct uuid *eid)
> > +{
> > +    /* Invariant: log_start - 2 <= last_applied <= commit_index <
> log_end. */
> > +    ovs_assert(raft->log_start <= raft->last_applied + 2);
> > +    ovs_assert(raft->last_applied <= raft->commit_index);
> > +    ovs_assert(raft->commit_index < raft->log_end);
> > +
> > +    if (raft->joining || raft->failed) {        /* XXX needed? */
> > +        return NULL;
> > +    }
>
> Did you want to check this "XXX"?
>
> > +/* If 'prev_log_index' exists in 'raft''s log, term 'prev_log_term',
> returns
> > + * NULL.  Otherwise, returns an explanation for the mismatch.  */
> > +static const char *
> > +match_index_and_term(const struct raft *raft,
> > +                     uint64_t prev_log_index, uint64_t prev_log_term)
>
> The description of 'prev_log_term' was confusing to me.  I think it may be
> missing some text about it matching the term pointed to by 'prev_log_index'.
>
> > +/* Returns NULL on success, RAFT_IN_PROGRESS for an operation in
> progress,
> > + * otherwise a brief comment explaining failure. */
> > +static void
> > +raft_handle_append_entries(struct raft *raft,
> > +                           const struct raft_append_request *rq,
> > +                           uint64_t prev_log_index, uint64_t
> prev_log_term,
> > +                           const struct raft_entry *entries,
> > +                           unsigned int n_entries)
>
> This description doesn't seem to match the function.
>
> > +        if (raft->entries[log_index - raft->log_start].term
> > +            != entries[i].term) {
> > +            if (raft_truncate(raft, log_index)) {
> > +                servers_changed = true;
>
> Does this always mean the servers changed?
>
> > +static bool
> > +raft_update_leader(struct raft *raft, const struct uuid *sid)
> > +{
> > +    if (raft->role == RAFT_LEADER && !uuid_equals(sid, &raft->sid)) {
> > +        char buf[SID_LEN + 1];
> > +        VLOG_ERR("this server is leader but server %s claims to be",
> > +                 raft_get_nickname(raft, sid, buf, sizeof buf));
> > +        return false;
>
> If 'sid' and 'raft->sid' were equal, wouldn't that mean that this other
> system is using the same sid?  That also seems like it would be a problem.
>
> > +static void
> > +raft_handle_append_request__(struct raft *raft,
> > +                             const struct raft_append_request *rq)
> > +{
> > ...
> > +    /*
> > +     * We now know that the data in rq->entries[] overlaps the data in
> > +     * raft->entries[], as shown below, with some positive 'ofs':
> > +     *
> > +     *     rq->prev_log_index
> > +     *       | first_entry_index
> > +     *       |   |             nth_entry_index
> > +     *       |   |               |
> > +     *       v   v               v
> > +     *         +---+---+---+---+---+
> > +     *       T | T | T | T | T | T |
> > +     *         +---+-------+---+---+
> > +     *                     +---+---+---+---+
> > +     *                   T | T | T | T | T |
> > +     *                     +---+---+---+---+
> > +     *                       ^               ^
> > +     *                       |               |
> > +     *                   log_start        log_end
> > +     *
> > +     *           |<-- ofs -->|
>
> I think 'ofs' points to the middle of the 'log_start' record.  I think it
> should maybe be moved a couple of spaces to make it clearer.
>
> > +/* Returns the next log entry or snapshot from 'raft', or NULL if there
> are
> > + * none left to read..  Stores the entry ID of the log entry in
> '*eid'.  Stores
> > + * true in '*is_snapshot' if the returned data is a snapshot, false if
> it is a
> > + * log entry.*/
> > +const struct json *
> > +raft_next_entry(struct raft *raft, struct uuid *eid, bool *is_snapshot)
> > +{
>
> There's an extra period after "read".
>
> > static void
> > raft_handle_append_request(struct raft *raft,
> >                            const struct raft_append_request *rq)
> > {
> >     raft_handle_append_request__(raft, rq);
> > }
>
> Is there a reason not to fold in raft_handle_append_request__()?  There
> don't seem to be any other callers to it.
>
> > +static void
> > +raft_log_reconfiguration(struct raft *raft)
> > +{
> > +    /* Add the reconfiguration to the log.
> > +     *
> > +     * We ignore any */
>
> Is this comment not finished?
>
> > +static void
> > +raft_run_reconfigure(struct raft *raft)
> > +{
> > ...
> > +    /* Remove a server, if one is scheduled for removal. */
> > +    HMAP_FOR_EACH (s, hmap_node, &raft->servers) {
> > +        if (s->phase == RAFT_PHASE_REMOVE) {
> > +            hmap_remove(&raft->servers, &s->hmap_node);
> > +            raft->remove_server = s;
>
> Are we sure that 'raft->remove_server' will only be assigned once?  I
> think it might be worth asserting that it's null before assigning here.
>
> > +static void
> > +raft_handle_remove_server_reply(struct raft *raft,
> > +                                const struct raft_remove_server_reply
> *rpc)
> > +{
> > +    if (rpc->success) {
> > +        VLOG_INFO("%04x: finished leaving cluster %04x",
> > +                  uuid_prefix(&raft->sid, 4), uuid_prefix(&raft->cid,
> 4));
>
> Do you want to use SID_FMT and CID_FMT here?
>
> > +/* Returns true if 'raft' has grown enough that reducing the log to a
> snapshot
> > + * would be valuable, false otherwise.  When this function returns
> true, the
> > + * client should consider using raft_store_snapshot() to reduce the log
> storage
> > + * requirements. */
> > +bool
> > +raft_grew_lots(const struct raft *raft)
> > +{
> > +    return (!raft->joining
> > +            && !raft->leaving
> > +            && !raft->left
> > +            && !raft->failed
> > +            && raft->last_applied - raft->log_start >= 100
> > +            && ovsdb_log_grew_lots(raft->log));
>
> ovsdb_log_grew_lots() is based on the filesize.  Raft entries are bigger.
> It's probably not significant, but should this difference be taken that
> into account?
>
> > +/* Replaces the log for 'raft', up to the last log entry read, by
> > + * 'new_snapshot_data'.  Returns NULL if successful, otherwise an error
> that
> > + * the caller must eventually free. */
> > +struct ovsdb_error * OVS_WARN_UNUSED_RESULT
> > +raft_store_snapshot(struct raft *raft, const struct json
> *new_snapshot_data)
>
> I think it would be worth mentioning that this function doesn't take
> ownership of 'new_snapshot_data'.
>
> > +    const struct raft_entry new_snapshot = {
> > +        .term = raft_get_term(raft, new_log_start - 1),
> > +        .data = CONST_CAST(struct json *, new_snapshot_data),
> > +        .eid = *raft_get_eid(raft, new_log_start - 1),
> > +        .servers = raft_servers_for_index(raft, new_log_start - 1),
> > +    };
> > +    struct ovsdb_error *error = raft_save_snapshot(raft, new_log_start,
> > +                                                   &new_snapshot);
> > +    if (error) {
> > +        return error;
> > +    }
>
> I think this may leak 'new_snapshot.servers'.  If it doesn't under the
> normal circumstance, I think it can under an error.
>
> > +static void
> > +raft_unixctl_status(struct unixctl_conn *conn,
> > +                    int argc OVS_UNUSED, const char *argv[],
> > +                    void *aux OVS_UNUSED)
> > +{
> > ...
> > +    ds_put_format(&s, "Cluster ID: ");
> > +    if (!uuid_is_zero(&raft->cid)) {
> > +        ds_put_format(&s, UUID_FMT"\n", UUID_ARGS(&raft->cid));
> > +    } else {
> > +        ds_put_format(&s, "not yet known\n");
> > +    }
> > +    ds_put_format(&s, "Server ID: "SID_FMT" ("UUID_FMT")\n",
> > +                  SID_ARGS(&raft->sid), UUID_ARGS(&raft->sid));
>
> Did you intentionally not use CID_FMT like you did for SID?
>
> > +    ds_put_cstr(&s, "Servers:\n");
> > +    struct raft_server *server;
> > +    HMAP_FOR_EACH (server, hmap_node, &raft->servers) {
> > +        ds_put_format(&s, "    %s ("SID_FMT" at %s)",
> > +                      server->nickname,
> > +                      SID_ARGS(&server->sid), server->address);
> > +        if (uuid_equals(&server->sid, &raft->sid)) {
> > +            ds_put_cstr(&s, " (me)");
>
> raft_put_sid() seems to use "self" to describe its own SID.  Do you think
> it's worth using the same terminology?
>
> > +static void
> > +raft_unixctl_leave__(struct unixctl_conn *conn, struct raft *raft)
> > +{
> > +    if (raft_left(raft)) {
> > +        unixctl_command_reply(conn, NULL);
> > ...
> > +    } else {
> > +        raft_leave(raft);
> > +        unixctl_command_reply(conn, NULL);
> > +    }
> > +}
>
> It doesn't seem like this makes a distinction between starting to leave
> and already having left.  It seems like it would be helpful to provide a
> reply that indicates the difference.
>
> > +static void
> > +raft_unixctl_leave(struct unixctl_conn *conn, int argc, const char
> *argv[],
> > +                   void *aux OVS_UNUSED)
> > +{
> > +    bool force = argc > 2 && !strcmp(argv[1], "--force");
>
> It doesn't appear that "--force" actually does anything.
>
> Some final random things:
>
>         - It might be nice to add some tests that use "tcp" instead of
> "unix" sockets.  I thought I noticed some instances in the code that could
> be problematic.
>
>         - I think corner cases, such as 'raft->failed' getting set to true
> aren't handled in the client code.
>
>         - It's very difficult to determine whether an argument is consumed
> or not, or whether the return value should be freed.  Better documentation
> (particularly for public functions) would help.
>
>         - Should the copyright dates just be 2017 and 2018 for new files?
> A lot of them reference older years.
>
> Thanks for implementing this important feature!
>
> Acked-by: Justin Pettit <jpettit at ovn.org>
>
> --Justin
>
>
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>


More information about the dev mailing list