[ovs-dev] [PATCH 15/15] ovsdb: Introduce experimental support for clustered databases. (2/2)
aginwala
aginwala at asu.edu
Thu Mar 22 20:50:31 UTC 2018
Test discussion @
https://mail.openvswitch.org/pipermail/ovs-discuss/2018-March/046440.html
Tested-by: aginwala <aginwala at asu.edu>
On Sat, Feb 24, 2018 at 3:17 PM, Justin Pettit <jpettit at ovn.org> wrote:
> >
> > On Dec 31, 2017, at 9:16 PM, Ben Pfaff <blp at ovn.org> wrote:
> >
> > diff --git a/Documentation/ref/ovsdb.5.rst b/Documentation/ref/ovsdb.5.
> rst
> > index f3e50976b5c7..0ab888996eac 100644
> > --- a/Documentation/ref/ovsdb.5.rst
> > +++ b/Documentation/ref/ovsdb.5.rst
> >
> > +When the server successfully joins the cluster, the database file is
> replaced
> > +by one in the general format described earlier.
>
> Just to be crystal clear, I think it be good to reference that it's the
> one described in the "Clustered Format" section.
>
> > diff --git a/ovsdb/raft-private.c b/ovsdb/raft-private.c
> > new file mode 100644
> > index 000000000000..457d1292a949
> > --- /dev/null
> > +++ b/ovsdb/raft-private.c
> >
> > +struct ovsdb_error * OVS_WARN_UNUSED_RESULT
> > +raft_address_validate(const char *address)
> > +{
> > ...
> > + return ovsdb_error(NULL, "%s: expected \"tcp\" or \"ssl\"
> address",
> > + address);
>
> I assume you're downplaying "unix:", since it only makes sense for testing
> purposes?
>
> > +struct ovsdb_error * OVS_WARN_UNUSED_RESULT
> > +raft_address_validate_json(const struct json *address)
>
> This function is only used within this file, so it could be declared
> static if you don't think it's generally useful.
>
> > +char *
> > +raft_address_to_nickname(const char *address, const struct uuid *sid)
> > +{
>
> I think it may be worth explaining what a nickname is and how one gets one.
>
> > +static struct ovsdb_error * OVS_WARN_UNUSED_RESULT
> > +raft_servers_from_json__(const struct json *json, struct hmap *servers)
> > +{
> > ...
> > + if (!uuid_from_string(&sid, node->name)) {
> > + return ovsdb_syntax_error(json, NULL, "%s is a not a UUID",
>
> I think you can drop the first "a" in that error message.
>
> > +void
> > +raft_servers_format(const struct hmap *servers, struct ds *ds)
> > +{
> > + int i = 0;
> > + const struct raft_server *s;
> > + HMAP_FOR_EACH (s, hmap_node, servers) {
> > + if (i++) {
> > + ds_put_cstr(ds, ", ");
> > + }
> > + ds_put_format(ds, SID_FMT"(%s)", SID_ARGS(&s->sid), s->address);
>
> Do you think it's worth putting a space between the SID and parenthetical
> address?
>
> > +static void
> > +raft_header_from_json__(struct raft_header *h, struct ovsdb_parser *p)
> > +{
> > ...
> > + /* Parse "remotes", if present.
>
> Should that be "remote_addresses"?
>
> > +static void
> > +raft_record_from_json__(struct raft_record *r, struct ovsdb_parser *p)
> > +{
> >
> > + /* All remaining types of log records include "term", plus at most
> one of:
> > + *
> > + * - "index" plus zero or more of "data" and "servers". If
> "data" is
> > + * present then "eid" may also be present.
> > ...
> > + r->entry.data = json_nullable_clone(
> > + ovsdb_parser_member(p, "data",
> > + OP_OBJECT | OP_ARRAY | OP_OPTIONAL));
> > + r->entry.eid = (r->entry.data
> > + ? raft_parse_required_uuid(p, "eid")
> > + : UUID_ZERO);
>
> The description of "index" makes it sound like "eid" is optional if "data"
> is present, but this code seems to expect it to be set.
>
> > diff --git a/ovsdb/raft-private.h b/ovsdb/raft-private.h
> > new file mode 100644
> > index 000000000000..6e147fadb0ac
> > --- /dev/null
> > +++ b/ovsdb/raft-private.h
> > ...
> > + char *nickname; /* 1ab3(s3) */
>
> It might be nice to put the example in quotes.
>
> > +const char *raft_servers_get_nickname__(const struct hmap *servers,
> > + const struct uuid *sid);
> > ...
> > +bool raft_parse_uuid__(struct ovsdb_parser *, const char *name, bool
> optional,
> > + struct uuid *);
>
> The coding style states that functions with a "__" suffix are used to
> indicate "internal use only". I suppose this entire file is supposed to be
> internal, but it still strikes me as funny to see these function in header
> files. What do you think?
>
> > diff --git a/ovsdb/raft-rpc.c b/ovsdb/raft-rpc.c
> > new file mode 100644
> > index 000000000000..617d4aa4eaad
> > --- /dev/null
> > +++ b/ovsdb/raft-rpc.c
> >
> > ...
> > +static void
> > +raft_append_request_from_jsonrpc(struct ovsdb_parser *p,
> > + struct raft_append_request *rq)
> > +{
> > ...
> > +
> > + const struct json *log = ovsdb_parser_member(p, "log", OP_ARRAY);
> > + if (!log) {
> > + return;
> > + }
>
> Should this parser argument include OP_OPTIONAL?
>
> > +static void
> > +raft_append_reply_to_jsonrpc(const struct raft_append_reply *rpy,
> > + struct json *args)
> > +{
> > ...
> > + json_object_put_string(args, "result",
> > + raft_append_result_to_string(rpy->result));
>
> I believe raft_append_result_to_string() can return NULL, which could
> cause a problem here.
>
> > +static void
> > +raft_format_append_reply(const struct raft_append_reply *rpy, struct
> ds *s)
> > +{
> > ...
> > + ds_put_format(s, " result=\"%s\"",
> > + raft_append_result_to_string(rpy->result));
>
> I don't think this will crash, but it will print strange if 'rpy->result'
> is not valid.
>
> Is it intentional that "n_entries" and "prev_log_*" aren't printed?
>
> > +static void
> > +raft_format_vote_request(const struct raft_vote_request *rq, struct ds
> *s)
> > +{
> > + ds_put_format(s, " term=%"PRIu64, rq->term);
> > + ds_put_format(s, " last_log_index=%"PRIu64, rq->last_log_index);
> > + ds_put_format(s, " last_log_term=%"PRIu64, rq->last_log_term);
> > +}
>
> Should "leadership_transfer" be printed?
>
> > +static void
> > +raft_install_snapshot_request_to_jsonrpc(
> > + const struct raft_install_snapshot_request *rq, struct json *args)
> > +{
> > ...
> > + json_object_put(args, "last_servers", json_clone(rq->last_servers));
> > ...
> > +
> > + json_object_put(args, "data", json_clone(rq->data));
>
> I believe "rq->last_servers" and 'rq->data' may be null from
> raft_install_snapshot_request_clone().
>
> > +static void
> > +raft_format_install_snapshot_reply(
> > + const struct raft_install_snapshot_reply *rpy, struct ds *s)
> > +{
> > + ds_put_format(s, " term=%"PRIu64, rpy->term);
> > +}
>
> Was it intentional to leave off "last_index" and "last_term"?
>
> > +static void
> > +raft_execute_command_request_to_jsonrpc(
> > + const struct raft_execute_command_request *rq, struct json *args)
> > +{
> > + json_object_put(args, "data", json_clone(rq->data));
>
> Can 'rq->data' be null from raft_execute_command_request_clone()?
>
> > +static void
> > +raft_format_execute_command_request(
> > + const struct raft_execute_command_request *rq, struct ds *s)
> > +{
> > ...
> > + json_to_ds(rq->data, JSSF_SORT, s);
>
> Same question about 'rq->data' being null.
>
> > +static void
> > +raft_execute_command_reply_to_jsonrpc(
> > + const struct raft_execute_command_reply *rpy, struct json *args)
> > +{
> > + json_object_put_format(args, "result", UUID_FMT,
> UUID_ARGS(&rpy->result));
> > + json_object_put_string(args, "status",
> > + raft_command_status_to_string(rpy->status));
>
> I believe raft_command_status_to_string() can return null.
>
> > +static void
> > +raft_format_execute_command_reply(
> > + const struct raft_execute_command_reply *rpy, struct ds *s)
> > +{
> > + ds_put_format(s, " result="UUID_FMT, UUID_ARGS(&rpy->result));
> > + ds_put_format(s, " status=\"%s\"",
> > + raft_command_status_to_string(rpy->status));
>
> Same concern, but this wouldn't cause a crash.
>
> > +struct ovsdb_error * OVS_WARN_UNUSED_RESULT
> > +raft_rpc_from_jsonrpc(struct uuid *cidp,
> > + const struct uuid *sid,
> > + const struct jsonrpc_msg *msg,
> > + union raft_rpc *rpc)
> > +{
>
> I think it would be worth adding a comment to this function; behavior such
> as how 'cidp' is used is not obvious unless you read the code.
>
> > +void
> > +raft_rpc_format(const union raft_rpc *rpc, struct ds *s)
> > +{
> > + ds_put_format(s, "%s", raft_rpc_type_to_string(rpc->type));
> > + if (rpc->common.comment) {
> > + ds_put_format(s, " \"%s\"", rpc->common.comment);
> > + }
> > + ds_put_char(s, ':');
>
> Is printing the SID not important?
>
> > diff --git a/lib/jsonrpc.h b/lib/jsonrpc.h
> > index 969a6ed38cd6..a44114e8dcd9 100644
> > --- a/lib/jsonrpc.h
> > +++ b/lib/jsonrpc.h
> >
> > struct raft_rpc_common {
> > enum raft_rpc_type type;
> > struct uuid sid; /* SID of peer server. */
>
> I found it surprising in the code that 'sid' could be the source or
> destination depending on whether we were converting to or from jsonrpc. I
> think it could be made clearer here.
>
> > diff --git a/ovsdb/raft.c b/ovsdb/raft.c
> > new file mode 100644
> > index 000000000000..9c358bf35687
> > --- /dev/null
> > +++ b/ovsdb/raft.c
> > ...
> > +/* An action deferred until a log write commits to disk. */
> > +struct raft_waiter {
> > + struct ovs_list list_node;
> > + uint64_t commit_ticket;
> > +
> > + enum raft_waiter_type type;
> > + union {
> > + /* RAFT_W_ENTRY.
> > + *
> > + * Waits for a RAFT_REC_ENTRY write to our local log to
> commit. Upon
> > + * completion, updates log_synced to indicate that the new log
> entry or
> > + * entries are committed and, if we are leader, also updates
> our local
> > + * match_index. */
> > + struct {
> > + uint64_t index;
> > + } entry;
> > +
> > + /* RAFT_W_TERM.
> > + *
> > + * Waits for a RAFT_REC_TERM or RAFT_REC_VOTE record write to
> commit.
> > + * Upon completion, updates synced_term and synced_vote, which
> triggers
> > + * sending RPCs deferred by the uncommitted term and vote. */
> > + struct {
> > + uint64_t term;
> > + struct uuid vote;
> > + } term;
> > +
> > + /* RAFT_W_RPC.
> > + *
> > + * Sometimes, sending an RPC to a peer must be delayed until an
> entry,
> > + * a term, or a vote mentioned in the RPC is synced to disk.
> This
> > + * waiter keeps a copy of such an RPC until the previous
> waiters have
> > + * committed. */
> > + union raft_rpc *rpc;
> > + };
> > +};
>
> Minor, but I think it's a little easier to read when single quotes are
> used around references to struct members.
>
> > +/* The Raft state machine. */
> > +struct raft {
> > ...
> > +/* Persistent derived state.
> > + *
> > + * This must be updated on stable storage before responding to RPCs.
> It can be
> > + * derived from the header, snapshot, and log in 'log'. */
> > +
> > + struct uuid cid; /* Cluster ID (immutable for the
> cluster). */
> > + struct uuid sid; /* Server ID (immutable for the
> server). */
> > + char *local_address; /* Local address (immutable for the
> server). */
> > + char *local_nickname; /* Used for local server in log
> messages. */
> > + char *name; /* Cluster name (immutable for the
> cluster). */
>
> I think 'name' is the schema name based on other references, including
> raft_get_name(), but this makes it sound different to me.
>
> > +/* Persistent state on all servers.
> > + *
> > + * Must be updated on stable storage before responding to RPCs. */
> > +
> > + /* Current term and vote, which might be on the way to disk now. */
> > + uint64_t term; /* Initialized to 0 and only increases.
> */
> > + struct uuid vote; /* In 'term', or all-zeros if none. */
>
> I found the 'vote' description confusing.
>
> > +/* Creates an on-disk file that represents a new Raft cluster and
> initializes
> > + * it to consist of a single server, the one on which this function is
> called.
> > + *
> > + * Creates the local copy of the cluster's log in 'file_name', which
> must not
> > + * already exist. Gives it the name 'name', which should be the
> database
> > + * schema name and which is used only to match up this database with
> server
> > + * added to the cluster later if the cluster ID is unavailable.
>
> "with *the* server"
>
> > + * The new server is located at 'local_address', which must take one of
> the
> > + * forms "tcp:IP[:PORT]" or "ssl:IP[:PORT]", where IP is an IPv4
> address or a
> > + * square bracket enclosed IPv6 address. PORT, if present, is a port
> number
> > + * that defaults to RAFT_PORT.
>
> I think the port must be explicitly set. I don't think there is a
> "RAFT_PORT" default defined.
>
> > +/* Creates a database file that represents a new server in an existing
> Raft
> > + * cluster.
> > + *
> > + * Creates the local copy of the cluster's log in 'file_name', which
> must not
> > + * already exist. Gives it the name 'name', which must be the same name
> > + * passed in to raft_create_cluster() earlier.
>
> Is raft_create_cluster() supposed to be called before this? I think
> raft_create_cluster() would have already created the log file, which the
> previous sentence said must not already exist.
>
> > + * 'cid' is optional. If specified, the new server will join only the
> cluster
> > + * with the given cluster ID.
> > + *
> > + * The new server is located at 'local_address', which must take one of
> the
> > + * forms "tcp:IP[:PORT]" or "ssl:IP[:PORT]", where IP is an IPv4
> address or a
> > + * square bracket enclosed IPv6 address. PORT, if present, is a port
> number
> > + * that defaults to RAFT_PORT.
>
> Same comment about explicitly setting the port and "RAFT_PORT" not being
> defined.
>
> > + *
> > + * Joining the cluster requiring contacting it. Thus,
> 'remote_addresses'
>
> s/requiring/requires/
>
> > + * This only creates the on-disk file and does no network access. Use
> > + * raft_open() to start operating the new server. (Until this happens,
> the
> > + * new server has not joined the cluster.)
>
> The name for this function (and the corresponding ovsdb-tool command)
> seems a bit confusing in this way. Not a big deal, but it doesn't seem
> obvious.
>
> > + * Returns null if successful, otherwise an ovsdb_error describing the
> > + * problem. */
> > +struct ovsdb_error * OVS_WARN_UNUSED_RESULT
> > +raft_join_cluster(const char *file_name,
> > + const char *name, const char *local_address,
> > + const struct sset *remote_addresses,
> > + const struct uuid *cid)
> > +{
> > ...
> > + /* Write log file. */
> > + struct raft_header h = {
> > + .sid = uuid_random(),
> > + .cid = cid ? *cid : UUID_ZERO,
> > + .name = xstrdup(name),
> > + .local_address = xstrdup(local_address),
> > + .joining = true,
> > + };
> > + sset_clone(&h.remote_addresses, remote_addresses);
> > + error = ovsdb_log_write_and_free(log, raft_header_to_json(&h));
> > + raft_header_uninit(&h);
>
> I don't think anything initializes 'h->snap', which seems like it could be
> a problem for raft_header_to_json() and raft_header_uninit().
>
> > +/* Reads the initial header record from 'log', which must be a Raft
> clustered
> > + * database log, and populates '*md' with the information read from
> it. The
> > + * caller must eventually destroy 'md'.
>
> You might mention that they do this by calling raft_metadata_destroy().
>
> > + *
> > + * On success, returns NULL. On failure, returns a error that the
> caller must
> > + * eventually destroy and zeros '*md'. */
>
> s/a error/an error/
>
> > +static struct json *
> > +raft_servers_for_index(const struct raft *raft, uint64_t index)
>
> I think it would be worth mentioning that the caller must free the return
> value.
>
> > +static struct ovsdb_error * OVS_WARN_UNUSED_RESULT
> > +raft_read_header(struct raft *raft)
> > +{
> > + /* Read header record. */
> > + struct json *json;
> > + struct ovsdb_error *error = ovsdb_log_read(raft->log, &json);
>
> I don't think anything free 'json'.
>
> > +static void
> > +raft_add_conn(struct raft *raft, struct jsonrpc_session *js,
> > + const struct uuid *sid, bool incoming)
> > +{
> > ...
> > + if (sid) {
> > + conn->sid = *sid;
> > + }
> > + conn->nickname = raft_address_to_nickname(
> jsonrpc_session_get_name(js),
> > + &conn->sid);
>
> The 'sid' argument can be null, which seems like it could be a problem
> with raft_address_to_nickname().
>
> > +/* Attempts to start 'raft' leaving its cluster. The caller can check
> progress
> > + * using raft_is_leaving() and raft_left(). */
> > +void
> > +raft_leave(struct raft *raft)
> > +{
> > ...
> > + raft->leaving = true;
> > + raft_transfer_leadership(raft, "this server is leaving the
> cluster");
> > + raft_become_follower(raft);
> > + raft_send_remove_server_requests(raft);
> > + raft->leave_timeout = time_msec() + ELECTION_BASE_MSEC;
> > + raft->leaving = true;
>
> This sets 'raft->leaving' twice. Was that intentional?
>
> > +/* Returns true if 'raft' is successfully left its cluster. */
> > +bool
> > +raft_left(const struct raft *raft)
>
> I think you can drop "is" in the comment describing the function.
>
> > +static void
> > +raft_close__(struct raft *raft)
> > +{
>
> I think it would be worth explaining why this would be called separately
> from raft_close().
>
> > + struct raft_server *rs = raft->remove_server;
> > + if (rs) {
> > + raft_send_remove_server_reply__(raft, &rs->sid,
> &rs->requester_sid,
> > + rs->requester_conn, false,
> > + RAFT_SERVER_SHUTDOWN);
> > + raft_server_destroy(raft->remove_server);
> > + }
>
> I think it might be safer to set 'raft->remove_server' to null in case
> this gets called, and then raft_close() is called.
>
> > void
> > raft_close(struct raft *raft)
> > {
>
> Should something free the 'raft->waiters' entries?
>
> > +static bool
> > +raft_conn_receive(struct raft *raft, struct raft_conn *conn,
> > + union raft_rpc *rpc)
> > +{
> > ...
> > + } else if (!uuid_equals(&conn->sid, &rpc->common.sid)) {
> > + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
> > + VLOG_WARN_RL(&rl,
> > + "%s: remote server ID changed from "SID_FMT" to
> "SID_FMT,
> > + jsonrpc_session_get_name(conn->js),
> > + SID_ARGS(&conn->sid), SID_ARGS(&rpc->common.sid));
>
> If the SID changed, shouldn't we update "conn->sid"?
>
> > +/* Returns true if 'conn' should stay open, 'conn' if it should be
> closed. */
> > +static bool
> > +raft_conn_should_stay_open(struct raft *raft, struct raft_conn *conn)
>
> I assume "false" if it should be closed. :-)
>
> > +/* Allows 'raft' to maintain the distributed log. Call this function
> as part
> > + * of the process's main loop. */
> > +void
> > +raft_run(struct raft *raft)
> > +{
> > ...
> > + /* Close unneeded sessions. */
> > + struct raft_conn *next;
> > + LIST_FOR_EACH_SAFE (conn, next, list_node, &raft->conns) {
> > + if (!raft_conn_should_stay_open(raft, conn)) {
> > + jsonrpc_session_close(conn->js);
> > + ovs_list_remove(&conn->list_node);
> > + free(conn);
>
> I think 'conn->nickname' needs to be freed, too. Maybe a helper function,
> since there are a couple of places that free 'conn'
>
> > + if (raft->joining && time_msec() >= raft->join_timeout) {
> > + raft->join_timeout = time_msec() + 1000;
> > + struct raft_conn *conn;
>
> This 'conn' shadows a previous declaration. I think you can just drop
> declaring this one.
>
> > + if (time_msec() >= raft->ping_timeout) {
> > + if (raft->role == RAFT_LEADER) {
> > + raft_send_heartbeats(raft);
> > + } else {
> > + long long int now = time_msec();
> > + struct raft_command *cmd, *next;
>
> This 'next' shadows a 'next' of a different type.
>
> > +/* Frees 'cmd'. */
> > +void
> > +raft_command_unref(struct raft_command *cmd)
> > +{
>
> It seems like a more accurate description would be along the lines of
> "Decrements the ref count for 'cmd' and frees it if zero."
>
> > +/* 'sid' is the server being added or removed. */
> > +static void
> > +raft_send_add_server_reply__(struct raft *raft, const struct uuid *sid,
> > + const char *address,
> > + bool success, const char *comment)
>
> The description makes it sound like a server could be removed using this
> function, but I don't think that's the case.
>
> > +static void
> > +raft_server_init_leader(struct raft *raft, struct raft_server *s)
> > +{
> > + s->next_index = raft->log_end;
> > + s->match_index = 0;
> > + s->phase = RAFT_PHASE_STABLE;
> > +}
>
> The name of this function is a little surprising to me. It seems like
> it's initializing a server, not necessarily a leader.
>
> > +/* Processes term 'term' received as part of RPC 'common'. Returns
> true if the
> > + * caller should continue processing the RPC, false if the caller
> should reject
> > + * it due to a stale term. */
> > +static bool
> > +raft_receive_term__(struct raft *raft, const struct raft_rpc_common
> *common,
> > + uint64_t term)
> > +{
> > + /* Section 3.3 says:
> > + *
> > + * Current terms are exchanged whenever servers communicate; if
> one
> > + * server’s current term is smaller than the other’s, then it
> updates
> > + * its current term to the larger value. If a candidate or
> leader
> > + * discovers that its term is out of date, it immediately
> reverts to
> > + * follower state. If a server receives a request with a stale
> term
> > + * number, it rejects the request.
> > + */
> > + if (term > raft->term) {
> > + if (!raft_set_term(raft, term, NULL)) {
> > + return false;
>
> I don't understand how the result of raft_set_term() plays into this
> decision to return false.
>
> > +static const struct json *
> > +raft_peek_next_entry(struct raft *raft, struct uuid *eid)
> > +{
> > + /* Invariant: log_start - 2 <= last_applied <= commit_index <
> log_end. */
> > + ovs_assert(raft->log_start <= raft->last_applied + 2);
> > + ovs_assert(raft->last_applied <= raft->commit_index);
> > + ovs_assert(raft->commit_index < raft->log_end);
> > +
> > + if (raft->joining || raft->failed) { /* XXX needed? */
> > + return NULL;
> > + }
>
> Did you want to check this "XXX"?
>
> > +/* If 'prev_log_index' exists in 'raft''s log, term 'prev_log_term',
> returns
> > + * NULL. Otherwise, returns an explanation for the mismatch. */
> > +static const char *
> > +match_index_and_term(const struct raft *raft,
> > + uint64_t prev_log_index, uint64_t prev_log_term)
>
> The description of 'prev_log_term' was confusing to me. I think it may be
> missing some text about it matching the term pointed to by 'prev_log_index'.
>
> > +/* Returns NULL on success, RAFT_IN_PROGRESS for an operation in
> progress,
> > + * otherwise a brief comment explaining failure. */
> > +static void
> > +raft_handle_append_entries(struct raft *raft,
> > + const struct raft_append_request *rq,
> > + uint64_t prev_log_index, uint64_t
> prev_log_term,
> > + const struct raft_entry *entries,
> > + unsigned int n_entries)
>
> This description doesn't seem to match the function.
>
> > + if (raft->entries[log_index - raft->log_start].term
> > + != entries[i].term) {
> > + if (raft_truncate(raft, log_index)) {
> > + servers_changed = true;
>
> Does this always mean the servers changed?
>
> > +static bool
> > +raft_update_leader(struct raft *raft, const struct uuid *sid)
> > +{
> > + if (raft->role == RAFT_LEADER && !uuid_equals(sid, &raft->sid)) {
> > + char buf[SID_LEN + 1];
> > + VLOG_ERR("this server is leader but server %s claims to be",
> > + raft_get_nickname(raft, sid, buf, sizeof buf));
> > + return false;
>
> If 'sid' and 'raft->sid' were equal, wouldn't that mean that this other
> system is using the same sid? That also seems like it would be a problem.
>
> > +static void
> > +raft_handle_append_request__(struct raft *raft,
> > + const struct raft_append_request *rq)
> > +{
> > ...
> > + /*
> > + * We now know that the data in rq->entries[] overlaps the data in
> > + * raft->entries[], as shown below, with some positive 'ofs':
> > + *
> > + * rq->prev_log_index
> > + * | first_entry_index
> > + * | | nth_entry_index
> > + * | | |
> > + * v v v
> > + * +---+---+---+---+---+
> > + * T | T | T | T | T | T |
> > + * +---+-------+---+---+
> > + * +---+---+---+---+
> > + * T | T | T | T | T |
> > + * +---+---+---+---+
> > + * ^ ^
> > + * | |
> > + * log_start log_end
> > + *
> > + * |<-- ofs -->|
>
> I think 'ofs' points to the middle of the 'log_start' record. I think it
> should maybe be moved a couple of spaces to make it clearer.
>
> > +/* Returns the next log entry or snapshot from 'raft', or NULL if there
> are
> > + * none left to read.. Stores the entry ID of the log entry in
> '*eid'. Stores
> > + * true in '*is_snapshot' if the returned data is a snapshot, false if
> it is a
> > + * log entry.*/
> > +const struct json *
> > +raft_next_entry(struct raft *raft, struct uuid *eid, bool *is_snapshot)
> > +{
>
> There's an extra period after "read".
>
> > static void
> > raft_handle_append_request(struct raft *raft,
> > const struct raft_append_request *rq)
> > {
> > raft_handle_append_request__(raft, rq);
> > }
>
> Is there a reason not to fold in raft_handle_append_request__()? There
> don't seem to be any other callers to it.
>
> > +static void
> > +raft_log_reconfiguration(struct raft *raft)
> > +{
> > + /* Add the reconfiguration to the log.
> > + *
> > + * We ignore any */
>
> Is this comment not finished?
>
> > +static void
> > +raft_run_reconfigure(struct raft *raft)
> > +{
> > ...
> > + /* Remove a server, if one is scheduled for removal. */
> > + HMAP_FOR_EACH (s, hmap_node, &raft->servers) {
> > + if (s->phase == RAFT_PHASE_REMOVE) {
> > + hmap_remove(&raft->servers, &s->hmap_node);
> > + raft->remove_server = s;
>
> Are we sure that 'raft->remove_server' will only be assigned once? I
> think it might be worth asserting that it's null before assigning here.
>
> > +static void
> > +raft_handle_remove_server_reply(struct raft *raft,
> > + const struct raft_remove_server_reply
> *rpc)
> > +{
> > + if (rpc->success) {
> > + VLOG_INFO("%04x: finished leaving cluster %04x",
> > + uuid_prefix(&raft->sid, 4), uuid_prefix(&raft->cid,
> 4));
>
> Do you want to use SID_FMT and CID_FMT here?
>
> > +/* Returns true if 'raft' has grown enough that reducing the log to a
> snapshot
> > + * would be valuable, false otherwise. When this function returns
> true, the
> > + * client should consider using raft_store_snapshot() to reduce the log
> storage
> > + * requirements. */
> > +bool
> > +raft_grew_lots(const struct raft *raft)
> > +{
> > + return (!raft->joining
> > + && !raft->leaving
> > + && !raft->left
> > + && !raft->failed
> > + && raft->last_applied - raft->log_start >= 100
> > + && ovsdb_log_grew_lots(raft->log));
>
> ovsdb_log_grew_lots() is based on the filesize. Raft entries are bigger.
> It's probably not significant, but should this difference be taken that
> into account?
>
> > +/* Replaces the log for 'raft', up to the last log entry read, by
> > + * 'new_snapshot_data'. Returns NULL if successful, otherwise an error
> that
> > + * the caller must eventually free. */
> > +struct ovsdb_error * OVS_WARN_UNUSED_RESULT
> > +raft_store_snapshot(struct raft *raft, const struct json
> *new_snapshot_data)
>
> I think it would be worth mentioning that this function doesn't take
> ownership of 'new_snapshot_data'.
>
> > + const struct raft_entry new_snapshot = {
> > + .term = raft_get_term(raft, new_log_start - 1),
> > + .data = CONST_CAST(struct json *, new_snapshot_data),
> > + .eid = *raft_get_eid(raft, new_log_start - 1),
> > + .servers = raft_servers_for_index(raft, new_log_start - 1),
> > + };
> > + struct ovsdb_error *error = raft_save_snapshot(raft, new_log_start,
> > + &new_snapshot);
> > + if (error) {
> > + return error;
> > + }
>
> I think this may leak 'new_snapshot.servers'. If it doesn't under the
> normal circumstance, I think it can under an error.
>
> > +static void
> > +raft_unixctl_status(struct unixctl_conn *conn,
> > + int argc OVS_UNUSED, const char *argv[],
> > + void *aux OVS_UNUSED)
> > +{
> > ...
> > + ds_put_format(&s, "Cluster ID: ");
> > + if (!uuid_is_zero(&raft->cid)) {
> > + ds_put_format(&s, UUID_FMT"\n", UUID_ARGS(&raft->cid));
> > + } else {
> > + ds_put_format(&s, "not yet known\n");
> > + }
> > + ds_put_format(&s, "Server ID: "SID_FMT" ("UUID_FMT")\n",
> > + SID_ARGS(&raft->sid), UUID_ARGS(&raft->sid));
>
> Did you intentionally not use CID_FMT like you did for SID?
>
> > + ds_put_cstr(&s, "Servers:\n");
> > + struct raft_server *server;
> > + HMAP_FOR_EACH (server, hmap_node, &raft->servers) {
> > + ds_put_format(&s, " %s ("SID_FMT" at %s)",
> > + server->nickname,
> > + SID_ARGS(&server->sid), server->address);
> > + if (uuid_equals(&server->sid, &raft->sid)) {
> > + ds_put_cstr(&s, " (me)");
>
> raft_put_sid() seems to use "self" to describe its own SID. Do you think
> it's worth using the same terminology?
>
> > +static void
> > +raft_unixctl_leave__(struct unixctl_conn *conn, struct raft *raft)
> > +{
> > + if (raft_left(raft)) {
> > + unixctl_command_reply(conn, NULL);
> > ...
> > + } else {
> > + raft_leave(raft);
> > + unixctl_command_reply(conn, NULL);
> > + }
> > +}
>
> It doesn't seem like this makes a distinction between starting to leave
> and already having left. It seems like it would be helpful to provide a
> reply that indicates the difference.
>
> > +static void
> > +raft_unixctl_leave(struct unixctl_conn *conn, int argc, const char
> *argv[],
> > + void *aux OVS_UNUSED)
> > +{
> > + bool force = argc > 2 && !strcmp(argv[1], "--force");
>
> It doesn't appear that "--force" actually does anything.
>
> Some final random things:
>
> - It might be nice to add some tests that use "tcp" instead of
> "unix" sockets. I thought I noticed some instances in the code that could
> be problematic.
>
> - I think corner cases, such as 'raft->failed' getting set to true
> aren't handled in the client code.
>
> - It's very difficult to determine whether an argument is consumed
> or not, or whether the return value should be freed. Better documentation
> (particularly for public functions) would help.
>
> - Should the copyright dates just be 2017 and 2018 for new files?
> A lot of them reference older years.
>
> Thanks for implementing this important feature!
>
> Acked-by: Justin Pettit <jpettit at ovn.org>
>
> --Justin
>
>
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
More information about the dev
mailing list