[ovs-dev] [PATCH 15/15] ovsdb: Introduce experimental support for clustered databases. (2/2)
Justin Pettit
jpettit at ovn.org
Sat Feb 24 23:17:51 UTC 2018
>
> On Dec 31, 2017, at 9:16 PM, Ben Pfaff <blp at ovn.org> wrote:
>
> diff --git a/Documentation/ref/ovsdb.5.rst b/Documentation/ref/ovsdb.5.rst
> index f3e50976b5c7..0ab888996eac 100644
> --- a/Documentation/ref/ovsdb.5.rst
> +++ b/Documentation/ref/ovsdb.5.rst
>
> +When the server successfully joins the cluster, the database file is replaced
> +by one in the general format described earlier.
Just to be crystal clear, I think it be good to reference that it's the one described in the "Clustered Format" section.
> diff --git a/ovsdb/raft-private.c b/ovsdb/raft-private.c
> new file mode 100644
> index 000000000000..457d1292a949
> --- /dev/null
> +++ b/ovsdb/raft-private.c
>
> +struct ovsdb_error * OVS_WARN_UNUSED_RESULT
> +raft_address_validate(const char *address)
> +{
> ...
> + return ovsdb_error(NULL, "%s: expected \"tcp\" or \"ssl\" address",
> + address);
I assume you're downplaying "unix:", since it only makes sense for testing purposes?
> +struct ovsdb_error * OVS_WARN_UNUSED_RESULT
> +raft_address_validate_json(const struct json *address)
This function is only used within this file, so it could be declared static if you don't think it's generally useful.
> +char *
> +raft_address_to_nickname(const char *address, const struct uuid *sid)
> +{
I think it may be worth explaining what a nickname is and how one gets one.
> +static struct ovsdb_error * OVS_WARN_UNUSED_RESULT
> +raft_servers_from_json__(const struct json *json, struct hmap *servers)
> +{
> ...
> + if (!uuid_from_string(&sid, node->name)) {
> + return ovsdb_syntax_error(json, NULL, "%s is a not a UUID",
I think you can drop the first "a" in that error message.
> +void
> +raft_servers_format(const struct hmap *servers, struct ds *ds)
> +{
> + int i = 0;
> + const struct raft_server *s;
> + HMAP_FOR_EACH (s, hmap_node, servers) {
> + if (i++) {
> + ds_put_cstr(ds, ", ");
> + }
> + ds_put_format(ds, SID_FMT"(%s)", SID_ARGS(&s->sid), s->address);
Do you think it's worth putting a space between the SID and parenthetical address?
> +static void
> +raft_header_from_json__(struct raft_header *h, struct ovsdb_parser *p)
> +{
> ...
> + /* Parse "remotes", if present.
Should that be "remote_addresses"?
> +static void
> +raft_record_from_json__(struct raft_record *r, struct ovsdb_parser *p)
> +{
>
> + /* All remaining types of log records include "term", plus at most one of:
> + *
> + * - "index" plus zero or more of "data" and "servers". If "data" is
> + * present then "eid" may also be present.
> ...
> + r->entry.data = json_nullable_clone(
> + ovsdb_parser_member(p, "data",
> + OP_OBJECT | OP_ARRAY | OP_OPTIONAL));
> + r->entry.eid = (r->entry.data
> + ? raft_parse_required_uuid(p, "eid")
> + : UUID_ZERO);
The description of "index" makes it sound like "eid" is optional if "data" is present, but this code seems to expect it to be set.
> diff --git a/ovsdb/raft-private.h b/ovsdb/raft-private.h
> new file mode 100644
> index 000000000000..6e147fadb0ac
> --- /dev/null
> +++ b/ovsdb/raft-private.h
> ...
> + char *nickname; /* 1ab3(s3) */
It might be nice to put the example in quotes.
> +const char *raft_servers_get_nickname__(const struct hmap *servers,
> + const struct uuid *sid);
> ...
> +bool raft_parse_uuid__(struct ovsdb_parser *, const char *name, bool optional,
> + struct uuid *);
The coding style states that functions with a "__" suffix are used to indicate "internal use only". I suppose this entire file is supposed to be internal, but it still strikes me as funny to see these function in header files. What do you think?
> diff --git a/ovsdb/raft-rpc.c b/ovsdb/raft-rpc.c
> new file mode 100644
> index 000000000000..617d4aa4eaad
> --- /dev/null
> +++ b/ovsdb/raft-rpc.c
>
> ...
> +static void
> +raft_append_request_from_jsonrpc(struct ovsdb_parser *p,
> + struct raft_append_request *rq)
> +{
> ...
> +
> + const struct json *log = ovsdb_parser_member(p, "log", OP_ARRAY);
> + if (!log) {
> + return;
> + }
Should this parser argument include OP_OPTIONAL?
> +static void
> +raft_append_reply_to_jsonrpc(const struct raft_append_reply *rpy,
> + struct json *args)
> +{
> ...
> + json_object_put_string(args, "result",
> + raft_append_result_to_string(rpy->result));
I believe raft_append_result_to_string() can return NULL, which could cause a problem here.
> +static void
> +raft_format_append_reply(const struct raft_append_reply *rpy, struct ds *s)
> +{
> ...
> + ds_put_format(s, " result=\"%s\"",
> + raft_append_result_to_string(rpy->result));
I don't think this will crash, but it will print strange if 'rpy->result' is not valid.
Is it intentional that "n_entries" and "prev_log_*" aren't printed?
> +static void
> +raft_format_vote_request(const struct raft_vote_request *rq, struct ds *s)
> +{
> + ds_put_format(s, " term=%"PRIu64, rq->term);
> + ds_put_format(s, " last_log_index=%"PRIu64, rq->last_log_index);
> + ds_put_format(s, " last_log_term=%"PRIu64, rq->last_log_term);
> +}
Should "leadership_transfer" be printed?
> +static void
> +raft_install_snapshot_request_to_jsonrpc(
> + const struct raft_install_snapshot_request *rq, struct json *args)
> +{
> ...
> + json_object_put(args, "last_servers", json_clone(rq->last_servers));
> ...
> +
> + json_object_put(args, "data", json_clone(rq->data));
I believe "rq->last_servers" and 'rq->data' may be null from raft_install_snapshot_request_clone().
> +static void
> +raft_format_install_snapshot_reply(
> + const struct raft_install_snapshot_reply *rpy, struct ds *s)
> +{
> + ds_put_format(s, " term=%"PRIu64, rpy->term);
> +}
Was it intentional to leave off "last_index" and "last_term"?
> +static void
> +raft_execute_command_request_to_jsonrpc(
> + const struct raft_execute_command_request *rq, struct json *args)
> +{
> + json_object_put(args, "data", json_clone(rq->data));
Can 'rq->data' be null from raft_execute_command_request_clone()?
> +static void
> +raft_format_execute_command_request(
> + const struct raft_execute_command_request *rq, struct ds *s)
> +{
> ...
> + json_to_ds(rq->data, JSSF_SORT, s);
Same question about 'rq->data' being null.
> +static void
> +raft_execute_command_reply_to_jsonrpc(
> + const struct raft_execute_command_reply *rpy, struct json *args)
> +{
> + json_object_put_format(args, "result", UUID_FMT, UUID_ARGS(&rpy->result));
> + json_object_put_string(args, "status",
> + raft_command_status_to_string(rpy->status));
I believe raft_command_status_to_string() can return null.
> +static void
> +raft_format_execute_command_reply(
> + const struct raft_execute_command_reply *rpy, struct ds *s)
> +{
> + ds_put_format(s, " result="UUID_FMT, UUID_ARGS(&rpy->result));
> + ds_put_format(s, " status=\"%s\"",
> + raft_command_status_to_string(rpy->status));
Same concern, but this wouldn't cause a crash.
> +struct ovsdb_error * OVS_WARN_UNUSED_RESULT
> +raft_rpc_from_jsonrpc(struct uuid *cidp,
> + const struct uuid *sid,
> + const struct jsonrpc_msg *msg,
> + union raft_rpc *rpc)
> +{
I think it would be worth adding a comment to this function; behavior such as how 'cidp' is used is not obvious unless you read the code.
> +void
> +raft_rpc_format(const union raft_rpc *rpc, struct ds *s)
> +{
> + ds_put_format(s, "%s", raft_rpc_type_to_string(rpc->type));
> + if (rpc->common.comment) {
> + ds_put_format(s, " \"%s\"", rpc->common.comment);
> + }
> + ds_put_char(s, ':');
Is printing the SID not important?
> diff --git a/lib/jsonrpc.h b/lib/jsonrpc.h
> index 969a6ed38cd6..a44114e8dcd9 100644
> --- a/lib/jsonrpc.h
> +++ b/lib/jsonrpc.h
>
> struct raft_rpc_common {
> enum raft_rpc_type type;
> struct uuid sid; /* SID of peer server. */
I found it surprising in the code that 'sid' could be the source or destination depending on whether we were converting to or from jsonrpc. I think it could be made clearer here.
> diff --git a/ovsdb/raft.c b/ovsdb/raft.c
> new file mode 100644
> index 000000000000..9c358bf35687
> --- /dev/null
> +++ b/ovsdb/raft.c
> ...
> +/* An action deferred until a log write commits to disk. */
> +struct raft_waiter {
> + struct ovs_list list_node;
> + uint64_t commit_ticket;
> +
> + enum raft_waiter_type type;
> + union {
> + /* RAFT_W_ENTRY.
> + *
> + * Waits for a RAFT_REC_ENTRY write to our local log to commit. Upon
> + * completion, updates log_synced to indicate that the new log entry or
> + * entries are committed and, if we are leader, also updates our local
> + * match_index. */
> + struct {
> + uint64_t index;
> + } entry;
> +
> + /* RAFT_W_TERM.
> + *
> + * Waits for a RAFT_REC_TERM or RAFT_REC_VOTE record write to commit.
> + * Upon completion, updates synced_term and synced_vote, which triggers
> + * sending RPCs deferred by the uncommitted term and vote. */
> + struct {
> + uint64_t term;
> + struct uuid vote;
> + } term;
> +
> + /* RAFT_W_RPC.
> + *
> + * Sometimes, sending an RPC to a peer must be delayed until an entry,
> + * a term, or a vote mentioned in the RPC is synced to disk. This
> + * waiter keeps a copy of such an RPC until the previous waiters have
> + * committed. */
> + union raft_rpc *rpc;
> + };
> +};
Minor, but I think it's a little easier to read when single quotes are used around references to struct members.
> +/* The Raft state machine. */
> +struct raft {
> ...
> +/* Persistent derived state.
> + *
> + * This must be updated on stable storage before responding to RPCs. It can be
> + * derived from the header, snapshot, and log in 'log'. */
> +
> + struct uuid cid; /* Cluster ID (immutable for the cluster). */
> + struct uuid sid; /* Server ID (immutable for the server). */
> + char *local_address; /* Local address (immutable for the server). */
> + char *local_nickname; /* Used for local server in log messages. */
> + char *name; /* Cluster name (immutable for the cluster). */
I think 'name' is the schema name based on other references, including raft_get_name(), but this makes it sound different to me.
> +/* Persistent state on all servers.
> + *
> + * Must be updated on stable storage before responding to RPCs. */
> +
> + /* Current term and vote, which might be on the way to disk now. */
> + uint64_t term; /* Initialized to 0 and only increases. */
> + struct uuid vote; /* In 'term', or all-zeros if none. */
I found the 'vote' description confusing.
> +/* Creates an on-disk file that represents a new Raft cluster and initializes
> + * it to consist of a single server, the one on which this function is called.
> + *
> + * Creates the local copy of the cluster's log in 'file_name', which must not
> + * already exist. Gives it the name 'name', which should be the database
> + * schema name and which is used only to match up this database with server
> + * added to the cluster later if the cluster ID is unavailable.
"with *the* server"
> + * The new server is located at 'local_address', which must take one of the
> + * forms "tcp:IP[:PORT]" or "ssl:IP[:PORT]", where IP is an IPv4 address or a
> + * square bracket enclosed IPv6 address. PORT, if present, is a port number
> + * that defaults to RAFT_PORT.
I think the port must be explicitly set. I don't think there is a "RAFT_PORT" default defined.
> +/* Creates a database file that represents a new server in an existing Raft
> + * cluster.
> + *
> + * Creates the local copy of the cluster's log in 'file_name', which must not
> + * already exist. Gives it the name 'name', which must be the same name
> + * passed in to raft_create_cluster() earlier.
Is raft_create_cluster() supposed to be called before this? I think raft_create_cluster() would have already created the log file, which the previous sentence said must not already exist.
> + * 'cid' is optional. If specified, the new server will join only the cluster
> + * with the given cluster ID.
> + *
> + * The new server is located at 'local_address', which must take one of the
> + * forms "tcp:IP[:PORT]" or "ssl:IP[:PORT]", where IP is an IPv4 address or a
> + * square bracket enclosed IPv6 address. PORT, if present, is a port number
> + * that defaults to RAFT_PORT.
Same comment about explicitly setting the port and "RAFT_PORT" not being defined.
> + *
> + * Joining the cluster requiring contacting it. Thus, 'remote_addresses'
s/requiring/requires/
> + * This only creates the on-disk file and does no network access. Use
> + * raft_open() to start operating the new server. (Until this happens, the
> + * new server has not joined the cluster.)
The name for this function (and the corresponding ovsdb-tool command) seems a bit confusing in this way. Not a big deal, but it doesn't seem obvious.
> + * Returns null if successful, otherwise an ovsdb_error describing the
> + * problem. */
> +struct ovsdb_error * OVS_WARN_UNUSED_RESULT
> +raft_join_cluster(const char *file_name,
> + const char *name, const char *local_address,
> + const struct sset *remote_addresses,
> + const struct uuid *cid)
> +{
> ...
> + /* Write log file. */
> + struct raft_header h = {
> + .sid = uuid_random(),
> + .cid = cid ? *cid : UUID_ZERO,
> + .name = xstrdup(name),
> + .local_address = xstrdup(local_address),
> + .joining = true,
> + };
> + sset_clone(&h.remote_addresses, remote_addresses);
> + error = ovsdb_log_write_and_free(log, raft_header_to_json(&h));
> + raft_header_uninit(&h);
I don't think anything initializes 'h->snap', which seems like it could be a problem for raft_header_to_json() and raft_header_uninit().
> +/* Reads the initial header record from 'log', which must be a Raft clustered
> + * database log, and populates '*md' with the information read from it. The
> + * caller must eventually destroy 'md'.
You might mention that they do this by calling raft_metadata_destroy().
> + *
> + * On success, returns NULL. On failure, returns a error that the caller must
> + * eventually destroy and zeros '*md'. */
s/a error/an error/
> +static struct json *
> +raft_servers_for_index(const struct raft *raft, uint64_t index)
I think it would be worth mentioning that the caller must free the return value.
> +static struct ovsdb_error * OVS_WARN_UNUSED_RESULT
> +raft_read_header(struct raft *raft)
> +{
> + /* Read header record. */
> + struct json *json;
> + struct ovsdb_error *error = ovsdb_log_read(raft->log, &json);
I don't think anything free 'json'.
> +static void
> +raft_add_conn(struct raft *raft, struct jsonrpc_session *js,
> + const struct uuid *sid, bool incoming)
> +{
> ...
> + if (sid) {
> + conn->sid = *sid;
> + }
> + conn->nickname = raft_address_to_nickname(jsonrpc_session_get_name(js),
> + &conn->sid);
The 'sid' argument can be null, which seems like it could be a problem with raft_address_to_nickname().
> +/* Attempts to start 'raft' leaving its cluster. The caller can check progress
> + * using raft_is_leaving() and raft_left(). */
> +void
> +raft_leave(struct raft *raft)
> +{
> ...
> + raft->leaving = true;
> + raft_transfer_leadership(raft, "this server is leaving the cluster");
> + raft_become_follower(raft);
> + raft_send_remove_server_requests(raft);
> + raft->leave_timeout = time_msec() + ELECTION_BASE_MSEC;
> + raft->leaving = true;
This sets 'raft->leaving' twice. Was that intentional?
> +/* Returns true if 'raft' is successfully left its cluster. */
> +bool
> +raft_left(const struct raft *raft)
I think you can drop "is" in the comment describing the function.
> +static void
> +raft_close__(struct raft *raft)
> +{
I think it would be worth explaining why this would be called separately from raft_close().
> + struct raft_server *rs = raft->remove_server;
> + if (rs) {
> + raft_send_remove_server_reply__(raft, &rs->sid, &rs->requester_sid,
> + rs->requester_conn, false,
> + RAFT_SERVER_SHUTDOWN);
> + raft_server_destroy(raft->remove_server);
> + }
I think it might be safer to set 'raft->remove_server' to null in case this gets called, and then raft_close() is called.
> void
> raft_close(struct raft *raft)
> {
Should something free the 'raft->waiters' entries?
> +static bool
> +raft_conn_receive(struct raft *raft, struct raft_conn *conn,
> + union raft_rpc *rpc)
> +{
> ...
> + } else if (!uuid_equals(&conn->sid, &rpc->common.sid)) {
> + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
> + VLOG_WARN_RL(&rl,
> + "%s: remote server ID changed from "SID_FMT" to "SID_FMT,
> + jsonrpc_session_get_name(conn->js),
> + SID_ARGS(&conn->sid), SID_ARGS(&rpc->common.sid));
If the SID changed, shouldn't we update "conn->sid"?
> +/* Returns true if 'conn' should stay open, 'conn' if it should be closed. */
> +static bool
> +raft_conn_should_stay_open(struct raft *raft, struct raft_conn *conn)
I assume "false" if it should be closed. :-)
> +/* Allows 'raft' to maintain the distributed log. Call this function as part
> + * of the process's main loop. */
> +void
> +raft_run(struct raft *raft)
> +{
> ...
> + /* Close unneeded sessions. */
> + struct raft_conn *next;
> + LIST_FOR_EACH_SAFE (conn, next, list_node, &raft->conns) {
> + if (!raft_conn_should_stay_open(raft, conn)) {
> + jsonrpc_session_close(conn->js);
> + ovs_list_remove(&conn->list_node);
> + free(conn);
I think 'conn->nickname' needs to be freed, too. Maybe a helper function, since there are a couple of places that free 'conn'
> + if (raft->joining && time_msec() >= raft->join_timeout) {
> + raft->join_timeout = time_msec() + 1000;
> + struct raft_conn *conn;
This 'conn' shadows a previous declaration. I think you can just drop declaring this one.
> + if (time_msec() >= raft->ping_timeout) {
> + if (raft->role == RAFT_LEADER) {
> + raft_send_heartbeats(raft);
> + } else {
> + long long int now = time_msec();
> + struct raft_command *cmd, *next;
This 'next' shadows a 'next' of a different type.
> +/* Frees 'cmd'. */
> +void
> +raft_command_unref(struct raft_command *cmd)
> +{
It seems like a more accurate description would be along the lines of "Decrements the ref count for 'cmd' and frees it if zero."
> +/* 'sid' is the server being added or removed. */
> +static void
> +raft_send_add_server_reply__(struct raft *raft, const struct uuid *sid,
> + const char *address,
> + bool success, const char *comment)
The description makes it sound like a server could be removed using this function, but I don't think that's the case.
> +static void
> +raft_server_init_leader(struct raft *raft, struct raft_server *s)
> +{
> + s->next_index = raft->log_end;
> + s->match_index = 0;
> + s->phase = RAFT_PHASE_STABLE;
> +}
The name of this function is a little surprising to me. It seems like it's initializing a server, not necessarily a leader.
> +/* Processes term 'term' received as part of RPC 'common'. Returns true if the
> + * caller should continue processing the RPC, false if the caller should reject
> + * it due to a stale term. */
> +static bool
> +raft_receive_term__(struct raft *raft, const struct raft_rpc_common *common,
> + uint64_t term)
> +{
> + /* Section 3.3 says:
> + *
> + * Current terms are exchanged whenever servers communicate; if one
> + * server’s current term is smaller than the other’s, then it updates
> + * its current term to the larger value. If a candidate or leader
> + * discovers that its term is out of date, it immediately reverts to
> + * follower state. If a server receives a request with a stale term
> + * number, it rejects the request.
> + */
> + if (term > raft->term) {
> + if (!raft_set_term(raft, term, NULL)) {
> + return false;
I don't understand how the result of raft_set_term() plays into this decision to return false.
> +static const struct json *
> +raft_peek_next_entry(struct raft *raft, struct uuid *eid)
> +{
> + /* Invariant: log_start - 2 <= last_applied <= commit_index < log_end. */
> + ovs_assert(raft->log_start <= raft->last_applied + 2);
> + ovs_assert(raft->last_applied <= raft->commit_index);
> + ovs_assert(raft->commit_index < raft->log_end);
> +
> + if (raft->joining || raft->failed) { /* XXX needed? */
> + return NULL;
> + }
Did you want to check this "XXX"?
> +/* If 'prev_log_index' exists in 'raft''s log, term 'prev_log_term', returns
> + * NULL. Otherwise, returns an explanation for the mismatch. */
> +static const char *
> +match_index_and_term(const struct raft *raft,
> + uint64_t prev_log_index, uint64_t prev_log_term)
The description of 'prev_log_term' was confusing to me. I think it may be missing some text about it matching the term pointed to by 'prev_log_index'.
> +/* Returns NULL on success, RAFT_IN_PROGRESS for an operation in progress,
> + * otherwise a brief comment explaining failure. */
> +static void
> +raft_handle_append_entries(struct raft *raft,
> + const struct raft_append_request *rq,
> + uint64_t prev_log_index, uint64_t prev_log_term,
> + const struct raft_entry *entries,
> + unsigned int n_entries)
This description doesn't seem to match the function.
> + if (raft->entries[log_index - raft->log_start].term
> + != entries[i].term) {
> + if (raft_truncate(raft, log_index)) {
> + servers_changed = true;
Does this always mean the servers changed?
> +static bool
> +raft_update_leader(struct raft *raft, const struct uuid *sid)
> +{
> + if (raft->role == RAFT_LEADER && !uuid_equals(sid, &raft->sid)) {
> + char buf[SID_LEN + 1];
> + VLOG_ERR("this server is leader but server %s claims to be",
> + raft_get_nickname(raft, sid, buf, sizeof buf));
> + return false;
If 'sid' and 'raft->sid' were equal, wouldn't that mean that this other system is using the same sid? That also seems like it would be a problem.
> +static void
> +raft_handle_append_request__(struct raft *raft,
> + const struct raft_append_request *rq)
> +{
> ...
> + /*
> + * We now know that the data in rq->entries[] overlaps the data in
> + * raft->entries[], as shown below, with some positive 'ofs':
> + *
> + * rq->prev_log_index
> + * | first_entry_index
> + * | | nth_entry_index
> + * | | |
> + * v v v
> + * +---+---+---+---+---+
> + * T | T | T | T | T | T |
> + * +---+-------+---+---+
> + * +---+---+---+---+
> + * T | T | T | T | T |
> + * +---+---+---+---+
> + * ^ ^
> + * | |
> + * log_start log_end
> + *
> + * |<-- ofs -->|
I think 'ofs' points to the middle of the 'log_start' record. I think it should maybe be moved a couple of spaces to make it clearer.
> +/* Returns the next log entry or snapshot from 'raft', or NULL if there are
> + * none left to read.. Stores the entry ID of the log entry in '*eid'. Stores
> + * true in '*is_snapshot' if the returned data is a snapshot, false if it is a
> + * log entry.*/
> +const struct json *
> +raft_next_entry(struct raft *raft, struct uuid *eid, bool *is_snapshot)
> +{
There's an extra period after "read".
> static void
> raft_handle_append_request(struct raft *raft,
> const struct raft_append_request *rq)
> {
> raft_handle_append_request__(raft, rq);
> }
Is there a reason not to fold in raft_handle_append_request__()? There don't seem to be any other callers to it.
> +static void
> +raft_log_reconfiguration(struct raft *raft)
> +{
> + /* Add the reconfiguration to the log.
> + *
> + * We ignore any */
Is this comment not finished?
> +static void
> +raft_run_reconfigure(struct raft *raft)
> +{
> ...
> + /* Remove a server, if one is scheduled for removal. */
> + HMAP_FOR_EACH (s, hmap_node, &raft->servers) {
> + if (s->phase == RAFT_PHASE_REMOVE) {
> + hmap_remove(&raft->servers, &s->hmap_node);
> + raft->remove_server = s;
Are we sure that 'raft->remove_server' will only be assigned once? I think it might be worth asserting that it's null before assigning here.
> +static void
> +raft_handle_remove_server_reply(struct raft *raft,
> + const struct raft_remove_server_reply *rpc)
> +{
> + if (rpc->success) {
> + VLOG_INFO("%04x: finished leaving cluster %04x",
> + uuid_prefix(&raft->sid, 4), uuid_prefix(&raft->cid, 4));
Do you want to use SID_FMT and CID_FMT here?
> +/* Returns true if 'raft' has grown enough that reducing the log to a snapshot
> + * would be valuable, false otherwise. When this function returns true, the
> + * client should consider using raft_store_snapshot() to reduce the log storage
> + * requirements. */
> +bool
> +raft_grew_lots(const struct raft *raft)
> +{
> + return (!raft->joining
> + && !raft->leaving
> + && !raft->left
> + && !raft->failed
> + && raft->last_applied - raft->log_start >= 100
> + && ovsdb_log_grew_lots(raft->log));
ovsdb_log_grew_lots() is based on the filesize. Raft entries are bigger. It's probably not significant, but should this difference be taken that into account?
> +/* Replaces the log for 'raft', up to the last log entry read, by
> + * 'new_snapshot_data'. Returns NULL if successful, otherwise an error that
> + * the caller must eventually free. */
> +struct ovsdb_error * OVS_WARN_UNUSED_RESULT
> +raft_store_snapshot(struct raft *raft, const struct json *new_snapshot_data)
I think it would be worth mentioning that this function doesn't take ownership of 'new_snapshot_data'.
> + const struct raft_entry new_snapshot = {
> + .term = raft_get_term(raft, new_log_start - 1),
> + .data = CONST_CAST(struct json *, new_snapshot_data),
> + .eid = *raft_get_eid(raft, new_log_start - 1),
> + .servers = raft_servers_for_index(raft, new_log_start - 1),
> + };
> + struct ovsdb_error *error = raft_save_snapshot(raft, new_log_start,
> + &new_snapshot);
> + if (error) {
> + return error;
> + }
I think this may leak 'new_snapshot.servers'. If it doesn't under the normal circumstance, I think it can under an error.
> +static void
> +raft_unixctl_status(struct unixctl_conn *conn,
> + int argc OVS_UNUSED, const char *argv[],
> + void *aux OVS_UNUSED)
> +{
> ...
> + ds_put_format(&s, "Cluster ID: ");
> + if (!uuid_is_zero(&raft->cid)) {
> + ds_put_format(&s, UUID_FMT"\n", UUID_ARGS(&raft->cid));
> + } else {
> + ds_put_format(&s, "not yet known\n");
> + }
> + ds_put_format(&s, "Server ID: "SID_FMT" ("UUID_FMT")\n",
> + SID_ARGS(&raft->sid), UUID_ARGS(&raft->sid));
Did you intentionally not use CID_FMT like you did for SID?
> + ds_put_cstr(&s, "Servers:\n");
> + struct raft_server *server;
> + HMAP_FOR_EACH (server, hmap_node, &raft->servers) {
> + ds_put_format(&s, " %s ("SID_FMT" at %s)",
> + server->nickname,
> + SID_ARGS(&server->sid), server->address);
> + if (uuid_equals(&server->sid, &raft->sid)) {
> + ds_put_cstr(&s, " (me)");
raft_put_sid() seems to use "self" to describe its own SID. Do you think it's worth using the same terminology?
> +static void
> +raft_unixctl_leave__(struct unixctl_conn *conn, struct raft *raft)
> +{
> + if (raft_left(raft)) {
> + unixctl_command_reply(conn, NULL);
> ...
> + } else {
> + raft_leave(raft);
> + unixctl_command_reply(conn, NULL);
> + }
> +}
It doesn't seem like this makes a distinction between starting to leave and already having left. It seems like it would be helpful to provide a reply that indicates the difference.
> +static void
> +raft_unixctl_leave(struct unixctl_conn *conn, int argc, const char *argv[],
> + void *aux OVS_UNUSED)
> +{
> + bool force = argc > 2 && !strcmp(argv[1], "--force");
It doesn't appear that "--force" actually does anything.
Some final random things:
- It might be nice to add some tests that use "tcp" instead of "unix" sockets. I thought I noticed some instances in the code that could be problematic.
- I think corner cases, such as 'raft->failed' getting set to true aren't handled in the client code.
- It's very difficult to determine whether an argument is consumed or not, or whether the return value should be freed. Better documentation (particularly for public functions) would help.
- Should the copyright dates just be 2017 and 2018 for new files? A lot of them reference older years.
Thanks for implementing this important feature!
Acked-by: Justin Pettit <jpettit at ovn.org>
--Justin
More information about the dev
mailing list