[ovs-dev] [PATCH 15/15] ovsdb: Introduce experimental support for clustered databases. (2/2)
Ben Pfaff
blp at ovn.org
Fri Mar 23 22:27:24 UTC 2018
Thank you for the review. As before, I am applying most of your
comments and not bothering to respond to them, so below you can find
what needed a reply.
On Sat, Feb 24, 2018 at 03:17:51PM -0800, Justin Pettit wrote:
> > On Dec 31, 2017, at 9:16 PM, Ben Pfaff <blp at ovn.org> wrote:
> > diff --git a/ovsdb/raft-private.c b/ovsdb/raft-private.c
> > new file mode 100644
> > index 000000000000..457d1292a949
> > --- /dev/null
> > +++ b/ovsdb/raft-private.c
> >
> > +struct ovsdb_error * OVS_WARN_UNUSED_RESULT
> > +raft_address_validate(const char *address)
> > +{
> > ...
> > + return ovsdb_error(NULL, "%s: expected \"tcp\" or \"ssl\" address",
> > + address);
>
> I assume you're downplaying "unix:", since it only makes sense for testing purposes?
Yes. I don't want to encourage users to use it, since it isn't useful
for distribution but only for testing..
> > +char *
> > +raft_address_to_nickname(const char *address, const struct uuid *sid)
> > +{
>
> I think it may be worth explaining what a nickname is and how one gets one.
OK, I added a comment.
> > +void
> > +raft_servers_format(const struct hmap *servers, struct ds *ds)
> > +{
> > + int i = 0;
> > + const struct raft_server *s;
> > + HMAP_FOR_EACH (s, hmap_node, servers) {
> > + if (i++) {
> > + ds_put_cstr(ds, ", ");
> > + }
> > + ds_put_format(ds, SID_FMT"(%s)", SID_ARGS(&s->sid), s->address);
>
> Do you think it's worth putting a space between the SID and parenthetical address?
It's actually more readable the way it is. I spent a lot of time
reading this stuff.
> > +static void
> > +raft_append_request_from_jsonrpc(struct ovsdb_parser *p,
> > + struct raft_append_request *rq)
> > +{
> > ...
> > +
> > + const struct json *log = ovsdb_parser_member(p, "log", OP_ARRAY);
> > + if (!log) {
> > + return;
> > + }
>
> Should this parser argument include OP_OPTIONAL?
No, an append request always includes a log entry array (although I
guess it could be an empty array). Can you help me to understand what
makes you think so? Perhaps there is some inconsistency here that I do
not yet see, that I should fix.
> > +static void
> > +raft_append_reply_to_jsonrpc(const struct raft_append_reply *rpy,
> > + struct json *args)
> > +{
> > ...
> > + json_object_put_string(args, "result",
> > + raft_append_result_to_string(rpy->result));
>
> I believe raft_append_result_to_string() can return NULL, which could
> cause a problem here.
The code shouldn't try to send an invalid RPC; if it does, then a
segfault will allow the problem to be diagnosed about as well as an
assertion failure would.
> > +static void
> > +raft_format_append_reply(const struct raft_append_reply *rpy, struct ds *s)
> > +{
> > ...
> > + ds_put_format(s, " result=\"%s\"",
> > + raft_append_result_to_string(rpy->result));
>
> I don't think this will crash, but it will print strange if
> 'rpy->result' is not valid.
The code shouldn't try to print an invalid RPC; such an RPC, if
received, would fail to demarshal before it got to that point.
> Is it intentional that "n_entries" and "prev_log_*" aren't printed?
They aren't very useful for debugging and clutter the output.
> > +void
> > +raft_rpc_format(const union raft_rpc *rpc, struct ds *s)
> > +{
> > + ds_put_format(s, "%s", raft_rpc_type_to_string(rpc->type));
> > + if (rpc->common.comment) {
> > + ds_put_format(s, " \"%s\"", rpc->common.comment);
> > + }
> > + ds_put_char(s, ':');
>
> Is printing the SID not important?
It's better to let the caller print it, since it has more context that
allows for more human-friendly names (e.g. those nicknames you asked
about).
I added a comment.
> > +/* Creates a database file that represents a new server in an existing Raft
> > + * cluster.
> > + *
> > + * Creates the local copy of the cluster's log in 'file_name', which must not
> > + * already exist. Gives it the name 'name', which must be the same name
> > + * passed in to raft_create_cluster() earlier.
>
> Is raft_create_cluster() supposed to be called before this? I think
> raft_create_cluster() would have already created the log file, which
> the previous sentence said must not already exist.
Every server needs a log file, not just the first server in the cluster.
> > + * This only creates the on-disk file and does no network access. Use
> > + * raft_open() to start operating the new server. (Until this happens, the
> > + * new server has not joined the cluster.)
>
> The name for this function (and the corresponding ovsdb-tool command)
> seems a bit confusing in this way. Not a big deal, but it doesn't
> seem obvious.
I'm open to better names, can you suggest one?
> > + * Returns null if successful, otherwise an ovsdb_error describing the
> > + * problem. */
> > +struct ovsdb_error * OVS_WARN_UNUSED_RESULT
> > +raft_join_cluster(const char *file_name,
> > + const char *name, const char *local_address,
> > + const struct sset *remote_addresses,
> > + const struct uuid *cid)
> > +{
> > ...
> > + /* Write log file. */
> > + struct raft_header h = {
> > + .sid = uuid_random(),
> > + .cid = cid ? *cid : UUID_ZERO,
> > + .name = xstrdup(name),
> > + .local_address = xstrdup(local_address),
> > + .joining = true,
> > + };
> > + sset_clone(&h.remote_addresses, remote_addresses);
> > + error = ovsdb_log_write_and_free(log, raft_header_to_json(&h));
> > + raft_header_uninit(&h);
>
> I don't think anything initializes 'h->snap', which seems like it
> could be a problem for raft_header_to_json() and raft_header_uninit().
In C, an initializer always initializes every member, and those
functions appear to properly handle nulls.
I added a comment.
> > +static struct json *
> > +raft_servers_for_index(const struct raft *raft, uint64_t index)
>
> I think it would be worth mentioning that the caller must free the
> return value.
Good catch. The existing caller didn't, so there was a bug here.
Should be fixed now.
> > +static void
> > +raft_add_conn(struct raft *raft, struct jsonrpc_session *js,
> > + const struct uuid *sid, bool incoming)
> > +{
> > ...
> > + if (sid) {
> > + conn->sid = *sid;
> > + }
> > + conn->nickname = raft_address_to_nickname(jsonrpc_session_get_name(js),
> > + &conn->sid);
>
> The 'sid' argument can be null, which seems like it could be a problem with raft_address_to_nickname().
Yes, the nickname can end up being 0000 for a while, but it changes to a
better one on the first received RPC. It's not ideal, admittedly, but
it's clear enough in the logs.
> > +static bool
> > +raft_conn_receive(struct raft *raft, struct raft_conn *conn,
> > + union raft_rpc *rpc)
> > +{
> > ...
> > + } else if (!uuid_equals(&conn->sid, &rpc->common.sid)) {
> > + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
> > + VLOG_WARN_RL(&rl,
> > + "%s: remote server ID changed from "SID_FMT" to "SID_FMT,
> > + jsonrpc_session_get_name(conn->js),
> > + SID_ARGS(&conn->sid), SID_ARGS(&rpc->common.sid));
>
> If the SID changed, shouldn't we update "conn->sid"?
It's more of an error message. The sid shouldn't change. I updated the
code to make that clearer, and to discard the message.
> > +static void
> > +raft_run_reconfigure(struct raft *raft)
> > +{
> > ...
> > + /* Remove a server, if one is scheduled for removal. */
> > + HMAP_FOR_EACH (s, hmap_node, &raft->servers) {
> > + if (s->phase == RAFT_PHASE_REMOVE) {
> > + hmap_remove(&raft->servers, &s->hmap_node);
> > + raft->remove_server = s;
>
> Are we sure that 'raft->remove_server' will only be assigned once? I
> think it might be worth asserting that it's null before assigning
> here.
We can be sure that it is null because we set it to null earlier in the
function.
> > +/* Returns true if 'raft' has grown enough that reducing the log to a snapshot
> > + * would be valuable, false otherwise. When this function returns true, the
> > + * client should consider using raft_store_snapshot() to reduce the log storage
> > + * requirements. */
> > +bool
> > +raft_grew_lots(const struct raft *raft)
> > +{
> > + return (!raft->joining
> > + && !raft->leaving
> > + && !raft->left
> > + && !raft->failed
> > + && raft->last_applied - raft->log_start >= 100
> > + && ovsdb_log_grew_lots(raft->log));
>
> ovsdb_log_grew_lots() is based on the filesize. Raft entries are
> bigger. It's probably not significant, but should this difference be
> taken that into account?
I think that doubling the size of a file should be independent of the
size of the individual entries. It's meant to be, anyhow.
> > +/* Replaces the log for 'raft', up to the last log entry read, by
> > + * 'new_snapshot_data'. Returns NULL if successful, otherwise an error that
> > + * the caller must eventually free. */
> > +struct ovsdb_error * OVS_WARN_UNUSED_RESULT
> > +raft_store_snapshot(struct raft *raft, const struct json *new_snapshot_data)
>
> I think it would be worth mentioning that this function doesn't take
> ownership of 'new_snapshot_data'.
I still think that 'const' is a good enough hint. I always assume that
const arguments don't transfer ownership, unless there's a comment to
the contrary.
> Some final random things:
>
> - It might be nice to add some tests that use "tcp" instead of "unix" sockets. I thought I noticed some instances in the code that could be problematic.
>
> - I think corner cases, such as 'raft->failed' getting set to true aren't handled in the client code.
>
> - It's very difficult to determine whether an argument is consumed or not, or whether the return value should be freed. Better documentation (particularly for public functions) would help.
>
> Thanks for implementing this important feature!
>
> Acked-by: Justin Pettit <jpettit at ovn.org>
Thanks. I'll look at those last things and polish this a bit more and
then commit it to master.
More information about the dev
mailing list