[ovs-dev] [PATCH] raft: Reintroduce jsonrpc inactivity probes.

Han Zhou hzhou at ovn.org
Thu Feb 25 07:06:57 UTC 2021


On Tue, Feb 23, 2021 at 5:15 AM Ilya Maximets <i.maximets at ovn.org> wrote:
>
> It's not enough to just have heartbeats.
>
> RAFT heartbeats are unidirectional, i.e. leader sends them to followers
> but not the other way around.  Missing heartbeats provokes followers to
> start election, but if leader will not receive any replies it will not
> do anything while there is a quorum, i.e. there are enough other
> servers to make decisions.
>
> This leads to situation that while TCP connection is established,
> leader will continue to blindly send messages to it.  In our case this
> leads to growing send backlog.  Connection will be terminated
> eventually due to excessive send backlog, but this this might take a
> lot of time and wasted process memory.  At the same time 'candidate'
> will continue to send vote requests to the dead connection on its
> side.
>
> To fix that we need to reintroduce inactivity probes that will drop
> connection if there was no incoming traffic for a long time and remote
> server doesn't reply to the "echo" request.  Probe interval might be
> chosen based on an election timeout to avoid issues described in commit
> db5a066c17bd.
>
> Fixes: db5a066c17bd ("raft: Disable RAFT jsonrpc inactivity probe.")
> Signed-off-by: Ilya Maximets <i.maximets at ovn.org>
> ---
>  ovsdb/raft.c | 32 +++++++++++++++++++++++++++++++-
>  1 file changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/ovsdb/raft.c b/ovsdb/raft.c
> index ea91d1fdb..0fb1420fb 100644
> --- a/ovsdb/raft.c
> +++ b/ovsdb/raft.c
> @@ -940,6 +940,34 @@ raft_reset_ping_timer(struct raft *raft)
>      raft->ping_timeout = time_msec() + raft->election_timer / 3;
>  }
>
> +static void
> +raft_conn_update_probe_interval(struct raft *raft, struct raft_conn
*r_conn)
> +{
> +    /* Inactivity probe will be sent if connection will remain idle for
the
> +     * time of an election timeout.  Connection will be dropped if
inactivity
> +     * will last twice that time.
> +     *
> +     * It's not enough to just have heartbeats if connection is still
> +     * established, but no packets received from the other side.  Without
> +     * inactivity probe follower will just try to initiate election
> +     * indefinitely staying in 'candidate' role.  And the leader will
continue
> +     * to send heartbeats to the dead connection thinking that remote
server
> +     * is still part of the cluster. */
> +    int probe_interval = raft->election_timer + ELECTION_RANGE_MSEC;
> +
> +    jsonrpc_session_set_probe_interval(r_conn->js, probe_interval);
> +}
> +
> +static void
> +raft_update_probe_intervals(struct raft *raft)
> +{
> +    struct raft_conn *r_conn;
> +
> +    LIST_FOR_EACH (r_conn, list_node, &raft->conns) {
> +        raft_conn_update_probe_interval(raft, r_conn);
> +    }
> +}
> +
>  static void
>  raft_add_conn(struct raft *raft, struct jsonrpc_session *js,
>                const struct uuid *sid, bool incoming)
> @@ -954,7 +982,7 @@ raft_add_conn(struct raft *raft, struct
jsonrpc_session *js,
>                                                &conn->sid);
>      conn->incoming = incoming;
>      conn->js_seqno = jsonrpc_session_get_seqno(conn->js);
> -    jsonrpc_session_set_probe_interval(js, 0);
> +    raft_conn_update_probe_interval(raft, conn);
>      jsonrpc_session_set_backlog_threshold(js,
raft->conn_backlog_max_n_msgs,
>
 raft->conn_backlog_max_n_bytes);
>  }
> @@ -2804,6 +2832,7 @@ raft_update_commit_index(struct raft *raft,
uint64_t new_commit_index)
>                            raft->election_timer, e->election_timer);
>                  raft->election_timer = e->election_timer;
>                  raft->election_timer_new = 0;
> +                raft_update_probe_intervals(raft);
>              }
>              if (e->servers) {
>                  /* raft_run_reconfigure() can write a new Raft entry,
which can
> @@ -2820,6 +2849,7 @@ raft_update_commit_index(struct raft *raft,
uint64_t new_commit_index)
>                  VLOG_INFO("Election timer changed from %"PRIu64" to
%"PRIu64,
>                            raft->election_timer, e->election_timer);
>                  raft->election_timer = e->election_timer;
> +                raft_update_probe_intervals(raft);
>              }
>          }
>          /* Check if any pending command can be completed, and complete
it.
> --
> 2.26.2
>

Thanks Ilya.
Acked-by: Han Zhou <hzhou at ovn.org>


More information about the dev mailing list