[ovs-dev] [PATCH 2/2] ovsdb raft: Avoid unnecessary reconnecting during leader election.

Han Zhou zhouhan at gmail.com
Fri Apr 19 19:17:47 UTC 2019


From: Han Zhou <hzhou8 at ebay.com>

If a server claims itself as "disconnected", all clients connected
to that server will try to reconnect to a new server in the cluster.

However, currently a server would claim itself as disconnected even
when itself is the candidate and try to become the new leader (most
likely it will be), and all its clients will reconnect to another
node.

During a leader fail-over (e.g. due to a leader failure), it is
expected that all clients of the old leader will have to reconnect
to other nodes in the cluster, but it is unnecessary for all the
clients of a healthy node to reconnect, which could cause more
disturbance in a large scale environment.

This patch fixes the problem by slightly change the condition that
a server regards itself as disconnected: if its role is candidate,
it is regarded as disconnected only if the election didn't succeed
at the first attempt. Related failure test cases are also unskipped
and all passed with this patch.

Signed-off-by: Han Zhou <hzhou8 at ebay.com>
---
 ovsdb/raft.c           | 13 +++++++++++--
 tests/ovsdb-cluster.at |  6 ------
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/ovsdb/raft.c b/ovsdb/raft.c
index 77ad365..c60ef41 100644
--- a/ovsdb/raft.c
+++ b/ovsdb/raft.c
@@ -286,6 +286,8 @@ struct raft {
 
     /* Candidates only.  Reinitialized at start of election. */
     int n_votes;                /* Number of votes for me. */
+    bool candidate_retrying;    /* The first round of election timed-out and it
+                                   is now retrying. */
 };
 
 /* All Raft structures. */
@@ -985,11 +987,17 @@ raft_get_sid(const struct raft *raft)
 /* Returns true if 'raft' has completed joining its cluster, has not left or
  * initiated leaving the cluster, does not have failed disk storage, and is
  * apparently connected to the leader in a healthy way (or is itself the
- * leader).*/
+ * leader).
+ *
+ * If 'raft' is candidate:
+ * a) if it is the first round of election, consider it as connected, hoping
+ *    it will successfully elect a new leader soon.
+ * b) if it is already retrying, consider it as disconnected (so that clients
+ *    may decide to reconnect to other members). */
 bool
 raft_is_connected(const struct raft *raft)
 {
-    return (raft->role != RAFT_CANDIDATE
+    return (!(raft->role == RAFT_CANDIDATE && raft->candidate_retrying)
             && !raft->joining
             && !raft->leaving
             && !raft->left
@@ -1608,6 +1616,7 @@ raft_start_election(struct raft *raft, bool leadership_transfer)
     }
 
     ovs_assert(raft->role != RAFT_LEADER);
+    raft->candidate_retrying = (raft->role == RAFT_CANDIDATE);
     raft->role = RAFT_CANDIDATE;
 
     raft->n_votes = 0;
diff --git a/tests/ovsdb-cluster.at b/tests/ovsdb-cluster.at
index 931da6e..4701272 100644
--- a/tests/ovsdb-cluster.at
+++ b/tests/ovsdb-cluster.at
@@ -148,8 +148,6 @@ AT_CLEANUP
 
 AT_SETUP([OVSDB cluster - txn on follower-2, leader crash before sending execRep, follower-2 becomes leader])
 AT_KEYWORDS([ovsdb server negative unix cluster pending-txn])
-# XXX: fix bug before enabling this test
-AT_CHECK([exit 77])
 ovsdb_cluster_failure_test 2 3 1 crash-before-sending-execute-command-reply 2
 AT_CLEANUP
 
@@ -160,8 +158,6 @@ AT_CLEANUP
 
 AT_SETUP([OVSDB cluster - txn on follower-2, leader crash after sending execRep, follower-2 becomes leader])
 AT_KEYWORDS([ovsdb server negative unix cluster pending-txn])
-# XXX: fix bug before enabling this test
-AT_CHECK([exit 77])
 ovsdb_cluster_failure_test 2 3 1 crash-after-sending-execute-command-reply 2
 AT_CLEANUP
 
@@ -172,8 +168,6 @@ AT_CLEANUP
 
 AT_SETUP([OVSDB cluster - txn on leader, leader crash before sending appendReq, follower-2 becomes leader])
 AT_KEYWORDS([ovsdb server negative unix cluster pending-txn])
-# XXX: fix bug before enabling this test
-AT_CHECK([exit 77])
 ovsdb_cluster_failure_test 1 2 1 crash-before-sending-append-request 2
 AT_CLEANUP
 
-- 
2.1.0



More information about the dev mailing list