[ovs-discuss] [HELP] Question about Raft

Han Zhou hzhou at ovn.org
Sat Mar 7 20:02:54 UTC 2020

On Sat, Mar 7, 2020 at 2:33 AM txfh2007 <txfh2007 at aliyun.com> wrote:
> Hi Han:
>     Thanks for your reply ! There is one point that I can't agree with
you: "If S2 or S3 already becomes leader, their term won't be lower than
S2. " In my test , in step 3, S3 is leader and its term is lower than S2.
The reason is when S2 disconnected from S1 and S3, S2 will add its term and
send vote req until its connection recovered. At the same time ,S3 becomes
leader and won't add its term. So it is possible that S2's term is larger
than S3's,  and that's why in Step 3, S2 replies "stale term" to S3's
append entry request.

Hi Timo,

Sorry that my answer wasn't accurate enough and caused confusion. My answer
was focusing on the "candidate forever" scenario as you reported so I
didn't take the more common scenario (that a reconnected server can have
larger term) into account, but of course the more common scenario do exist.
Please see my rephrased answer below. and let me know if it solves the

> Timo
> On Fri, Mar 6, 2020 at 1:13 AM txfh2007 via discuss <
ovs-discuss at openvswitch.org> wrote:
> >
> > Hi Han && all:
> >
> >     I have a question about RAFT: I have tried the latest OVN-2.30, and
have found in some condition, there is one node whose role is always
"Candidate" (got by cluster/status cmd), but act as a Follower. My cluster
still works well, but it seems odd that a server's role is always
Candidate. As far as I know, server's role is normally Follower or Leader.
> Hi Timo, I happened to fix the problem yesterday and here is the patch:
https://patchwork.ozlabs.org/patch/1250116/. Details of my analysis is in
commit message and a test case is added to cover this scenario.
> >     After digging into related code, I think I can try to describe how
to reproduce this scenario:
> >         1. It is three servers cluster: One Leader(S2), Two
> >         2. Try to disconnect Leader(S2) from other two servers,so S2
would add term and send vote request, and meanwhile S1 and S3 would choose
a new Leader(Let's say it's S3)
> When S1 and S3 choose a new leader, they (one of them, or both) would
have to increase the term, too.
> >         3. Recover connection between S2 and other two nodes, then if
S2 receives append entry req from S3, as S3's term is lower, so S2 will
reply "stale term"
> If S2 or S3 already becomes leader, their term won't be lower than S2.
>From this point on, the below steps shouldn't happen. But instead, it is
possible that when S2 receives append-request from the new leader, it has
the same term, and it updates the leader without switching from candidate
to follower, thus result in the candidate state forever.


If S1 (not S2, sorry for the typo above) or S3 already becomes leader, it
is possible that their term is the same as the one of S2 when S2's
connection restored, and when S2 received append-request from the new
leader, because it observes the same term, it updates the leader without
switching from candidate to follower (which is a bug of the implementation,
and fixed in the patch I posted, which is merged yesterday), thus result in
the candidate state forever. In this situation, the candidate doesn't
increase term and initiate vote-request any more because it receives
append-request (heartbeat) regularly and responses, like a follower. The
only difference is that it announces itself as "disconnected from cluster"
to its clients, so all the clients will be disconnected from it.

On the other hand, if S2's connection is restored after more election timer
timeouts, it's term can be larger than the new leader. In this case, it
won't trigger the "candidate forever" problem. Firstly, the candidate will
send vote-request with a larger term, but the new leader will reject
vote-request because it is leader itself, and the follower will also reject
the vote-request because of the logic of
"raft_should_suppress_disruptive_server()". However, the candidate will
receive append-request from the new leader, which has smaller term. It
replies append-reply with reason "stale term" but with the its own term
number. When the leader receives this reply, it sees a large term number
than its own, so it updates its term to the larger term and steps down as
follower, and then the cluster will start election again, which will end up
with one leader and two followers as usual.

> >         4. After S3 gets S2's reply, S3 will change its term to S2's
value and change its role to follower and then candidate(at the same time ,
S1/S2/S3 are all candidate role)
> >         5.Then if S2 got S3's vote request and vote for S3, S3 will
become new leader, but S2's role is still candidate

If all 3 ended up as candidate in same term as mention in your step 4, each
of them only votes to themselves, and there won't be any leader elected in
that term and they will have to increase term (at random time) and re-elect
again. For my understanding the only chance that end up with a candidate
forever, is when 2 servers entered into candidate competing in the *same

> >
> >     I guess The reason is term of S3's vote request is equal to S2's
term, For S2, it will change to follower only if receiving vote request
whose term value is larger than it own .
> >     Am I right? and the candidate role(but actually is a follower) is
reasonable ?
> >
> > Thanks
> > Timo
> >
> Hi Timo,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20200307/8445f4b1/attachment.html>

More information about the discuss mailing list