[ovs-discuss] Re： [HELP] Question about Raft
txfh2007 at aliyun.com
Sat Mar 7 10:33:21 UTC 2020
Thanks for your reply ! There is one point that I can't agree with you: "If S2 or S3 already becomes leader, their term won't be lower than S2. " In my test , in step 3, S3 is leader and its term is lower than S2. The reason is when S2 disconnected from S1 and S3, S2 will add its term and send vote req until its connection recovered. At the same time ,S3 becomes leader and won't add its term. So it is possible that S2's term is larger than S3's, and that's why in Step 3, S2 replies "stale term" to S3's append entry request.
On Fri, Mar 6, 2020 at 1:13 AM txfh2007 via discuss <ovs-discuss at openvswitch.org> wrote:
> Hi Han && all:
> I have a question about RAFT: I have tried the latest OVN-2.30, and have found in some condition, there is one node whose role is always "Candidate" (got by cluster/status cmd), but act as a Follower. My cluster still works well, but it seems odd that a server's role is always Candidate. As far as I know, server's role is normally Follower or Leader.
Hi Timo, I happened to fix the problem yesterday and here is the patch: https://patchwork.ozlabs.org/patch/1250116/. Details of my analysis is in commit message and a test case is added to cover this scenario.
> After digging into related code, I think I can try to describe how to reproduce this scenario:
> 1. It is three servers cluster: One Leader(S2), Two followers(S1,S3)
> 2. Try to disconnect Leader(S2) from other two servers,so S2 would add term and send vote request, and meanwhile S1 and S3 would choose a new Leader(Let's say it's S3)
When S1 and S3 choose a new leader, they (one of them, or both) would have to increase the term, too.
> 3. Recover connection between S2 and other two nodes, then if S2 receives append entry req from S3, as S3's term is lower, so S2 will reply "stale term"
If S2 or S3 already becomes leader, their term won't be lower than S2. From this point on, the below steps shouldn't happen. But instead, it is possible that when S2 receives append-request from the new leader, it has the same term, and it updates the leader without switching from candidate to follower, thus result in the candidate state forever.
> 4. After S3 gets S2's reply, S3 will change its term to S2's value and change its role to follower and then candidate(at the same time , S1/S2/S3 are all candidate role)
> 5.Then if S2 got S3's vote request and vote for S3, S3 will become new leader, but S2's role is still candidate
> I guess The reason is term of S3's vote request is equal to S2's term, For S2, it will change to follower only if receiving vote request whose term value is larger than it own .
> Am I right? and the candidate role(but actually is a follower) is reasonable ?
More information about the discuss