[ovs-git] [openvswitch/ovs] 877618: raft: Fix next_index in install_snapshot reply han...

Han Zhou noreply at github.com
Fri Mar 6 23:02:13 UTC 2020


  Branch: refs/heads/branch-2.13
  Home:   https://github.com/openvswitch/ovs
  Commit: 877618fc833273d1e29e012b5e925d51cba80ff5
      https://github.com/openvswitch/ovs/commit/877618fc833273d1e29e012b5e925d51cba80ff5
  Author: Han Zhou <hzhou at ovn.org>
  Date:   2020-03-06 (Fri, 06 Mar 2020)

  Changed paths:
    M ovsdb/raft.c

  Log Message:
  -----------
  raft: Fix next_index in install_snapshot reply handling.

When a leader handles install_snapshot reply, the next_index for
the follower should be log_start instead of log_end, because there
can be new entries added in leader's log after initiating the
install_snapshot procedure.  Also, it should send all the accumulated
entries to follower in the following append-request message, instead
of sending 0 entries, to speed up the converge.

Without this fix, there is no functional problem, but it takes
uncessary extra rounds of append-requests responsed with "inconsistency"
by follower, although finally will be converged.

Signed-off-by: Han Zhou <hzhou at ovn.org>
Signed-off-by: Ben Pfaff <blp at ovn.org>


  Commit: 25a7e5547f1e107db0f032ad269f447c57401531
      https://github.com/openvswitch/ovs/commit/25a7e5547f1e107db0f032ad269f447c57401531
  Author: Han Zhou <hzhou at ovn.org>
  Date:   2020-03-06 (Fri, 06 Mar 2020)

  Changed paths:
    M ovsdb/raft.c
    M tests/ovsdb-cluster.at

  Log Message:
  -----------
  raft: Fix the problem of stuck in candidate role forever.

Sometimes a server can stay in candidate role forever, even if the server
already see the new leader and handles append-requests normally. However,
because of the wrong role, it appears as disconnected from cluster and
so the clients are disconnected.

This problem happens when 2 servers become candidates in the same
term, and one of them is elected as leader in that term. It can be
reproduced by the test cases added in this patch.

The root cause is that the current implementation only changes role to
follower when a bigger term is observed (in raft_receive_term__()).
According to the RAFT paper, if another candidate becomes leader with
the same term, the candidate should change to follower.

This patch fixes it by changing the role to follower when leader
is being updated in raft_update_leader().

Signed-off-by: Han Zhou <hzhou at ovn.org>
Signed-off-by: Ben Pfaff <blp at ovn.org>


Compare: https://github.com/openvswitch/ovs/compare/3ae90e1899c5...25a7e5547f1e


More information about the git mailing list