[ovs-git] [openvswitch/ovs] e362ad: raft: Reintroduce jsonrpc inactivity probes.

Ilya Maximets noreply at github.com
Mon Mar 1 20:18:24 UTC 2021


  Branch: refs/heads/branch-2.12
  Home:   https://github.com/openvswitch/ovs
  Commit: e362ad4d12b97f1881ecf696c3e2adb46786cea6
      https://github.com/openvswitch/ovs/commit/e362ad4d12b97f1881ecf696c3e2adb46786cea6
  Author: Ilya Maximets <i.maximets at ovn.org>
  Date:   2021-03-01 (Mon, 01 Mar 2021)

  Changed paths:
    M ovsdb/raft.c

  Log Message:
  -----------
  raft: Reintroduce jsonrpc inactivity probes.

It's not enough to just have heartbeats.

RAFT heartbeats are unidirectional, i.e. leader sends them to followers
but not the other way around.  Missing heartbeats provokes followers to
start election, but if leader will not receive any replies it will not
do anything while there is a quorum, i.e. there are enough other
servers to make decisions.

This leads to situation that while TCP connection is established,
leader will continue to blindly send messages to it.  In our case this
leads to growing send backlog.  Connection will be terminated
eventually due to excessive send backlog, but this this might take a
lot of time and wasted process memory.  At the same time 'candidate'
will continue to send vote requests to the dead connection on its
side.

To fix that we need to reintroduce inactivity probes that will drop
connection if there was no incoming traffic for a long time and remote
server doesn't reply to the "echo" request.  Probe interval might be
chosen based on an election timeout to avoid issues described in commit
db5a066c17bd.

Reported-by: Carlos Goncalves <cgoncalves at redhat.com>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1929690
Fixes: db5a066c17bd ("raft: Disable RAFT jsonrpc inactivity probe.")
Acked-by: Han Zhou <hzhou at ovn.org>
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>


  Commit: 6bf6378922e591f11f3b61ee65159ebe6b9ea7b8
      https://github.com/openvswitch/ovs/commit/6bf6378922e591f11f3b61ee65159ebe6b9ea7b8
  Author: Ilya Maximets <i.maximets at ovn.org>
  Date:   2021-03-01 (Mon, 01 Mar 2021)

  Changed paths:
    M ovsdb/raft.c

  Log Message:
  -----------
  raft: Report disconnected in cluster/status if candidate retries election.

If election times out for a server in 'candidate' role it sets
'candidate_retrying' flag that notifies that storage is disconnected
and client should re-connect.  However, cluster/status command
reports 'Status: cluster member' and that is misleading.
Reporting "disconnected from the cluster (election timeout)" instead.

Reported-by: Carlos Goncalves <cgoncalves at redhat.com>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1929690
Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered databases.")
Acked-by: Han Zhou <hzhou at ovn.org>
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>


Compare: https://github.com/openvswitch/ovs/compare/7fa0206754c1...6bf6378922e5


More information about the git mailing list