[ovs-git] [openvswitch/ovs] 290ae0: raft: Reintroduce jsonrpc inactivity probes.

William Tu noreply at github.com
Mon Mar 1 20:18:02 UTC 2021


  Branch: refs/heads/branch-2.14
  Home:   https://github.com/openvswitch/ovs
  Commit: 290ae09d5307f0cfebe765fba9e01d94ac8877b5
      https://github.com/openvswitch/ovs/commit/290ae09d5307f0cfebe765fba9e01d94ac8877b5
  Author: Ilya Maximets <i.maximets at ovn.org>
  Date:   2021-03-01 (Mon, 01 Mar 2021)

  Changed paths:
    M ovsdb/raft.c

  Log Message:
  -----------
  raft: Reintroduce jsonrpc inactivity probes.

It's not enough to just have heartbeats.

RAFT heartbeats are unidirectional, i.e. leader sends them to followers
but not the other way around.  Missing heartbeats provokes followers to
start election, but if leader will not receive any replies it will not
do anything while there is a quorum, i.e. there are enough other
servers to make decisions.

This leads to situation that while TCP connection is established,
leader will continue to blindly send messages to it.  In our case this
leads to growing send backlog.  Connection will be terminated
eventually due to excessive send backlog, but this this might take a
lot of time and wasted process memory.  At the same time 'candidate'
will continue to send vote requests to the dead connection on its
side.

To fix that we need to reintroduce inactivity probes that will drop
connection if there was no incoming traffic for a long time and remote
server doesn't reply to the "echo" request.  Probe interval might be
chosen based on an election timeout to avoid issues described in commit
db5a066c17bd.

Reported-by: Carlos Goncalves <cgoncalves at redhat.com>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1929690
Fixes: db5a066c17bd ("raft: Disable RAFT jsonrpc inactivity probe.")
Acked-by: Han Zhou <hzhou at ovn.org>
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>


  Commit: 94e4395a3080a90e7d19ef6205756e082f19f8aa
      https://github.com/openvswitch/ovs/commit/94e4395a3080a90e7d19ef6205756e082f19f8aa
  Author: Ilya Maximets <i.maximets at ovn.org>
  Date:   2021-03-01 (Mon, 01 Mar 2021)

  Changed paths:
    M ovsdb/raft.c

  Log Message:
  -----------
  raft: Report disconnected in cluster/status if candidate retries election.

If election times out for a server in 'candidate' role it sets
'candidate_retrying' flag that notifies that storage is disconnected
and client should re-connect.  However, cluster/status command
reports 'Status: cluster member' and that is misleading.
Reporting "disconnected from the cluster (election timeout)" instead.

Reported-by: Carlos Goncalves <cgoncalves at redhat.com>
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1929690
Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered databases.")
Acked-by: Han Zhou <hzhou at ovn.org>
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>


  Commit: a7ffe403fee3ea45f6ec5f64247e3ebee22c8f9e
      https://github.com/openvswitch/ovs/commit/a7ffe403fee3ea45f6ec5f64247e3ebee22c8f9e
  Author: William Tu <u9012063 at gmail.com>
  Date:   2021-03-01 (Mon, 01 Mar 2021)

  Changed paths:
    M Documentation/topics/dpdk/qos.rst
    M vswitchd/vswitch.xml

  Log Message:
  -----------
  Documentation: Fix DPDK qos example.

Fix the example use case based on the decription.
EIR and CIR are measured in bytes/sec and considered 64-byte
IP packets size withtout 14-byte Ethernet header.
So fix the 1000pps example by: (64 - 14) * 1000 = 50,000
If the frame includes 4-byte FCS header, then it's
(64 - 14 - 4) * 1000 = 46,000

Fixes: e61bdffc2a98 ("netdev-dpdk: Add new DPDK RFC 4115 egress policer")
Signed-off-by: William Tu <u9012063 at gmail.com>
Acked-by: Eelco Chaudron <echaudro at redhat.com>
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>


Compare: https://github.com/openvswitch/ovs/compare/338285cb08da...a7ffe403fee3


More information about the git mailing list