[ovs-git] [openvswitch/ovs] a27bdf: ovsdb-idl.c: Allows retry even when using a single...

Han Zhou noreply at github.com
Wed Aug 21 20:48:36 UTC 2019


  Branch: refs/heads/branch-2.12
  Home:   https://github.com/openvswitch/ovs
  Commit: a27bdf888520f83e4a82832ec0efd161f9df2c6a
      https://github.com/openvswitch/ovs/commit/a27bdf888520f83e4a82832ec0efd161f9df2c6a
  Author: Han Zhou <hzhou8 at ebay.com>
  Date:   2019-08-21 (Wed, 21 Aug 2019)

  Changed paths:
    M lib/ovsdb-idl.c
    M tests/ovsdb-cluster.at
    M tests/test-ovsdb.c

  Log Message:
  -----------
  ovsdb-idl.c: Allows retry even when using a single remote.

When clustered mode is used, the client needs to retry connecting
to new servers when certain failures happen. Today it is allowed to
retry new connection only if multiple remotes are used, which prevents
using LB VIP with clustered nodes. This patch makes sure the retry
logic works when using LB VIP: although same IP is used for retrying,
the LB can actually redirect the connection to a new node.

Signed-off-by: Han Zhou <hzhou8 at ebay.com>
Signed-off-by: Ben Pfaff <blp at ovn.org>


  Commit: fd8b1feed609cc7d4654fb8f317cef6bd0636385
      https://github.com/openvswitch/ovs/commit/fd8b1feed609cc7d4654fb8f317cef6bd0636385
  Author: Han Zhou <hzhou8 at ebay.com>
  Date:   2019-08-21 (Wed, 21 Aug 2019)

  Changed paths:
    M ovsdb/raft-private.h
    M ovsdb/raft.c
    M tests/ovsdb-cluster.at

  Log Message:
  -----------
  raft.c: Stale leader should disconnect from cluster.

As mentioned in RAFT paper, section 6.2:

Leaders: A server might be in the leader state, but if it isn’t the current
leader, it could be needlessly delaying client requests. For example, suppose a
leader is partitioned from the rest of the cluster, but it can still
communicate with a particular client. Without additional mechanism, it could
delay a request from that client forever, being unable to replicate a log entry
to any other servers. Meanwhile, there might be another leader of a newer term
that is able to communicate with a majority of the cluster and would be able to
commit the client’s request. Thus, a leader in Raft steps down if an election
timeout elapses without a successful round of heartbeats to a majority of its
cluster; this allows clients to retry their requests with another server.

Reported-by: Aliasgar Ginwala <aginwala at ebay.com>
Tested-by: Aliasgar Ginwala <aginwala at ebay.com>
Signed-off-by: Han Zhou <hzhou8 at ebay.com>
Signed-off-by: Ben Pfaff <blp at ovn.org>


  Commit: d61ce8bc953115cb662044ea1de7f8627d5c9305
      https://github.com/openvswitch/ovs/commit/d61ce8bc953115cb662044ea1de7f8627d5c9305
  Author: Han Zhou <hzhou8 at ebay.com>
  Date:   2019-08-21 (Wed, 21 Aug 2019)

  Changed paths:
    M ovsdb/raft.c
    M tests/ovsdb-cluster.at

  Log Message:
  -----------
  raft.c: Set candidate_retrying if no leader elected since last election.

candiate_retrying is used to determine if the current node is disconnected
from the cluster when the node is in candiate role. However, a node
can flap between candidate and follower role before a leader is elected
when majority of the cluster is down, so is_connected() will flap, too, which
confuses clients.

This patch avoids the flapping with the help of a new member had_leader,
so that if no leader was elected since last election, we know we are
still retrying, and keep as disconnected from the cluster.

Signed-off-by: Han Zhou <hzhou8 at ebay.com>
Signed-off-by: Ben Pfaff <blp at ovn.org>


Compare: https://github.com/openvswitch/ovs/compare/689a519c2d21...d61ce8bc9531


More information about the git mailing list