[ovs-git] [openvswitch/ovs] ca367f: ovsdb-idl.c: Allows retry even when using a single...

Wed Aug 21 18:58:40 UTC 2019

  Branch: refs/heads/master
  Home:   https://github.com/openvswitch/ovs
  Commit: ca367fa5f8bb40a5d0b695df9ca25c26974b792f
      https://github.com/openvswitch/ovs/commit/ca367fa5f8bb40a5d0b695df9ca25c26974b792f
  Author: Han Zhou <hzhou8 at ebay.com>
  Date:   2019-08-21 (Wed, 21 Aug 2019)

  Changed paths:
    M lib/ovsdb-idl.c
    M tests/ovsdb-cluster.at
    M tests/test-ovsdb.c

  Log Message:
  -----------
  ovsdb-idl.c: Allows retry even when using a single remote.

When clustered mode is used, the client needs to retry connecting
to new servers when certain failures happen. Today it is allowed to
retry new connection only if multiple remotes are used, which prevents
using LB VIP with clustered nodes. This patch makes sure the retry
logic works when using LB VIP: although same IP is used for retrying,
the LB can actually redirect the connection to a new node.

Signed-off-by: Han Zhou <hzhou8 at ebay.com>
Signed-off-by: Ben Pfaff <blp at ovn.org>

  Commit: 89771c1e65304b815ec01ec0f11affac01d62169
      https://github.com/openvswitch/ovs/commit/89771c1e65304b815ec01ec0f11affac01d62169
  Author: Han Zhou <hzhou8 at ebay.com>
  Date:   2019-08-21 (Wed, 21 Aug 2019)

  Changed paths:
    M ovsdb/raft-private.h
    M ovsdb/raft.c
    M tests/ovsdb-cluster.at

  Log Message:
  -----------
  raft.c: Stale leader should disconnect from cluster.

As mentioned in RAFT paper, section 6.2:

Leaders: A server might be in the leader state, but if it isn’t the current
leader, it could be needlessly delaying client requests. For example, suppose a
leader is partitioned from the rest of the cluster, but it can still
communicate with a particular client. Without additional mechanism, it could
delay a request from that client forever, being unable to replicate a log entry
to any other servers. Meanwhile, there might be another leader of a newer term
that is able to communicate with a majority of the cluster and would be able to
commit the client’s request. Thus, a leader in Raft steps down if an election
timeout elapses without a successful round of heartbeats to a majority of its
cluster; this allows clients to retry their requests with another server.

Reported-by: Aliasgar Ginwala <aginwala at ebay.com>
Tested-by: Aliasgar Ginwala <aginwala at ebay.com>
Signed-off-by: Han Zhou <hzhou8 at ebay.com>
Signed-off-by: Ben Pfaff <blp at ovn.org>

  Commit: 923f01cad678228224ae4fe86466e2f61ab2c9d0
      https://github.com/openvswitch/ovs/commit/923f01cad678228224ae4fe86466e2f61ab2c9d0
  Author: Han Zhou <hzhou8 at ebay.com>
  Date:   2019-08-21 (Wed, 21 Aug 2019)

  Changed paths:
    M ovsdb/raft.c
    M tests/ovsdb-cluster.at

  Log Message:
  -----------
  raft.c: Set candidate_retrying if no leader elected since last election.

candiate_retrying is used to determine if the current node is disconnected
from the cluster when the node is in candiate role. However, a node
can flap between candidate and follower role before a leader is elected
when majority of the cluster is down, so is_connected() will flap, too, which
confuses clients.

This patch avoids the flapping with the help of a new member had_leader,
so that if no leader was elected since last election, we know we are
still retrying, and keep as disconnected from the cluster.

Signed-off-by: Han Zhou <hzhou8 at ebay.com>
Signed-off-by: Ben Pfaff <blp at ovn.org>

  Commit: 8e35461419a63fc5ca1e492994d08c02295137e1
      https://github.com/openvswitch/ovs/commit/8e35461419a63fc5ca1e492994d08c02295137e1
  Author: Han Zhou <hzhou8 at ebay.com>
  Date:   2019-08-21 (Wed, 21 Aug 2019)

  Changed paths:
    M Documentation/ref/ovsdb.5.rst
    M ovsdb/ovsdb-server.1.in
    M ovsdb/raft-private.c
    M ovsdb/raft-private.h
    M ovsdb/raft-rpc.h
    M ovsdb/raft.c
    M tests/ovsdb-cluster.at

  Log Message:
  -----------
  ovsdb raft: Support leader election time change online.

A new unixctl command cluster/change-election-timer is implemented to
change leader election timeout base value according to the scale needs.

The change takes effect upon consensus of the cluster, implemented through
the append-request RPC.  A new field "election-timer" is added to raft log
entry for this purpose.

Signed-off-by: Han Zhou <hzhou8 at ebay.com>
Signed-off-by: Ben Pfaff <blp at ovn.org>

Compare: https://github.com/openvswitch/ovs/compare/f4bef8b39a36...8e35461419a6