[ovs-git] [openvswitch/ovs] 78c801: raft-rpc: Fix message format.

Han Zhou noreply at github.com
Fri Mar 6 23:00:29 UTC 2020


  Branch: refs/heads/branch-2.13
  Home:   https://github.com/openvswitch/ovs
  Commit: 78c8011f58daec41ec97440f2e42795699322742
      https://github.com/openvswitch/ovs/commit/78c8011f58daec41ec97440f2e42795699322742
  Author: Han Zhou <hzhou at ovn.org>
  Date:   2020-03-06 (Fri, 06 Mar 2020)

  Changed paths:
    M ovsdb/raft-rpc.c

  Log Message:
  -----------
  raft-rpc: Fix message format.

Signed-off-by: Han Zhou <hzhou at ovn.org>
Signed-off-by: Ben Pfaff <blp at ovn.org>


  Commit: f0c8b44c5832c36989fad78927407fc14e64ce46
      https://github.com/openvswitch/ovs/commit/f0c8b44c5832c36989fad78927407fc14e64ce46
  Author: Han Zhou <hzhou at ovn.org>
  Date:   2020-03-06 (Fri, 06 Mar 2020)

  Changed paths:
    M ovsdb/ovsdb-server.c
    M tests/ovsdb-cluster.at

  Log Message:
  -----------
  ovsdb-server: Don't disconnect clients after raft install_snapshot.

When "schema" field is found in read_db(), there can be two cases:
1. There is a schema change in clustered DB and the "schema" is the new one.
2. There is a install_snapshot RPC happened, which caused log compaction on the
server and the next log is just the snapshot, which always constains "schema"
field, even though the schema hasn't been changed.

The current implementation doesn't handle case 2), and always assume the schema
is changed hence disconnect all clients of the server. It can cause stability
problem when there are big number of clients connected when this happens in
a large scale environment.

Signed-off-by: Han Zhou <hzhou at ovn.org>
Signed-off-by: Ben Pfaff <blp at ovn.org>


  Commit: adc64ab057345f7004c44bf92363b9adda862134
      https://github.com/openvswitch/ovs/commit/adc64ab057345f7004c44bf92363b9adda862134
  Author: Han Zhou <hzhou at ovn.org>
  Date:   2020-03-06 (Fri, 06 Mar 2020)

  Changed paths:
    M ovsdb/raft.c
    M tests/ovsdb-cluster.at

  Log Message:
  -----------
  raft: Fix raft_is_connected() when there is no leader yet.

If there is never a leader known by the current server, it's status
should be "disconnected" to the cluster. Without this patch, when
a server in cluster is restarted, before it successfully connecting
back to the cluster it will appear as connected, which is wrong.

Signed-off-by: Han Zhou <hzhou at ovn.org>
Signed-off-by: Ben Pfaff <blp at ovn.org>


  Commit: 3ae90e1899c5a05148ea1870d9bb4ac3c05e3a19
      https://github.com/openvswitch/ovs/commit/3ae90e1899c5a05148ea1870d9bb4ac3c05e3a19
  Author: Han Zhou <hzhou at ovn.org>
  Date:   2020-03-06 (Fri, 06 Mar 2020)

  Changed paths:
    M ovsdb/ovsdb.c
    M ovsdb/ovsdb.h
    M ovsdb/transaction.c
    M ovsdb/trigger.c

  Log Message:
  -----------
  raft: Avoid busy loop during leader election.

When a server doesn't see a leader yet, e.g. during leader re-election,
if a transaction comes from a client, it will cause 100% CPU busy loop.
With debug log enabled it is like:

2020-02-28T04:04:35.631Z|00059|poll_loop|DBG|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164
2020-02-28T04:04:35.631Z|00062|poll_loop|DBG|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164
2020-02-28T04:04:35.631Z|00065|poll_loop|DBG|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164
2020-02-28T04:04:35.631Z|00068|poll_loop|DBG|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164
2020-02-28T04:04:35.631Z|00071|poll_loop|DBG|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164
2020-02-28T04:04:35.631Z|00074|poll_loop|DBG|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164
2020-02-28T04:04:35.631Z|00077|poll_loop|DBG|wakeup due to 0-ms timeout at ../ovsdb/trigger.c:164
...

The problem is that in ovsdb_trigger_try(), all cluster errors are treated
as temporary error and retry immediately. This patch fixes it by introducing
'run_triggers_now', which tells if a retry is needed immediately. When the
cluster error is with detail 'not leader', we don't immediately retry, but
will wait for the next poll event to trigger the retry. When 'not leader'
status changes, there must be a event, i.e. raft RPC that changes the
status, so the trigger is guaranteed to be triggered, without busy loop.

Signed-off-by: Han Zhou <hzhou at ovn.org>
Signed-off-by: Ben Pfaff <blp at ovn.org>


Compare: https://github.com/openvswitch/ovs/compare/f0de49125707...3ae90e1899c5


More information about the git mailing list