[ovs-dev] [PATCH 2/2] ovsdb raft: Precheck prereq before proposing commit.

Ben Pfaff blp at ovn.org
Mon Mar 4 21:31:41 UTC 2019


On Fri, Mar 01, 2019 at 10:56:37AM -0800, Han Zhou wrote:
> From: Han Zhou <hzhou8 at ebay.com>
> 
> In current OVSDB Raft design, when there are multiple transactions
> pending, either from same server node or different nodes in the
> cluster, only the first one can be successful at once, and following
> ones will fail at the prerequisite check on leader node, because
> the first one will update the expected prerequisite eid on leader
> node, and the prerequisite used for proposing a commit has to be
> committed eid, so it is not possible for a node to use the latest
> prerequisite expected by the leader to propose a commit until the
> lastest transaction is committed by the leader and updated the
> committed_index on the node.
> 
> Current implementation proposes the commit as soon as the transaction
> is requested by the client, which results in continously retry which
> causes high CPU load and waste.
> 
> Particularly, even if all clients are using leader_only to connect to
> only the leader, the prereq check failure still happens a lot when
> a batch of transactions are pending on the leader node - the leader
> node proposes a batch of commits using the same committed eid as
> prerequisite and it updates the expected prereq as soon as the first
> one is in progress, but it needs time to append to followers and wait
> until majority replies to update the committed_index, which results in
> continously useless retries of the following transactions proposed by
> the leader itself.
> 
> This patch doesn't change the design but simplely pre-checks if current
> eid is same as prereq, before proposing the commit, to avoid waste of
> CPU cycles, for both leader and followers. When clients use leader_only
> mode, this patch completely eliminates the prereq check failures.
> 
> In scale test of OVN with 1k HVs and creating and binding 10k lports,
> the patch resulted in 90% CPU cost reduction on leader and >80% CPU cost
> reduction on followers. (The test was with leader election base time
> set to 10000ms, because otherwise the test couldn't complete because
> of the frequent leader re-election.)
> 
> This is just one of the related performance problems of the prereq
> checking mechanism dicussed at:
> 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2019-February/048243.html
> Signed-off-by: Han Zhou <hzhou8 at ebay.com>

I *think* that this patch is going to be unreliable.  It appears to me
that what it does is wait until the current eid presented by the raft
storage is the one that we want.  But I don't think it's guaranteed that
that will ever happen.  What if we lose the raft connection, reconnect,
and skip past that particular eid?  I think in that kind of a case we'd
keep the trigger around forever and never discard it.


More information about the dev mailing list