[ovs-dev] [PATCH v5] ovsdb raft: Sync commit index to followers without delay.

Mon Mar 25 23:24:11 UTC 2019

On Mon, Mar 25, 2019 at 2:00 PM Ben Pfaff <blp at ovn.org> wrote:
>
> On Sat, Mar 23, 2019 at 09:44:26AM -0700, Han Zhou wrote:
> > From: Han Zhou <hzhou8 at ebay.com>
> >
> > When update is requested from follower, the leader sends AppendRequest
> > to all followers and wait until AppendReply received from majority, and
> > then it will update commit index - the new entry is regarded as committed
> > in raft log. However, this commit will not be notified to followers
> > (including the one initiated the request) until next heartbeat (ping
> > timeout), if no other pending requests. This results in long latency
> > for updates made through followers, especially when a batch of updates
> > are requested through the same follower.
>
> The tests pass now, but each one of them ends up with several ovn-sbctl
> and ovsdb-server processes all trying to use 100% of a CPU.  If I run
> the tests with 10-way parallelism, the load average goes above 100.
> Surely something has to be wrong in the implementation here.
>
> Each of the ovn-sbctl processes is trying to push through only a single
> transaction; after that, it exits.  If we think of ovsdb-server as
> giving each of its clients one chance to execute an RPC in round-robin
> order, which is approximately correct, then one of those transactions
> should succeed per round.  I don't understand why, if this model is
> correct, the ovn-sbctls would burn so much CPU.  If the model is wrong,
> then we need to understand why it is wrong and how we can fix it.  Maybe
> the ovn-sbctl processes are retrying blindly without waiting for an
> update from the server; if so, that's a bug and it should be fixed.

Hi Ben, to make the test case more effective, in v4/v5 I enlarged the
size of each transaction 50 times, specified by the var "n3". This is
just to create the load in test and slow down the transaction
execution, to ensure there are enough parallel requests ongoing, and
to ensure we can test the server membership changes during this
period. Without this change (in test case only), the CPU was low, but
it just making the torture test not torturing at all (either skipped
because the execution was too fast, or if put sleep there it loses the
parallelism). And the test case change was proved to be more effective
- it found an existing bug, as mentioned in the notes.

If your question is why increasing the size of transaction causing
high CPU load, it seems expected if we see what's being done by each
client:
- download initial data, which increases as more and more transaction
executed, each adding 50 entries (10 * 5 * 50 = 2500), later clients
will take more CPU because of this.
- send transaction with operations: wait + update, which has double
size of the original data, plus newly added 50 entries.
- there are some chances that clients got conflict update (wait
operation fails), which triggers retry. This doesn't happen a lot (by
checking the logs), and I think it is expected due to the purpose of
the testing - to increase parallelism.
- process updates, at least once for its own update, but also could
process other's update. (This step is not CPU costly since each update
notification is not big)

So I think increasing the size of transaction causing higher CPU is
expected due to these operations, considering the JsonRPC cost. I ran
the tests a lot a times and I didn't observe any client occupying CPU
for long time - there were short spikes (not 100%) shown by "top", but
I never see anyone there twice. If you see any behavior like it is
continously retrying, could you please share the x-y.log file of that
client?

I agree there could be improvement to the RPC performance of
ovsdb/IDL, but it is not related to this patch (or maybe we can say
the test in this patch utilized this slowness).

Thanks,
Han