[ovs-dev] [PATCH v5] ovsdb raft: Sync commit index to followers without delay.

Han Zhou zhouhan at gmail.com
Wed Apr 10 01:48:09 UTC 2019


On Mon, Apr 8, 2019 at 4:08 PM Han Zhou <zhouhan at gmail.com> wrote:
>
>
>
> On Wed, Mar 27, 2019 at 4:10 PM Han Zhou <zhouhan at gmail.com> wrote:
> >
> > On Mon, Mar 25, 2019 at 4:24 PM Han Zhou <zhouhan at gmail.com> wrote:
> > >
> > > On Mon, Mar 25, 2019 at 2:00 PM Ben Pfaff <blp at ovn.org> wrote:
> > > >
> > > > On Sat, Mar 23, 2019 at 09:44:26AM -0700, Han Zhou wrote:
> > > > > From: Han Zhou <hzhou8 at ebay.com>
> > > > >
> > > > > When update is requested from follower, the leader sends
AppendRequest
> > > > > to all followers and wait until AppendReply received from
majority, and
> > > > > then it will update commit index - the new entry is regarded as
committed
> > > > > in raft log. However, this commit will not be notified to
followers
> > > > > (including the one initiated the request) until next heartbeat
(ping
> > > > > timeout), if no other pending requests. This results in long
latency
> > > > > for updates made through followers, especially when a batch of
updates
> > > > > are requested through the same follower.
> > > >
> > > > The tests pass now, but each one of them ends up with several
ovn-sbctl
> > > > and ovsdb-server processes all trying to use 100% of a CPU.  If I
run
> > > > the tests with 10-way parallelism, the load average goes above 100.
> > > > Surely something has to be wrong in the implementation here.
> > > >
> > > > Each of the ovn-sbctl processes is trying to push through only a
single
> > > > transaction; after that, it exits.  If we think of ovsdb-server as
> > > > giving each of its clients one chance to execute an RPC in
round-robin
> > > > order, which is approximately correct, then one of those
transactions
> > > > should succeed per round.  I don't understand why, if this model is
> > > > correct, the ovn-sbctls would burn so much CPU.  If the model is
wrong,
> > > > then we need to understand why it is wrong and how we can fix it.
Maybe
> > > > the ovn-sbctl processes are retrying blindly without waiting for an
> > > > update from the server; if so, that's a bug and it should be fixed.
> > >
> > > Hi Ben, to make the test case more effective, in v4/v5 I enlarged the
> > > size of each transaction 50 times, specified by the var "n3". This is
> > > just to create the load in test and slow down the transaction
> > > execution, to ensure there are enough parallel requests ongoing, and
> > > to ensure we can test the server membership changes during this
> > > period. Without this change (in test case only), the CPU was low, but
> > > it just making the torture test not torturing at all (either skipped
> > > because the execution was too fast, or if put sleep there it loses the
> > > parallelism). And the test case change was proved to be more effective
> > > - it found an existing bug, as mentioned in the notes.
> > >
> > > If your question is why increasing the size of transaction causing
> > > high CPU load, it seems expected if we see what's being done by each
> > > client:
> > > - download initial data, which increases as more and more transaction
> > > executed, each adding 50 entries (10 * 5 * 50 = 2500), later clients
> > > will take more CPU because of this.
> > > - send transaction with operations: wait + update, which has double
> > > size of the original data, plus newly added 50 entries.
> > > - there are some chances that clients got conflict update (wait
> > > operation fails), which triggers retry. This doesn't happen a lot (by
> > > checking the logs), and I think it is expected due to the purpose of
> > > the testing - to increase parallelism.
> > > - process updates, at least once for its own update, but also could
> > > process other's update. (This step is not CPU costly since each update
> > > notification is not big)
> > >
> > > So I think increasing the size of transaction causing higher CPU is
> > > expected due to these operations, considering the JsonRPC cost. I ran
> > > the tests a lot a times and I didn't observe any client occupying CPU
> > > for long time - there were short spikes (not 100%) shown by "top", but
> > > I never see anyone there twice. If you see any behavior like it is
> > > continously retrying, could you please share the x-y.log file of that
> > > client?
> > >
> > > I agree there could be improvement to the RPC performance of
> > > ovsdb/IDL, but it is not related to this patch (or maybe we can say
> > > the test in this patch utilized this slowness).
> > >
> >
> > Hi Ben, does my explain make sense? To prove that the CPU cost is
> > expected (at least nothing caused by this patch), I compared with
> > master branch using leader_only = true for the ovn-sbctl command in
> > torture test, and execute same updated torture test case (i.e. same
> > number of n1 = 10, n2 = 5, and n3 = 50). When leader_only = true,
> > there is no latency for transaction execution even without this patch,
> > because no idle waiting for syncing from leader to followers, so all
> > transactions are executed full speed, and the performance is
> > comparable.
> >
> > time make check TESTSUITEFLAGS="2529"
> > (2529: OVSDB 3-server torture test - kill/restart leader)
> >
> > - *Master branch* with new test case, and use leader_only=true:
> > real    0m15.448s
> > user    0m15.384s
> > sys    0m13.687s
> >
> > - With this patch (leader_only=false):
> > real    0m13.904s
> > user    0m15.578s
> > sys    0m11.817s
> >
> > - With this patch but leader_only=true (exactly same test case as for
> > *Master branch*):
> > real    0m14.419s
> > user    0m14.972s
> > sys    0m12.467s
> >
> > As we can see, there is no obvious difference. During the tests I was
> > also watch "top" output and I didn't notice any difference either. The
> > "leader_only" update on top of this patch is simply:
> > ------8><------------------------------------------><8-----------
> > diff --git a/tests/ovsdb-cluster.at b/tests/ovsdb-cluster.at
> > index 5550a19..94ec9c7 100644
> > --- a/tests/ovsdb-cluster.at
> > +++ b/tests/ovsdb-cluster.at
> > @@ -140,7 +140,7 @@ ovsdb|WARN|schema: changed 2 columns in
> > 'OVN_Southbound' database from ephemeral
> >               for k in $(seq $n3); do
> >                   txn="$txn -- add SB_Global . external_ids
$i-$j-$k=$i-$j-$k"
> >               done
> > -             run_as "ovn-sbctl($i-$j)" ovn-sbctl
> > "-vPATTERN:console:ovn-sbctl($i-$j)|%D{%H:%M:%S}|%05N|%c|%p|%m"
> > --log-file=$i-$j.log -vfile -vsyslog:off -vtimeval:off --timeout=120
> > --no-leader-only $txn
> > +             run_as "ovn-sbctl($i-$j)" ovn-sbctl
> > "-vPATTERN:console:ovn-sbctl($i-$j)|%D{%H:%M:%S}|%05N|%c|%p|%m"
> > --log-file=$i-$j.log -vfile -vsyslog:off -vtimeval:off --timeout=120
> > $txn
> >               status=$?
> >               if test $status != 0; then
> >                   echo "$i-$j exited with status $status" >
$i-$j:$status
> >
--------------------------------------------------------------------------
> > Please try it and let me know if you still have concerns.
> >
> > Thanks,
> > Han
>
> Hi Ben, this one is small but important, just waiting for your final
approval. Sorry for pushing this, but I am also working on UTs for
different failure scenarios of Raft, this patch helps for some of the test
cases, so I hope this can be merged before I send out the UT series.
>
> Thanks,
> Han

I embedded this one as the first patch in a new series:
https://patchwork.ozlabs.org/project/openvswitch/list/?series=101873
The new series is mainly for raft UT and one bug fix, which is not closely
related to but depends on this patch.


More information about the dev mailing list