[ovs-discuss] On TCP_CRR test setup in "Accelerating Open vSwitch to “Ludicrous Speed”"

Ben Pfaff blp at nicira.com
Wed Jul 15 05:30:17 UTC 2015


On Wed, Jul 15, 2015 at 11:09:43AM +0800, Yousong Zhou wrote:
> On 15 July 2015 at 06:12, Ben Pfaff <blp at nicira.com> wrote:
> > On Fri, Jul 10, 2015 at 08:54:03PM +0800, Yousong Zhou wrote:
> >> On 8 July 2015 at 21:39, Yousong Zhou <yszhou4tech at gmail.com> wrote:
> >> > Hello, list
> >> >
> >> > I am doing some performance tests for the preparation of upgrading
> >> > Open vSwitch from 1.11.0 to 2.3.2.  However, with TCP_CRR, I can only
> >> > achieve about 130k tps (last time I got only 40k because of a .debug
> >> > type kernel), not even close to the reported 680k from the blog post
> >> > [0].  I also found other available reports [1, 2] but those results
> >> > were even worse and not consistent with each other.
> >> >
> >>
> >> Hi, I just found the 680k tps TCP_CRR test result in the nsdi2015
> >> paper "The design and implementation of Open vSwitch" [1].  Hmm, the
> >> 120k tps in section "Cache layer performance" is similar to what I
> >> have got.  But how were they boosted to 688k for both Linux bridge and
> >> Open vSwitch in section "Comparison to in-kernel switch"?
> >
> > I think that the configuration we used is described in that paper under
> > "Cache layer performance":
> >
> >     In all following tests, Open vSwitch ran on a Linux server with two
> >     8-core, 2.0 GHz Xeon processors and two Intel 10-Gb NICs. To generate
> >     many connections, we used Netperf’s TCP CRR test [25], which repeatedly
> >     establishes a TCP connection, sends and receives one byte of traffic,
> >     and disconnects.  The results are reported in transactions per second
> >     (tps).  Netperf only makes one connection attempt at a time, so we ran
> >     400 Netperf sessions in parallel and reported the sum.
> 
> I already read that part.  The hardware configuration seems to be
> comparable [1].  In our tests 32 netperf instances were more or less
> enough to get the 130k tps and we increased the number of netperf
> pairs to 127 with no obvious improvement.
> 
> But when we read that the performance can be as high as 680k with both
> Linux bridge and Open vSwitch, we thought there should be something we
> overlooked, e.g. system parameters tuning, kernel configuration.
> 
> I noticed that in section "Cache layer performance" the best result
> was about 120k tps with all optimisations on.  But the result was more
> than 680k tps in section "Comparison to in-kernel switch".  How this
> boost was done?

I am surprised that the paper doesn't seem to clearly address state the
flow table used in each case.  Based on the numbers of datapath flows
and masks listed in Table 1, though, I would guess that the OpenFlow
flow table in this case was sophisticated (that is, several OpenFlow
tables with nontrivial classifications).  I don't recall the exact
experiment setup, but I suspect that we were using the VMware NVP (aka
NSX-MH) controller.

For the comparison to the Linux bridge, though, I imagine that for a
fair comparison we used a simple Open vSwitch flow table (one that just
executes the OpenFlow "normal switching" action), since that is close in
effect to what the Linux bridge does.

If I recall correctly, the main problem we had with that comparison was
test load generation.  I don't think that we maxed out the performance;
with more or more efficient test load generation machines, I think it
could be faster.



More information about the discuss mailing list