[ovs-discuss] ovsdb behavior under ovn management plane scaling

Ryan Moats rmoats at us.ibm.com
Thu Jan 28 20:36:40 UTC 2016



Russell Bryant <russell at ovn.org> wrote on 01/28/2016 02:12:34 PM:

> From: Russell Bryant <russell at ovn.org>
> To: Ryan Moats/Omaha/IBM at IBMUS, discuss at openvswitch.org
> Date: 01/28/2016 02:12 PM
> Subject: Re: [ovs-discuss] ovsdb behavior under ovn management plane
scaling
>
> On 01/28/2016 02:35 PM, Ryan Moats wrote:
> > As promised on today's OVN IRC meeting...
> >
> > We're in the process of testing OVN at scale as well as looking at the
> > performance of the various planes of OVN.
> >
> > One of the initial management plane scaling tests is to
> >
> > 1. Create OpenStack external etwork x1 with IPv4 subnet xs1 (we aren't
> > testing the data plane, so we don't have to worry about NAT)
> > 2. Replicate the following template 400 times
> > a. Create OpenStack project p(i) and then in p(i):
> > b. create a network n1 and assign an IPv4 subnet s1 to it
> > c. create a router, assign an interface to s1 and set the router's
> > external gateway to xs1
> > d. launch a compute instance i1, attached to n1
> >
> > [in a network line diagram: i1 -- n1(s1) -- r1 -- x1(xs1)]
> >
> > When doing this with on a four VM cloud (each VM has four CPU cores and
> > 16 GB of memory), we are seeing the steps "assign an interface to s1"
> > and "set the router's external gateway to xs1" take longer amounts of
> > time as the number of templates increases.
> >
> > Looking at the ovsdb server logs during this test, one can break down
> > OVN_Northbound operations into three buckets:
> > (1) pure insert operations
> > (2) operations that combine and insert and an update
> > (3) pure update operations
> >
> > Data from buckets (1) and (2) were combined and plotted in
> > http://ibin.co/2V2VVrQYDKyI - The vertical axis is in seconds, and the
> > horizontal axis is "transaction during the test", so while I can't tell
> > you exactly where in the test a particular point occurred, one can look
> > at the graph and say with some level
> > of confidence that inserting rows into a table isn't all that expensive
> > an operation.
> >
> > Data from bucket (3) was plotted as http://ibin.co/2V2Vjb9rVqUK -
Again,
> > the vertical axis is in seconds, and the horizontal axis is
"transaction
> > during the test". All of these operations are updates to port state in
> > the Logical_Ports table and I read this plot as saying that as we have
> > more and more ports in the Logical_Ports table, update operations can
> > take longer and longer. Given the OVN scale I am looking at (the
current
> > test cloud is 125 hypervisors), any linearity in time (even via
> > increased variability) is something I'd like to see if we can
improve...
>
> Thanks a lot for your work and for sharing your results.
>
> When it comes to working on OVN performance, I'd like to make sure we're
> working on the most important things.  Analysis and optimization work
> can take a lot of time, so it's best if we focus on the bottlenecks.
>
> My understanding based on the IBM scale testing so far was that it
> appeared ovn-controller was the first bottleneck hit.  We also dug into
> it and determined it was logical flow processing taking the bulk of the
> time.  I was under the impression that ovsdb-server was actually doing
> great.  Is your experience different?  How do your results relate to the
> performance of the rest of the system?

Yes, that was the first bottleneck we hit and we've taken the work that led
to your
RFC and gone looking for the next bottleneck, which now appears to be
communications between
the networking-ovn plugin and ovn-northd.  The first step in that path is
from the
plugin to ovsdb-server, so I view my initial post as one facet of the
problem...

Ryan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://openvswitch.org/pipermail/ovs-discuss/attachments/20160128/96bc07d0/attachment-0002.html>


More information about the discuss mailing list