[ovs-discuss] ovsdb behavior under ovn management plane scaling

Russell Bryant russell at ovn.org
Thu Jan 28 20:12:34 UTC 2016


On 01/28/2016 02:35 PM, Ryan Moats wrote:
> As promised on today's OVN IRC meeting...
> 
> We're in the process of testing OVN at scale as well as looking at the
> performance of the various planes of OVN.
> 
> One of the initial management plane scaling tests is to
> 
> 1. Create OpenStack external etwork x1 with IPv4 subnet xs1 (we aren't
> testing the data plane, so we don't have to worry about NAT)
> 2. Replicate the following template 400 times
> a. Create OpenStack project p(i) and then in p(i):
> b. create a network n1 and assign an IPv4 subnet s1 to it
> c. create a router, assign an interface to s1 and set the router's
> external gateway to xs1
> d. launch a compute instance i1, attached to n1
> 
> [in a network line diagram: i1 -- n1(s1) -- r1 -- x1(xs1)]
> 
> When doing this with on a four VM cloud (each VM has four CPU cores and
> 16 GB of memory), we are seeing the steps "assign an interface to s1"
> and "set the router's external gateway to xs1" take longer amounts of
> time as the number of templates increases.
> 
> Looking at the ovsdb server logs during this test, one can break down
> OVN_Northbound operations into three buckets:
> (1) pure insert operations
> (2) operations that combine and insert and an update
> (3) pure update operations
> 
> Data from buckets (1) and (2) were combined and plotted in
> http://ibin.co/2V2VVrQYDKyI - The vertical axis is in seconds, and the
> horizontal axis is "transaction during the test", so while I can't tell
> you exactly where in the test a particular point occurred, one can look
> at the graph and say with some level
> of confidence that inserting rows into a table isn't all that expensive
> an operation.
> 
> Data from bucket (3) was plotted as http://ibin.co/2V2Vjb9rVqUK - Again,
> the vertical axis is in seconds, and the horizontal axis is "transaction
> during the test". All of these operations are updates to port state in
> the Logical_Ports table and I read this plot as saying that as we have
> more and more ports in the Logical_Ports table, update operations can
> take longer and longer. Given the OVN scale I am looking at (the current
> test cloud is 125 hypervisors), any linearity in time (even via
> increased variability) is something I'd like to see if we can improve...

Thanks a lot for your work and for sharing your results.

When it comes to working on OVN performance, I'd like to make sure we're
working on the most important things.  Analysis and optimization work
can take a lot of time, so it's best if we focus on the bottlenecks.

My understanding based on the IBM scale testing so far was that it
appeared ovn-controller was the first bottleneck hit.  We also dug into
it and determined it was logical flow processing taking the bulk of the
time.  I was under the impression that ovsdb-server was actually doing
great.  Is your experience different?  How do your results relate to the
performance of the rest of the system?

Thanks,

-- 
Russell Bryant



More information about the discuss mailing list