[ovs-dev] [OVN] Potential scalability bug in ovn-northd on creating and binding large number of lports

Russell Bryant russell at ovn.org
Fri Jun 24 13:34:17 UTC 2016


On Fri, Jun 24, 2016 at 8:12 AM, Ryan Moats <rmoats at us.ibm.com> wrote:

> "dev" <dev-bounces at openvswitch.org> wrote on 06/23/2016 12:56:59 PM:
>
> > From: Hui Kang/Watson/IBM at IBMUS
> > To: dev at openvswitch.org
> > Date: 06/23/2016 12:57 PM
> > Subject: [ovs-dev] [OVN] Potential scalability bug in ovn-northd on
> > creating and binding large number of lports
> > Sent by: "dev" <dev-bounces at openvswitch.org>
> >
> >
> > Hi,
> > In our scalability test for OVN, we observed an in-scalable behaviour of
> > the
> > ovn-northd process: the time binding a logical port increases as # of
> large
> > port increasing, regardless of whether logical ports belong to the same
> > logical
> > switch. The most suspicious function in causing this issue is build_ports
> ()
> > called by ovnnb_db_run() [1], as described below.
> >
> > Test description:
> >     step 1: Create 6 logical switches. For each logical switch, create
> 200
> >             logical ports.
> >     step 2: Bind 200 lports from each logical switch on an OVN chassis.
> >
> > Test results for step 2:
> >
> >     # of ports  |  # of ovn_ports            |  Cpu cycle spent in
> |
> >                 | allocated in build_port()  | built_port(), in million
> |
> >             200 |                        200 |                     25
> |
> >             400 |                        400 |                     50
> |
> >             600 |                        600 |                     75
> |
> >             800 |                        800 |                     93
> |
> >            1000 |                       1000 |                    108
> |
> >            1200 |                       1200 |                    125
> |
> >
> > We see that on binding each logical port on a hypervisor,
> > join_logical_ports()
> > in build_port allocates the number of (struct ovn_port) for all the
> > existing
> > ports in the southbound database [2], which causes the accumulated CPU
> > cycles.
> >
> > My question is whether there is any particular reason to allocate that
> > number
> > of (struct ovn_port)? It seems to me there is room in this code to
> optimize
> > for performance. Thanks.
> >
> > - Hui
> >
> >
> > [1]
> >
>
> https://github.com/openvswitch/ovs/blob/master/ovn/northd/ovn-northd.c#L2529
>
> > [2]
> >
> https://github.com/openvswitch/ovs/blob/master/ovn/northd/ovn-northd.c#L571
>
> Hui, ovn-northd's current design is that it processes the entire ovn nb db
> each computational cycle, so I would expect to see what you are seeing,
> which
> is the argument for converting ovn-northd to an incremental processing
> model.
>
> Ben, Justin, Yusheng can one of you talk to an ETA for when the nlog
> ovn-northd code base will start to land in the review queue?  That will
> provide input on whether doing an interim patch series is worth the effort
> or not...
>

Incremental processing is one angle.  The other we need to have on the
roadmap is some type of sharding.  This is important for HA purposes, as
well.  We need to be able to run multiple instances of ovn-northd that each
only operate on a subset of the full data set.

-- 
Russell Bryant



More information about the dev mailing list