[ovs-dev] [OVN] Potential scalability bug in ovn-northd on creating and binding large number of lports

Hui Kang kangh at us.ibm.com
Sat Jun 25 03:28:47 UTC 2016



Ryan Moats/Omaha/IBM wrote on 06/24/2016 10:51:10 PM:

> From: Ryan Moats/Omaha/IBM
> To: Hui Kang/Watson/IBM at IBMUS
> Cc: dev at openvswitch.org
> Date: 06/24/2016 10:51 PM
> Subject: Re: [ovs-dev] [OVN] Potential scalability bug in ovn-northd
> on creating and binding large number of lports
>
> "dev" <dev-bounces at openvswitch.org> wrote on 06/23/2016 12:56:59 PM:
>
> > From: Hui Kang/Watson/IBM at IBMUS
> > To: dev at openvswitch.org
> > Date: 06/23/2016 12:57 PM
> > Subject: [ovs-dev] [OVN] Potential scalability bug in ovn-northd on
> > creating and binding large number of lports
> > Sent by: "dev" <dev-bounces at openvswitch.org>
> >
> >
> > Hi,
> > In our scalability test for OVN, we observed an in-scalable behaviour
of
> > the
> > ovn-northd process: the time binding a logical port increases as # of
large
> > port increasing, regardless of whether logical ports belong to the same
> > logical
> > switch. The most suspicious function in causing this issue is
build_ports()
> > called by ovnnb_db_run() [1], as described below.
> >
> > Test description:
> >     step 1: Create 6 logical switches. For each logical switch, create
200
> >             logical ports.
> >     step 2: Bind 200 lports from each logical switch on an OVN chassis.
> >
> > Test results for step 2:
> >
> >     # of ports  |  # of ovn_ports            |  Cpu cycle spent in
|
> >                 | allocated in build_port()  | built_port(), in million
|
> >             200 |                        200 |                     25
|
> >             400 |                        400 |                     50
|
> >             600 |                        600 |                     75
|
> >             800 |                        800 |                     93
|
> >            1000 |                       1000 |                    108
|
> >            1200 |                       1200 |                    125
|
> >
> > We see that on binding each logical port on a hypervisor,
> > join_logical_ports()
> > in build_port allocates the number of (struct ovn_port) for all the
> > existing
> > ports in the southbound database [2], which causes the accumulated CPU
> > cycles.
> >
> > My question is whether there is any particular reason to allocate that
> > number
> > of (struct ovn_port)? It seems to me there is room in this code to
optimize
> > for performance. Thanks.
> >
> > - Hui
> >
> >
> > [1]
> >
https://github.com/openvswitch/ovs/blob/master/ovn/northd/ovn-northd.c#L2529

> > [2]
> >
https://github.com/openvswitch/ovs/blob/master/ovn/northd/ovn-northd.c#L571
>
> Hui-
>
> Since it looks like simple short term optimization of northd is a
> "Good Thing" (TM) [1], Is the above the "long pole in the tent" or
> are there other hot spots of similar brightness?
>
> I'm planning at looking at what might be possible, and want to know if
> there are other spots that should be included in the pass.

Hi, Ryan,
Thanks for your attention to this problem. As of now, our profiling data
shows
that the build_ports (including its callee such as join_logical_port) is
the
hot spot.

We can discuss on IRC or slack about how to optimize.

Regards,
- Hui

>
> Thanks in advance,
> Ryan
>
> [1] http://openvswitch.org/pipermail/dev/2016-June/073574.html



More information about the dev mailing list