[ovs-discuss] OpenStack profiling with networking-ovn - port creation is slow

Daniel Alvarez Sanchez dalvarez at redhat.com
Thu Feb 15 21:56:34 UTC 2018


On Wed, Feb 14, 2018 at 9:34 PM, Han Zhou <zhouhan at gmail.com> wrote:

>
>
> On Wed, Feb 14, 2018 at 9:45 AM, Ben Pfaff <blp at ovn.org> wrote:
> >
> > On Wed, Feb 14, 2018 at 11:27:11AM +0100, Daniel Alvarez Sanchez wrote:
> > > Thanks for your inputs. I need to look more carefully into the patch
> you
> > > submitted but it looks like, at least, we'll be reducing the number of
> > > calls to Datum.__cmp__ which should be good.
> >
> > Thanks.  Please do take a look.  It's a micro-optimization but maybe
> > it'll help?
> >
> > > I probably didn't explain it very well. Right now we have N processes
> > > for Neutron server (in every node). Each of those opens a connection
> > > to NB db and they subscribe to updates from certain tables. Each time
> > > a change happens, ovsdb-server will send N update2 messages that has
> > > to be processed in this "expensive" way by each of those N
> > > processes. My proposal (yet to be refined) would be to now open N+1
> > > connections to ovsdb-server and only subscribe to notifications from 1
> > > of those. So every time a new change happens, ovsdb-server will send 1
> > > update2 message. This message will be processed (using Py IDL as we do
> > > now) and once processed, send it (mcast maybe?) to the rest N
> > > processes. This msg could be simply a Python object serialized and
> > > we'd be saving all this Datum, Atom, etc. processing by doing it just
> > > once.
> >
> Daniel, I understand that the update2 messages sending would consume NB
> ovsdb-server CPU and processing those update would consume neutron server
> process CPU. However, are we sure it is the bottleneck for port creation?
>
> From ovsdb-server point of view, sending updates to tens of clients should
> not be the bottleneck, considering that we have a lot more clients on HVs
> for SB ovsdb-server.
>
> From clients point of view, I think it is more of memory overhead than
> CPU, and it also depends on how many neutron processes are running on the
> same node. I didn't find neutron process CPU in your charts. I am hesitate
> for such big change before we are clear about the bottleneck. The chart of
> port creation time is very nice, but do we know which part of code
> contributed to the linear growth? Do we have profiling for the time spent
> in ovn_client.add_acls()?
>

Here we are [0]. We see some spikes which are larger as the amount of ports
increases
but looks like the actual bottleneck is going to be when we're actually
commiting the
transaction [1]. I'll dig further though.

[0 https://imgur.com/a/TmwbC
[1]
https://github.com/openvswitch/ovs/blob/master/python/ovs/db/idl.py#L1158

>
> > OK.  It's an optimization that does the work in one place rather than N
> > places, so definitely a win from a CPU cost point of view, but it trades
> > performance for increased complexity.  It sounds like performance is
> > really important so maybe the increased complexity is a fair trade.
> >
> > We might also be able to improve performance by using native code for
> > some of the work.  Were these tests done with the native code JSON
> > parser that comes with OVS?  It is dramatically faster than the Python
> > code.
> >
> > > On Tue, Feb 13, 2018 at 8:32 PM, Ben Pfaff <blp at ovn.org> wrote:
> > >
> > > > Can you sketch the rows that are being inserted or modified when a
> port
> > > > is added?  I would expect something like this as a minimum:
> > > >
> > > >         * Insert one Logical_Switch_Port row.
> > > >
> > > >         * Add pointer to Logical_Switch_Port to ports column in one
> row
> > > >           in Logical_Switch.
> > > >
> > > > In addition it sounds like currently we're seeing:
> > > >
> > > >         * Add one ACL row per security group rule.
> > > >
> > > >         * Add pointers to ACL rows to acls column in one row in
> > > >           Logical_Switch.
> > > >
> > > This is what happens when we create a port in OpenStack (without
> > > binding it) which belongs to a SG which allows ICMP and SSH traffic
> > > and drops the rest [0]
> > >
> > > Basically, you were right and only thing missing was adding the new
> > > address to the Address_Set table.
> >
> > OK.
> >
> > It sounds like the real scaling problem here is that for R security
> > group rules and P ports, we have R*P rows in the ACL table.  Is that
> > correct?  Should we aim to solve that problem?
>
> I think this might be the most valuable point to optimize for the
> create_port scenario from Neutron.
> I remember there was a patch for ACL group in OVN, so that instead of R*P
> rows we will have only R + P rows, but didn't see it went through.
> Is this also a good use case of conjuncture?
>
> > _______________________________________________
> > discuss mailing list
> > discuss at openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20180215/a37a70b6/attachment-0001.html>


More information about the discuss mailing list