[ovs-discuss] OpenStack profiling with networking-ovn - port creation is slow

Han Zhou zhouhan at gmail.com
Wed Feb 14 20:34:19 UTC 2018


On Wed, Feb 14, 2018 at 9:45 AM, Ben Pfaff <blp at ovn.org> wrote:
>
> On Wed, Feb 14, 2018 at 11:27:11AM +0100, Daniel Alvarez Sanchez wrote:
> > Thanks for your inputs. I need to look more carefully into the patch you
> > submitted but it looks like, at least, we'll be reducing the number of
> > calls to Datum.__cmp__ which should be good.
>
> Thanks.  Please do take a look.  It's a micro-optimization but maybe
> it'll help?
>
> > I probably didn't explain it very well. Right now we have N processes
> > for Neutron server (in every node). Each of those opens a connection
> > to NB db and they subscribe to updates from certain tables. Each time
> > a change happens, ovsdb-server will send N update2 messages that has
> > to be processed in this "expensive" way by each of those N
> > processes. My proposal (yet to be refined) would be to now open N+1
> > connections to ovsdb-server and only subscribe to notifications from 1
> > of those. So every time a new change happens, ovsdb-server will send 1
> > update2 message. This message will be processed (using Py IDL as we do
> > now) and once processed, send it (mcast maybe?) to the rest N
> > processes. This msg could be simply a Python object serialized and
> > we'd be saving all this Datum, Atom, etc. processing by doing it just
> > once.
>
Daniel, I understand that the update2 messages sending would consume NB
ovsdb-server CPU and processing those update would consume neutron server
process CPU. However, are we sure it is the bottleneck for port creation?

>From ovsdb-server point of view, sending updates to tens of clients should
not be the bottleneck, considering that we have a lot more clients on HVs
for SB ovsdb-server.

>From clients point of view, I think it is more of memory overhead than CPU,
and it also depends on how many neutron processes are running on the same
node. I didn't find neutron process CPU in your charts. I am hesitate for
such big change before we are clear about the bottleneck. The chart of port
creation time is very nice, but do we know which part of code contributed
to the linear growth? Do we have profiling for the time spent in
ovn_client.add_acls()?

> OK.  It's an optimization that does the work in one place rather than N
> places, so definitely a win from a CPU cost point of view, but it trades
> performance for increased complexity.  It sounds like performance is
> really important so maybe the increased complexity is a fair trade.
>
> We might also be able to improve performance by using native code for
> some of the work.  Were these tests done with the native code JSON
> parser that comes with OVS?  It is dramatically faster than the Python
> code.
>
> > On Tue, Feb 13, 2018 at 8:32 PM, Ben Pfaff <blp at ovn.org> wrote:
> >
> > > Can you sketch the rows that are being inserted or modified when a
port
> > > is added?  I would expect something like this as a minimum:
> > >
> > >         * Insert one Logical_Switch_Port row.
> > >
> > >         * Add pointer to Logical_Switch_Port to ports column in one
row
> > >           in Logical_Switch.
> > >
> > > In addition it sounds like currently we're seeing:
> > >
> > >         * Add one ACL row per security group rule.
> > >
> > >         * Add pointers to ACL rows to acls column in one row in
> > >           Logical_Switch.
> > >
> > This is what happens when we create a port in OpenStack (without
> > binding it) which belongs to a SG which allows ICMP and SSH traffic
> > and drops the rest [0]
> >
> > Basically, you were right and only thing missing was adding the new
> > address to the Address_Set table.
>
> OK.
>
> It sounds like the real scaling problem here is that for R security
> group rules and P ports, we have R*P rows in the ACL table.  Is that
> correct?  Should we aim to solve that problem?

I think this might be the most valuable point to optimize for the
create_port scenario from Neutron.
I remember there was a patch for ACL group in OVN, so that instead of R*P
rows we will have only R + P rows, but didn't see it went through.
Is this also a good use case of conjuncture?

> _______________________________________________
> discuss mailing list
> discuss at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20180214/cf3fa57e/attachment.html>


More information about the discuss mailing list