[ovs-discuss] OpenStack profiling with networking-ovn - port creation is slow

Daniel Alvarez Sanchez dalvarez at redhat.com
Tue Feb 13 11:39:56 UTC 2018


Hi folks,

As we're doing some performance tests in OpenStack using OVN,
we noticed that as we keep creating ports, the time for creating a
single port increases. Also, ovn-northd CPU consumption is quite
high (see [0] which shows the CPU consumption when creating
1000 ports and deleting them. Last part where CPU is at 100% is
when all the ports get deleted).

With 500 ports in the same Logical Switch, I did some profiling
of OpenStack neutron-server adding 10 more ports to that Logical
Switch. Currently, neutron-server spawns different API workers
(separate processes) which open connections to OVN NB so every
time an update message is sent from ovsdb-server it'll be processed
by all of them.

In my profiling, I used GreenletProfiler in all those processes to produce
a trace file and then merged all of them together to aggregate the
results. In those tests I used OVS master branch compiled it with
shared libraries to make use of the JSON C parser. Still, I've seen
that most of the time's been spent in the following two modules:

- python/ovs/db/data.py:  33%
- uuid.py:  21%

For the data.py module, this is the usage (self time):

Atom.__lt__       16.25%     8283 calls
from_json:118      6.18%   406935 calls
Atom.__hash__      3.48%  1623832 calls
from_json:328      2.01%     5040 calls

While for the uuid module:

UUID.__cmp__       12.84%  3570975 calls
UUID.__init__       4.06%   362541 calls
UUID.__hash__       2.96%     1800 calls
UUID.__str__        1.03%   355016 calls

Most of the calls to Atom.__lt__ come from
BaseOvnIdl.__process_update2(idl.py)
-> BaseOvnIdl.__row_update(idl.py) -> Datum.__cmp__(data.py) ->
Atom.__cmp__(data.py).

The aggregated number of calls to BaseOvnIdl.__process_update2 is
1400 (and we're updating only 10 ports!!) while the total connections
opened to NB database are 10:

# netstat -np | grep 6641 | grep python | wc -l
10

* Bear in mind that those results above were aggregated across all
processes.

Looks like the main culprit for this explosion could be the way we
handle ACLs. As every time we create a port, it'll belong to a Neutron
security group (OVN Address Set) and we'll add a new ACL for every
Neutron security group rule. If we patch the code to skip the ACL part,
the time for creating a port remains stable over the time.

>From the comparison tests against ML2/OVS (reference implementation),
OVN outperforms in most of the operations except for the port creation
where we can see it can become a bottleneck.

Before optimizing/redesigning the ACL part, we could do some other
changes to the way we handle notifications from OVSDB: eg., instead of
having multiple processes receiving *all* notifications, we could have
one single process subscribed to those notifications and send a more
optimized (already parsed) multicast notification to all listening processes
to keep their own in-memory copies of the DB up to date. All processes
would connect to NB database in "write-only" mode to commit their
transactions however.

Even though this last paragraph would best fit in OpenStack ML I want
to raise it here for feedback and see if someone can see some "immediate"
optimization for the way we're processing notifications from OVSDB.
Maybe some python binding to do it in C? :)

Any feedback, comments or suggestions are highly appreciated :)

Best,
Daniel Alvarez

[0]
https://snapshot.raintank.io/dashboard/snapshot/dwbhn0Z1zVTh9kI5j6mCVySx8TvrP45m?orgId=2
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20180213/d7100a60/attachment.html>


More information about the discuss mailing list