[ovs-discuss] OpenStack profiling with networking-ovn - port creation is slow
Daniel Alvarez Sanchez
dalvarez at redhat.com
Mon Feb 19 17:50:38 UTC 2018
On Tue, Feb 13, 2018 at 8:32 PM, Ben Pfaff <blp at ovn.org> wrote:
> On Tue, Feb 13, 2018 at 12:39:56PM +0100, Daniel Alvarez Sanchez wrote:
> > Hi folks,
> >
> > As we're doing some performance tests in OpenStack using OVN,
> > we noticed that as we keep creating ports, the time for creating a
> > single port increases. Also, ovn-northd CPU consumption is quite
> > high (see [0] which shows the CPU consumption when creating
> > 1000 ports and deleting them. Last part where CPU is at 100% is
> > when all the ports get deleted).
> >
> > With 500 ports in the same Logical Switch, I did some profiling
> > of OpenStack neutron-server adding 10 more ports to that Logical
> > Switch. Currently, neutron-server spawns different API workers
> > (separate processes) which open connections to OVN NB so every
> > time an update message is sent from ovsdb-server it'll be processed
> > by all of them.
> >
> > In my profiling, I used GreenletProfiler in all those processes to
> produce
> > a trace file and then merged all of them together to aggregate the
> > results. In those tests I used OVS master branch compiled it with
> > shared libraries to make use of the JSON C parser. Still, I've seen
> > that most of the time's been spent in the following two modules:
> >
> > - python/ovs/db/data.py: 33%
> > - uuid.py: 21%
> >
> > For the data.py module, this is the usage (self time):
> >
> > Atom.__lt__ 16.25% 8283 calls
> > from_json:118 6.18% 406935 calls
> > Atom.__hash__ 3.48% 1623832 calls
> > from_json:328 2.01% 5040 calls
> >
> > While for the uuid module:
> >
> > UUID.__cmp__ 12.84% 3570975 calls
> > UUID.__init__ 4.06% 362541 calls
> > UUID.__hash__ 2.96% 1800 calls
> > UUID.__str__ 1.03% 355016 calls
> >
> > Most of the calls to Atom.__lt__ come from
> > BaseOvnIdl.__process_update2(idl.py)
> > -> BaseOvnIdl.__row_update(idl.py) -> Datum.__cmp__(data.py) ->
> > Atom.__cmp__(data.py).
>
> I don't know Python well enough to whether these are "real" or correct
> optimizations, but the following strike me as possible low-hanging fruit
> micro-optimizations:
>
> diff --git a/python/ovs/db/data.py b/python/ovs/db/data.py
> index 9e57595f7513..dc816f64708f 100644
> --- a/python/ovs/db/data.py
> +++ b/python/ovs/db/data.py
> @@ -76,12 +76,12 @@ class Atom(object):
> def __eq__(self, other):
> if not isinstance(other, Atom) or self.type != other.type:
> return NotImplemented
> - return True if self.value == other.value else False
> + return self.value == other.value
>
> def __lt__(self, other):
> if not isinstance(other, Atom) or self.type != other.type:
> return NotImplemented
> - return True if self.value < other.value else False
> + return self.value < other.value
>
> def __cmp__(self, other):
> if not isinstance(other, Atom) or self.type != other.type:
>
> We could also reduce the number of calls to Datum.__cmp__ by recognizing
> that if we've already found one change we don't have to do any more
> comparisons, something like this:
>
> diff --git a/python/ovs/db/idl.py b/python/ovs/db/idl.py
> index 60548bcf50b6..2e550adfdf1c 100644
> --- a/python/ovs/db/idl.py
> +++ b/python/ovs/db/idl.py
> @@ -447,6 +447,7 @@ class Idl(object):
> raise error.Error("<table-updates> is not an object",
> table_updates)
>
> + changed = False
> for table_name, table_update in six.iteritems(table_updates):
> table = self.tables.get(table_name)
> if not table:
> @@ -472,8 +473,8 @@ class Idl(object):
> % (table_name, uuid_string))
>
> if version == OVSDB_UPDATE2:
> - if self.__process_update2(table, uuid, row_update):
> - self.change_seqno += 1
> + changed = self.__process_update2(table, uuid,
> row_update,
> + changed)
> continue
>
> parser = ovs.db.parser.Parser(row_update, "row-update")
> @@ -485,12 +486,12 @@ class Idl(object):
> raise error.Error('<row-update> missing "old" and '
> '"new" members', row_update)
>
> - if self.__process_update(table, uuid, old, new):
> - self.change_seqno += 1
> + changed = self.__process_update(table, uuid, old, new,
> changed)
> + if changed:
> + self.change_seqno += 1
>
> - def __process_update2(self, table, uuid, row_update):
> + def __process_update2(self, table, uuid, row_update, changed):
> row = table.rows.get(uuid)
> - changed = False
> if "delete" in row_update:
> if row:
> del table.rows[uuid]
> @@ -511,9 +512,8 @@ class Idl(object):
> else:
> row_update = row_update['initial']
> self.__add_default(table, row_update)
> - if self.__row_update(table, row, row_update):
> - changed = True
> - self.notify(ROW_CREATE, row)
> + changed = self.__row_update(table, row, row_update, changed)
> + self.notify(ROW_CREATE, row)
> elif "modify" in row_update:
> if not row:
> raise error.Error('Modify non-existing row')
> @@ -528,10 +528,9 @@ class Idl(object):
> row_update)
> return changed
>
> - def __process_update(self, table, uuid, old, new):
> + def __process_update(self, table, uuid, old, new, changed):
> """Returns True if a column changed, False otherwise."""
> row = table.rows.get(uuid)
> - changed = False
> if not new:
> # Delete row.
> if row:
> @@ -547,13 +546,13 @@ class Idl(object):
> if not row:
> row = self.__create_row(table, uuid)
> changed = True
> + self.__row_update(table, row, new, True)
> + self.notify(ROW_CREATE, row)
> else:
> # XXX rate-limit
> vlog.warn("cannot add existing row %s to table %s"
> % (uuid, table.name))
> - if self.__row_update(table, row, new):
> - changed = True
> - self.notify(ROW_CREATE, row)
> + changed = self.__row_update(table, row, new, changed)
> else:
> op = ROW_UPDATE
> if not row:
> @@ -563,9 +562,8 @@ class Idl(object):
> # XXX rate-limit
> vlog.warn("cannot modify missing row %s in table %s"
> % (uuid, table.name))
> - if self.__row_update(table, row, new):
> - changed = True
> - self.notify(op, row, Row.from_json(self, table, uuid,
> old))
> + changed = self.__row_update(table, row, new, changed)
> + self.notify(op, row, Row.from_json(self, table, uuid, old))
> return changed
>
> def __column_name(self, column):
> @@ -608,8 +606,7 @@ class Idl(object):
>
> return old_row_diff_json
>
> - def __row_update(self, table, row, row_json):
> - changed = False
> + def __row_update(self, table, row, row_json, changed):
> for column_name, datum_json in six.iteritems(row_json):
> column = table.columns.get(column_name)
> if not column:
> @@ -626,14 +623,13 @@ class Idl(object):
> % (column_name, table.name, e))
> continue
>
> - if datum != row._data[column_name]:
> + # If the column changed, update it and record that we saw a
> change.
> + # We also update it if we've already recorded some other
> change,
> + # because that's cheaper than doing an unnecessary comparison.
> + if changed or datum != row._data[column_name]:
> row._data[column_name] = datum
> if column.alert:
> changed = True
> - else:
> - # Didn't really change but the OVSDB monitor protocol
> always
> - # includes every value in a row.
> - pass
> return changed
>
> def __create_row(self, table, uuid):
>
> FWIW, I have tried this patch on the same scenario of the whole thread
and there's no gain. However I think the optimizations make sense and
once we have the Port sets, they're going to be more noticeable.
> > The aggregated number of calls to BaseOvnIdl.__process_update2 is
> > 1400 (and we're updating only 10 ports!!) while the total connections
> > opened to NB database are 10:
> >
> > # netstat -np | grep 6641 | grep python | wc -l
> > 10
> >
> > * Bear in mind that those results above were aggregated across all
> > processes.
> >
> > Looks like the main culprit for this explosion could be the way we
> > handle ACLs. As every time we create a port, it'll belong to a Neutron
> > security group (OVN Address Set) and we'll add a new ACL for every
> > Neutron security group rule. If we patch the code to skip the ACL part,
> > the time for creating a port remains stable over the time.
> >
> > From the comparison tests against ML2/OVS (reference implementation),
> > OVN outperforms in most of the operations except for the port creation
> > where we can see it can become a bottleneck.
> >
> > Before optimizing/redesigning the ACL part, we could do some other
> > changes to the way we handle notifications from OVSDB: eg., instead of
> > having multiple processes receiving *all* notifications, we could have
> > one single process subscribed to those notifications and send a more
> > optimized (already parsed) multicast notification to all listening
> processes
> > to keep their own in-memory copies of the DB up to date. All processes
> > would connect to NB database in "write-only" mode to commit their
> > transactions however.
>
> I'm not sure I understand this. What information is it unnecessary to
> have the processes receive? If they are getting information they don't
> need, then we might be able to eliminate some of it.
>
> Can you sketch the rows that are being inserted or modified when a port
> is added? I would expect something like this as a minimum:
>
> * Insert one Logical_Switch_Port row.
>
> * Add pointer to Logical_Switch_Port to ports column in one row
> in Logical_Switch.
>
> In addition it sounds like currently we're seeing:
>
> * Add one ACL row per security group rule.
>
> * Add pointers to ACL rows to acls column in one row in
> Logical_Switch.
>
> and that's the bottleneck?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20180219/ba921d91/attachment-0001.html>
More information about the discuss
mailing list