[ovs-discuss] Fw: ovsdb behavior under ovn management plane scaling

Mala Anand manand at us.ibm.com
Mon Feb 1 19:05:56 UTC 2016



re-posting after becoming a member..

I want to post some more details on the issues that Matt has posted earlier
on the OVN scaling measurements.

The symptom of the problem seem to be same in both Matt's router scaling
tests as well as VM scaling tests that we ran on the same environment. In
this case we created VMs with two network interfaces one for provider
network and another for private network.  On private network side, each
tenant gets two private networks, each with one subnet and one VM on each
network.  Once VMs boots-up,  it starts iperf traffic between the VMs in
the private network.

Around 330 VMs we started hitting errors in the rally benchmark,  looking
through the logs in the system where ovn northd is deployed, I see
connection to neutron stopped working.

-  CONNECTION TO NEUTRON DROPPED
2016-01-28T04:06:31.558Z|00430|ovsdb_file|
INFO|/var/lib/openvswitch/ovnnb.db: compacting database online
(1453083247.677 seconds old, 4233 transactions, 10503193 bytes)
2016-01-28T04:19:57.769Z|00431|reconnect|ERR|tcp:10.138.7.225:36487: no
response to inactivity probe after 5 seconds, disconnecting
2016-01-28T04:20:08.288Z|00432|memory|INFO|peak resident set size grew 145%
in last 1341.4 seconds, from 27464 kB to 67328 kB
2016-01-28T04:20:08.288Z|00433|memory|INFO|cells:230931 monitors:190
sessions:190
2016-01-28T04:25:04.472Z|00434|ovsdb_file|
INFO|/var/lib/openvswitch/ovnsb.db: compacting database online
(1453084360.588 seconds old, 6892 transactions, 10488324 bytes)
2016-01-28T04:27:38.367Z|00435|reconnect|ERR|tcp:10.138.109.44:57841: no
response to inactivity probe after 5 seconds, disconnecting
2016-01-28T04:28:36.182Z|00436|reconnect|ERR|tcp:10.138.7.223:55050: no
response to inactivity probe after 5 seconds, disconnecting
2016-01-28T04:29:31.554Z|00437|reconnect|ERR|tcp:10.138.109.47:46113: no
response to inactivity probe after 5 seconds, disconnecting
2016-01-28T05:15:04.831Z|00493|ovsdb_file|
INFO|/var/lib/openvswitch/ovnsb.db: compacting database online
(1453047567.212 seconds old, 385 transactions, 12170216 bytes)
2016-01-28T05:15:05.411Z|00494|timeval|WARN|Unreasonably long 1020ms poll
interval (996ms user, 21ms system)
2016-01-28T05:15:05.411Z|00495|timeval|WARN|faults: 3845 minor, 0 major
2016-01-28T05:15:05.411Z|00496|timeval|WARN|disk: 0 reads, 23328 writes
2016-01-28T05:15:05.411Z|00497|timeval|WARN|context switches: 208
voluntary, 1 involuntary
2016-01-28T05:15:05.411Z|00498|coverage|INFO|Event coverage, avg rate over
last: 5 seconds, last minute, last hour,  hash=a87630a6:
2016-01-28T05:15:05.411Z|00499|coverage|INFO|hmap_pathological
49.4/sec    22.450/sec       44.1908/sec   total: 198889
2016-01-28T05:15:05.411Z|00500|coverage|INFO|hmap_expand
4873.2/sec  2320.833/sec     2205.4286/sec   total: 21266361
2016-01-28T05:15:05.411Z|00501|coverage|INFO|lockfile_lock
0.0/sec     0.000/sec        0.0019/sec   total: 14
2016-01-28T05:15:05.411Z|00502|coverage|INFO|lockfile_unlock
0.0/sec     0.000/sec        0.0022/sec   total: 13
2016-01-28T05:15:05.411Z|00503|coverage|INFO|poll_create_node
10204.8/sec 11800.700/sec    12457.8392/sec   total: 484911073
2016-01-28T05:15:05.411Z|00504|coverage|INFO|poll_zero_timeout
2.0/sec     4.467/sec        4.0672/sec   total: 329223
2016-01-28T05:15:05.411Z|00505|coverage|INFO|seq_change
0.0/sec     0.000/sec        0.0000/sec   total: 3
2016-01-28T05:15:05.411Z|00506|coverage|INFO|pstream_open
0.0/sec     0.000/sec        0.0000/sec   total: 3
2016-01-28T05:15:05.411Z|00507|coverage|INFO|unixctl_received
0.0/sec     0.000/sec        0.0000/sec   total: 7
2016-01-28T05:15:05.411Z|00508|coverage|INFO|unixctl_replied
0.0/sec     0.000/sec        0.0000/sec   total: 7
2016-01-28T05:15:05.411Z|00509|coverage|INFO|util_xalloc
761596.4/sec 419504.017/sec   222669.7517/sec   total: 1841629427
2016-01-28T05:15:05.411Z|00510|coverage|INFO|5 events never hit



Looking at the neutron server log, I see ACL error referential integrity
violation.  I am still digging into this,  let me know if you need these
log files.

2016-01-28 04:55:04.723 2757 WARNING
requests.packages.urllib3.connectionpool
[req-7964fc18-a95c-4464-a568-038a453d006e 7a0b7c6414734a93b5dffbc666534690
e53d6cd40c5140c58ec014eb56070917 - - -] Connection pool is full, discarding
connection: identity.open.softlayer.com
2016-01-28 04:55:04.772 2757 WARNING
requests.packages.urllib3.connectionpool [-] Connection pool is full,
discarding connection: identity.open.softlayer.com
2016-01-28 04:55:05.776 2757 WARNING
requests.packages.urllib3.connectionpool
[req-6b80241f-0928-4d93-a80e-988a7b3e9690 7a0b7c6414734a93b5dffbc666534690
e53d6cd40c5140c58ec014eb56070917 - - -] Connection pool is full, discarding
connection: identity.open.softlayer.com
2016-01-28 04:55:55.380 2757 ERROR neutron.agent.ovsdb.impl_idl [-] OVSDB
Error: {"details":"Table Logical_Switch column acls row
36d940b2-26cc-426a-bda6-dd2491f18397 references nonexistent row
0644cffd-71ca-467f-8e1d-6652968870ef in table ACL.","error":"referential
integrity violation"}
2016-01-28 04:55:55.443 2757 ERROR neutron.agent.ovsdb.impl_idl
[req-0bd44851-ecd9-4957-978f-350b52ada25b cc27c50b17fc4954905db5f3f3eed730
e53d6cd40c5140c58ec014eb56070917 - - -] Traceback (most recent call last):
  File
"/opt/neutron/lib/python2.7/site-packages/neutron/agent/ovsdb/native/connection.py",
 line 99, in run
    txn.results.put(txn.do_commit())
  File
"/opt/neutron/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py",
line 106, in do_commit
    raise RuntimeError(msg)
RuntimeError: OVSDB Error: {"details":"Table Logical_Switch column acls row
36d940b2-26cc-426a-bda6-dd2491f18397 references nonexistent row
0644cffd-71ca-467f-8e1d-6652968870ef in table ACL.","error":"referential
integrity violation"}

2016-01-28 04:55:55.728 2757 ERROR neutron.api.v2.resource
[req-0bd44851-ecd9-4957-978f-350b52ada25b cc27c50b17fc4954905db5f3f3eed730
e53d6cd40c5140c58ec014eb56070917 - - -] create failed
2016-01-28 04:55:55.728 2757 ERROR neutron.api.v2.resource Traceback (most
recent call last):
2016-01-28 04:55:55.728 2757 ERROR neutron.api.v2.resource   File
"/opt/neutron/lib/python2.7/site-packages/neutron/api/v2/resource.py", line
83, in resource


 Thanks,
     Mala
 ---------------------------------------------------------------------------


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://openvswitch.org/pipermail/ovs-discuss/attachments/20160201/f5455df3/attachment-0002.html>


More information about the discuss mailing list