[ovs-discuss] [openvswitch 2.9.2] testsuite: 8 978 failed on Fedora Rawhide after fa9a62453ea4

Timothy Redaelli tredaelli at redhat.com
Tue Jul 3 10:32:26 UTC 2018


Hi,
I'm debugging a failure in "8: vsctl-bashcomp - argument completion"
and "978: ofproto - ofport_request" on Fedora Rawhide that prevent me
to release OVS 2.9.2. The 2 tests fails for the same root cause and
they are present on current ovs master too.

After a bisect I found that the problematic commit is fa9a62453ea4
("ovsdb: Introduce experimental support for clustered databases.").

I can only see the problem on Fedora Rawhide since it has some
debugging kernel config options enables that emphasize the problem.
If I re-build the same kernel without debugging kernel option the
problem is not easily reproducible.

After other analysis I found that the problem is that some ovs-vsctl
commands (for example "ovs-vsctl get-manager" that is used by
"ovs-vsctl-bashcomp.bash") sometimes generates a "Connection reset by
peer" in ovsdb-server.log and, with the kernel-debug, it became a:
"2018-07-03T09:53:36.401Z|00038|jsonrpc|WARN|Dropped 23 log messages in
last 11 seconds (most recently, 1 seconds ago) due to excessive rate"
error that makes the test fail since `check_logs` (ofproto-macros.at)
doesn't ignore the "Dropped X log messages" log message.

In check_logs I can read the following comments:
# We most notably ignore 'Broken pipe' warnings.  These often and
# intermittently appear in ovsdb-server.log, because *ctl commands
# (e.g. ovs-vsctl, ovn-nbctl) exit right after committing a change to
the # database.  However, in reaction, some daemon may immediately
update the # database, and this later update may cause database sending
update back to # *ctl command if *ctl has not exited yet.  If *ctl
command exits before # the database calls send, the send fails with
'Broken pipe'.  Also removes # all "connection reset" warning logs for
similar reasons (either EPIPE or # ECONNRESET can be returned on a send
depending on whether the peer had # unconsumed data when it closed the
socket).

so I don't know which could be a good approach since after the
fa9a62453ea4 commit is not "often", but it's almost "always" (on
kernel-debug).

Do you have any ideas?

Thank you

-- 
Timothy Redaelli


More information about the discuss mailing list