[ovs-dev] OVN: Broken pipe race
blp at nicira.com
Fri Aug 21 22:02:34 UTC 2015
On Mon, Aug 17, 2015 at 11:24:48AM -0700, Alex Wang wrote:
> Want to open a thread to discuss the following race I encountered while
> unit testing ovn.
> The most simple case is when I run ovn-nbctl to add a lport in unit test:
> 1. ovn-nbctl first creates/commits the logical_port entry in ovn-nb
> database. the new entry's "up" column is empty,
> 2. then assume ovn-nbctl execution got suspended after
> 3. next, ovn-northd will update the ovn-sb database and finds that the
> new logical port is not bound. so it goes ahead update the "up"
> column of the entry to "false"...
> 4. since ovn-nbctl is still running and is set to monitor everything, the
> ovsdb-server will try sending the "update" to ovn-nbctl...
> 5. now consider this race: if ovn-nbctl execution resumes and exits right
> before ovsdb-server sending the update,... the send will fail with
> (Broken Pipe) error, resulting in a WARN log in ovsdb-server.log.
> Even if we set the "up" column to "false" at creation, we can still run into
> similar race if the ovn-controller quickly binds the lport to chassis and
> ovn-northd now updates "up" column to "true".
> I also found similar race for other command combinations... e.g.
> deleting vtep switch physical port and deleting ovs port while running
> ovs-vtep simulator...
> I'm thinking instead of trying to fix every case (which may not be even
> possible), we can try removing all monitor request right after
> ovsdb_idl_txn_commit_block() and try waiting until receiving the
> monitor request ack from ovsdb-server. After that ovsdb-server will
> never try sending anything to "*-*ctl" commands,
> Would like to hear what you think?~
I think the warning is harmless (since we know the cause) so I'd be
inclined to just ignore it in the testsuite.
More information about the dev