[ovs-dev] OVN: Broken pipe race
alexw at nicira.com
Mon Aug 17 18:24:48 UTC 2015
Want to open a thread to discuss the following race I encountered while
unit testing ovn.
The most simple case is when I run ovn-nbctl to add a lport in unit test:
1. ovn-nbctl first creates/commits the logical_port entry in ovn-nb
database. the new entry's "up" column is empty,
2. then assume ovn-nbctl execution got suspended after
3. next, ovn-northd will update the ovn-sb database and finds that the
new logical port is not bound. so it goes ahead update the "up"
column of the entry to "false"...
4. since ovn-nbctl is still running and is set to monitor everything, the
ovsdb-server will try sending the "update" to ovn-nbctl...
5. now consider this race: if ovn-nbctl execution resumes and exits right
before ovsdb-server sending the update,... the send will fail with
(Broken Pipe) error, resulting in a WARN log in ovsdb-server.log.
Even if we set the "up" column to "false" at creation, we can still run into
similar race if the ovn-controller quickly binds the lport to chassis and
ovn-northd now updates "up" column to "true".
I also found similar race for other command combinations... e.g.
deleting vtep switch physical port and deleting ovs port while running
I'm thinking instead of trying to fix every case (which may not be even
possible), we can try removing all monitor request right after
ovsdb_idl_txn_commit_block() and try waiting until receiving the
monitor request ack from ovsdb-server. After that ovsdb-server will
never try sending anything to "*-*ctl" commands,
Would like to hear what you think?~
More information about the dev