[ovs-dev] OVN: Broken pipe race

Alex Wang alexw at nicira.com
Mon Aug 17 18:24:48 UTC 2015


Want to open a thread to discuss the following race I encountered while
unit testing ovn.

The most simple case is when I run ovn-nbctl to add a lport in unit test:
1. ovn-nbctl first creates/commits the logical_port entry in ovn-nb
    database.  the new entry's "up" column is empty,
2. then assume ovn-nbctl execution got suspended after
3. next, ovn-northd will update the ovn-sb database and finds that the
    new logical port is not bound.  so it goes ahead update the "up"
    column of the entry to "false"...
4. since ovn-nbctl is still running and is set to monitor everything, the
    ovsdb-server will try sending the "update" to ovn-nbctl...
5. now consider this race:  if ovn-nbctl execution resumes and exits right
    before ovsdb-server sending the update,...  the send will fail with
    (Broken Pipe) error, resulting in a WARN log in ovsdb-server.log.

Even if we set the "up" column to "false" at creation, we can still run into
similar race if the ovn-controller quickly binds the lport to chassis and
ovn-northd now updates "up" column to "true".

I also found similar race for other command combinations...  e.g.
deleting vtep switch physical port and deleting ovs port while running
ovs-vtep simulator...

I'm thinking instead of trying to fix every case (which may not be even
possible), we can try removing all monitor request right after
ovsdb_idl_txn_commit_block() and try waiting until receiving the
monitor request ack from ovsdb-server.  After that ovsdb-server will
never try sending anything to "*-*ctl" commands,

Would like to hear what you think?~

Alex Wang,

More information about the dev mailing list