[ovs-dev] FreeBSD vconn refuse-connection unit test race condition

Ben Pfaff blp at nicira.com
Mon Jul 30 22:38:57 UTC 2012


On Mon, Jul 30, 2012 at 07:52:42PM +0000, Ed Maste wrote:
> On FreeBSD I'm seeing an intermittent failure in a number of the
> vconn unit tests - e.g. 186 tcp vconn - refuse connection.  The test
> code:
> 
> 139  static void
> 140  test_refuse_connection(int argc OVS_UNUSED, char *argv[])
> 141  {
> 142      const char *type = argv[1];
> 143      int expected_error;
> 144      struct fake_pvconn fpv;
> 145      struct vconn *vconn;
> 146  
> 147      expected_error = (!strcmp(type, "unix") ? EPIPE
> 148                        : !strcmp(type, "tcp") ? ECONNRESET
> 149                        : EPROTO);
> 150  
> 151      fpv_create(type, &fpv);
> 152      CHECK_ERRNO(vconn_open(fpv.vconn_name, OFP10_VERSION, &vconn,
> 153                             DSCP_DEFAULT), 0);
> 154      fpv_close(&fpv);
> 155      vconn_run(vconn);
> 156      CHECK_ERRNO(vconn_connect(vconn), expected_error);
> 157      vconn_close(vconn);
> 158      fpv_destroy(&fpv);
> 159  }
> 
> The return value of vconn_connect at line 156 is variously ECONNRESET,
> EPIPE, or EAGAIN, with maybe 3/4 of my runs being successful.  The EPIPE
> case comes from the write() in lib/strem-fd.c:fd_send(), and from
> write(2):
> 
> [EPIPE]	An attempt is made to write to a socket of type
> 	SOCK_STREAM that is not connected to a peer socket.
> 
> (It looks like Linux returns ECONNRESET for this case.)
> 
> So it looks like the tcp localhost close() takes some small time to
> complete, and depending on exactly when it does a different path is
> taken and the failure manifests itself in a different way.
> 
> I can try to make this deterministic by having the unit test try to wait
> until the close() is complete (somehow), but it may make sense to 
> investigate the different code paths to ensure that failure is handled
> correctly for each of them.  Any thoughts?

The test here is really just trying to make sure that the code doesn't
explode if the connection gets refused.  Sure, we can try to pin the
exact problem and therefore the exact error code more precisely, but
it might be just as useful to simply accept any reasonable error
code.  I'd take a patch that does that.

(EAGAIN isn't all that reasonable, so maybe we need a retry loop for
that case.)

Does that make sense?

Thanks,

Ben.



More information about the dev mailing list