[ovs-dev] FreeBSD vconn refuse-connection unit test race condition
Ben Pfaff
blp at nicira.com
Mon Jul 30 22:38:57 UTC 2012
On Mon, Jul 30, 2012 at 07:52:42PM +0000, Ed Maste wrote:
> On FreeBSD I'm seeing an intermittent failure in a number of the
> vconn unit tests - e.g. 186 tcp vconn - refuse connection. The test
> code:
>
> 139 static void
> 140 test_refuse_connection(int argc OVS_UNUSED, char *argv[])
> 141 {
> 142 const char *type = argv[1];
> 143 int expected_error;
> 144 struct fake_pvconn fpv;
> 145 struct vconn *vconn;
> 146
> 147 expected_error = (!strcmp(type, "unix") ? EPIPE
> 148 : !strcmp(type, "tcp") ? ECONNRESET
> 149 : EPROTO);
> 150
> 151 fpv_create(type, &fpv);
> 152 CHECK_ERRNO(vconn_open(fpv.vconn_name, OFP10_VERSION, &vconn,
> 153 DSCP_DEFAULT), 0);
> 154 fpv_close(&fpv);
> 155 vconn_run(vconn);
> 156 CHECK_ERRNO(vconn_connect(vconn), expected_error);
> 157 vconn_close(vconn);
> 158 fpv_destroy(&fpv);
> 159 }
>
> The return value of vconn_connect at line 156 is variously ECONNRESET,
> EPIPE, or EAGAIN, with maybe 3/4 of my runs being successful. The EPIPE
> case comes from the write() in lib/strem-fd.c:fd_send(), and from
> write(2):
>
> [EPIPE] An attempt is made to write to a socket of type
> SOCK_STREAM that is not connected to a peer socket.
>
> (It looks like Linux returns ECONNRESET for this case.)
>
> So it looks like the tcp localhost close() takes some small time to
> complete, and depending on exactly when it does a different path is
> taken and the failure manifests itself in a different way.
>
> I can try to make this deterministic by having the unit test try to wait
> until the close() is complete (somehow), but it may make sense to
> investigate the different code paths to ensure that failure is handled
> correctly for each of them. Any thoughts?
The test here is really just trying to make sure that the code doesn't
explode if the connection gets refused. Sure, we can try to pin the
exact problem and therefore the exact error code more precisely, but
it might be just as useful to simply accept any reasonable error
code. I'd take a patch that does that.
(EAGAIN isn't all that reasonable, so maybe we need a retry loop for
that case.)
Does that make sense?
Thanks,
Ben.
More information about the dev
mailing list