[ovs-dev] [PATCH] testsuite: ofproto-dpif: fix sockets dependencies

Ben Pfaff blp at nicira.com
Mon Apr 6 21:01:50 UTC 2015


Jarno, you're the expert on this stuff I think, do you have advice for
Flavio?

On Mon, Apr 06, 2015 at 04:03:23PM -0300, Flavio Leitner wrote:
> On Mon, 6 Apr 2015 11:09:58 -0700
> Ben Pfaff <blp at nicira.com> wrote:
> 
> > On Mon, Apr 06, 2015 at 01:04:43PM -0300, Flavio Leitner wrote:
> > > On Thu, 2 Apr 2015 20:17:19 -0700
> > > Ben Pfaff <blp at nicira.com> wrote:
> > > 
> > > > On Thu, Apr 02, 2015 at 06:44:08PM -0300, Flavio Leitner wrote:
> > > > > On Thu, 2 Apr 2015 13:47:56 -0700
> > > > > Ben Pfaff <blp at nicira.com> wrote:
> > > > > 
> > > > > > On Thu, Apr 02, 2015 at 05:45:34PM -0300, Flavio Leitner
> > > > > > wrote:
> > > > > > > On Thu, 2 Apr 2015 13:37:06 -0700
> > > > > > > Ben Pfaff <blp at nicira.com> wrote:
> > > > > > > 
> > > > > > > > On Thu, Apr 02, 2015 at 05:33:25PM -0300, Flavio Leitner
> > > > > > > > wrote:
> > > > > > > > > On Thu, 2 Apr 2015 12:58:37 -0700
> > > > > > > > > Ben Pfaff <blp at nicira.com> wrote:
> > > > > > > > > 
> > > > > > > > > > On Thu, Apr 02, 2015 at 03:57:20PM -0300, Flavio
> > > > > > > > > > Leitner wrote:
> > > > > > > > > > > The ofproto-dpif creates dummies backed by sockets
> > > > > > > > > > > so depending on the order of execution when bridge
> > > > > > > > > > > is reconfiguring, an active socket may run first
> > > > > > > > > > > and not find the file.  That is usually not a
> > > > > > > > > > > problem because it will try to reconnect one second
> > > > > > > > > > > later. However, it breaks the testsuite.
> > > > > > > > > > > 
> > > > > > > > > > > This patch fixes the issue splitting active and
> > > > > > > > > > > passive sockets in different vsctl-ctl commands
> > > > > > > > > > > that guarantees the proper ordering between them.
> > > > > > > > > > 
> > > > > > > > > > WAIT_FOR_DUMMY_PORTS is supposed to avoid this
> > > > > > > > > > problem, by waiting until the ports have connected.
> > > > > > > > > > Is it busted?
> > > > > > > > > 
> > > > > > > > > No, but it takes at least one extra second for the port
> > > > > > > > > to reconnect. If we consider the four tests fixed by the
> > > > > > > > > patch, it can be 4 extra seconds to complete the same
> > > > > > > > > tests, so I'd just go with the proposed patch.
> > > > > > > > 
> > > > > > > > OK, I buy that, but in that case the commit message goes
> > > > > > > > overboard when it says that the current form "breaks the
> > > > > > > > testsuite".  Can you rephrase this as an optimization
> > > > > > > > rather than a bug fix, then?
> > > > > > > 
> > > > > > > Not really because it is a bug fix.  Those tests break like
> > > > > > > 7 out of 10 times on a s390x.
> > > > > > 
> > > > > > OK, so WAIT_FOR_DUMMY_PORTS is buggy then, can we figure out
> > > > > > why and fix it?  Optimizations are fine too but I'd like to
> > > > > > get to the root of the problem.
> > > > > 
> > > > > Ok, I missed that WAIT_FOR_DUMMY_PORTS is not available on
> > > > > branch-2.3 where it breaks most of the times.  So, I think the
> > > > > correct would be to backport these patches to branch-2.3 first:
> > > > >  
> > > > > 93fa0de tests: Fix race in 'balance-tcp bonding' test.
> > > > > 60187ac test: Remove explicit sleeps from ofproto-dpif bond
> > > > > tests d611e23 test: add WAIT_FOR_DUMMY_PORTS helper macro for
> > > > > writing tests
> > > > > 
> > > > > and then hopefully my patch becomes just an optimization.
> > > > > Anyway, I can do that next week if you or someone else didn't
> > > > > do it before me.
> > > > 
> > > > Thanks for the list of commits.  I took care of the backport.  I'd
> > > > appreciate it if you'd verify it.
> > > 
> > > I just did, and unfortunately it still fails because the macro
> > > WAIT_FOR_DUMMY_PORTS() depends on the ovs-appctl
> > > netdev-dummy/conn-state command which is not available in
> > > branch-2.3 too.  So, could you please backport this commit as well?
> > > 
> > > commit 7d7fffe8a4dbe0aab2eb16eca5ce016518b652ad
> > > Author: Andy Zhou <azhou at nicira.com>
> > > Date:   Thu Jun 5 16:01:17 2014 -0700
> > > 
> > >     netdev-dummy: add appctl netdev-dummy/conn-state command
> > >     
> > >     Using without any parameter, this command list the connection
> > >     state of all netdev-dummy devices that are configured to make
> > >     active connections.
> > > [...]
> > > 
> > > With the above commit applied, I could run 100 times the test 715
> > > without failures and I could run all ofproto-dpif tests 30 times
> > > without failures too.
> > 
> > Thanks for checking that!  I applied this to branch-2.3 also.
> 
> Thanks a lot!  I found another issue with the following test:
> 
> 717. ofproto-dpif.at:203: testing ofproto-dpif, balance-tcp bonding,
> different recirc flow 
> [...]
> --- -   2015-04-06 14:47:38.815107575 -0400
> +++ /root/ovs-2.3.git/tests/testsuite.dir/at-groups/717/stdout
> 2015-04-06 14:47 :38.806520838 -0400
> @@ -1,2 +1,2 @@
> -table_id=254, n_packets=1, n_bytes=64,
> priority=20,recirc_id=0x12d,dp_hash=0xcf /0xff,actions=output
> +table_id=254, n_packets=1, n_bytes=64,
> priority=20,recirc_id=0x12c,dp_hash=0x72 /0xff,actions=output
> 
> Two things changed: dp_hash and recirc_id.  There is a patch on master
> that masks the dp_hash issue:
> 
> commit 8ae8176fd0d8ed919e3301cc961dcf02b65ff49d
> Author: Jarno Rajahalme <jrajahalme at nicira.com>
> Date:   Wed Jan 7 10:16:47 2015 -0800
> 
>     tests: Make test independent of the hash function.
>     
>     Otherwise compiling with -msse4.2 (or -march=native on a SSE4.2
>     capable CPU) will produce a test failure due to the CRC32-based hash
>     function being different from mhash.
>     
>     Signed-off-by: Jarno Rajahalme <jrajahalme at nicira.com>
>     Acked-by: Ben Pfaff <blp at nicira.com>
> 
> Still the recirc_id changes, sometimes it doesn't:
> --- -   2015-04-06 14:52:07.239712884 -0400
> +++ /root/ovs-2.3.git/tests/testsuite.dir/at-groups/717/stdout
> 2015-04-06 14:52:07.226520838 -0400 @@ -1,2 +1,2 @@
> -table_id=254, n_packets=1, n_bytes=64,
> priority=20,recirc_id=0x12d,dp_hash=0xcf/0xff,actions=output
> +table_id=254, n_packets=1, n_bytes=64,
> priority=20,recirc_id=0x12d,dp_hash=0x72/0xff,actions=output
> 
> dp_hash is constant regardless of how many times I run the test.
> 
> Any idea?
> 
> Also, the above test is not included in the command:
> # make check TESTSUITEFLAGS="-k ofproto-dpif"
> 
> I am not expert on that language but I think we can't use the first
> word followed by something like ','.  So, we should change to:
> 
>  # Makes sure recirculation does not change the way packet is handled.
> -AT_SETUP([ofproto-dpif, balance-tcp bonding, different recirc flow ])
> +AT_SETUP([ofproto-dpif - balance-tcp bonding, different recirc flow ])
>  OVS_VSWITCHD_START(
> 
> which then -k ofproto-dpif works for me. (there are other tests with
> the same bug).
> 
> Thanks,
> fbl



More information about the dev mailing list