[ovs-dev] ovn test failures

Ryan Moats rmoats at us.ibm.com
Fri Jul 22 21:58:59 UTC 2016


Ben Pfaff <blp at ovn.org> wrote on 07/22/2016 02:21:01 PM:

> From: Ben Pfaff <blp at ovn.org>
> To: Ryan Moats/Omaha/IBM at IBMUS
> Cc: Guru Shetty <guru at ovn.org>, ovs dev <dev at openvswitch.org>
> Date: 07/22/2016 02:21 PM
> Subject: Re: [ovs-dev] ovn test failures
>
> On Fri, Jul 22, 2016 at 01:52:18PM -0500, Ryan Moats wrote:
> >
> >
> > Guru Shetty <guru at ovn.org> wrote on 07/22/2016 12:31:43 PM:
> >
> > > From: Guru Shetty <guru at ovn.org>
> > > To: Ryan Moats/Omaha/IBM at IBMUS
> > > Cc: Lance Richardson <lrichard at redhat.com>, ovs dev
<dev at openvswitch.org>
> > > Date: 07/22/2016 12:31 PM
> > > Subject: Re: [ovs-dev] ovn test failures
> > >
> > > On 21 July 2016 at 06:05, Ryan Moats <rmoats at us.ibm.com> wrote:
> > > "dev" <dev-bounces at openvswitch.org> wrote on 07/21/2016 06:32:02 AM:
> > >
> > > > From: Lance Richardson <lrichard at redhat.com>
> > > > To: ovs dev <dev at openvswitch.org>
> > > > Date: 07/21/2016 06:32 AM
> > > > Subject: [ovs-dev] ovn test failures
> > > > Sent by: "dev" <dev-bounces at openvswitch.org>
> > > >
> > > > It seems the failure rate for OVN end-to-end tests went up
> > significantly
> > > > when commit 70c7cfef188b5ae9940abd5b7d9fe46b1fa88c8e was merged
earlier
> > > > this week.
> > > >
> > > > After this commit, 100 iterations of "make check
TESTSUITEFLAGs='-j8 -k
> > > ovn'"
> > > > gave (number of failures in left-most column):
> > > >       2 2179: ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS
FAILED
> > > > (ovn.at:1312)
> > > >      10 2183: ovn -- 2 HVs, 2 LS, 1 lport/LS, 2 peer LRs
FAILED
> > > > (ovn.at:2416)
> > > >      52 2184: ovn -- 1 HV, 1 LS, 2 lport/LS, 1 LR
FAILED
> > > > (ovn.at:2529)
> > > >      45 2185: ovn -- 1 HV, 2 LSs, 1 lport/LS, 1 LR
FAILED
> > > > (ovn.at:2668)
> > > >      23 2186: ovn -- 2 HVs, 3 LS, 1 lport/LS, 2 peer LRs, static
> > > > routes FAILED (ovn.at:2819)
> > > >      53 2188: ovn -- 2 HVs, 3 LRs connected via LS, static routes
> > > > FAILED (ovn.at:3053)
> > > >      32 2189: ovn -- 2 HVs, 2 LRs connected via LS, gateway router
> > > > FAILED (ovn.at:3237)
> > > >      50 2190: ovn -- icmp_reply: 1 HVs, 2 LSs, 1 lport/LS, 1 LR
> > > > FAILED (ovn.at:3389)
> > > >
> > > > Immediately prior to this (at commit
> > > > 48ff3e25abe31b761d2d3f3a2fd6ccaa783c79dc),
> > > > the number of failures per 100 iterations was much lower:
> > > >       1 2178: ovn -- 2 HVs, 4 lports/HV, localnet ports
FAILED
> > > > (ovn.at:1020)
> > > >       1 2179: ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS
FAILED
> > > > (ovn.at:1307)
> > > >       1 2179: ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS
FAILED
> > > > (ovn.at:1312)
> > > >       9 2184: ovn -- 1 HV, 1 LS, 2 lport/LS, 1 LR
FAILED
> > > > (ovn.at:2529)
> > > >       7 2186: ovn -- 2 HVs, 3 LS, 1 lport/LS, 2 peer LRs, static
> > > > routes FAILED (ovn.at:2819)
> > > >       1 2187: ovn -- send gratuitous arp on localnet
FAILED
> > > > (ovn.at:2874)
> > > >      16 2188: ovn -- 2 HVs, 3 LRs connected via LS, static routes
> > > > FAILED (ovn.at:3053)
> > > >
> > > > Any ideas?
> > > >
> > > > Thanks,
> > > >
> > > >     Lance
> >
> > > As author of that patch, I will admit that those numbers are a
> > > bit disturbing, because they aren't consistent with what I was
> > > seeing while developing and testing the patch series.
> > >
> > > What they make me suspect is that that patches doesn't catch all
> > > state transitions (similar to what you uncovered with commit
> > > f94705d729459d808fd139c8f95d5f1f8d8becc6) correctly.
> > >
> > > Two things come to mind:
> > > 1) Make sure that all of the places where the code needs to request
> > >    a full process of tables are correctly handled.
> > > 2) If a later step in the process finds that an earlier step in
> > >    the process needs to process the database rows fully during the
> > >    next cycle, use poll_immediate_wake so that processing happens
> > >    sooner than later.
> > > Ryan,
> > >  Were you planning to look at the failures? Should we revert the
patch?
> > >
> > >
> >
> > Guru-
> >
> > Yes, I have been looking at the failures since Wed and I have a patch
set
> > that
> > addresses all of these failures.  However, I'm travelling today, so I
won't
> > be able
> > to mail it until either late tonight or tomorrow morning (US Central
Time).
>
> We'll look forward to it.  I think that these are probably affecting
> everyone who regularly runs the tests.  It'll be nice to get them fixed
> soon.

I had time to make it happen from the Austin airport -

http://openvswitch.org/pipermail/dev/2016-July/075942.html




More information about the dev mailing list