[ovs-dev] ovn test failures

Ryan Moats rmoats at us.ibm.com
Fri Jul 22 18:52:18 UTC 2016



Guru Shetty <guru at ovn.org> wrote on 07/22/2016 12:31:43 PM:

> From: Guru Shetty <guru at ovn.org>
> To: Ryan Moats/Omaha/IBM at IBMUS
> Cc: Lance Richardson <lrichard at redhat.com>, ovs dev <dev at openvswitch.org>
> Date: 07/22/2016 12:31 PM
> Subject: Re: [ovs-dev] ovn test failures
>
> On 21 July 2016 at 06:05, Ryan Moats <rmoats at us.ibm.com> wrote:
> "dev" <dev-bounces at openvswitch.org> wrote on 07/21/2016 06:32:02 AM:
>
> > From: Lance Richardson <lrichard at redhat.com>
> > To: ovs dev <dev at openvswitch.org>
> > Date: 07/21/2016 06:32 AM
> > Subject: [ovs-dev] ovn test failures
> > Sent by: "dev" <dev-bounces at openvswitch.org>
> >
> > It seems the failure rate for OVN end-to-end tests went up
significantly
> > when commit 70c7cfef188b5ae9940abd5b7d9fe46b1fa88c8e was merged earlier
> > this week.
> >
> > After this commit, 100 iterations of "make check TESTSUITEFLAGs='-j8 -k
> ovn'"
> > gave (number of failures in left-most column):
> >       2 2179: ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS       FAILED
> > (ovn.at:1312)
> >      10 2183: ovn -- 2 HVs, 2 LS, 1 lport/LS, 2 peer LRs      FAILED
> > (ovn.at:2416)
> >      52 2184: ovn -- 1 HV, 1 LS, 2 lport/LS, 1 LR             FAILED
> > (ovn.at:2529)
> >      45 2185: ovn -- 1 HV, 2 LSs, 1 lport/LS, 1 LR            FAILED
> > (ovn.at:2668)
> >      23 2186: ovn -- 2 HVs, 3 LS, 1 lport/LS, 2 peer LRs, static
> > routes FAILED (ovn.at:2819)
> >      53 2188: ovn -- 2 HVs, 3 LRs connected via LS, static routes
> > FAILED (ovn.at:3053)
> >      32 2189: ovn -- 2 HVs, 2 LRs connected via LS, gateway router
> > FAILED (ovn.at:3237)
> >      50 2190: ovn -- icmp_reply: 1 HVs, 2 LSs, 1 lport/LS, 1 LR
> > FAILED (ovn.at:3389)
> >
> > Immediately prior to this (at commit
> > 48ff3e25abe31b761d2d3f3a2fd6ccaa783c79dc),
> > the number of failures per 100 iterations was much lower:
> >       1 2178: ovn -- 2 HVs, 4 lports/HV, localnet ports       FAILED
> > (ovn.at:1020)
> >       1 2179: ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS       FAILED
> > (ovn.at:1307)
> >       1 2179: ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS       FAILED
> > (ovn.at:1312)
> >       9 2184: ovn -- 1 HV, 1 LS, 2 lport/LS, 1 LR             FAILED
> > (ovn.at:2529)
> >       7 2186: ovn -- 2 HVs, 3 LS, 1 lport/LS, 2 peer LRs, static
> > routes FAILED (ovn.at:2819)
> >       1 2187: ovn -- send gratuitous arp on localnet          FAILED
> > (ovn.at:2874)
> >      16 2188: ovn -- 2 HVs, 3 LRs connected via LS, static routes
> > FAILED (ovn.at:3053)
> >
> > Any ideas?
> >
> > Thanks,
> >
> >     Lance

> As author of that patch, I will admit that those numbers are a
> bit disturbing, because they aren't consistent with what I was
> seeing while developing and testing the patch series.
>
> What they make me suspect is that that patches doesn't catch all
> state transitions (similar to what you uncovered with commit
> f94705d729459d808fd139c8f95d5f1f8d8becc6) correctly.
>
> Two things come to mind:
> 1) Make sure that all of the places where the code needs to request
>    a full process of tables are correctly handled.
> 2) If a later step in the process finds that an earlier step in
>    the process needs to process the database rows fully during the
>    next cycle, use poll_immediate_wake so that processing happens
>    sooner than later.
> Ryan,
>  Were you planning to look at the failures? Should we revert the patch?
>
>

Guru-

Yes, I have been looking at the failures since Wed and I have a patch set
that
addresses all of these failures.  However, I'm travelling today, so I won't
be able
to mail it until either late tonight or tomorrow morning (US Central Time).

Ryan



More information about the dev mailing list