[ovs-dev] ovn test failures

Lance Richardson lrichard at redhat.com
Thu Jul 21 17:32:14 UTC 2016


----- Original Message -----
> From: "Ryan Moats" <rmoats at us.ibm.com>
> To: "Lance Richardson" <lrichard at redhat.com>
> Cc: "ovs dev" <dev at openvswitch.org>
> Sent: Thursday, July 21, 2016 9:05:53 AM
> Subject: Re: [ovs-dev] ovn test failures
> 
> "dev" <dev-bounces at openvswitch.org> wrote on 07/21/2016 06:32:02 AM:
> 
> > From: Lance Richardson <lrichard at redhat.com>
> > To: ovs dev <dev at openvswitch.org>
> > Date: 07/21/2016 06:32 AM
> > Subject: [ovs-dev] ovn test failures
> > Sent by: "dev" <dev-bounces at openvswitch.org>
> >
> > It seems the failure rate for OVN end-to-end tests went up significantly
> > when commit 70c7cfef188b5ae9940abd5b7d9fe46b1fa88c8e was merged earlier
> > this week.
> >
> > After this commit, 100 iterations of "make check TESTSUITEFLAGs='-j8 -k
> ovn'"
> > gave (number of failures in left-most column):
> >       2 2179: ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS       FAILED
> > (ovn.at:1312)
> >      10 2183: ovn -- 2 HVs, 2 LS, 1 lport/LS, 2 peer LRs      FAILED
> > (ovn.at:2416)
> >      52 2184: ovn -- 1 HV, 1 LS, 2 lport/LS, 1 LR             FAILED
> > (ovn.at:2529)
> >      45 2185: ovn -- 1 HV, 2 LSs, 1 lport/LS, 1 LR            FAILED
> > (ovn.at:2668)
> >      23 2186: ovn -- 2 HVs, 3 LS, 1 lport/LS, 2 peer LRs, static
> > routes FAILED (ovn.at:2819)
> >      53 2188: ovn -- 2 HVs, 3 LRs connected via LS, static routes
> > FAILED (ovn.at:3053)
> >      32 2189: ovn -- 2 HVs, 2 LRs connected via LS, gateway router
> > FAILED (ovn.at:3237)
> >      50 2190: ovn -- icmp_reply: 1 HVs, 2 LSs, 1 lport/LS, 1 LR
> > FAILED (ovn.at:3389)
> >
> > Immediately prior to this (at commit
> > 48ff3e25abe31b761d2d3f3a2fd6ccaa783c79dc),
> > the number of failures per 100 iterations was much lower:
> >       1 2178: ovn -- 2 HVs, 4 lports/HV, localnet ports       FAILED
> > (ovn.at:1020)
> >       1 2179: ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS       FAILED
> > (ovn.at:1307)
> >       1 2179: ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS       FAILED
> > (ovn.at:1312)
> >       9 2184: ovn -- 1 HV, 1 LS, 2 lport/LS, 1 LR             FAILED
> > (ovn.at:2529)
> >       7 2186: ovn -- 2 HVs, 3 LS, 1 lport/LS, 2 peer LRs, static
> > routes FAILED (ovn.at:2819)
> >       1 2187: ovn -- send gratuitous arp on localnet          FAILED
> > (ovn.at:2874)
> >      16 2188: ovn -- 2 HVs, 3 LRs connected via LS, static routes
> > FAILED (ovn.at:3053)
> >
> > Any ideas?
> >
> > Thanks,
> >
> >     Lance
> 
> As author of that patch, I will admit that those numbers are a
> bit disturbing, because they aren't consistent with what I was
> seeing while developing and testing the patch series.
> 
> What they make me suspect is that that patches doesn't catch all
> state transitions (similar to what you uncovered with commit
> f94705d729459d808fd139c8f95d5f1f8d8becc6) correctly.
> 
> Two things come to mind:
> 1) Make sure that all of the places where the code needs to request
>    a full process of tables are correctly handled.
> 2) If a later step in the process finds that an earlier step in
>    the process needs to process the database rows fully during the
>    next cycle, use poll_immediate_wake so that processing happens
>    sooner than later.
> 
> Ryan
> 

Thanks Ryan. FWIW, here's (approximately) how I generated those numbers:

rm -f check.out
for i in {1..100} ; do
    make check TESTSUITEFLAGS="-j8 -k ovn" 2>&1 >> check.out
done
grep 'FAILED (' check.out | sort | uniq -c



More information about the dev mailing list