[ovs-dev] [ovs-dev,v2,2/4] ovn-controller: add quiet mode

Ryan Moats rmoats at us.ibm.com
Thu Oct 6 01:58:23 UTC 2016



"dev" <dev-bounces at openvswitch.org> wrote on 10/05/2016 01:19:34 PM:

> From: Ryan Moats/Omaha/IBM at IBMUS
> To: Ben Pfaff <blp at ovn.org>
> Cc: dev at openvswitch.org
> Date: 10/05/2016 01:20 PM
> Subject: Re: [ovs-dev] [ovs-dev,v2,2/4] ovn-controller: add quiet mode
> Sent by: "dev" <dev-bounces at openvswitch.org>
>
> Ben Pfaff <blp at ovn.org> wrote on 10/05/2016 01:04:36 PM:
>
> > From: Ben Pfaff <blp at ovn.org>
> > To: Ryan Moats/Omaha/IBM at IBMUS
> > Cc: dev at openvswitch.org
> > Date: 10/05/2016 01:04 PM
> > Subject: Re: [ovs-dev,v2,2/4] ovn-controller: add quiet mode
> >
> > On Wed, Oct 05, 2016 at 12:52:57PM -0500, Ryan Moats wrote:
> > > Ben Pfaff <blp at ovn.org> wrote on 10/05/2016 12:37:26 PM:
> > >
> > > > From: Ben Pfaff <blp at ovn.org>
> > > > To: Ryan Moats/Omaha/IBM at IBMUS
> > > > Cc: dev at openvswitch.org
> > > > Date: 10/05/2016 12:37 PM
> > > > Subject: Re: [ovs-dev,v2,2/4] ovn-controller: add quiet mode
> > > >
> > > > On Tue, Oct 04, 2016 at 05:11:37PM -0500, Ryan Moats wrote:
> > > > > Ben Pfaff <blp at ovn.org> wrote on 10/04/2016 12:14:32 PM:
> > > > >
> > > > > > From: Ben Pfaff <blp at ovn.org>
> > > > > > To: Ryan Moats/Omaha/IBM at IBMUS
> > > > > > Cc: dev at openvswitch.org
> > > > > > Date: 10/04/2016 12:14 PM
> > > > > > Subject: Re: [ovs-dev,v2,2/4] ovn-controller: add quiet mode
> > > > > >
> > > > > > On Wed, Aug 31, 2016 at 03:22:44PM +0000, Ryan Moats wrote:
> > > > > > > As discussed in [1], what the incremental processing code
> > > > > > > actually accomplished was that the ovn-controller would
> > > > > > > be "quiet" and not burn CPU when things weren't changing.
> > > > > > > This patch set recreates this state by calculating whether
> > > > > > > changes have occured that would require a full calculation
> > > > > > > to be performed.  It does this by persisting a copy of
> > > > > > > the localvif_to_ofport and tunnel information in the
> > > > > > > controller module, rather than in the physical.c module
> > > > > > > as was the case with previous commits.
> > > > > > >
> > > > > > > [1]
> http://openvswitch.org/pipermail/dev/2016-August/078272.html
> > > > > > >
> > > > > > > Signed-off-by: Ryan Moats <rmoats at us.ibm.com>
> > > > > >
> > > > > > Hi Ryan.
> > > > > >
> > > > > > I like the idea behind this patch.  However, it no longer
applies
> to
> > > > > > master, so it needs a rebase.
> > > > >
> > > > > So done, but before submitting a new patch....
> > > > >
> > > > > >
> > > > > > It also seems like this TODO should be addressed:
> > > > > > +        /* TODO (regXboi): this next line is needed for the 3
> HVs, 3
> > > LS,
> > > > > > +         * 3 lports/LS, 1 LR test case, but has the potential
> side
> > > > > effect
> > > > > > +         * of defeating quiet mode once a logical router leads
> to
> > > > > creating
> > > > > > +         * patch ports. Need to understand the failure mode
> better
> > > and
> > > > > > +         * what is needed to remove this. */
> > > > > > +        force_full_process();
> > > > >
> > > > > I've been looking at what happens here and I'm seeing some
> signatures
> > > > > that concern me.  The test case that fails is no longer the cited
> one
> > > above
> > > > > but is "2 HVs, 4 lports/HV, localnet ports" ...
> > > > >
> > > > > What I'm seeing when I peer in is that the information populating
> the
> > > > > local_datapath structures doesn't appear to be consistent on each
> pass
> > > > > through binding_run: Looking at the ovn-controller process under
> hv2
> > > > > for the above test (when it passes), I'll see signatures that
look
> > > > > like the following:
> > > > >
> > > > > 2016-10-04T21:57:58.257Z|00020|physical|INFO|looking at binding
> record
> > > > > 3736404b-c69d-4878-8d45-81ad9be06f5f
> > > > > 2016-10-04T21:57:58.257Z|00021|physical|INFO|dp_key is 1
> > > > > 2016-10-04T21:57:58.257Z|00022|physical|INFO|looking for ofport
of
> lp11
> > > > > (LP)
> > > > > 2016-10-04T21:57:58.257Z|00023|physical|INFO|looking for ofport
of
> ln1
> > > > > (localnet)
> > > > > ...
> > > > > 2016-10-04T21:57:58.259Z|00034|physical|INFO|looking at binding
> record
> > > > > 3736404b-c69d-4878-8d45-81ad9be06f5f
> > > > > 2016-10-04T21:57:58.259Z|00035|physical|INFO|dp_key is 1
> > > > > 2016-10-04T21:57:58.259Z|00036|physical|INFO|looking for ofport
of
> lp11
> > > > > (LP)
> > > > > 2016-10-04T21:57:58.259Z|00037|physical|INFO|looking for tunnel
to
> hv1
> > > > > ...
> > > >
> > > > This sort of oscillation seems super-weird to me.  I'd also like to
> > > > learn more.
> > > >
> > >
> > > I found the problem and fixed it in the v3 patch set - some of the
> methods
> > > (patch_run for example) now include a test of ovs_idl_txn before they
> run
> > > and some don't (physical_run for example), so you can see guess what
> was
> > > happening...
> > >
> > > (If we had a txn to work with everything is fine, but without one,
> > > physical_run was trying to calculate flows based on local datapaths
> > > with no localnet ports being defined, which leads to the above
> > > oscillation.)
> > >
> > > I added a txn check to the logic for whether to call things or not in
> > > ovn-controller.c and I added a note in the email about thinking that
a
> > > follow-on patch to clean up all of the ovs_idl_txn gates might make
> sense
> > > (so that nobody trips on this in the future)
> >
> > It sounds like you found a bug that should be fixed independent of
> > "quiet mode".  Should it be two patches, one of which we backport to
> > branch-2.6?
>
> I'd say *yes* to two patches, and *yes* to backport to branch-2.6, but
> I'd not split what I've done as it is just a quick hack - I'd say let's

The patch for fixing this bug is available at

http://patchwork.ozlabs.org/patch/678692/

I'll rebase the v3 of quiet mode patch onto the above once it merges...

Ryan



More information about the dev mailing list