[ovs-dev] [ovs-dev,v2,2/4] ovn-controller: add quiet mode

Ryan Moats rmoats at us.ibm.com
Tue Oct 4 22:11:37 UTC 2016


Ben Pfaff <blp at ovn.org> wrote on 10/04/2016 12:14:32 PM:

> From: Ben Pfaff <blp at ovn.org>
> To: Ryan Moats/Omaha/IBM at IBMUS
> Cc: dev at openvswitch.org
> Date: 10/04/2016 12:14 PM
> Subject: Re: [ovs-dev,v2,2/4] ovn-controller: add quiet mode
>
> On Wed, Aug 31, 2016 at 03:22:44PM +0000, Ryan Moats wrote:
> > As discussed in [1], what the incremental processing code
> > actually accomplished was that the ovn-controller would
> > be "quiet" and not burn CPU when things weren't changing.
> > This patch set recreates this state by calculating whether
> > changes have occured that would require a full calculation
> > to be performed.  It does this by persisting a copy of
> > the localvif_to_ofport and tunnel information in the
> > controller module, rather than in the physical.c module
> > as was the case with previous commits.
> >
> > [1] http://openvswitch.org/pipermail/dev/2016-August/078272.html
> >
> > Signed-off-by: Ryan Moats <rmoats at us.ibm.com>
>
> Hi Ryan.
>
> I like the idea behind this patch.  However, it no longer applies to
> master, so it needs a rebase.

So done, but before submitting a new patch....

>
> It also seems like this TODO should be addressed:
> +        /* TODO (regXboi): this next line is needed for the 3 HVs, 3 LS,
> +         * 3 lports/LS, 1 LR test case, but has the potential side
effect
> +         * of defeating quiet mode once a logical router leads to
creating
> +         * patch ports. Need to understand the failure mode better and
> +         * what is needed to remove this. */
> +        force_full_process();

I've been looking at what happens here and I'm seeing some signatures
that concern me.  The test case that fails is no longer the cited one above
but is "2 HVs, 4 lports/HV, localnet ports" ...

What I'm seeing when I peer in is that the information populating the
local_datapath structures doesn't appear to be consistent on each pass
through binding_run: Looking at the ovn-controller process under hv2
for the above test (when it passes), I'll see signatures that look
like the following:

2016-10-04T21:57:58.257Z|00020|physical|INFO|looking at binding record
3736404b-c69d-4878-8d45-81ad9be06f5f
2016-10-04T21:57:58.257Z|00021|physical|INFO|dp_key is 1
2016-10-04T21:57:58.257Z|00022|physical|INFO|looking for ofport of lp11
(LP)
2016-10-04T21:57:58.257Z|00023|physical|INFO|looking for ofport of ln1
(localnet)
...
2016-10-04T21:57:58.259Z|00034|physical|INFO|looking at binding record
3736404b-c69d-4878-8d45-81ad9be06f5f
2016-10-04T21:57:58.259Z|00035|physical|INFO|dp_key is 1
2016-10-04T21:57:58.259Z|00036|physical|INFO|looking for ofport of lp11
(LP)
2016-10-04T21:57:58.259Z|00037|physical|INFO|looking for tunnel to hv1
...
2016-10-04T21:57:58.263Z|00054|physical|INFO|looking at binding record
3736404b-c69d-4878-8d45-81ad9be06f5f
2016-10-04T21:57:58.263Z|00055|physical|INFO|dp_key is 1
2016-10-04T21:57:58.263Z|00056|physical|INFO|looking for ofport of lp11
(LP)
2016-10-04T21:57:58.263Z|00057|physical|INFO|looking for ofport of ln1
(localnet)
...
2016-10-04T21:57:58.275Z|00115|physical|INFO|looking at binding record
3736404b-c69d-4878-8d45-81ad9be06f5f
2016-10-04T21:57:58.275Z|00116|physical|INFO|dp_key is 1
2016-10-04T21:57:58.275Z|00117|physical|INFO|looking for ofport of lp11
(LP)
2016-10-04T21:57:58.275Z|00118|physical|INFO|looking for tunnel to hv1
...
2016-10-04T21:57:58.281Z|00133|physical|INFO|looking at binding record
3736404b-c69d-4878-8d45-81ad9be06f5f
2016-10-04T21:57:58.281Z|00134|physical|INFO|dp_key is 1
2016-10-04T21:57:58.281Z|00135|physical|INFO|looking for ofport of lp11
(LP)
2016-10-04T21:57:58.281Z|00136|physical|INFO|looking for ofport of ln1
(localnet)
...
2016-10-04T21:57:58.296Z|00203|physical|INFO|looking at binding record
3736404b-c69d-4878-8d45-81ad9be06f5f
2016-10-04T21:57:58.296Z|00204|physical|INFO|dp_key is 1
2016-10-04T21:57:58.296Z|00205|physical|INFO|looking for ofport of lp11
(LP)
2016-10-04T21:57:58.296Z|00206|physical|INFO|looking for tunnel to hv1
...
2016-10-04T21:57:58.302Z|00241|physical|INFO|looking at binding record
3736404b-c69d-4878-8d45-81ad9be06f5f
2016-10-04T21:57:58.302Z|00242|physical|INFO|dp_key is 1
2016-10-04T21:57:58.302Z|00243|physical|INFO|looking for ofport of lp11
(LP)
2016-10-04T21:57:58.302Z|00244|physical|INFO|looking for ofport of ln1
(localnet)
...
2016-10-04T21:57:58.327Z|00307|physical|INFO|looking at binding record
3736404b-c69d-4878-8d45-81ad9be06f5f
2016-10-04T21:57:58.327Z|00308|physical|INFO|dp_key is 1
2016-10-04T21:57:58.328Z|00309|physical|INFO|looking for ofport of lp11
(LP)
2016-10-04T21:57:58.328Z|00310|physical|INFO|looking for tunnel to hv1
...
2016-10-04T21:57:58.341Z|00341|physical|INFO|looking at binding record
3736404b-c69d-4878-8d45-81ad9be06f5f
2016-10-04T21:57:58.341Z|00342|physical|INFO|dp_key is 1
2016-10-04T21:57:58.341Z|00343|physical|INFO|looking for ofport of lp11
(LP)
2016-10-04T21:57:58.341Z|00344|physical|INFO|looking for ofport of ln1
(localnet)
...
2016-10-04T21:57:58.452Z|00532|physical|INFO|looking at binding record
3736404b-c69d-4878-8d45-81ad9be06f5f
2016-10-04T21:57:58.452Z|00533|physical|INFO|dp_key is 1
2016-10-04T21:57:58.452Z|00534|physical|INFO|looking for ofport of lp11
(LP)
2016-10-04T21:57:58.452Z|00535|physical|INFO|looking for tunnel to hv1
...
2016-10-04T21:57:58.465Z|00624|physical|INFO|looking at binding record
3736404b-c69d-4878-8d45-81ad9be06f5f
2016-10-04T21:57:58.465Z|00625|physical|INFO|dp_key is 1
2016-10-04T21:57:58.465Z|00626|physical|INFO|looking for ofport of lp11
(LP)
2016-10-04T21:57:58.465Z|00627|physical|INFO|looking for ofport of ln1
(localnet)
...
2016-10-04T21:57:58.884Z|00710|physical|INFO|looking at binding record
3736404b-c69d-4878-8d45-81ad9be06f5f
2016-10-04T21:57:58.884Z|00711|physical|INFO|dp_key is 1
2016-10-04T21:57:58.884Z|00712|physical|INFO|looking for ofport of lp11
(LP)

The failure mode I'm seeing is that ovn-controller on hv2 is thinking that
ports on hv1 are reachable by a tunnel and not via the localnet port, so it
incorrectly programs tunnel rules into table 32...

My expectation (which could be incorrect) is that before the localnet port
is bound, this lookup would go with a tunnel, but once the localnet port
is found, then it would be consistently found...

Any insight on this? (I'm thinking this is bug in binding_run, but want
confirmation before I go down that rabbit hole)

Ryan




More information about the dev mailing list