[ovs-dev] [PATCH] ovn-controller: process lport bindings only when transaction is possible

Ryan Moats rmoats at us.ibm.com
Sun Jul 3 19:24:10 UTC 2016




"dev" <dev-bounces at openvswitch.org> wrote on 07/03/2016 02:18:51 PM:

> From: Ben Pfaff <blp at ovn.org>
> To: Lance Richardson <lrichard at redhat.com>
> Cc: dev at openvswitch.org
> Date: 07/03/2016 02:19 PM
> Subject: Re: [ovs-dev] [PATCH] ovn-controller: process lport
> bindings only when transaction is possible
> Sent by: "dev" <dev-bounces at openvswitch.org>
>
> On Sat, Jul 02, 2016 at 04:11:14PM -0400, Lance Richardson wrote:
> >
> >
> > ----- Original Message -----
> > > From: "Ben Pfaff" <blp at ovn.org>
> > > To: "Lance Richardson" <lrichard at redhat.com>
> > > Cc: dev at openvswitch.org
> > > Sent: Saturday, July 2, 2016 3:23:35 PM
> > > Subject: Re: [ovs-dev] [PATCH] ovn-controller: process lport
> bindings only when transaction is possible
> > >
> > > On Sat, Jul 02, 2016 at 01:56:21PM -0400, Lance Richardson wrote:
> > > > As currently implemented, binding_run() normally updates the set of
> > > > locally owned logical ports on each call.  When changes to the
> > > > membership of this set are detected (i.e. when locally bound
> > > > logical ports are added or deleted), additional processing to
> > > > update the sb database with lport binding is performed.
> > > >
> > > > However, the sb database can only be updated when a transaction to
> > > > the sb database is possible (that is, when ctx->ovnsb_idl_txn is
> > > > non-NULL).  If a new logical port is detected  while ctx->
ovnsb_idl_txn
> > > > happens to be NULL, its binding information will not be updated in
> > > > the the sb database until another change to the set of
locally-owned
> > > > logical ports changes. If no such change ever occurs, the sb
database
> > > > is never updated with the appropriate binding information.
> > > >
> > > > Eliminate this issue by only updating the set of locally owned
logical
> > > > ports when an sb database transaction is possible. This addresses
> > > > a cause of occasional failures in the "3 HVs, 3 LS, 3 lports/LS, 1
LR"
> > > > test case.
> > > >
> > > > Signed-off-by: Lance Richardson <lrichard at redhat.com>
> > > > ---
> > > >  ovn/controller/binding.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
> > > > index 8b439a6..369f8f2 100644
> > > > --- a/ovn/controller/binding.c
> > > > +++ b/ovn/controller/binding.c
> > > > @@ -266,7 +266,7 @@ binding_run(struct controller_ctx *ctx, const
struct
> > > > ovsrec_bridge *br_int,
> > > >      }
> > > >
> > > >      if (br_int) {
> > > > -        if (get_local_iface_ids(br_int, &lports)) {
> > > > +        if (ctx->ovnsb_idl_txn && get_local_iface_ids(br_int,
> &lports)) {
> > > >              process_full_binding = true;
> > > >          }
> > > >      } else {
> > >
> > > At a glance, get_local_iface_ids() doesn't seem to directly modify
the
> > > southbound database, but it does modify data structures that later
code
> > > depends on.  Based on that, it's not obvious for me to see that this
is
> > > a correct change.  Can you point out anything relevant?
> > >
> > > Thanks,
> > >
> > > Ben.
> > >
> >
> > Hi Ben,
> >
> > The failing scenario goes like this:
> >    1) Test case logical network setup is complete.
> >    2) The last physical network port is added via
> >       as hv3 ovs-vsctl --add-port ... --set Interface vif333
> external-ids:iface-id=lp333
> >    3) hv3 ovn-controller receives update from hv3 ovsdb-server
> with above mapping,
> >       binding_run() is called, and ctx->ovnsb_idl_txn happens to be
NULL.
> >    4) binding_run() calls get_local_iface_ids(), which recognizes the
new
> >       local port as matching a logical port, so the lp333 is added to
the
> >       global ssets "lports" and "all_lports".  This means lp333
> will not be treated
> >       as a new logical port on subsequent calls. Because
> getLocal_iface_ids()
> >       has discovered a new lport, it returns changed = true.
> >    5) Because get_local_iface_ids() returned true, binding_run()
> sets process_full_binding
> >       to true.
> >    6) Because process_full_binding is true, binding_run() calls
> consider_local_datapath()
> >       for each logical port in shash_lports (which now includes lp333).
> >    7) consider_local_datapath() processing returns without calling
> >       sbrec_port_binding_set_chassis() because ctx->ovnsb_idl_txn is
NULL.
> >    8) There are subsequent calls to binding_run() with non-NULL
> ctx->ovnsb_idl,
> >       but because lp333 is already in the "lports" sset,
> get_local_iface_ids()
> >       returns changed=false, so process_full_binding is false, which
means
> >       consider_local_datapath() is not called for lp333.
> >    9) Because consider_local_datapath() is not called for lp333,
> the sb database
> >       is not updated with the lport/chassis binding.
> >
> > Hopefully the above is intelligible. Another way of looking at it would
be
> > to say the condition for calling consider_local_datapath() is an
> "edge trigger",
> > this change suppresses the trigger until the necessary actions can
> be performed.
>
> That is an amazingly detailed rationale.  I appended all of it to the
> commit message and applied this to master.
>
> > One thing I have a doubt about is whether there's any guarantee
> that, after a call
> > with ctx->ovnsb_idl_txn==NULL, there will be another call with
> non-NULL ctx->ovnsb_idl_txn,
> > but I suspect that other things would not work without this being the
case.
>
> This should be guaranteed.

I admit to being late to the party on this one, but I would have acked it
as being a definite error case had I gotten there in time.

Ryan



More information about the dev mailing list