[ovs-dev] [PATCH] ovn-controller: process lport bindings only when transaction is possible

Lance Richardson lrichard at redhat.com
Sat Jul 2 20:27:50 UTC 2016


----- Original Message -----
> From: "Lance Richardson" <lrichard at redhat.com>
> To: "Ben Pfaff" <blp at ovn.org>
> Cc: dev at openvswitch.org
> Sent: Saturday, July 2, 2016 4:11:14 PM
> Subject: Re: [ovs-dev] [PATCH] ovn-controller: process lport bindings only when transaction is possible
> 
> 
> 
> ----- Original Message -----
> > From: "Ben Pfaff" <blp at ovn.org>
> > To: "Lance Richardson" <lrichard at redhat.com>
> > Cc: dev at openvswitch.org
> > Sent: Saturday, July 2, 2016 3:23:35 PM
> > Subject: Re: [ovs-dev] [PATCH] ovn-controller: process lport bindings only
> > when transaction is possible
> > 
> > On Sat, Jul 02, 2016 at 01:56:21PM -0400, Lance Richardson wrote:
> > > As currently implemented, binding_run() normally updates the set of
> > > locally owned logical ports on each call.  When changes to the
> > > membership of this set are detected (i.e. when locally bound
> > > logical ports are added or deleted), additional processing to
> > > update the sb database with lport binding is performed.
> > > 
> > > However, the sb database can only be updated when a transaction to
> > > the sb database is possible (that is, when ctx->ovnsb_idl_txn is
> > > non-NULL).  If a new logical port is detected  while ctx->ovnsb_idl_txn
> > > happens to be NULL, its binding information will not be updated in
> > > the the sb database until another change to the set of locally-owned
> > > logical ports changes. If no such change ever occurs, the sb database
> > > is never updated with the appropriate binding information.
> > > 
> > > Eliminate this issue by only updating the set of locally owned logical
> > > ports when an sb database transaction is possible. This addresses
> > > a cause of occasional failures in the "3 HVs, 3 LS, 3 lports/LS, 1 LR"
> > > test case.
> > > 
> > > Signed-off-by: Lance Richardson <lrichard at redhat.com>
> > > ---
> > >  ovn/controller/binding.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
> > > index 8b439a6..369f8f2 100644
> > > --- a/ovn/controller/binding.c
> > > +++ b/ovn/controller/binding.c
> > > @@ -266,7 +266,7 @@ binding_run(struct controller_ctx *ctx, const struct
> > > ovsrec_bridge *br_int,
> > >      }
> > >  
> > >      if (br_int) {
> > > -        if (get_local_iface_ids(br_int, &lports)) {
> > > +        if (ctx->ovnsb_idl_txn && get_local_iface_ids(br_int, &lports))
> > > {
> > >              process_full_binding = true;
> > >          }
> > >      } else {
> > 
> > At a glance, get_local_iface_ids() doesn't seem to directly modify the
> > southbound database, but it does modify data structures that later code
> > depends on.  Based on that, it's not obvious for me to see that this is
> > a correct change.  Can you point out anything relevant?
> > 
> > Thanks,
> > 
> > Ben.
> > 
> 
> Hi Ben,
> 
> The failing scenario goes like this:
>    1) Test case logical network setup is complete.
>    2) The last physical network port is added via
>       as hv3 ovs-vsctl --add-port ... --set Interface vif333
>       external-ids:iface-id=lp333
>    3) hv3 ovn-controller receives update from hv3 ovsdb-server with above
>    mapping,
>       binding_run() is called, and ctx->ovnsb_idl_txn happens to be NULL.
>    4) binding_run() calls get_local_iface_ids(), which recognizes the new
>       local port as matching a logical port, so the lp333 is added to the
>       global ssets "lports" and "all_lports".  This means lp333 will not be
>       treated
>       as a new logical port on subsequent calls. Because getLocal_iface_ids()
>       has discovered a new lport, it returns changed = true.
>    5) Because get_local_iface_ids() returned true, binding_run() sets
>    process_full_binding
>       to true.
>    6) Because process_full_binding is true, binding_run() calls
>    consider_local_datapath()
>       for each logical port in shash_lports (which now includes lp333).
>    7) consider_local_datapath() processing returns without calling
>       sbrec_port_binding_set_chassis() because ctx->ovnsb_idl_txn is NULL.
>    8) There are subsequent calls to binding_run() with non-NULL
>    ctx->ovnsb_idl,
>       but because lp333 is already in the "lports" sset,
>       get_local_iface_ids()
>       returns changed=false, so process_full_binding is false, which means
>       consider_local_datapath() is not called for lp333.
>    9) Because consider_local_datapath() is not called for lp333, the sb
>    database
>       is not updated with the lport/chassis binding.
> 
> Hopefully the above is intelligible. Another way of looking at it would be
> to say the condition for calling consider_local_datapath() is an "edge
> trigger",
> this change suppresses the trigger until the necessary actions can be
> performed.
> 
> One thing I have a doubt about is whether there's any guarantee that, after a
> call
> with ctx->ovnsb_idl_txn==NULL, there will be another call with non-NULL
> ctx->ovnsb_idl_txn,
> but I suspect that other things would not work without this being the case.
> 
>    Lance

BTW, I've noticed in the change history that binding_run() used to return
immediately when called with ctx->ovnsb_idl_txn==NULL, but this was later
changed.  I don't have a good understanding of the issues related to that
change, but I could definitely be missing something. Also, before incremental
update handling was added this failure scenario could not have occurred,
so it would be good if the folks working on that notice this thread and weigh in.

   Lance



More information about the dev mailing list