[ovs-dev] [PATCH] ovn-controller: process lport bindings only when transaction is possible

Lance Richardson lrichard at redhat.com
Sat Jul 2 20:11:14 UTC 2016



----- Original Message -----
> From: "Ben Pfaff" <blp at ovn.org>
> To: "Lance Richardson" <lrichard at redhat.com>
> Cc: dev at openvswitch.org
> Sent: Saturday, July 2, 2016 3:23:35 PM
> Subject: Re: [ovs-dev] [PATCH] ovn-controller: process lport bindings only when transaction is possible
> 
> On Sat, Jul 02, 2016 at 01:56:21PM -0400, Lance Richardson wrote:
> > As currently implemented, binding_run() normally updates the set of
> > locally owned logical ports on each call.  When changes to the
> > membership of this set are detected (i.e. when locally bound
> > logical ports are added or deleted), additional processing to
> > update the sb database with lport binding is performed.
> > 
> > However, the sb database can only be updated when a transaction to
> > the sb database is possible (that is, when ctx->ovnsb_idl_txn is
> > non-NULL).  If a new logical port is detected  while ctx->ovnsb_idl_txn
> > happens to be NULL, its binding information will not be updated in
> > the the sb database until another change to the set of locally-owned
> > logical ports changes. If no such change ever occurs, the sb database
> > is never updated with the appropriate binding information.
> > 
> > Eliminate this issue by only updating the set of locally owned logical
> > ports when an sb database transaction is possible. This addresses
> > a cause of occasional failures in the "3 HVs, 3 LS, 3 lports/LS, 1 LR"
> > test case.
> > 
> > Signed-off-by: Lance Richardson <lrichard at redhat.com>
> > ---
> >  ovn/controller/binding.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
> > index 8b439a6..369f8f2 100644
> > --- a/ovn/controller/binding.c
> > +++ b/ovn/controller/binding.c
> > @@ -266,7 +266,7 @@ binding_run(struct controller_ctx *ctx, const struct
> > ovsrec_bridge *br_int,
> >      }
> >  
> >      if (br_int) {
> > -        if (get_local_iface_ids(br_int, &lports)) {
> > +        if (ctx->ovnsb_idl_txn && get_local_iface_ids(br_int, &lports)) {
> >              process_full_binding = true;
> >          }
> >      } else {
> 
> At a glance, get_local_iface_ids() doesn't seem to directly modify the
> southbound database, but it does modify data structures that later code
> depends on.  Based on that, it's not obvious for me to see that this is
> a correct change.  Can you point out anything relevant?
> 
> Thanks,
> 
> Ben.
> 

Hi Ben,

The failing scenario goes like this:
   1) Test case logical network setup is complete.
   2) The last physical network port is added via
      as hv3 ovs-vsctl --add-port ... --set Interface vif333 external-ids:iface-id=lp333
   3) hv3 ovn-controller receives update from hv3 ovsdb-server with above mapping,
      binding_run() is called, and ctx->ovnsb_idl_txn happens to be NULL.
   4) binding_run() calls get_local_iface_ids(), which recognizes the new
      local port as matching a logical port, so the lp333 is added to the
      global ssets "lports" and "all_lports".  This means lp333 will not be treated
      as a new logical port on subsequent calls. Because getLocal_iface_ids()
      has discovered a new lport, it returns changed = true.
   5) Because get_local_iface_ids() returned true, binding_run() sets process_full_binding
      to true.
   6) Because process_full_binding is true, binding_run() calls consider_local_datapath()
      for each logical port in shash_lports (which now includes lp333).
   7) consider_local_datapath() processing returns without calling
      sbrec_port_binding_set_chassis() because ctx->ovnsb_idl_txn is NULL.
   8) There are subsequent calls to binding_run() with non-NULL ctx->ovnsb_idl,
      but because lp333 is already in the "lports" sset, get_local_iface_ids()
      returns changed=false, so process_full_binding is false, which means
      consider_local_datapath() is not called for lp333.
   9) Because consider_local_datapath() is not called for lp333, the sb database
      is not updated with the lport/chassis binding.

Hopefully the above is intelligible. Another way of looking at it would be
to say the condition for calling consider_local_datapath() is an "edge trigger",
this change suppresses the trigger until the necessary actions can be performed.

One thing I have a doubt about is whether there's any guarantee that, after a call
with ctx->ovnsb_idl_txn==NULL, there will be another call with non-NULL ctx->ovnsb_idl_txn,
but I suspect that other things would not work without this being the case.

   Lance




More information about the dev mailing list