[ovs-dev] [PATCH v2] ovn: Make it possible for CMS to detect when the OVN system is up-to-date.

Ben Pfaff blp at ovn.org
Tue Jul 19 19:45:08 UTC 2016


On Tue, Jul 19, 2016 at 10:06:09AM -0400, Russell Bryant wrote:
> On Mon, Jul 18, 2016 at 2:30 PM, Ben Pfaff <blp at ovn.org> wrote:
> 
> > Until now, there has been no reliable for the CMS (or ovn-nbctl, or
> > anything else) to detect when changes made to the northbound configuration
> > have been passed through to the southbound database or to the hypervisors.
> > This commit adds this feature to the system, by adding sequence numbers
> > to the northbound and southbound databases and adding code in ovn-nbctl,
> > ovn-northd, and ovn-controller to keep those sequence numbers up-to-date.
> >
> > The biggest user-visible change from this commit is new a new option
> > --wait to ovn-nbctl.  With --wait=sb, ovn-nbctl now waits for ovn-northd
> > to update the southbound database; with --wait=hv, it waits for the
> > changes to make their way to Open vSwitch on every hypervisor.
> >
> > Signed-off-by: Ben Pfaff <blp at ovn.org>
> > ---
> > v1->v2: Rebase to fix up database version number.
> >
> 
> Cool feature.  :-)
> 
> Thanks a lot for the detailed documentation.  That made it easy to
> understand how this is to be used.
> 
> I was trying to decide if the OpenStack Neutron plugin would make use of
> this.  We currently use the 'up' column of Logical_Switch_Port.  As you
> point out in docs, "up" means something different, as it signals the
> successful binding of a port to a chassis.  I think that's really what we
> want and would continue using.
> 
> A downside I thought of to using this from something like Neutron is that
> it would block the whole cloud any time there's any sort of problem with
> any hypervisor, which I don't think is desirable.
> 
> The feature still sounds very useful for testing and debugging, at a
> minimum.
> 
> If this all makes sense, then I can proceed with a more detailed review of
> the implementation.
> 
> Thanks,

That makes sense to me.

I don't envision a CMS using this on a per-transaction basis.  As you
say, it's mostly for testing and debugging.  I can see it being useful
in a couple of ways:

        * First, with some elaboration, a technique like this could be
          used to determine that finer-grained components are up to
          date.  For example, a logical switch is physically distributed
          across a certain set of hypervisors; we could add an hv_cfg
          column to the Logical_Switch table to indicate how up-to-date
          the logical switch is (and have ovn-northd populate it).

        * Second, it may be useful for monitoring.  If the system is
          taking a long time to get up-to-date, then it's a signal to
          look closer.

> +        /* Track the flow update. */
> > +        struct ofctrl_flow_update *fup, *prev;
> > +        LIST_FOR_EACH_REVERSE_SAFE (fup, prev, list_node, &flow_updates) {
> > +            if (nb_cfg < fup->nb_cfg) {
> > +                /* This ofctrl_flow_update is for a configuration later
> > than
> > +                 * 'nb_cfg'.  This should not normally happen, because it
> > means
> > +                 * that 'nb_cfg' in the SB_Global table of the southbound
> > +                 * database decreased, and it should normally be
> > monotonically
> > +                 * increasing. */
> >
> 
> Would this also get hit if nb_cfg overflows?
> 
> That would sure be a *lot* of transactions, though ...

By my calculations, if we increment nb_cfg a million times a second, it
would almost 300,000 years to overflow back to -2**63.  By then, it's
likely that our users will have moved on to something new.



More information about the dev mailing list