[ovs-dev] [RFC] Idea for fixing "raceful" E2E ovn tests

Ben Pfaff blp at ovn.org
Fri Apr 22 20:06:16 UTC 2016


On Thu, Apr 14, 2016 at 08:37:27AM -0500, Ryan Moats wrote:
> I've pretty much become fed up with the "raceful" nature of the E2E
> ovn test cases and so I've set my self the goal of fixing them.
> 
> After some thought last night, I *think* I might have found a way
> to do it.  Now, since I'm not 100% that my idea is the cleanest way
> to fix things, I thought I'd throw the idea out on the table first
> and get some feedback before I go off and write code.
> 
> I'm thinking of the following changes:
> 
> in ovn/controller/ofctrl.c, change ofctrl_put to return a boolean
> (true if a flow is queue, false otherwise).
> 
> in ovn/controller/ovn-controller.c:
> 1. add a command line argument (--unit-test).
> 2. if ofctrl_put return true and the above command line argument
>    is specified, dump a line to the log file saying that ovn-controller
>    hasn't made any OF changes in the last loop
> 
> in the ovn E2E tests cases:
> 1. start ovn-controller with the --unit-test argument
> 2. instead of sleep 1 for waiting for ovn-northd/ovn-controller to
>    quiesce, look at the tail of the ovn-controller logs for two
>    consecutive loops where ovn-controller hasn't made any OF changes
>    in the last loop

I've had a different idea for how to do this for a while now.  Let me
see...

You might not be familiar with how ovs-vsctl waits until the changes
that it makes to the Open vSwitch configuration take effect before
exiting.  It uses a sequence number protocol in the Open_vSwitch
database table.  This table (which has exactly one row) has two integer
columns, named cur_cfg and next_cfg.  Initially cur_cfg and next_cfg are
both zero.  When ovs-vsctl modifies the configuration, it also
atomically increments next_cfg.  Before ovs-vswitchd reconfigures
itself, it reads next_cfg, then when it finishes it sets cur_cfg to the
next_cfg value that it read.  ovs-vsctl waits until cur_cfg is greater
than or equal to the next_cfg that it set before it exits.

This has a number of nice properties.  It has low overhead in time and
space, it works efficiently with any number of writers, and later on it
is easy to observe what happened and when by looking at the database
log.

My thought was to extend this concept to the OVN distributed system.
For example:

        * Add a next_cfg value somewhere in the OVN_Northbound
          database.  When ovn-nbctl or something else updates the
          database and wants to allow for finding out where the updates
          are propagated, it increments next_cfg.

        * Add a similar next_cfg somewhere in OVN_Southbound, and make
          ovn-northd propagate the value from northbound to southbound.

        * Add a cur_cfg column to the Chassis table in OVN_Southbound.
          When an ovn-controller brings its chassis up to date with a
          given configuration, it stores the next_cfg value that it just
          got up-to-date with into its own cur_cfg.

        * In theory ovn-northd could propagate the current minimum value
          of cur_cfg back to the northbound database, if that's useful.
          In a real system with constant failures though it's hard for
          me to guess whether it's realistic.

This could even be used such that ovn-nbctl waits for the whole
distributed system to catch up before it exits.

What are your thoughts?



More information about the dev mailing list