[ovs-dev] [PATCH v2] ovn-controller: eliminate stall in ofctrl state machine

Lance Richardson lrichard at redhat.com
Tue Jul 12 02:10:18 UTC 2016



----- Original Message -----
> From: "Ryan Moats" <rmoats at us.ibm.com>
> To: "Lance Richardson" <lrichard at redhat.com>
> Cc: dev at openvswitch.org, "Justin Pettit" <jpettit at ovn.org>
> Sent: Monday, July 11, 2016 1:44:18 PM
> Subject: Re: [ovs-dev] [PATCH v2] ovn-controller: eliminate stall in ofctrl	state machine
> 
> Lance Richardson <lrichard at redhat.com> wrote on 07/09/2016 11:29:08 AM:
> 
> > From: Lance Richardson <lrichard at redhat.com>
> > To: dev at openvswitch.org
> > Cc: Justin Pettit <jpettit at ovn.org>, Ryan Moats/Omaha/IBM at IBMUS
> > Date: 07/09/2016 11:29 AM
> > Subject: Re: [ovs-dev] [PATCH v2] ovn-controller: eliminate stall in
> > ofctrl state machine
> >
> > + Ryan Moats
> > ----- Original Message -----
> > > From: "Lance Richardson" <lrichard at redhat.com>
> > > To: dev at openvswitch.org
> > > Cc: "Justin Pettit" <jpettit at ovn.org>
> > > Sent: Thursday, July 7, 2016 8:33:57 PM
> > > Subject: Re: [ovs-dev] [PATCH v2] ovn-controller: eliminate stall
> > in ofctrl   state machine
> > >
> > > Oops, had intended to cc: Justin.
> > > ----- Original Message -----
> > > > From: "Lance Richardson" <lrichard at redhat.com>
> > > > To: dev at openvswitch.org
> > > > Sent: Thursday, July 7, 2016 8:31:08 PM
> > > > Subject: [ovs-dev] [PATCH v2] ovn-controller: eliminate stall in
> ofctrl
> > > >    state machine
> > > >
> > > > The "ovn -- 2 HVs, 3 LRs connected via LS, static routes"
> > > > test case currently exhibits frequent failures. These failures
> > > > occur because, at the time that the test packets are sent to
> > > > verify forwarding, no flows have been installed in the vswitch
> > > > for one of the hypervisors.
> > > >
> > > > Investigation shows that, in the failing case, the ofctrl state
> > > > machine has not yet transitioned to the S_UPDATE_FLOWS state.
> > > > This occurrs when ofctrl_run() is called and:
> > > >    1) The state is S_TLV_TABLE_MOD_SENT.
> > > >    2) An OFPTYPE_NXT_TLV_TABLE_REPLY message is queued for reception.
> > > >    3) No event (other than SB probe timer expiration) is expected
> > > >       that would unblock poll_block() in the main ovn-controller
> > > >       loop.
> > > >
> > > > In this scenario, ofctrl_run() will move state to S_CLEAR_FLOWS
> > > > and return, without having executed run_S_CLEAR_FLOWS() which
> > > > would have immediately transitioned the state to S_UPDATE_FLOWS
> > > > which is needed in order for ovn-controller to configure flows
> > > > in ovs-vswitchd. After a delay of about 5 seconds (the default
> > > > SB probe timer interval), ofctrl_run() would be called again
> > > > to make the transition to S_UPDATE_FLOWS, but by this time
> > > > the test case has already failed.
> > > >
> > > > Fix by expanding the state machine's "while state != old_state"
> > > > loop to include processing of receive events, with a maximum
> > > > iteration limit to prevent excessive looping in pathological
> > > > cases. Without this fix, around 40 failures are seen out of
> > > > 100 attempts, with this fix no failures have been observed in
> > > > several hundred attempts.
> > > >
> > > > Signed-off-by: Lance Richardson <lrichard at redhat.com>
> 
> I've run this through the same testing that I did with v1 (i.e.
> rally to create 10 ports on each of 15 networks on a four-node
> devstack cloud running tip of tree master everywhere):
> 
> - When I only have 3 projects (tenants), I get a statistically
>   significant 6% reduction in the port_create API time measured
>   by rally.
> 
> - When I have 15 projects, then the reduction in API time is
>   almost (but not quite) statistically significant.
> 
> I consider the first result to be a good enough reason for an
> ack and tested by, because I view it as helping under scale
> (for OpenStack, the port density per project/tenant is a key
> driver, and so patches that help when that is higher are good
> things).
> 
> Acked-by: Ryan Moats <rmoats at us.ibm.com>
> Tested-by: Ryan Moats <rmoats at us.ibm.com>
> 
> 
> Having said that
> 

Thanks for reviewing and testing (again), Ryan.

Justin, did you have any thoughts on this second approach?

Thanks,

   Lance




More information about the dev mailing list