[ovs-dev] [PATCH v2] ovn-controller: eliminate stall in ofctrl state machine

Ryan Moats rmoats at us.ibm.com
Mon Jul 11 17:44:18 UTC 2016


Lance Richardson <lrichard at redhat.com> wrote on 07/09/2016 11:29:08 AM:

> From: Lance Richardson <lrichard at redhat.com>
> To: dev at openvswitch.org
> Cc: Justin Pettit <jpettit at ovn.org>, Ryan Moats/Omaha/IBM at IBMUS
> Date: 07/09/2016 11:29 AM
> Subject: Re: [ovs-dev] [PATCH v2] ovn-controller: eliminate stall in
> ofctrl state machine
>
> + Ryan Moats
> ----- Original Message -----
> > From: "Lance Richardson" <lrichard at redhat.com>
> > To: dev at openvswitch.org
> > Cc: "Justin Pettit" <jpettit at ovn.org>
> > Sent: Thursday, July 7, 2016 8:33:57 PM
> > Subject: Re: [ovs-dev] [PATCH v2] ovn-controller: eliminate stall
> in ofctrl   state machine
> >
> > Oops, had intended to cc: Justin.
> > ----- Original Message -----
> > > From: "Lance Richardson" <lrichard at redhat.com>
> > > To: dev at openvswitch.org
> > > Sent: Thursday, July 7, 2016 8:31:08 PM
> > > Subject: [ovs-dev] [PATCH v2] ovn-controller: eliminate stall in
ofctrl
> > >    state machine
> > >
> > > The "ovn -- 2 HVs, 3 LRs connected via LS, static routes"
> > > test case currently exhibits frequent failures. These failures
> > > occur because, at the time that the test packets are sent to
> > > verify forwarding, no flows have been installed in the vswitch
> > > for one of the hypervisors.
> > >
> > > Investigation shows that, in the failing case, the ofctrl state
> > > machine has not yet transitioned to the S_UPDATE_FLOWS state.
> > > This occurrs when ofctrl_run() is called and:
> > >    1) The state is S_TLV_TABLE_MOD_SENT.
> > >    2) An OFPTYPE_NXT_TLV_TABLE_REPLY message is queued for reception.
> > >    3) No event (other than SB probe timer expiration) is expected
> > >       that would unblock poll_block() in the main ovn-controller
> > >       loop.
> > >
> > > In this scenario, ofctrl_run() will move state to S_CLEAR_FLOWS
> > > and return, without having executed run_S_CLEAR_FLOWS() which
> > > would have immediately transitioned the state to S_UPDATE_FLOWS
> > > which is needed in order for ovn-controller to configure flows
> > > in ovs-vswitchd. After a delay of about 5 seconds (the default
> > > SB probe timer interval), ofctrl_run() would be called again
> > > to make the transition to S_UPDATE_FLOWS, but by this time
> > > the test case has already failed.
> > >
> > > Fix by expanding the state machine's "while state != old_state"
> > > loop to include processing of receive events, with a maximum
> > > iteration limit to prevent excessive looping in pathological
> > > cases. Without this fix, around 40 failures are seen out of
> > > 100 attempts, with this fix no failures have been observed in
> > > several hundred attempts.
> > >
> > > Signed-off-by: Lance Richardson <lrichard at redhat.com>

I've run this through the same testing that I did with v1 (i.e.
rally to create 10 ports on each of 15 networks on a four-node
devstack cloud running tip of tree master everywhere):

- When I only have 3 projects (tenants), I get a statistically
  significant 6% reduction in the port_create API time measured
  by rally.

- When I have 15 projects, then the reduction in API time is
  almost (but not quite) statistically significant.

I consider the first result to be a good enough reason for an
ack and tested by, because I view it as helping under scale
(for OpenStack, the port density per project/tenant is a key
driver, and so patches that help when that is higher are good
things).

Acked-by: Ryan Moats <rmoats at us.ibm.com>
Tested-by: Ryan Moats <rmoats at us.ibm.com>


Having said that



More information about the dev mailing list