[ovs-dev] [PATCH] ovn pacemaker: Provide the option to configure inactivity probe value

Numan Siddique nusiddiq at redhat.com
Mon Oct 16 09:20:48 UTC 2017


On Sat, Oct 14, 2017 at 2:56 AM, Ben Pfaff <blp at ovn.org> wrote:

> On Fri, Oct 13, 2017 at 12:06:56PM -0400, Russell Bryant wrote:
> > On Fri, Oct 13, 2017 at 8:30 AM, Numan Siddique <nusiddiq at redhat.com>
> wrote:
> > > On Fri, Oct 13, 2017 at 6:05 AM, Andy Zhou <azhou at ovn.org> wrote:
> > >
> > >> Hi, Numan,
> > >>
> > >> I am curious why default 5 seconds inactivity time does not work? Do
> > >> you have more details?
> > >>
> > >> Does the glitch usually happen around the HA switch over?  If this
> > >> happens during normal operation,
> > >> Then this is not HA specific issue, but an indication of some
> > >> connectivity issues.
> > >>
> > >
> > > Hi Andy. This happens in the openstack deployment and when the
> > > neutron-server is busy handling lots of API requests.
> > > Normally the deployment would be having 3 controller nodes and
> > > neutron-server would be running in each node.  On each controller node,
> > > neutron-server starts around 10 - 12 neutron workers (which are
> separate
> > > processes).  Number of API workers is a configuration option and
> normally
> > > number of cores = no of neutron works if not configured.
> > >
> > > I have tested  in both physical nodes deployment and virtual
> deployment (3
> > > controllers running as vms in a node). Around 40 connections are
> opened to
> > > the OVN north ovsdb-server by all the neutron workers in the physical
> > > deployment and around 15 connections are opened in the virtual
> deployment.
> > > When neutron-server is loaded with many API requests, I have noticed
> that,
> > > ovsdb-server drops the connections when it doesn't get the echo reply
> every
> > > 5 seconds. This leads to lot of reconnections to the ovsdb-server and
> the
> > > response from the neutron-server is very slow and bad.  With this
> patch it
> > > seems to work fine.
> > >
> > > The issue is not because of any network issues but because of lots of
> > > connections from the neutron-server workers to the ovsdb-server and
> failure
> > > by the idl clients to reply to the echo request every 5 seconds when
> the
> > > neutron-server is loaded.
> >
> > We have to disable the inactivity probe everywhere each time we have
> > done performance testing so far.
>
> Really this seems that it's a bug (or inadequacy) in ovsdb-server.  It's
> pretty sad that ovsdb-server can't reply within 5 seconds


It's actually the ovsdb python idl client which is not able to reply within
5 seconds for the
echo request from ovsdb-server.




> (maybe there's
> a 2x or 3x multiplier on the response time, I don't recall).  I hope
> that the clustered database does better here.
>
> That said, if in the real world we need 60 seconds for now, let's use it
> but remember that we should get our act together later.  (Maybe a
> comment would be helpful.)
>


More information about the dev mailing list