[ovs-dev] [PATCH] TCP Stream: Use TCP keepalive by default

Tue Nov 23 17:44:38 UTC 2021

On Tue, Nov 16, 2021 at 09:54:54PM +0100, Ilya Maximets wrote:
> On 10/25/21 16:36, Michael Santana wrote:
> > In the case that a client disables jsonrpc probes the client would fail
> > to detect if the connection to the server has dropped. To workaround
> > such case TCP keepalive is enabled.
> > 
> > Signed-off-by: Michael Santana <msantana at redhat.com>
> > ---
> 
> Hi, Michael.  Thanks for the patch.  But I'm not sure why we need this,
> at least in current form.
> 
> Standard keepalive configuration on modern systems is set to something
> around 2 hours most of the time.  So, the user might have 2 hours of
> downtime and not even notice.
> 
> TCP keepalives might be useful for the case where user knows that
> application may not reply for a long time, so they have to set the
> inactivity probe to a higher value.   In this case, we could detect
> connection failure with TCP keepalive using shorter time interval.
> 
> Having TCP keepalive configured to a very long interval (which is system
> default), IMHO, doesn't make a lot of sense.
> 
> I would also argue that inactivity probes should never be disabled, but
> set to a higher value instead, because TCP keepalive will not be able
> to detect hanged application, e.g. deadlock.

That's a good point. My concern is that either we will need
to stop allowing setting to 0 and that can break upgrades,
or we need to silently set a big interval instead, which is
not a good program behavior IMHO.

Also, it sounds like not allowing to disable actually prevents
the user to shoot himself in the foot, but I don't know for
sure if that has a valid use-case.

Even if we decide to improve the built-in inactivity probes,
it still seems like a good idea to enable TCP keepalive as
a second layer of fault detection.

BTW, the patch missed the tag below
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1988461

-- 
fbl