[ovs-dev] [PATCH] conntrack: add tcp_in_liberal option in userspace conntrack

Darrell Ball dlu998 at gmail.com
Wed Jun 5 16:35:44 UTC 2019


On Tue, Jun 4, 2019 at 11:41 PM 姜立东 <jianglidong3 at jd.com> wrote:

> *comments* *inline*.
>
>
>
> *发件人:* Darrell Ball <dlu998 at gmail.com>
> *发送时间:* 2019年6月5日 13:31
> *收件人:* 姜立东 <jianglidong3 at jd.com>
> *抄送:* dev at openvswitch.org; 王志克 <wangzhike at jd.com>
> *主题:* Re: [ovs-dev] [PATCH] conntrack: add tcp_in_liberal option in
> userspace conntrack
>
>
>
>
>
>
>
> On Tue, Jun 4, 2019 at 7:13 PM 姜立东 <jianglidong3 at jd.com> wrote:
>
> Please check *comments* *inline*.
>
>
>
> BR, Lidong
>
>
>
> *发件人:* Darrell Ball <dlu998 at gmail.com>
> *发送时间:* 2019年6月4日 23:33
> *收件人:* 姜立东 <jianglidong3 at jd.com>
> *抄送:* dev at openvswitch.org; 王志克 <wangzhike at jd.com>
> *主题:* Re: [ovs-dev] [PATCH] conntrack: add tcp_in_liberal option in
> userspace conntrack
>
>
>
>
>
>
>
> On Mon, Jun 3, 2019 at 7:59 PM 姜立东 <jianglidong3 at jd.com> wrote:
>
> Hi Darrell,
>
>
>
> The “tighter checking” on TCP_RST in kernel, do you mean processes under “
> TCP_CONNTRACK_CLOSE” case?
>
> There is only one situation(when th->seq is less than peer td_maxack, as
> mentioned) may leads to explicitly –NF_ACCEPT, kernel implementation just
> lets most TCP_RST packets pass when TCP liberal mode is enabled.
>
>
>
> yep, this is the tighter checking the kernel does for RSTs in general
>
> Userspace code has a tighter check for RSTs
>
> *Why not follow kernel tight checking then? **J*
>
>
>
> Meanwhile, I think the purpose of TCP liberal mode is to skip TCP SEQ
> check in conntrack, when SEQ out of sync is normal situation in some
> application scenarios.
>
> For example, TCP_RST packets can have very different SEQ than what is
> captured in conntrack in hardware-offloading scenario, because most of
> packets are forwarded by hardware.
>
>
>
> TCP resets are used in selected valid scenarios, but are more often
> associated with DOS scenarios, so they are a special case
>
> Removing most or all protection from reset attacks seems irresponsible.
>
>
>
> *TCP reset happens in few scenarios in theory, but it can happen
> frequently in big scale applicant scenario with huge connections, some
> software clients use sensible shorter wait time, they may close socket
> before server response. *
>
> *So it depends on TCP RST function to free server resource, it is
> necessary requirement just as what we can observer in our network every
> day, that is why we need to keep TCP_RST function working either in kernel
> conntrack and in user space conntrack.*
>
>
>
> That is common; however, the question is why not pass through the RST in
> offload mode in this scenario and let conntrack do cleanup on reuse,
> if/when necessary ?
>
>
>
> *If pass through TCP_RST in hardware, it leaves garbage information in OVS
> conntrack, while this connection has been cleared in both endpoints.
> When/if same port is reused, this garbage information may cause other
> problems.*
>

'Reuse' is a pre-existing capability that handles this kind of case and the
summary behavior in this use case will be similar.


> *Also such solution makes OVS conntrack connection information less
> valuable, once flow is offloaded, it’s information is meaningless.*
>

>From this point in the lifecycle of the connection, there is just a little
more to do anyways, 'normally'.

That being said, there can be added value features that may come into play
here in the non-offload path.


>
>
>
>
> *Meanwhile, kernel implementation of this feature has been there years, we
> don**’t see serious security issue is involved.*
>
>
>
> :-)
>
>
>
> *Why don**’t we do less limitation, leave the decision to user of this
> mode, considering TCP liberal is mostly related to special deployment
> scenario.*
>
>
>
> 'no-tcp-seq-chk' mode is a related feature and can be easily incrementally
> added or in lieu of 'liberal' mode; this will do as you require.
>
>
>
>
>
>
>
>
>
> Hardware offload implementations can be done well or poorly; catering to
> poor implementations at the cost to every other use case
>
> may not be the best approach.
>
>
>
> *No matter good or bad design, TCP RST is needed to let conntrack know
> that this connection is closing, and recycle this connection in short time*
> .
>
>
>
> In this scenario, TCP_RST can work normally with kernel TCP liberal mode
> because it is passed to destination, but won’t work with user space
> conntrack with current change due to being dropped.
>
>
>
> I think tighter check is good for normal situation, but may be conflict
> with the purpose of TCP liberal mode by putting more limitations.
>
>
>
> BR, Lidong
>
>
>
> *发件人:* Darrell Ball <dlu998 at gmail.com>
> *发送时间:* 2019年6月3日 23:53
> *收件人:* 姜立东 <jianglidong3 at jd.com>
> *抄送:* dev at openvswitch.org; 王志克 <wangzhike at jd.com>
> *主题:* Re: [ovs-dev] [PATCH] conntrack: add tcp_in_liberal option in
> userspace conntrack
>
>
>
>
>
>
>
> On Mon, Jun 3, 2019 at 12:21 AM 姜立东 <jianglidong3 at jd.com> wrote:
>
> Hi Darrell,
>
>
>
> By checking latest patch, we notice it has inconsistent behavior on
> TCP_RST packet with
>
> kernel tcp_liberal option.
>
> In kernel, TCP_RST packet is only checked for TCP_CONNTRACK_CLOSE state,
>
>
>
> TCP_CONNTRACK_CLOSE is the 'next state' on receiving RST
>
>
>
>
>
> TCP_RST
>
> packet is dropped if its seq is less than peer acked.
>
> In other cases, TCP_RST packets are processed in  tcp_in_window() as
> other packets, if
>
> tcp_liberal is enabled, they always are accepted.
>
>
>
> On receipt of a RST in any present state, tighter checking is done; this
> checking is done
>
> before *tcp_in_window*
> <https://elixir.bootlin.com/linux/v4.9.12/ident/tcp_in_window>() is
> executed.
>
>
>
> Do you mean “tighter checking” is processes under “TCP_CONNTRACK_CLOSE”
> case? There is
>
>
>
> Please refer to tcp_packet() and tcp_in_window() in kernel conntrack.
>
>
>
> Current implementation may cause TCP_RST packet is marked as INVALID by
> conntrack
>
> due to old seq info in conntrack in hardware-offloading scenario.
>
> So that we suggest to move conntrack_get_tcp_liberal(ct) up in this check
> to include
>
> TCP_RST packet as below,
>
>
>
> I think the tighter checking for RSTs  I am proposing is generally
> consistent, although BSD and
>
> Linux are somewhat different.
>
>
>
>
>
>
>
> -    if (SEQ_GEQ(src->seqhi, end)
>
> -        /* Last octet inside other's window space */
>
> -        && SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws))
>
> -        /* Retrans: not more than one window back */
>
> -        && (ackskew >= -MAXACKWINDOW)
>
> -        /* Acking not more than one reassembled fragment backwards */
>
> -        && (ackskew <= (MAXACKWINDOW << sws))
>
> -        /* Acking not more than one window forward */
>
> +    if ((SEQ_GEQ(src->seqhi, end)
>
> +          /* Last octet inside other's window space */
>
> +          && SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws))
>
> +          /* Retrans: not more than one window back */
>
> +          && ackskew >= -MAXACKWINDOW
>
> +          /* Acking not more than one reassembled fragment backwards */
>
> +          && ackskew <= (MAXACKWINDOW << sws))
>
> +          /* Acking not more than one window forward */
>
>          && ((tcp_flags & TCP_RST) == 0 || orig_seq == src->seqlo
>
> -            || (orig_seq == src->seqlo + 1) || (orig_seq + 1 == src->seqlo))) {
>
> +             || (orig_seq == src->seqlo + 1)
>
> +             || (orig_seq + 1 == src->seqlo)))
>
> +        || conntrack_get_tcp_liberal(ct)) {
>
>
>
>
>
>
>
> Thanks,
>
> Lidong
>
>
>
> *发件人:* Darrell Ball <dlu998 at gmail.com>
> *发送时间:* 2019年6月1日 2:11
> *收件人:* 姜立东 <jianglidong3 at jd.com>
> *抄送:* dev at openvswitch.org; 王志克 <wangzhike at jd.com>
> *主题:* Re: [ovs-dev] [PATCH] conntrack: add tcp_in_liberal option in
> userspace conntrack
>
>
>
>
>
>
>
> On Fri, May 31, 2019 at 9:40 AM Darrell Ball <dlu998 at gmail.com> wrote:
>
>
>
>
>
> On Fri, May 31, 2019 at 2:42 AM 姜立东 <jianglidong3 at jd.com> wrote:
>
> Hi Darrell,
>
>
>
> Thanks for prompt action.
>
>
>
> I think there are 2 main differences in your change,
>
> 1. Replace Open_vswitch.other_config by dpctl CLI command.
>
> 2. One more case to check tcp_liberal as below.
>
> @@ -333,10 +335,12 @@ tcp_conn_update(struct conntrack *ct, struct conn
> *conn_,
>
>      } else if ((dst->state < CT_DPIF_TCPS_SYN_SENT
>
>                  || dst->state >= CT_DPIF_TCPS_FIN_WAIT_2
>
>                  || src->state >= CT_DPIF_TCPS_FIN_WAIT_2)
>
> -               && SEQ_GEQ(src->seqhi + MAXACKWINDOW, end)
>
> -               /* Within a window forward of the originating packet */
>
> -               && SEQ_GEQ(seq, src->seqlo - MAXACKWINDOW)) {
>
> -               /* Within a window backward of the originating packet */
>
> +               && ((SEQ_GEQ(src->seqhi + MAXACKWINDOW, end)
>
> +                    /* Within a window forward of the originating packet
> */
>
> +                    && SEQ_GEQ(seq, src->seqlo - MAXACKWINDOW))
>
> +                    /* Within a window backward of the originating packet
> */
>
> +                   || (conntrack_get_tcp_liberal(ct)
>
> +                       && (tcp_flags & TCP_RST) == 0)))
>
> For item 1, it is fine for us.
>
> For item 2, we didn’t actually hit into this conditional branch in our
> application, but considering conntrack is in the middle of this path,
>
> it is reasonable to add this check and leave SEQ validation  to endpoints’
> local TCP state machine. So it looks good to us.
>
>
>
> Lidong
>
>
>
> I just noticed you were referring to the "else if" condition above
>
> This change had been dropped in v2 since it is not needed, here:
>
>
>
> https://patchwork.ozlabs.org/patch/1107761/
>
>
>
> The change in the 'if' condition check is the main algorithm change; you
> might want to
>
> compare it to the proposed change that you had sent out.
>
>
>
> Thanks Darrell
>
>
>
>
>
>
>
> *diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c*
>
> *index 397aca1..e95d2f3 100644*
>
> *--- a/lib/conntrack-tcp.c*
>
> *+++ b/lib/conntrack-tcp.c*
>
> *@@ -272,16 +272,18 @@*  tcp_conn_update(struct conntrack *ct, struct conn *conn_,
>
>       int ackskew = check_ackskew ? dst->seqlo - ack : 0;
>
>  #define MAXACKWINDOW (0xffff + 1500)    /* 1500 is an arbitrary fudge factor */
>
> -    if (SEQ_GEQ(src->seqhi, end)
>
> -        /* Last octet inside other's window space */
>
> -        && SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws))
>
> -        /* Retrans: not more than one window back */
>
> -        && (ackskew >= -MAXACKWINDOW)
>
> -        /* Acking not more than one reassembled fragment backwards */
>
> -        && (ackskew <= (MAXACKWINDOW << sws))
>
> -        /* Acking not more than one window forward */
>
> +    if (((SEQ_GEQ(src->seqhi, end)
>
> +          /* Last octet inside other's window space */
>
> +          && SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws))
>
> +          /* Retrans: not more than one window back */
>
> +          && ackskew >= -MAXACKWINDOW
>
> +          /* Acking not more than one reassembled fragment backwards */
>
> +          && ackskew <= (MAXACKWINDOW << sws))
>
> +         || conntrack_get_tcp_liberal(ct))
>
> +          /* Acking not more than one window forward */
>
>          && ((tcp_flags & TCP_RST) == 0 || orig_seq == src->seqlo
>
> -            || (orig_seq == src->seqlo + 1) || (orig_seq + 1 == src->seqlo))) {
>
> +             || (orig_seq == src->seqlo + 1)
>
> +             || (orig_seq + 1 == src->seqlo))) {
>
>          /* Require an exact/+1 sequence match on resets when possible */
>
>           /* update max window */
>
>
>
>
>
>
>
>
>
>
>
> Thanks Lidong
>
>
>
> The algorithm I used adheres to the liberal mode definition.
>
>
>
> For future reference, another important aspect is test support.
>
>
>
> Darrell
>
>
>
>
>
> BR,
>
> Lidong
>
>
>
> *发件人:* Darrell Ball <dlu998 at gmail.com>
> *发送时间:* 2019年5月30日 14:51
> *收件人:* 姜立东 <jianglidong3 at jd.com>
> *抄送:* dev at openvswitch.org; 王志克 <wangzhike at jd.com>
> *主题:* Re: [ovs-dev] [PATCH] conntrack: add tcp_in_liberal option in
> userspace conntrack
>
>
>
> Hi Lidong/Zhike
>
>
>
> I sent an alternative implementation:
> https://mail.openvswitch.org/pipermail/ovs-dev/2019-May/359364.html
>
>
>
> Pls take a look and check if it meets your requirements.
>
>
>
> Thanks Darrell
>
>
>
> On Fri, May 24, 2019 at 10:11 AM Darrell Ball <dlu998 at gmail.com> wrote:
>
> Thanks for the patch Lidong/Zhike
>
>
>
> I took an initial look and will send a response next week.
>
>
>
> Darrell
>
>
>
> On Tue, May 21, 2019 at 8:53 PM 姜立东 <jianglidong3 at jd.com> wrote:
>
> From 3ce112684921bca74839e109fda91848aa024a54 Mon Sep 17 00:00:00 2001
> From: Jiang Lidong <jianglidong3 at jd.com>
> Date: Wed, 22 May 2019 11:21:34 +0800
> Subject: [PATCH] conntrack: add tcp_in_liberal option in userspace
> conntrack
>
> Adding similar cp_in_liberal option in userspace conntrack as
> kernel conntrack does to skip seq check on tcp connection.
> It prevents packet is marked as INVALID by stable seq
> info in conntrack connection.
>
> This option can help to make traffic survive in hardware
> offloading cases, especially when traffic is being moved
> back to software path from hardware forwarding engine.
>
> Signed-off-by: Lidong Jiang <jianglidong3 at jd.com>
> Signed-off-by: Zhike Wang <wangzhike at jd.com>
>
> ---
> lib/conntrack-private.h | 2 ++
> lib/conntrack-tcp.c     | 5 +++--
> lib/conntrack.c         | 6 ++++++
> lib/conntrack.h         | 4 +++-
> lib/dpif-netdev.c       | 6 ++++++
> 5 files changed, 20 insertions(+), 3 deletions(-)
>
> diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
> index 51b7d7f..9bc99cd 100644
> --- a/lib/conntrack-private.h
> +++ b/lib/conntrack-private.h
> @@ -172,6 +172,8 @@ struct conntrack {
>      /* Fragmentation handling context. */
>      struct ipf *ipf;
> +
> +    bool tcp_be_liberal;
> };
>  /* Lock acquisition order:
> diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c
> index 397aca1..61abafb 100644
> --- a/lib/conntrack-tcp.c
> +++ b/lib/conntrack-tcp.c
> @@ -272,7 +272,7 @@ tcp_conn_update(struct conntrack *ct, struct conn
> *conn_,
>      int ackskew = check_ackskew ? dst->seqlo - ack : 0;
> #define MAXACKWINDOW (0xffff + 1500)    /* 1500 is an arbitrary fudge
> factor */
> -    if (SEQ_GEQ(src->seqhi, end)
> +    if ((SEQ_GEQ(src->seqhi, end)
>          /* Last octet inside other's window space */
>          && SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws))
>          /* Retrans: not more than one window back */
> @@ -281,7 +281,8 @@ tcp_conn_update(struct conntrack *ct, struct conn
> *conn_,
>          && (ackskew <= (MAXACKWINDOW << sws))
>          /* Acking not more than one window forward */
>          && ((tcp_flags & TCP_RST) == 0 || orig_seq == src->seqlo
> -            || (orig_seq == src->seqlo + 1) || (orig_seq + 1 ==
> src->seqlo))) {
> +            || (orig_seq == src->seqlo + 1) || (orig_seq + 1 ==
> src->seqlo)))
> +            || (ct->tcp_be_liberal)) {
>          /* Require an exact/+1 sequence match on resets when possible */
>          /* update max window */
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 6711f5e..bd92710 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -2282,6 +2282,12 @@ conntrack_ipf_ctx(struct conntrack *ct)
>      return ct->ipf;
> }
> +void
> +conntrack_set_tcp_be_liberal(struct conntrack *ct, bool enabled)
> +{
> +    ct->tcp_be_liberal = enabled;
> +}
> +
> int
> conntrack_dump_start(struct conntrack *ct, struct conntrack_dump *dump,
>                       const uint16_t *pzone, int *ptot_bkts)
> diff --git a/lib/conntrack.h b/lib/conntrack.h
> index 2012150..b8d799d 100644
> --- a/lib/conntrack.h
> +++ b/lib/conntrack.h
> @@ -119,5 +119,7 @@ int conntrack_set_maxconns(struct conntrack *ct,
> uint32_t maxconns);
> int conntrack_get_maxconns(struct conntrack *ct, uint32_t *maxconns);
> int conntrack_get_nconns(struct conntrack *ct, uint32_t *nconns);
> struct ipf *conntrack_ipf_ctx(struct conntrack *ct);
> -
> +
> +void conntrack_set_tcp_be_liberal(struct conntrack *ct, bool enabled);
> +
> #endif /* conntrack.h */
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 5a6f2ab..ae6a18e 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -3823,6 +3823,12 @@ dpif_netdev_set_config(struct dpif *dpif, const
> struct smap *other_config)
>      uint32_t tx_flush_interval, cur_tx_flush_interval;
>      uint64_t rebalance_intvl;
> +    bool tcp_be_liberal = smap_get_bool(other_config,
> +                                        "conntrack_tcp_be_liberal",
> +                                        false);
> +
> +    conntrack_set_tcp_be_liberal(&dp->conntrack, tcp_be_liberal);
> +
>      tx_flush_interval = smap_get_int(other_config, "tx-flush-interval",
>                                       DEFAULT_TX_FLUSH_INTERVAL);
>      atomic_read_relaxed(&dp->tx_flush_interval, &cur_tx_flush_interval);
> --
> 1.8.3.1
>
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>


More information about the dev mailing list