[ovs-dev] [PATCH] conntrack: add tcp_in_liberal option in userspace conntrack

Darrell Ball dlu998 at gmail.com
Wed Jun 5 17:11:53 UTC 2019


On Wed, Jun 5, 2019 at 9:35 AM Darrell Ball <dlu998 at gmail.com> wrote:

>
>
> On Tue, Jun 4, 2019 at 11:41 PM 姜立东 <jianglidong3 at jd.com> wrote:
>
>> *comments* *inline*.
>>
>>
>>
>> *发件人:* Darrell Ball <dlu998 at gmail.com>
>> *发送时间:* 2019年6月5日 13:31
>> *收件人:* 姜立东 <jianglidong3 at jd.com>
>> *抄送:* dev at openvswitch.org; 王志克 <wangzhike at jd.com>
>> *主题:* Re: [ovs-dev] [PATCH] conntrack: add tcp_in_liberal option in
>> userspace conntrack
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Jun 4, 2019 at 7:13 PM 姜立东 <jianglidong3 at jd.com> wrote:
>>
>> Please check *comments* *inline*.
>>
>>
>>
>> BR, Lidong
>>
>>
>>
>> *发件人:* Darrell Ball <dlu998 at gmail.com>
>> *发送时间:* 2019年6月4日 23:33
>> *收件人:* 姜立东 <jianglidong3 at jd.com>
>> *抄送:* dev at openvswitch.org; 王志克 <wangzhike at jd.com>
>> *主题:* Re: [ovs-dev] [PATCH] conntrack: add tcp_in_liberal option in
>> userspace conntrack
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Jun 3, 2019 at 7:59 PM 姜立东 <jianglidong3 at jd.com> wrote:
>>
>> Hi Darrell,
>>
>>
>>
>> The “tighter checking” on TCP_RST in kernel, do you mean processes under
>> “TCP_CONNTRACK_CLOSE” case?
>>
>> There is only one situation(when th->seq is less than peer td_maxack, as
>> mentioned) may leads to explicitly –NF_ACCEPT, kernel implementation
>> just lets most TCP_RST packets pass when TCP liberal mode is enabled.
>>
>>
>>
>> yep, this is the tighter checking the kernel does for RSTs in general
>>
>> Userspace code has a tighter check for RSTs
>>
>> *Why not follow kernel tight checking then? **J*
>>
>>
>>
>> Meanwhile, I think the purpose of TCP liberal mode is to skip TCP SEQ
>> check in conntrack, when SEQ out of sync is normal situation in some
>> application scenarios.
>>
>> For example, TCP_RST packets can have very different SEQ than what is
>> captured in conntrack in hardware-offloading scenario, because most of
>> packets are forwarded by hardware.
>>
>>
>>
>> TCP resets are used in selected valid scenarios, but are more often
>> associated with DOS scenarios, so they are a special case
>>
>> Removing most or all protection from reset attacks seems irresponsible.
>>
>>
>>
>> *TCP reset happens in few scenarios in theory, but it can happen
>> frequently in big scale applicant scenario with huge connections, some
>> software clients use sensible shorter wait time, they may close socket
>> before server response. *
>>
>> *So it depends on TCP RST function to free server resource, it is
>> necessary requirement just as what we can observer in our network every
>> day, that is why we need to keep TCP_RST function working either in kernel
>> conntrack and in user space conntrack.*
>>
>>
>>
>> That is common; however, the question is why not pass through the RST in
>> offload mode in this scenario and let conntrack do cleanup on reuse,
>> if/when necessary ?
>>
>>
>>
>> *If pass through TCP_RST in hardware, it leaves garbage information in
>> OVS conntrack, while this connection has been cleared in both endpoints.
>> When/if same port is reused, this garbage information may cause other
>> problems.*
>>
>
> 'Reuse' is a pre-existing capability that handles this kind of case and
> the summary behavior in this use case will be similar.
>

In retrospect, the reuse capability may not meet your requirements in all
cases.
Anyways, your second point below is sufficient.

I will modify the patch for 'no-tcp-seq-chk'



>
>
>> *Also such solution makes OVS conntrack connection information less
>> valuable, once flow is offloaded, it’s information is meaningless.*
>>
>
> From this point in the lifecycle of the connection, there is just a little
> more to do anyways, 'normally'.
>
> That being said, there can be added value features that may come into play
> here in the non-offload path.
>
>
>>
>>
>>
>>
>> *Meanwhile, kernel implementation of this feature has been there years,
>> we don**’t see serious security issue is involved.*
>>
>>
>>
>> :-)
>>
>>
>>
>> *Why don**’t we do less limitation, leave the decision to user of this
>> mode, considering TCP liberal is mostly related to special deployment
>> scenario.*
>>
>>
>>
>> 'no-tcp-seq-chk' mode is a related feature and can be easily
>> incrementally added or in lieu of 'liberal' mode; this will do as you
>> require.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Hardware offload implementations can be done well or poorly; catering to
>> poor implementations at the cost to every other use case
>>
>> may not be the best approach.
>>
>>
>>
>> *No matter good or bad design, TCP RST is needed to let conntrack know
>> that this connection is closing, and recycle this connection in short time*
>> .
>>
>>
>>
>> In this scenario, TCP_RST can work normally with kernel TCP liberal mode
>> because it is passed to destination, but won’t work with user space
>> conntrack with current change due to being dropped.
>>
>>
>>
>> I think tighter check is good for normal situation, but may be conflict
>> with the purpose of TCP liberal mode by putting more limitations.
>>
>>
>>
>> BR, Lidong
>>
>>
>>
>> *发件人:* Darrell Ball <dlu998 at gmail.com>
>> *发送时间:* 2019年6月3日 23:53
>> *收件人:* 姜立东 <jianglidong3 at jd.com>
>> *抄送:* dev at openvswitch.org; 王志克 <wangzhike at jd.com>
>> *主题:* Re: [ovs-dev] [PATCH] conntrack: add tcp_in_liberal option in
>> userspace conntrack
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Jun 3, 2019 at 12:21 AM 姜立东 <jianglidong3 at jd.com> wrote:
>>
>> Hi Darrell,
>>
>>
>>
>> By checking latest patch, we notice it has inconsistent behavior on
>> TCP_RST packet with
>>
>> kernel tcp_liberal option.
>>
>> In kernel, TCP_RST packet is only checked for TCP_CONNTRACK_CLOSE state,
>>
>>
>>
>> TCP_CONNTRACK_CLOSE is the 'next state' on receiving RST
>>
>>
>>
>>
>>
>> TCP_RST
>>
>> packet is dropped if its seq is less than peer acked.
>>
>> In other cases, TCP_RST packets are processed in  tcp_in_window() as
>> other packets, if
>>
>> tcp_liberal is enabled, they always are accepted.
>>
>>
>>
>> On receipt of a RST in any present state, tighter checking is done; this
>> checking is done
>>
>> before *tcp_in_window*
>> <https://elixir.bootlin.com/linux/v4.9.12/ident/tcp_in_window>() is
>> executed.
>>
>>
>>
>> Do you mean “tighter checking” is processes under “TCP_CONNTRACK_CLOSE”
>> case? There is
>>
>>
>>
>> Please refer to tcp_packet() and tcp_in_window() in kernel conntrack.
>>
>>
>>
>> Current implementation may cause TCP_RST packet is marked as INVALID by
>> conntrack
>>
>> due to old seq info in conntrack in hardware-offloading scenario.
>>
>> So that we suggest to move conntrack_get_tcp_liberal(ct) up in this
>> check to include
>>
>> TCP_RST packet as below,
>>
>>
>>
>> I think the tighter checking for RSTs  I am proposing is generally
>> consistent, although BSD and
>>
>> Linux are somewhat different.
>>
>>
>>
>>
>>
>>
>>
>> -    if (SEQ_GEQ(src->seqhi, end)
>>
>> -        /* Last octet inside other's window space */
>>
>> -        && SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws))
>>
>> -        /* Retrans: not more than one window back */
>>
>> -        && (ackskew >= -MAXACKWINDOW)
>>
>> -        /* Acking not more than one reassembled fragment backwards */
>>
>> -        && (ackskew <= (MAXACKWINDOW << sws))
>>
>> -        /* Acking not more than one window forward */
>>
>> +    if ((SEQ_GEQ(src->seqhi, end)
>>
>> +          /* Last octet inside other's window space */
>>
>> +          && SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws))
>>
>> +          /* Retrans: not more than one window back */
>>
>> +          && ackskew >= -MAXACKWINDOW
>>
>> +          /* Acking not more than one reassembled fragment backwards */
>>
>> +          && ackskew <= (MAXACKWINDOW << sws))
>>
>> +          /* Acking not more than one window forward */
>>
>>          && ((tcp_flags & TCP_RST) == 0 || orig_seq == src->seqlo
>>
>> -            || (orig_seq == src->seqlo + 1) || (orig_seq + 1 == src->seqlo))) {
>>
>> +             || (orig_seq == src->seqlo + 1)
>>
>> +             || (orig_seq + 1 == src->seqlo)))
>>
>> +        || conntrack_get_tcp_liberal(ct)) {
>>
>>
>>
>>
>>
>>
>>
>> Thanks,
>>
>> Lidong
>>
>>
>>
>> *发件人:* Darrell Ball <dlu998 at gmail.com>
>> *发送时间:* 2019年6月1日 2:11
>> *收件人:* 姜立东 <jianglidong3 at jd.com>
>> *抄送:* dev at openvswitch.org; 王志克 <wangzhike at jd.com>
>> *主题:* Re: [ovs-dev] [PATCH] conntrack: add tcp_in_liberal option in
>> userspace conntrack
>>
>>
>>
>>
>>
>>
>>
>> On Fri, May 31, 2019 at 9:40 AM Darrell Ball <dlu998 at gmail.com> wrote:
>>
>>
>>
>>
>>
>> On Fri, May 31, 2019 at 2:42 AM 姜立东 <jianglidong3 at jd.com> wrote:
>>
>> Hi Darrell,
>>
>>
>>
>> Thanks for prompt action.
>>
>>
>>
>> I think there are 2 main differences in your change,
>>
>> 1. Replace Open_vswitch.other_config by dpctl CLI command.
>>
>> 2. One more case to check tcp_liberal as below.
>>
>> @@ -333,10 +335,12 @@ tcp_conn_update(struct conntrack *ct, struct conn
>> *conn_,
>>
>>      } else if ((dst->state < CT_DPIF_TCPS_SYN_SENT
>>
>>                  || dst->state >= CT_DPIF_TCPS_FIN_WAIT_2
>>
>>                  || src->state >= CT_DPIF_TCPS_FIN_WAIT_2)
>>
>> -               && SEQ_GEQ(src->seqhi + MAXACKWINDOW, end)
>>
>> -               /* Within a window forward of the originating packet */
>>
>> -               && SEQ_GEQ(seq, src->seqlo - MAXACKWINDOW)) {
>>
>> -               /* Within a window backward of the originating packet */
>>
>> +               && ((SEQ_GEQ(src->seqhi + MAXACKWINDOW, end)
>>
>> +                    /* Within a window forward of the originating packet
>> */
>>
>> +                    && SEQ_GEQ(seq, src->seqlo - MAXACKWINDOW))
>>
>> +                    /* Within a window backward of the originating
>> packet */
>>
>> +                   || (conntrack_get_tcp_liberal(ct)
>>
>> +                       && (tcp_flags & TCP_RST) == 0)))
>>
>> For item 1, it is fine for us.
>>
>> For item 2, we didn’t actually hit into this conditional branch in our
>> application, but considering conntrack is in the middle of this path,
>>
>> it is reasonable to add this check and leave SEQ validation  to endpoints
>> ’ local TCP state machine. So it looks good to us.
>>
>>
>>
>> Lidong
>>
>>
>>
>> I just noticed you were referring to the "else if" condition above
>>
>> This change had been dropped in v2 since it is not needed, here:
>>
>>
>>
>> https://patchwork.ozlabs.org/patch/1107761/
>>
>>
>>
>> The change in the 'if' condition check is the main algorithm change; you
>> might want to
>>
>> compare it to the proposed change that you had sent out.
>>
>>
>>
>> Thanks Darrell
>>
>>
>>
>>
>>
>>
>>
>> *diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c*
>>
>> *index 397aca1..e95d2f3 100644*
>>
>> *--- a/lib/conntrack-tcp.c*
>>
>> *+++ b/lib/conntrack-tcp.c*
>>
>> *@@ -272,16 +272,18 @@*  tcp_conn_update(struct conntrack *ct, struct conn *conn_,
>>
>>       int ackskew = check_ackskew ? dst->seqlo - ack : 0;
>>
>>  #define MAXACKWINDOW (0xffff + 1500)    /* 1500 is an arbitrary fudge factor */
>>
>> -    if (SEQ_GEQ(src->seqhi, end)
>>
>> -        /* Last octet inside other's window space */
>>
>> -        && SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws))
>>
>> -        /* Retrans: not more than one window back */
>>
>> -        && (ackskew >= -MAXACKWINDOW)
>>
>> -        /* Acking not more than one reassembled fragment backwards */
>>
>> -        && (ackskew <= (MAXACKWINDOW << sws))
>>
>> -        /* Acking not more than one window forward */
>>
>> +    if (((SEQ_GEQ(src->seqhi, end)
>>
>> +          /* Last octet inside other's window space */
>>
>> +          && SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws))
>>
>> +          /* Retrans: not more than one window back */
>>
>> +          && ackskew >= -MAXACKWINDOW
>>
>> +          /* Acking not more than one reassembled fragment backwards */
>>
>> +          && ackskew <= (MAXACKWINDOW << sws))
>>
>> +         || conntrack_get_tcp_liberal(ct))
>>
>> +          /* Acking not more than one window forward */
>>
>>          && ((tcp_flags & TCP_RST) == 0 || orig_seq == src->seqlo
>>
>> -            || (orig_seq == src->seqlo + 1) || (orig_seq + 1 == src->seqlo))) {
>>
>> +             || (orig_seq == src->seqlo + 1)
>>
>> +             || (orig_seq + 1 == src->seqlo))) {
>>
>>          /* Require an exact/+1 sequence match on resets when possible */
>>
>>           /* update max window */
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Thanks Lidong
>>
>>
>>
>> The algorithm I used adheres to the liberal mode definition.
>>
>>
>>
>> For future reference, another important aspect is test support.
>>
>>
>>
>> Darrell
>>
>>
>>
>>
>>
>> BR,
>>
>> Lidong
>>
>>
>>
>> *发件人:* Darrell Ball <dlu998 at gmail.com>
>> *发送时间:* 2019年5月30日 14:51
>> *收件人:* 姜立东 <jianglidong3 at jd.com>
>> *抄送:* dev at openvswitch.org; 王志克 <wangzhike at jd.com>
>> *主题:* Re: [ovs-dev] [PATCH] conntrack: add tcp_in_liberal option in
>> userspace conntrack
>>
>>
>>
>> Hi Lidong/Zhike
>>
>>
>>
>> I sent an alternative implementation:
>> https://mail.openvswitch.org/pipermail/ovs-dev/2019-May/359364.html
>>
>>
>>
>> Pls take a look and check if it meets your requirements.
>>
>>
>>
>> Thanks Darrell
>>
>>
>>
>> On Fri, May 24, 2019 at 10:11 AM Darrell Ball <dlu998 at gmail.com> wrote:
>>
>> Thanks for the patch Lidong/Zhike
>>
>>
>>
>> I took an initial look and will send a response next week.
>>
>>
>>
>> Darrell
>>
>>
>>
>> On Tue, May 21, 2019 at 8:53 PM 姜立东 <jianglidong3 at jd.com> wrote:
>>
>> From 3ce112684921bca74839e109fda91848aa024a54 Mon Sep 17 00:00:00 2001
>> From: Jiang Lidong <jianglidong3 at jd.com>
>> Date: Wed, 22 May 2019 11:21:34 +0800
>> Subject: [PATCH] conntrack: add tcp_in_liberal option in userspace
>> conntrack
>>
>> Adding similar cp_in_liberal option in userspace conntrack as
>> kernel conntrack does to skip seq check on tcp connection.
>> It prevents packet is marked as INVALID by stable seq
>> info in conntrack connection.
>>
>> This option can help to make traffic survive in hardware
>> offloading cases, especially when traffic is being moved
>> back to software path from hardware forwarding engine.
>>
>> Signed-off-by: Lidong Jiang <jianglidong3 at jd.com>
>> Signed-off-by: Zhike Wang <wangzhike at jd.com>
>>
>> ---
>> lib/conntrack-private.h | 2 ++
>> lib/conntrack-tcp.c     | 5 +++--
>> lib/conntrack.c         | 6 ++++++
>> lib/conntrack.h         | 4 +++-
>> lib/dpif-netdev.c       | 6 ++++++
>> 5 files changed, 20 insertions(+), 3 deletions(-)
>>
>> diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
>> index 51b7d7f..9bc99cd 100644
>> --- a/lib/conntrack-private.h
>> +++ b/lib/conntrack-private.h
>> @@ -172,6 +172,8 @@ struct conntrack {
>>      /* Fragmentation handling context. */
>>      struct ipf *ipf;
>> +
>> +    bool tcp_be_liberal;
>> };
>>  /* Lock acquisition order:
>> diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c
>> index 397aca1..61abafb 100644
>> --- a/lib/conntrack-tcp.c
>> +++ b/lib/conntrack-tcp.c
>> @@ -272,7 +272,7 @@ tcp_conn_update(struct conntrack *ct, struct conn
>> *conn_,
>>      int ackskew = check_ackskew ? dst->seqlo - ack : 0;
>> #define MAXACKWINDOW (0xffff + 1500)    /* 1500 is an arbitrary fudge
>> factor */
>> -    if (SEQ_GEQ(src->seqhi, end)
>> +    if ((SEQ_GEQ(src->seqhi, end)
>>          /* Last octet inside other's window space */
>>          && SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws))
>>          /* Retrans: not more than one window back */
>> @@ -281,7 +281,8 @@ tcp_conn_update(struct conntrack *ct, struct conn
>> *conn_,
>>          && (ackskew <= (MAXACKWINDOW << sws))
>>          /* Acking not more than one window forward */
>>          && ((tcp_flags & TCP_RST) == 0 || orig_seq == src->seqlo
>> -            || (orig_seq == src->seqlo + 1) || (orig_seq + 1 ==
>> src->seqlo))) {
>> +            || (orig_seq == src->seqlo + 1) || (orig_seq + 1 ==
>> src->seqlo)))
>> +            || (ct->tcp_be_liberal)) {
>>          /* Require an exact/+1 sequence match on resets when possible */
>>          /* update max window */
>> diff --git a/lib/conntrack.c b/lib/conntrack.c
>> index 6711f5e..bd92710 100644
>> --- a/lib/conntrack.c
>> +++ b/lib/conntrack.c
>> @@ -2282,6 +2282,12 @@ conntrack_ipf_ctx(struct conntrack *ct)
>>      return ct->ipf;
>> }
>> +void
>> +conntrack_set_tcp_be_liberal(struct conntrack *ct, bool enabled)
>> +{
>> +    ct->tcp_be_liberal = enabled;
>> +}
>> +
>> int
>> conntrack_dump_start(struct conntrack *ct, struct conntrack_dump *dump,
>>                       const uint16_t *pzone, int *ptot_bkts)
>> diff --git a/lib/conntrack.h b/lib/conntrack.h
>> index 2012150..b8d799d 100644
>> --- a/lib/conntrack.h
>> +++ b/lib/conntrack.h
>> @@ -119,5 +119,7 @@ int conntrack_set_maxconns(struct conntrack *ct,
>> uint32_t maxconns);
>> int conntrack_get_maxconns(struct conntrack *ct, uint32_t *maxconns);
>> int conntrack_get_nconns(struct conntrack *ct, uint32_t *nconns);
>> struct ipf *conntrack_ipf_ctx(struct conntrack *ct);
>> -
>> +
>> +void conntrack_set_tcp_be_liberal(struct conntrack *ct, bool enabled);
>> +
>> #endif /* conntrack.h */
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index 5a6f2ab..ae6a18e 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -3823,6 +3823,12 @@ dpif_netdev_set_config(struct dpif *dpif, const
>> struct smap *other_config)
>>      uint32_t tx_flush_interval, cur_tx_flush_interval;
>>      uint64_t rebalance_intvl;
>> +    bool tcp_be_liberal = smap_get_bool(other_config,
>> +                                        "conntrack_tcp_be_liberal",
>> +                                        false);
>> +
>> +    conntrack_set_tcp_be_liberal(&dp->conntrack, tcp_be_liberal);
>> +
>>      tx_flush_interval = smap_get_int(other_config, "tx-flush-interval",
>>                                       DEFAULT_TX_FLUSH_INTERVAL);
>>      atomic_read_relaxed(&dp->tx_flush_interval, &cur_tx_flush_interval);
>> --
>> 1.8.3.1
>>
>> _______________________________________________
>> dev mailing list
>> dev at openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>>


More information about the dev mailing list