[ovs-dev] [PATCH v2 2/2] conntrack: limit port clash resolution attempts

wenxu wenxu at ucloud.cn
Tue Sep 7 03:47:39 UTC 2021









From: Paolo Valerio <pvalerio at redhat.com>
Date: 2021-09-06 19:08:44
To:  wenxu at ucloud.cn,i.maximets at ovn.org,dlu998 at gmail.com,aconole at redhat.com
Cc:  dev at openvswitch.org
Subject: Re: [PATCH v2 2/2] conntrack: limit port clash resolution attempts>Hello,
>
>Thanks for working on this.
>
>I like the idea of limiting the number of attempts, the logic has
>been kept to avoid yet another logical change in a single shot, but it
>is definitely needed especially for the case where both dst port and
>src port are manipulated.


I think both src port and dst port manipulated is only for a special case. It is
better put the specail case out of there like kernel datapath. The conntrack
in kernel only do src or dst port nat.  If dst nat failed then do src nat.


in the function ovs_ct_nat


err = ovs_ct_nat_execute(skb, ct, ctinfo, &info->range, maniptype);
 if (err == NF_ACCEPT && ct->status & IPS_DST_NAT) {
               xxxxx
                } else if (CTINFO2DIR(ctinfo) == IP_CT_DIR_ORIGINAL) {
                        err = ovs_ct_nat_execute(skb, ct, ctinfo, NULL,
                                                 NF_NAT_MANIP_SRC);
                }    
        }    



>
>wenxu at ucloud.cn writes:
>
>> From: wenxu <wenxu at ucloud.cn>
>>
>> In case almost or all available ports are taken, clash resolution can
>> take a very long time, resulting in soft lockup.
>>
>> This can happen when many to-be-natted hosts connect to same
>> destination:port (e.g. a proxy) and all connections pass the same SNAT.
>>
>> Pick a random offset in the acceptable range, then try ever smaller
>> number of adjacent port numbers, until either the limit is reached or a
>> useable port was found.  This results in at most 248 attempts
>> (128 + 64 + 32 + 16 + 8, i.e. 4 restarts with new search offset)
>> instead of 64000+.
>
>248 attempts for each IP in the range, right?
>A large range of IP addresses may still theoretically lead to many
>attempts, although random subranges and the hash should help to
>reduce the chances in that case.
>
>>
>> Signed-off-by: wenxu <wenxu at ucloud.cn>
>> ---
>>  lib/conntrack.c | 35 +++++++++++++++++++++++++++++++----
>>  1 file changed, 31 insertions(+), 4 deletions(-)
>>
>> diff --git a/lib/conntrack.c b/lib/conntrack.c
>> index 2d14205..bc7de17 100644
>> --- a/lib/conntrack.c
>> +++ b/lib/conntrack.c
>> @@ -2414,6 +2414,10 @@ nat_get_unique_tuple(struct conntrack *ct, const struct conn *conn,
>>                       conn->key.nw_proto == IPPROTO_UDP;
>>      uint16_t min_dport, max_dport, curr_dport, orig_dport;
>>      uint16_t min_sport, max_sport, curr_sport, orig_sport;
>> +    static const unsigned int max_attempts = 128;
>> +    uint16_t range_src, range_dst, range_max;
>> +    unsigned int attempts;
>> +    unsigned int i;
>>  
>>      min_addr = conn->nat_info->min_addr;
>>      max_addr = conn->nat_info->max_addr;
>> @@ -2430,6 +2434,10 @@ nat_get_unique_tuple(struct conntrack *ct, const struct conn *conn,
>>      set_dport_range(conn->nat_info, &conn->key, hash, &orig_dport,
>>                      &min_dport, &max_dport);
>>  
>> +    range_src = max_sport - min_sport + 1;
>> +    range_dst = max_dport - min_dport + 1;
>> +    range_max = range_src > range_dst ? range_src : range_dst;
>> +
>>  another_round:
>>      store_addr_to_key(&curr_addr, &nat_conn->rev_key,
>>                        conn->nat_info->nat_action);
>> @@ -2446,17 +2454,36 @@ another_round:
>>      curr_sport = orig_sport;
>>      curr_dport = orig_dport;
>>  
>> +    attempts = range_max;
>> +    if (attempts > max_attempts) {
>> +        attempts = max_attempts;
>> +    }
>> +
>> +another_port_round:
>> +    i = 0;
>>      FOR_EACH_PORT_IN_RANGE(curr_dport, min_dport, max_dport) {
>>          nat_conn->rev_key.src.port = htons(curr_dport);
>>          FOR_EACH_PORT_IN_RANGE(curr_sport, min_sport, max_sport) {
>> -            nat_conn->rev_key.dst.port = htons(curr_sport);
>> -            if (!conn_lookup(ct, &nat_conn->rev_key,
>> -                             time_msec(), NULL, NULL)) {
>> -                return true;
>> +            if (i++ < attempts) {
>> +                nat_conn->rev_key.dst.port = htons(curr_sport);
>> +                if (!conn_lookup(ct, &nat_conn->rev_key,
>> +                                 time_msec(), NULL, NULL)) {
>> +                    return true;
>> +                }
>>              }
>
>this way we reduce the number of total lookups, but we still iterate
>through the entire range. We should avoid that.
>
>>          }
>>      }
>>  
>> +    if (attempts >= range_max || attempts < 16) {
>> +        goto next_addr;
>> +    }
>> +
>> +    attempts /= 2;
>> +    curr_dport = random_uint32() % range_dst;
>> +    curr_sport = random_uint32() % range_src;
>> +
>
>did you mean:
>
>curr = min + (random_uint32() % range);
>
>?
>
>> +    goto another_port_round;
>> +
>>      /* Check if next IP is in range and respin. Otherwise, notify
>>       * exhaustion to the caller. */
>>  next_addr:
>> -- 
>> 1.8.3.1
>






More information about the dev mailing list