[ovs-dev] [PATCH v1] ovn-controller: Fix the CT zone assignment logic for logical routers

Dumitru Ceara dceara at redhat.com
Thu Aug 13 07:39:41 UTC 2020


On 8/13/20 3:08 AM, Ankur Sharma wrote:
> Hi Dumitru,
> 
> Thanks a lot for looking at the change.
> It my mistake, LB use case totally slipped through my mind.
> 
> Cross product based approach is not generic enough to address LB as well (and future use cases for CT ZONE).
> 
> Please ignore this patch, i will submit a new one with a more generic approach soon.
> 

Looking forward to it!

> There is one change in this patch that can be taken, i.e replacing CT_ZONE_DB_QUEUED with CT_ZONE_OF_QUEUED,
> but i will submit a separate patch for it as well.
> 

Sounds good, this is something we should definitely address and it can
be indeed a separate patch.

Thanks,
Dumitru

> 
> Thanks
> 
> Regards,
> Ankur
> 
> 
> 
> 
> From: Dumitru Ceara <dceara at redhat.com>
> Sent: Friday, August 7, 2020 1:58 AM
> To: svc.mail.git <svc.mail.git at nutanix.com>; ovs-dev at openvswitch.org <ovs-dev at openvswitch.org>; Ankur Sharma <ankur.sharma at nutanix.com>
> Subject: Re: [ovs-dev] [PATCH v1] ovn-controller: Fix the CT zone assignment logic for logical routers 
>  
> On 8/4/20 5:55 AM, Ankur Sharma wrote:
>> From: Ankur Sharma <ankur.sharma at nutanix.com>
>>
>> BACKGROUND:
>> a. ovn-controller assigns CT ZONES for local ports and datapaths.
>> b. If a local port/datapath is cleaned up from a chassis, then
>>     corresponding CT ZONE is "unassigned"/"freed" up.
>>
>> ISSUE:
>> Above logic and implementations leaves stale CT entries in the
>> datapath, which may get reused unexpectedly, thereby causing
>> issues like, packets going through ct_nat(SNAT_IP_NEW) and getting
>> a stale IP as SNAT IP etc.
>>
>> a. As a part of CT Zone unassign, implementation should FLUSH the
>>     corresponding CT entries, i.e it should do FLUSH by ZONE.
>>     As os now, implementation avoids the flushing, thereby leaving
>>     stale CT entries.
>>
>> b. Similarly, since the implementation relies on datapath existence
>>     for assign/unassign of CT ZONEs. Hence, simple operations like
>>     moving the logical router from one external logical switch to
>>     another, may not cause any CT ZONE reassignment and thereby
>>     stale CT entries might get consumed, when they should not have
>>     been.
>>
>> c. a. and b. combined causes following:
>>     i. Start a to be SNATed traffic from internal endpoint to an
>>        external endpoint. Let us say internal endpoint IP is
>>        50.0.0.10 and external endpoint ip is 8.8.8.8 and
>>        logical router port ip (and hence SNAT ip) is 100.0.0.10.
>>
>>    ii. Detach the logical router from old external logical switch
>>        and attach to new external logical switch. As a result
>>        of this operation, new router port ip becomes 200.0.0.10
>>        , which also becomes the new SNAT ip.
>>
>>   iii. The observation has been that traffic initiated in i. above
>>        still ends up using OLD SNAT IP, i.e 100.0.0.10, rather than
>>        200.0.0.10
>>
>>   iv. iii. above happened, because although from OVS DP, the IP for
>>       NAT action is 200.0.0.10, however, since its an ongoing traffic,
>>       hence the CT entries come in use and end up NATing to old SNAT
>>       ip 100.0.0.10. For example:
>>
>>       OVS DP STATE
>>       recirc_id(0),in_port(16),....ct(commit,zone=1,nat(src=200.0.0.10))
>>
>>       CT STATE
>>       icmp,orig=(src=50.0.0.10,dst=8.8.8.8,id=2288,type=8,code=0),
>>       reply=(src=8.8.8.8,dst=100.0.0.10,id=2288,type=0,code=0),zone=1
>>
>> FIX:
>> This patch improves the overall CT ZONE management by doing following:
>> a. Do a FLUSH by CT ZONE, once we identify that a zone has to be freed up.
>> b. From datapath perspective, restrict the CT ZONE assignment ONLY
>>     to logical routers that has NAT rules enabled.
> 
> Hi Ankur,
> 
> Maybe I'm missing something but doesn't this cause stateful ACLs, load
> balancers applied to logical_switches and load balancers applied to
> logical routers that don't have nat configured to use the default
> conntrack zone (0)? I don't think that's ok.
> 
> As a matter of fact, most of the system-ovn.at tests related to
> conntrack fail because conntrack entries are created in the default
> zone, e.g.:
> 
> $ make check-kernel
> 1: ovn -- 2 LRs connected via LS, gateway router, SNAT and DNAT FAILED
> (system-ovn.at:116)
> 
> +++ /root/ovn/tests/system-kmod-testsuite.dir/at-groups/1/stdout
> 2020-08-07 04:50:11.167836393 -0400
> @@ -1,2 +1,2 @@
> -icmp,orig=(src=172.16.1.2,dst=30.0.0.2,id=<cleared>,type=8,code=0),reply=(src=192.168.1.2,dst=172.16.1.2,id=<cleared>,type=0,code=0),zone=<cleared>
> +icmp,orig=(src=172.16.1.2,dst=30.0.0.2,id=<cleared>,type=8,code=0),reply=(src=192.168.1.2,dst=172.16.1.2,id=<cleared>,type=0,code=0)
> 
>   2: ovn -- 2 LRs connected via LS, gateway router, SNAT and DNAT - IPv6
> FAILED (system-ovn.at:296)
>   3: ovn -- 2 LRs connected via LS, gateway router, easy SNAT FAILED
> (system-ovn.at:448)
>   4: ovn -- 2 LRs connected via LS, gateway router, easy SNAT - IPv6
> FAILED (system-ovn.at:560)
>   5: ovn -- multiple gateway routers, SNAT and DNAT  FAILED
> (system-ovn.at:730)
>   6: ovn -- multiple gateway routers, SNAT and DNAT - IPv6 FAILED
> (system-ovn.at:958)
>   7: ovn -- load-balancing                           FAILED
> (system-ovn.at:1153)
>   8: ovn -- load-balancing - IPv6                    ok
>   9: ovn -- load-balancing - same subnet.            ok
>  10: ovn -- load-balancing - same subnet. - IPv6     ok
>  11: ovn -- load balancing in gateway router         FAILED
> (system-ovn.at:1829)
> 
> ./system-ovn.at:1829: ovs-appctl dpctl/dump-conntrack | grep
> "dst=30.0.0.1" | sed -e 's/port=[0-9]*/port=<cleared>/g' -e
> 's/id=[0-9]*/id=<cleared>/g' -e 's/state=[0-9_A-Z]*/state=<cleared>/g' |
> sort | uniq |
> sed -e 's/zone=[0-9]*/zone=<cleared>/'
> --- -   2020-08-07 04:46:41.414957151 -0400
> +++ /root/ovn/tests/system-kmod-testsuite.dir/at-groups/11/stdout
> 2020-08-07 04:46:41.412798625 -0400
> @@ -1,3 +1,3 @@
> -tcp,orig=(src=172.16.1.2,dst=30.0.0.1,sport=<cleared>,dport=<cleared>),reply=(src=192.168.1.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
> -tcp,orig=(src=172.16.1.2,dst=30.0.0.1,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
> +tcp,orig=(src=172.16.1.2,dst=30.0.0.1,sport=<cleared>,dport=<cleared>),reply=(src=192.168.1.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),protoinfo=(state=<cleared>)
> +tcp,orig=(src=172.16.1.2,dst=30.0.0.1,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),protoinfo=(state=<cleared>)
> 
> 
>  12: ovn -- load balancing in gateway router - IPv6  FAILED
> (system-ovn.at:2033)
>  13: ovn -- multiple gateway routers, load-balancing FAILED
> (system-ovn.at:2209)
>  14: ovn -- multiple gateway routers, load-balancing - IPv6 FAILED
> (system-ovn.at:2376)
> 
> [...]
> 
>  30: ovn -- ECMP symmetric reply                     FAILED
> (system-ovn.at:4684)
> 
> ./system-ovn.at:4684: ovs-appctl dpctl/dump-conntrack | grep
> "dst=172.16.0.1" | sed -e 's/port=[0-9]*/port=<cleared>/g' -e
> 's/id=[0-9]*/id=<cleared>/g' -e 's/state=[0-9_A-Z]*/state=<cleared>/g' |
> sort | uniq | \
> sed -e 's/zone=[0-9]*/zone=<cleared>/' |
> sed -e
> 's/labels=0x[0-9a-f]*00000401020400000000/labels=0x00000401020400000000/'
> --- -   2020-08-07 04:56:22.403499981 -0400
> +++ /root/ovn/tests/system-kmod-testsuite.dir/at-groups/30/stdout
> 2020-08-07 04:56:22.401442917 -0400
> @@ -1,2 +1,2 @@
> -icmp,orig=(src=172.16.0.1,dst=10.0.0.2,id=<cleared>,type=8,code=0),reply=(src=10.0.0.2,dst=172.16.0.1,id=<cleared>,type=0,code=0),zone=<cleared>,labels=0x00000401020400000000
> +icmp,orig=(src=172.16.0.1,dst=10.0.0.2,id=<cleared>,type=8,code=0),reply=(src=10.0.0.2,dst=172.16.0.1,id=<cleared>,type=0,code=0),labels=0x00000401020400000000
> 
>> c. Instead of using logical router uuid as ct zone key, use crossproduct
>>     of logical router and logical router port that connects to external
>>     logical switch.
>>
>> Signed-off-by: Ankur Sharma <ankur.sharma at nutanix.com>
>> ---
>>   controller/ovn-controller.c | 37 +++++++++++++++++++++++++++----------
>>   controller/physical.c       | 18 ++++++++++++------
>>   lib/ovn-util.c              | 10 ++++++----
>>   lib/ovn-util.h              |  3 ++-
>>   4 files changed, 47 insertions(+), 21 deletions(-)
>>
>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>> index 5ca32ac..9a6746e 100644
>> --- a/controller/ovn-controller.c
>> +++ b/controller/ovn-controller.c
>> @@ -521,17 +521,34 @@ update_ct_zones(const struct sset *lports, const struct hmap *local_datapaths,
>>           sset_add(&all_users, user);
>>       }
>>   
>> -    /* Local patched datapath (gateway routers) need zones assigned. */
>> +    /* Local patched datapath (gateway routers) need zones assigned.
>> +     * Only local logical routers with atleast one NAT rule are considered for
>> +     * CT zone assignment.*/
>>       const struct local_datapath *ld;
>>       HMAP_FOR_EACH (ld, hmap_node, local_datapaths) {
>> -        /* XXX Add method to limit zone assignment to logical router
>> -         * datapaths with NAT */
>> -        char *dnat = alloc_nat_zone_key(&ld->datapath->header_.uuid, "dnat");
>> -        char *snat = alloc_nat_zone_key(&ld->datapath->header_.uuid, "snat");
>> -        sset_add(&all_users, dnat);
>> -        sset_add(&all_users, snat);
>> -        free(dnat);
>> -        free(snat);
>> +       const char *dp_nblr = smap_get(&ld->datapath->external_ids,
>> +                                      "logical-router");
>> +       if (dp_nblr) {
>> +          for (size_t iter = 0; iter < ld->n_peer_ports; iter++) {
>> +             const struct sbrec_port_binding *peer_binding =
>> +                                             ld->peer_ports[iter].remote;
>> +             const struct sbrec_port_binding *local_binding =
>> +                                             ld->peer_ports[iter].local;
>> +
>> +             if (peer_binding->nat_addresses) {
>> +                char *dnat = alloc_nat_zone_key(&ld->datapath->header_.uuid,
>> +                                                &local_binding->header_.uuid,
>> +                                                "dnat");
>> +                char *snat = alloc_nat_zone_key(&ld->datapath->header_.uuid,
>> +                                                &local_binding->header_.uuid,
>> +                                                "snat");
>> +                sset_add(&all_users, dnat);
>> +                sset_add(&all_users, snat);
>> +                free(dnat);
>> +                free(snat);
>> +             }
>> +          }
>> +       }
>>       }
>>   
>>       /* Delete zones that do not exist in above sset. */
>> @@ -541,7 +558,7 @@ update_ct_zones(const struct sset *lports, const struct hmap *local_datapaths,
>>                        ct_zone->data, ct_zone->name);
>>   
>>               struct ct_zone_pending_entry *pending = xmalloc(sizeof *pending);
>> -            pending->state = CT_ZONE_DB_QUEUED; /* Skip flushing zone. */
>> +            pending->state = CT_ZONE_OF_QUEUED;
>>               pending->zone = ct_zone->data;
>>               pending->add = false;
>>               shash_add(pending_ct_zones, ct_zone->name, pending);
>> diff --git a/controller/physical.c b/controller/physical.c
>> index 535c777..cc497e0 100644
>> --- a/controller/physical.c
>> +++ b/controller/physical.c
>> @@ -218,18 +218,24 @@ static struct zone_ids
>>   get_zone_ids(const struct sbrec_port_binding *binding,
>>                const struct simap *ct_zones)
>>   {
>> -    struct zone_ids zone_ids;
>> +    struct zone_ids zone_ids = {0};
>>   
>>       zone_ids.ct = simap_get(ct_zones, binding->logical_port);
>>   
>> -    const struct uuid *key = &binding->datapath->header_.uuid;
>> +    const struct uuid *key1 = &binding->datapath->header_.uuid;
>> +    const struct uuid *key2 = &binding->header_.uuid;
>>   
>> -    char *dnat = alloc_nat_zone_key(key, "dnat");
>> -    zone_ids.dnat = simap_get(ct_zones, dnat);
>> +    char *dnat = alloc_nat_zone_key(key1, key2, "dnat");
>> +
>> +    if (simap_contains(ct_zones, dnat)) {
>> +       zone_ids.dnat = simap_get(ct_zones, dnat);
>> +    }
> 
> Both simap_get() and simap_contains() call simap_find() internally.  If
> simap_get() can't find the key it will return a default value of 0.
> There's no need for the "if (simap_contains(ct_zones, dnat)) {". We can
> just directly:
> 
> zone_ids.dnat = simap_get(ct_zones, dnat);
> 
>>       free(dnat);
>>   
>> -    char *snat = alloc_nat_zone_key(key, "snat");
>> -    zone_ids.snat = simap_get(ct_zones, snat);
>> +    char *snat = alloc_nat_zone_key(key1, key2, "snat");
>> +    if (simap_contains(ct_zones, snat)) {
>> +       zone_ids.snat = simap_get(ct_zones, snat);
>> +    }
> 
> Same here.
> 
> Thanks,
> Dumitru
> 
>>       free(snat);
>>   
>>       return zone_ids;
>> diff --git a/lib/ovn-util.c b/lib/ovn-util.c
>> index cdb5e18..cba7355 100644
>> --- a/lib/ovn-util.c
>> +++ b/lib/ovn-util.c
>> @@ -327,14 +327,16 @@ destroy_lport_addresses(struct lport_addresses *laddrs)
>>       free(laddrs->ipv6_addrs);
>>   }
>>   
>> -/* Allocates a key for NAT conntrack zone allocation for a provided
>> - * 'key' record and a 'type'.
>> +/* Allocates a key for NAT conntrack zone allocation for provided
>> + * 'keys' and a 'type'.
>>    *
>>    * It is the caller's responsibility to free the allocated memory. */
>>   char *
>> -alloc_nat_zone_key(const struct uuid *key, const char *type)
>> +alloc_nat_zone_key(const struct uuid *key1, const struct uuid *key2,
>> +                   const char *type)
>>   {
>> -    return xasprintf(UUID_FMT"_%s", UUID_ARGS(key), type);
>> +    return xasprintf(UUID_FMT"_"UUID_FMT"_%s", UUID_ARGS(key1),
>> +                     UUID_ARGS(key2), type);
>>   }
>>   
>>   const char *
>> diff --git a/lib/ovn-util.h b/lib/ovn-util.h
>> index 0f7b501..fe86bf8 100644
>> --- a/lib/ovn-util.h
>> +++ b/lib/ovn-util.h
>> @@ -77,7 +77,8 @@ bool extract_sbrec_binding_first_mac(const struct sbrec_port_binding *binding,
>>   
>>   void destroy_lport_addresses(struct lport_addresses *);
>>   
>> -char *alloc_nat_zone_key(const struct uuid *key, const char *type);
>> +char *alloc_nat_zone_key(const struct uuid *key1, const struct uuid *key2,
>> +                         const char *type);
>>   
>>   const char *default_nb_db(void);
>>   const char *default_sb_db(void);
>>
> 
> 



More information about the dev mailing list