[ovs-dev] [PATCH v1] ovn-controller: Fix the CT zone assignment logic for logical routers

Dumitru Ceara dceara at redhat.com
Fri Aug 7 08:58:19 UTC 2020


On 8/4/20 5:55 AM, Ankur Sharma wrote:
> From: Ankur Sharma <ankur.sharma at nutanix.com>
> 
> BACKGROUND:
> a. ovn-controller assigns CT ZONES for local ports and datapaths.
> b. If a local port/datapath is cleaned up from a chassis, then
>    corresponding CT ZONE is "unassigned"/"freed" up.
> 
> ISSUE:
> Above logic and implementations leaves stale CT entries in the
> datapath, which may get reused unexpectedly, thereby causing
> issues like, packets going through ct_nat(SNAT_IP_NEW) and getting
> a stale IP as SNAT IP etc.
> 
> a. As a part of CT Zone unassign, implementation should FLUSH the
>    corresponding CT entries, i.e it should do FLUSH by ZONE.
>    As os now, implementation avoids the flushing, thereby leaving
>    stale CT entries.
> 
> b. Similarly, since the implementation relies on datapath existence
>    for assign/unassign of CT ZONEs. Hence, simple operations like
>    moving the logical router from one external logical switch to
>    another, may not cause any CT ZONE reassignment and thereby
>    stale CT entries might get consumed, when they should not have
>    been.
> 
> c. a. and b. combined causes following:
>    i. Start a to be SNATed traffic from internal endpoint to an
>       external endpoint. Let us say internal endpoint IP is
>       50.0.0.10 and external endpoint ip is 8.8.8.8 and
>       logical router port ip (and hence SNAT ip) is 100.0.0.10.
> 
>   ii. Detach the logical router from old external logical switch
>       and attach to new external logical switch. As a result
>       of this operation, new router port ip becomes 200.0.0.10
>       , which also becomes the new SNAT ip.
> 
>  iii. The observation has been that traffic initiated in i. above
>       still ends up using OLD SNAT IP, i.e 100.0.0.10, rather than
>       200.0.0.10
> 
>  iv. iii. above happened, because although from OVS DP, the IP for
>      NAT action is 200.0.0.10, however, since its an ongoing traffic,
>      hence the CT entries come in use and end up NATing to old SNAT
>      ip 100.0.0.10. For example:
> 
>      OVS DP STATE
>      recirc_id(0),in_port(16),....ct(commit,zone=1,nat(src=200.0.0.10))
> 
>      CT STATE
>      icmp,orig=(src=50.0.0.10,dst=8.8.8.8,id=2288,type=8,code=0),
>      reply=(src=8.8.8.8,dst=100.0.0.10,id=2288,type=0,code=0),zone=1
> 
> FIX:
> This patch improves the overall CT ZONE management by doing following:
> a. Do a FLUSH by CT ZONE, once we identify that a zone has to be freed up.
> b. From datapath perspective, restrict the CT ZONE assignment ONLY
>    to logical routers that has NAT rules enabled.

Hi Ankur,

Maybe I'm missing something but doesn't this cause stateful ACLs, load
balancers applied to logical_switches and load balancers applied to
logical routers that don't have nat configured to use the default
conntrack zone (0)? I don't think that's ok.

As a matter of fact, most of the system-ovn.at tests related to
conntrack fail because conntrack entries are created in the default
zone, e.g.:

$ make check-kernel
1: ovn -- 2 LRs connected via LS, gateway router, SNAT and DNAT FAILED
(system-ovn.at:116)

+++ /root/ovn/tests/system-kmod-testsuite.dir/at-groups/1/stdout
2020-08-07 04:50:11.167836393 -0400
@@ -1,2 +1,2 @@
-icmp,orig=(src=172.16.1.2,dst=30.0.0.2,id=<cleared>,type=8,code=0),reply=(src=192.168.1.2,dst=172.16.1.2,id=<cleared>,type=0,code=0),zone=<cleared>
+icmp,orig=(src=172.16.1.2,dst=30.0.0.2,id=<cleared>,type=8,code=0),reply=(src=192.168.1.2,dst=172.16.1.2,id=<cleared>,type=0,code=0)

  2: ovn -- 2 LRs connected via LS, gateway router, SNAT and DNAT - IPv6
FAILED (system-ovn.at:296)
  3: ovn -- 2 LRs connected via LS, gateway router, easy SNAT FAILED
(system-ovn.at:448)
  4: ovn -- 2 LRs connected via LS, gateway router, easy SNAT - IPv6
FAILED (system-ovn.at:560)
  5: ovn -- multiple gateway routers, SNAT and DNAT  FAILED
(system-ovn.at:730)
  6: ovn -- multiple gateway routers, SNAT and DNAT - IPv6 FAILED
(system-ovn.at:958)
  7: ovn -- load-balancing                           FAILED
(system-ovn.at:1153)
  8: ovn -- load-balancing - IPv6                    ok
  9: ovn -- load-balancing - same subnet.            ok
 10: ovn -- load-balancing - same subnet. - IPv6     ok
 11: ovn -- load balancing in gateway router         FAILED
(system-ovn.at:1829)

./system-ovn.at:1829: ovs-appctl dpctl/dump-conntrack | grep
"dst=30.0.0.1" | sed -e 's/port=[0-9]*/port=<cleared>/g' -e
's/id=[0-9]*/id=<cleared>/g' -e 's/state=[0-9_A-Z]*/state=<cleared>/g' |
sort | uniq |
sed -e 's/zone=[0-9]*/zone=<cleared>/'
--- -   2020-08-07 04:46:41.414957151 -0400
+++ /root/ovn/tests/system-kmod-testsuite.dir/at-groups/11/stdout
2020-08-07 04:46:41.412798625 -0400
@@ -1,3 +1,3 @@
-tcp,orig=(src=172.16.1.2,dst=30.0.0.1,sport=<cleared>,dport=<cleared>),reply=(src=192.168.1.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
-tcp,orig=(src=172.16.1.2,dst=30.0.0.1,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),zone=<cleared>,protoinfo=(state=<cleared>)
+tcp,orig=(src=172.16.1.2,dst=30.0.0.1,sport=<cleared>,dport=<cleared>),reply=(src=192.168.1.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),protoinfo=(state=<cleared>)
+tcp,orig=(src=172.16.1.2,dst=30.0.0.1,sport=<cleared>,dport=<cleared>),reply=(src=192.168.2.2,dst=172.16.1.2,sport=<cleared>,dport=<cleared>),protoinfo=(state=<cleared>)


 12: ovn -- load balancing in gateway router - IPv6  FAILED
(system-ovn.at:2033)
 13: ovn -- multiple gateway routers, load-balancing FAILED
(system-ovn.at:2209)
 14: ovn -- multiple gateway routers, load-balancing - IPv6 FAILED
(system-ovn.at:2376)

[...]

 30: ovn -- ECMP symmetric reply                     FAILED
(system-ovn.at:4684)

./system-ovn.at:4684: ovs-appctl dpctl/dump-conntrack | grep
"dst=172.16.0.1" | sed -e 's/port=[0-9]*/port=<cleared>/g' -e
's/id=[0-9]*/id=<cleared>/g' -e 's/state=[0-9_A-Z]*/state=<cleared>/g' |
sort | uniq | \
sed -e 's/zone=[0-9]*/zone=<cleared>/' |
sed -e
's/labels=0x[0-9a-f]*00000401020400000000/labels=0x00000401020400000000/'
--- -   2020-08-07 04:56:22.403499981 -0400
+++ /root/ovn/tests/system-kmod-testsuite.dir/at-groups/30/stdout
2020-08-07 04:56:22.401442917 -0400
@@ -1,2 +1,2 @@
-icmp,orig=(src=172.16.0.1,dst=10.0.0.2,id=<cleared>,type=8,code=0),reply=(src=10.0.0.2,dst=172.16.0.1,id=<cleared>,type=0,code=0),zone=<cleared>,labels=0x00000401020400000000
+icmp,orig=(src=172.16.0.1,dst=10.0.0.2,id=<cleared>,type=8,code=0),reply=(src=10.0.0.2,dst=172.16.0.1,id=<cleared>,type=0,code=0),labels=0x00000401020400000000

> c. Instead of using logical router uuid as ct zone key, use crossproduct
>    of logical router and logical router port that connects to external
>    logical switch.
> 
> Signed-off-by: Ankur Sharma <ankur.sharma at nutanix.com>
> ---
>  controller/ovn-controller.c | 37 +++++++++++++++++++++++++++----------
>  controller/physical.c       | 18 ++++++++++++------
>  lib/ovn-util.c              | 10 ++++++----
>  lib/ovn-util.h              |  3 ++-
>  4 files changed, 47 insertions(+), 21 deletions(-)
> 
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index 5ca32ac..9a6746e 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -521,17 +521,34 @@ update_ct_zones(const struct sset *lports, const struct hmap *local_datapaths,
>          sset_add(&all_users, user);
>      }
>  
> -    /* Local patched datapath (gateway routers) need zones assigned. */
> +    /* Local patched datapath (gateway routers) need zones assigned.
> +     * Only local logical routers with atleast one NAT rule are considered for
> +     * CT zone assignment.*/
>      const struct local_datapath *ld;
>      HMAP_FOR_EACH (ld, hmap_node, local_datapaths) {
> -        /* XXX Add method to limit zone assignment to logical router
> -         * datapaths with NAT */
> -        char *dnat = alloc_nat_zone_key(&ld->datapath->header_.uuid, "dnat");
> -        char *snat = alloc_nat_zone_key(&ld->datapath->header_.uuid, "snat");
> -        sset_add(&all_users, dnat);
> -        sset_add(&all_users, snat);
> -        free(dnat);
> -        free(snat);
> +       const char *dp_nblr = smap_get(&ld->datapath->external_ids,
> +                                      "logical-router");
> +       if (dp_nblr) {
> +          for (size_t iter = 0; iter < ld->n_peer_ports; iter++) {
> +             const struct sbrec_port_binding *peer_binding =
> +                                             ld->peer_ports[iter].remote;
> +             const struct sbrec_port_binding *local_binding =
> +                                             ld->peer_ports[iter].local;
> +
> +             if (peer_binding->nat_addresses) {
> +                char *dnat = alloc_nat_zone_key(&ld->datapath->header_.uuid,
> +                                                &local_binding->header_.uuid,
> +                                                "dnat");
> +                char *snat = alloc_nat_zone_key(&ld->datapath->header_.uuid,
> +                                                &local_binding->header_.uuid,
> +                                                "snat");
> +                sset_add(&all_users, dnat);
> +                sset_add(&all_users, snat);
> +                free(dnat);
> +                free(snat);
> +             }
> +          }
> +       }
>      }
>  
>      /* Delete zones that do not exist in above sset. */
> @@ -541,7 +558,7 @@ update_ct_zones(const struct sset *lports, const struct hmap *local_datapaths,
>                       ct_zone->data, ct_zone->name);
>  
>              struct ct_zone_pending_entry *pending = xmalloc(sizeof *pending);
> -            pending->state = CT_ZONE_DB_QUEUED; /* Skip flushing zone. */
> +            pending->state = CT_ZONE_OF_QUEUED;
>              pending->zone = ct_zone->data;
>              pending->add = false;
>              shash_add(pending_ct_zones, ct_zone->name, pending);
> diff --git a/controller/physical.c b/controller/physical.c
> index 535c777..cc497e0 100644
> --- a/controller/physical.c
> +++ b/controller/physical.c
> @@ -218,18 +218,24 @@ static struct zone_ids
>  get_zone_ids(const struct sbrec_port_binding *binding,
>               const struct simap *ct_zones)
>  {
> -    struct zone_ids zone_ids;
> +    struct zone_ids zone_ids = {0};
>  
>      zone_ids.ct = simap_get(ct_zones, binding->logical_port);
>  
> -    const struct uuid *key = &binding->datapath->header_.uuid;
> +    const struct uuid *key1 = &binding->datapath->header_.uuid;
> +    const struct uuid *key2 = &binding->header_.uuid;
>  
> -    char *dnat = alloc_nat_zone_key(key, "dnat");
> -    zone_ids.dnat = simap_get(ct_zones, dnat);
> +    char *dnat = alloc_nat_zone_key(key1, key2, "dnat");
> +
> +    if (simap_contains(ct_zones, dnat)) {
> +       zone_ids.dnat = simap_get(ct_zones, dnat);
> +    }

Both simap_get() and simap_contains() call simap_find() internally.  If
simap_get() can't find the key it will return a default value of 0.
There's no need for the "if (simap_contains(ct_zones, dnat)) {". We can
just directly:

zone_ids.dnat = simap_get(ct_zones, dnat);

>      free(dnat);
>  
> -    char *snat = alloc_nat_zone_key(key, "snat");
> -    zone_ids.snat = simap_get(ct_zones, snat);
> +    char *snat = alloc_nat_zone_key(key1, key2, "snat");
> +    if (simap_contains(ct_zones, snat)) {
> +       zone_ids.snat = simap_get(ct_zones, snat);
> +    }

Same here.

Thanks,
Dumitru

>      free(snat);
>  
>      return zone_ids;
> diff --git a/lib/ovn-util.c b/lib/ovn-util.c
> index cdb5e18..cba7355 100644
> --- a/lib/ovn-util.c
> +++ b/lib/ovn-util.c
> @@ -327,14 +327,16 @@ destroy_lport_addresses(struct lport_addresses *laddrs)
>      free(laddrs->ipv6_addrs);
>  }
>  
> -/* Allocates a key for NAT conntrack zone allocation for a provided
> - * 'key' record and a 'type'.
> +/* Allocates a key for NAT conntrack zone allocation for provided
> + * 'keys' and a 'type'.
>   *
>   * It is the caller's responsibility to free the allocated memory. */
>  char *
> -alloc_nat_zone_key(const struct uuid *key, const char *type)
> +alloc_nat_zone_key(const struct uuid *key1, const struct uuid *key2,
> +                   const char *type)
>  {
> -    return xasprintf(UUID_FMT"_%s", UUID_ARGS(key), type);
> +    return xasprintf(UUID_FMT"_"UUID_FMT"_%s", UUID_ARGS(key1),
> +                     UUID_ARGS(key2), type);
>  }
>  
>  const char *
> diff --git a/lib/ovn-util.h b/lib/ovn-util.h
> index 0f7b501..fe86bf8 100644
> --- a/lib/ovn-util.h
> +++ b/lib/ovn-util.h
> @@ -77,7 +77,8 @@ bool extract_sbrec_binding_first_mac(const struct sbrec_port_binding *binding,
>  
>  void destroy_lport_addresses(struct lport_addresses *);
>  
> -char *alloc_nat_zone_key(const struct uuid *key, const char *type);
> +char *alloc_nat_zone_key(const struct uuid *key1, const struct uuid *key2,
> +                         const char *type);
>  
>  const char *default_nb_db(void);
>  const char *default_sb_db(void);
> 



More information about the dev mailing list