[ovs-dev] [PATCH ovn v2] Allow explicit setting of the SNAT zone on a gateway router.

Numan Siddique numans at ovn.org
Mon Nov 16 18:54:05 UTC 2020


On Tue, Nov 17, 2020 at 12:14 AM Mark Michelson <mmichels at redhat.com> wrote:
>
> Thanks for the review Numan.
>
> On 11/16/20 4:19 AM, Numan Siddique wrote:
> > On Thu, Nov 12, 2020 at 8:26 PM Mark Michelson <mmichels at redhat.com> wrote:
> >>
> >> In certain situations, OVN may coexist with other applications on a
> >> host. Traffic from OVN and the other applications may then go out a
> >> shared gateway. If OVN traffic and the other application traffic use
> >> different conntrack zones for SNAT, then it is possible for the shared
> >> gateway to assign conflicting source IP:port combinations. By sharing
> >> the same conntrack zone, there will be no conflicting assignments.
> >>
> >> In this commit, we introduce options:snat-ct-zone for northbound logical
> >> routers. By setting this option, users can explicitly set the conntrack
> >> zone for the logical router so that it will match the zone used by
> >> non-OVN traffic on the host.
> >>
> >> The biggest side effects of this patch are:
> >> 1) southbound datapath changes now result in recalculating CT zones in
> >>     ovn-controller. This can result in recomputing physical flows in more
> >>     situations than previously.
> >> 2) The table 65 flow to transition between datapaths is no longer
> >>     associated with a port binding. This is because the flow refers to
> >>     the peer datapath's CT zones, which can now be updated due to changes
> >>     on that datapath. The flow therefore may need to be updated either
> >>     due to the port binding being changed or the peer datapath being
> >>     changed.
> >>
> >> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1892311
> >> Signed-off-by: Mark Michelson <mmichels at redhat.com>
> >
> > Hi Mark,
> >
> > Thanks for the patch.
> >
> > I did some testing with the below logical resources
> >
> > ******
> > switch 0ff55345-e283-4ed7-8c7f-8475a8a8f968 (sw1)
> >      port sw1-lr0
> >          type: router
> >          addresses: ["00:00:00:00:ff:02"]
> >          router-port: lr0-sw1
> >      port sw1-port2
> >          addresses: ["40:54:00:00:00:04 20.0.0.4"]
> > switch bddcdd29-9b32-482a-b9a6-0f9fa1a0e25a (sw0)
> >      port sw0-lr0
> >          type: router
> >          addresses: ["00:00:00:00:ff:01"]
> >          router-port: lr0-sw0
> >      port sw0-port1
> >          addresses: ["50:54:00:00:00:03 10.0.0.3"]
> >      port sw0-lr1
> >          type: router
> >          router-port: lr1-sw0
> > router 9462469c-d799-4d28-bccd-80e9a8c68e24 (lr1)
> >      port lr1-sw0
> >          mac: "00:00:00:00:fa:21"
> >          networks: ["10.0.0.20/24"]
> > router 96b14269-65ad-4ae9-9fa6-e4ef2a344a54 (lr0)
> >      port lr0-sw1
> >          mac: "00:00:00:00:ff:02"
> >          networks: ["20.0.0.1/24"]
> >      port lr0-public
> >          mac: "00:00:20:20:12:13"
> >          networks: ["172.168.0.100/24"]
> >          gateway chassis: [chassis-1]
> > *****
> >
> >
> > If I set - "ovn-nbctl set logical_router lr0 options:snat-ct-zone=x"
> > and If 'x' is already allocated for sw0 snat, then the
> > both sw0 snat and lr0 snat share the same zone id. Is that expected ?
>
> That is not expected. If you request a zone for lr0, then sw0 should
> have its zone re-assigned to something different. I can add a test case
> for this and debug it. Looking at the code, it's not obvious to me why
> it's failing.
>
> >
> > If I set the same zone id 'y' for both lr0 and lr1, then I observed
> > that both lr0 and lr1 have the same zone id 'y' and I also
> > see the warning in the ovn-controller log.
> >
> > Later if I change the zone of lr1 to 'z' and then lr0 to 'z', the lr0
> > is not updated with 'z'. So lr0 will have 'y' and lr1 will have 'z'.
> > Seems like it's not consistent.
>
> I wrote the code with the expectation that if duplicate zone assignments
> were detected, the oldest request would be honored and the newest would
> be ignored. If you first requested 'y' for lr0 and then later requested
> 'y' for lr1, the expectation is that when you requested 'y' for lr1,
> you'd see the warning, and lr1 would be auto-assigned a different zone
> id. I think the inconsistent behavior may be occurring due to the
> hashing of the datapaths. The order in which datapaths are visited does
> not correlate with the order that the datapaths were assigned their zone
> IDs. So if the newer assignment is visited in the datapath traversal
> before the older assignment, they'll both get assigned the same zone.
> But if the newer assignment is visited in the datapath traversal after
> the older assignment, it works as intended. This could be corrected, but
> the point may be moot based on discussion below.
>
> >
> > Can there be a  possibility that there are 2 shared gateways in one
> > node and both want to share the same zone id ?
> >
> > i.e lr0 is associated with one shared gateway and lr1 with another
> > shared gateway (on the same host) and both want to use the zone id 0 ?
>
> Yes, I guess that could very well be possible. I guess there's no harm
> in having multiple gateways share the same zone assignment if they both
> were explicitly configured to use that particular zone assigment. That
> actually would simplify the code a bit.

Thanks for the reply and explanation. To simplify the code and to
avoid checking for duplicate
zone id assignments, we could

1. Allow only zone id 0 for configuration from CMS. This is very restrictive.
2. OVN can reserve the zone ids from 0 to say 100 for CMS
configuration. And CMS is free to use
   the zone ids within this reserve. ovn-controllers will allocate
zone ids starting from 101 for the normal logical ports/routers. This
is not very
   restrictive and would also simplify the code.

What do you think of this ? My personal preference would be (2). I am
fine with the present proposal too if (1) or (2) will not solve the
requirement.

Thanks
Numan


>
> >
> > Thanks
> > Numan
> >
> >
> >> ---
> >>   controller/ovn-controller.c | 89 +++++++++++++++++++++++++++++++-----
> >>   controller/physical.c       |  2 +-
> >>   lib/ovn-util.c              |  7 +++
> >>   lib/ovn-util.h              |  1 +
> >>   northd/ovn-northd.c         | 10 ++++
> >>   ovn-nb.xml                  |  7 +++
> >>   tests/ovn.at                | 91 +++++++++++++++++++++++++++++++++++++
> >>   7 files changed, 194 insertions(+), 13 deletions(-)
> >>
> >> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> >> index a06cae3cc..54c8fd2db 100644
> >> --- a/controller/ovn-controller.c
> >> +++ b/controller/ovn-controller.c
> >> @@ -531,6 +531,21 @@ update_sb_db(struct ovsdb_idl *ovs_idl, struct ovsdb_idl *ovnsb_idl,
> >>       }
> >>   }
> >>
> >> +static void
> >> +add_pending_ct_zone_entry(struct shash *pending_ct_zones,
> >> +                          enum ct_zone_pending_state state,
> >> +                          int zone, bool add, const char *name)
> >> +{
> >> +    VLOG_DBG("%s ct zone %"PRId32" for '%s'",
> >> +             add ? "assigning" : "removing", zone, name);
> >> +
> >> +    struct ct_zone_pending_entry *pending = xmalloc(sizeof *pending);
> >> +    pending->state = state; /* Skip flushing zone. */
> >> +    pending->zone = zone;
> >> +    pending->add = add;
> >> +    shash_add(pending_ct_zones, name, pending);
> >> +}
> >> +
> >>   static void
> >>   update_ct_zones(const struct sset *lports, const struct hmap *local_datapaths,
> >>                   struct simap *ct_zones, unsigned long *ct_zone_bitmap,
> >> @@ -540,6 +555,7 @@ update_ct_zones(const struct sset *lports, const struct hmap *local_datapaths,
> >>       int scan_start = 1;
> >>       const char *user;
> >>       struct sset all_users = SSET_INITIALIZER(&all_users);
> >> +    struct simap req_snat_zones = SIMAP_INITIALIZER(&req_snat_zones);
> >>
> >>       SSET_FOR_EACH(user, lports) {
> >>           sset_add(&all_users, user);
> >> @@ -554,6 +570,25 @@ update_ct_zones(const struct sset *lports, const struct hmap *local_datapaths,
> >>           char *snat = alloc_nat_zone_key(&ld->datapath->header_.uuid, "snat");
> >>           sset_add(&all_users, dnat);
> >>           sset_add(&all_users, snat);
> >> +
> >> +        int req_snat_zone = datapath_snat_ct_zone(ld->datapath);
> >> +        if (req_snat_zone >= 0) {
> >> +            struct simap_node *node;
> >> +            bool collision = false;
> >> +            SIMAP_FOR_EACH (node, &req_snat_zones) {
> >> +                if (node->data == req_snat_zone) {
> >> +                    VLOG_WARN("Datapaths %.*s and " UUID_FMT " request SNAT "
> >> +                              "CT zone %d\n", UUID_LEN, node->name,
> >> +                              UUID_ARGS(&ld->datapath->header_.uuid),
> >> +                              req_snat_zone);
> >> +                    collision = true;
> >> +                    break;
> >> +                }
> >> +            }
> >> +            if (!collision) {
> >> +                simap_put(&req_snat_zones, snat, req_snat_zone);
> >> +            }
> >> +        }
> >>           free(dnat);
> >>           free(snat);
> >>       }
> >> @@ -564,17 +599,51 @@ update_ct_zones(const struct sset *lports, const struct hmap *local_datapaths,
> >>               VLOG_DBG("removing ct zone %"PRId32" for '%s'",
> >>                        ct_zone->data, ct_zone->name);
> >>
> >> -            struct ct_zone_pending_entry *pending = xmalloc(sizeof *pending);
> >> -            pending->state = CT_ZONE_DB_QUEUED; /* Skip flushing zone. */
> >> -            pending->zone = ct_zone->data;
> >> -            pending->add = false;
> >> -            shash_add(pending_ct_zones, ct_zone->name, pending);
> >> +            add_pending_ct_zone_entry(pending_ct_zones, CT_ZONE_DB_QUEUED,
> >> +                                      ct_zone->data, false, ct_zone->name);
> >>
> >>               bitmap_set0(ct_zone_bitmap, ct_zone->data);
> >>               simap_delete(ct_zones, ct_zone);
> >>           }
> >>       }
> >>
> >> +    /* Prioritize requested CT zones */
> >> +    struct simap_node *snat_req_node;
> >> +    SIMAP_FOR_EACH (snat_req_node, &req_snat_zones) {
> >> +        struct simap_node *node = simap_find(ct_zones, snat_req_node->name);
> >> +        if (node) {
> >> +            if (node->data == snat_req_node->data) {
> >> +                /* Already have this zone reserved */
> >> +                continue;
> >> +            } else {
> >> +                /* Zone has changed for this node. delete old entry */
> >> +                bitmap_set0(ct_zone_bitmap, node->data);
> >> +                simap_delete(ct_zones, node);
> >> +            }
> >> +        } else if (snat_req_node->data > 0 &&
> >> +                   bitmap_is_set(ct_zone_bitmap, snat_req_node->data)) {
> >> +            /* Uh oh. Someone else already has this zone assigned.
> >> +             * We need to find who and remove them from ct_zones so
> >> +             * that they get re-assigned a new zone below
> >> +             */
> >> +            struct simap_node *next;
> >> +            SIMAP_FOR_EACH_SAFE (node, next, ct_zones) {
> >> +                if (node->data == snat_req_node->data) {
> >> +                    simap_delete(ct_zones, node);
> >> +                    break;
> >> +                }
> >> +            }
> >> +        }
> >> +
> >> +        add_pending_ct_zone_entry(pending_ct_zones, CT_ZONE_OF_QUEUED,
> >> +                                  snat_req_node->data, true,
> >> +                                  snat_req_node->name);
> >> +
> >> +        bitmap_set1(ct_zone_bitmap, snat_req_node->data);
> >> +        simap_put(ct_zones, snat_req_node->name, snat_req_node->data);
> >> +    }
> >> +    simap_destroy(&req_snat_zones);
> >> +
> >>       /* xxx This is wasteful to assign a zone to each port--even if no
> >>        * xxx security policy is applied. */
> >>
> >> @@ -596,13 +665,8 @@ update_ct_zones(const struct sset *lports, const struct hmap *local_datapaths,
> >>           }
> >>           scan_start = zone + 1;
> >>
> >> -        VLOG_DBG("assigning ct zone %"PRId32" to '%s'", zone, user);
> >> -
> >> -        struct ct_zone_pending_entry *pending = xmalloc(sizeof *pending);
> >> -        pending->state = CT_ZONE_OF_QUEUED;
> >> -        pending->zone = zone;
> >> -        pending->add = true;
> >> -        shash_add(pending_ct_zones, user, pending);
> >> +        add_pending_ct_zone_entry(pending_ct_zones, CT_ZONE_OF_QUEUED,
> >> +                                  zone, true, user);
> >>
> >>           bitmap_set1(ct_zone_bitmap, zone);
> >>           simap_put(ct_zones, user, zone);
> >> @@ -2330,6 +2394,7 @@ main(int argc, char *argv[])
> >>
> >>       engine_add_input(&en_ct_zones, &en_ovs_open_vswitch, NULL);
> >>       engine_add_input(&en_ct_zones, &en_ovs_bridge, NULL);
> >> +    engine_add_input(&en_ct_zones, &en_sb_datapath_binding, NULL);
> >>       engine_add_input(&en_ct_zones, &en_runtime_data, NULL);
> >>
> >>       engine_add_input(&en_runtime_data, &en_ofctrl_is_connected, NULL);
> >> diff --git a/controller/physical.c b/controller/physical.c
> >> index 1bc2c389b..00c4ca4fd 100644
> >> --- a/controller/physical.c
> >> +++ b/controller/physical.c
> >> @@ -926,7 +926,7 @@ consider_port_binding(struct ovsdb_idl_index *sbrec_port_binding_by_name,
> >>
> >>           ofctrl_add_flow(flow_table, OFTABLE_LOG_TO_PHY, 100,
> >>                           binding->header_.uuid.parts[0],
> >> -                        &match, ofpacts_p, &binding->header_.uuid);
> >> +                        &match, ofpacts_p, hc_uuid);
> >>           return;
> >>       }
> >>
> >> diff --git a/lib/ovn-util.c b/lib/ovn-util.c
> >> index 18aac8da3..f0fb796b4 100644
> >> --- a/lib/ovn-util.c
> >> +++ b/lib/ovn-util.c
> >> @@ -532,6 +532,13 @@ datapath_is_switch(const struct sbrec_datapath_binding *ldp)
> >>   {
> >>       return smap_get(&ldp->external_ids, "logical-switch") != NULL;
> >>   }
> >> +
> >> +int
> >> +datapath_snat_ct_zone(const struct sbrec_datapath_binding *dp)
> >> +{
> >> +    return smap_get_int(&dp->external_ids, "snat-ct-zone", -1);
> >> +}
> >> +
> >>
> >>   struct tnlid_node {
> >>       struct hmap_node hmap_node;
> >> diff --git a/lib/ovn-util.h b/lib/ovn-util.h
> >> index 3496673b2..ea8226571 100644
> >> --- a/lib/ovn-util.h
> >> +++ b/lib/ovn-util.h
> >> @@ -107,6 +107,7 @@ uint32_t ovn_logical_flow_hash(const struct uuid *logical_datapath,
> >>                                  uint16_t priority,
> >>                                  const char *match, const char *actions);
> >>   bool datapath_is_switch(const struct sbrec_datapath_binding *);
> >> +int datapath_snat_ct_zone(const struct sbrec_datapath_binding *ldp);
> >>   void ovn_conn_show(struct unixctl_conn *conn, int argc OVS_UNUSED,
> >>                      const char *argv[] OVS_UNUSED, void *idl_);
> >>
> >> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> >> index 4d4190cb9..8ce756c8a 100644
> >> --- a/northd/ovn-northd.c
> >> +++ b/northd/ovn-northd.c
> >> @@ -1179,6 +1179,16 @@ ovn_datapath_update_external_ids(struct ovn_datapath *od)
> >>               smap_add(&ids, "interconn-ts", ts);
> >>           }
> >>       }
> >> +
> >> +    /* Set snat-ct-zone */
> >> +    if (od->nbr) {
> >> +        int nat_default_ct = smap_get_int(&od->nbr->options,
> >> +                                           "snat-ct-zone", -1);
> >> +        if (nat_default_ct >= 0) {
> >> +            smap_add_format(&ids, "snat-ct-zone", "%d", nat_default_ct);
> >> +        }
> >> +    }
> >> +
> >>       sbrec_datapath_binding_set_external_ids(od->sb, &ids);
> >>       smap_destroy(&ids);
> >>   }
> >> diff --git a/ovn-nb.xml b/ovn-nb.xml
> >> index 5e6c6c7a3..619c0e299 100644
> >> --- a/ovn-nb.xml
> >> +++ b/ovn-nb.xml
> >> @@ -1961,6 +1961,13 @@
> >>           unique key for each datapath by itself.  However, if it is configured,
> >>           <code>ovn-northd</code> honors the configured value.
> >>         </column>
> >> +      <column name="options" key="snat-ct-zone"
> >> +          type='{"type": "integer", "minInteger": 0, "maxInteger": 65535}'>
> >> +        Use the requested conntrack zone for SNAT with this router. This can be
> >> +        useful if egress traffic from the host running OVN comes from both OVN
> >> +        and other sources. This way, OVN and the other sources can make use of
> >> +        the same conntrack zone.
> >> +      </column>
> >>       </group>
> >>
> >>       <group title="Common Columns">
> >> diff --git a/tests/ovn.at b/tests/ovn.at
> >> index 378d11921..4f8c1a3e1 100644
> >> --- a/tests/ovn.at
> >> +++ b/tests/ovn.at
> >> @@ -22686,3 +22686,94 @@ AT_CHECK([test "$encap_rec_mvtep" == "$encap_rec_mvtep1"], [0], [])
> >>
> >>   OVN_CLEANUP([hv1])
> >>   AT_CLEANUP
> >> +
> >> +AT_SETUP([ovn -- snat default ct zone])
> >> +ovn_start
> >> +
> >> +net_add n1
> >> +sim_add hv1
> >> +ovs-vsctl add-br br-phys
> >> +as hv1
> >> +ovn_attach n1 br-phys 192.168.0.10
> >> +
> >> +ovn-nbctl ls-add sw0
> >> +ovn-nbctl lsp-add sw0 sw0-p1
> >> +ovn-nbctl lsp-set-addresses sw0-p1 "00:00:00:00:00:02 10.0.0.2"
> >> +
> >> +ovn-nbctl lr-add gw_router
> >> +ovn-nbctl set Logical_Router gw_router options:chassis="hv1"
> >> +
> >> +ovn-nbctl lrp-add gw_router gw_router-sw0 00:00:00:00:00:01 10.0.0.1/24
> >> +ovn-nbctl lsp-add sw0 sw0-gw_router
> >> +ovn-nbctl lsp-set-addresses sw0-gw_router router
> >> +ovn-nbctl set Logical_Switch_Port sw0-gw_router type=router \
> >> +                             options:router-port=gw_router-sw0 \
> >> +
> >> +ovn-nbctl lr-nat-add gw_router snat 192.168.0.1 10.0.0.0/24
> >> +
> >> +as hv1 ovs-vsctl -- add-port br-int hv1-vif1 -- \
> >> +    set interface hv1-vif1 external-ids:iface-id=sw0-p1
> >> +
> >> +ovn-nbctl --wait=hv sync
> >> +
> >> +ro_nb_uuid=$(ovn-nbctl get Logical_Router gw_router _uuid)
> >> +sw_nb_uuid=$(ovn-nbctl get Logical_Switch sw0 _uuid)
> >> +ro_sb_uuid=$(ovn-sbctl --bare --columns=_uuid find Datapath_Binding external-ids:logical-router=${ro_nb_uuid})
> >> +sw_sb_uuid=$(ovn-sbctl --bare --columns=_uuid find Datapath_Binding external-ids:logical-switch=${sw_nb_uuid})
> >> +ro_meta=$(ovn-sbctl get Datapath_Binding ${ro_sb_uuid} tunnel_key)
> >> +ro_meta=$(printf %#x ${ro_meta})
> >> +sw_meta=$(ovn-sbctl get Datapath_Binding ${sw_sb_uuid} tunnel_key)
> >> +sw_meta=$(printf %#x ${sw_meta})
> >> +
> >> +echo "ro_nb_uuid: ${ro_nb_uuid}"
> >> +echo "sw_nb_uuid: ${sw_nb_uuid}"
> >> +echo "ro_sb_uuid: ${ro_sb_uuid}"
> >> +echo "sw_sb_uuid: ${sw_sb_uuid}"
> >> +echo "ro_meta: ${ro_meta}"
> >> +echo "sw_meta: ${sw_meta}"
> >> +
> >> +as hv1
> >> +ovs-vsctl list bridge br-int
> >> +snat_zone=$(printf %#x $(ovs-vsctl get bridge br-int external-ids:ct-zone-${ro_sb_uuid}_snat | tr -d \"))
> >> +
> >> +echo "snat_zone: ${snat_zone}"
> >> +
> >> +as hv1 ovs-ofctl dump-flows br-int > offlows_pre
> >> +AT_CAPTURE_FILE([offlows_pre])
> >> +# We should have a flow in table 33 that transitions from the ingress pipeline
> >> +# to the egress pipeline of gw_router.
> >> +AT_CHECK_UNQUOTED([grep -c "table=33.*metadata=${ro_meta}.*load:${snat_zone}->NXM_NX_REG12[]" offlows_pre], [0], [dnl
> >> +1
> >> +])
> >> +
> >> +# We should have a flow in table 65 that transitions from the egress pipeline
> >> +# of sw0 to the ingress pipeline of gw_router.
> >> +AT_CHECK_UNQUOTED([grep -c "table=65.*metadata=${sw_meta}.*load:${snat_zone}->NXM_NX_REG12[]" offlows_pre], [0], [dnl
> >> +1
> >> +])
> >> +
> >> +ovn-nbctl --wait=hv set Logical_Router gw_router options:snat-ct-zone=666
> >> +
> >> +as hv1
> >> +snat_zone=$(ovs-vsctl get bridge br-int external-ids:ct-zone-${ro_sb_uuid}_snat | tr -d \")
> >> +
> >> +echo "snat_zone: ${snat_zone}"
> >> +
> >> +AT_CHECK([test "${snat_zone}" = "666"], [0], [])
> >> +
> >> +as hv1 ovs-ofctl dump-flows br-int > offlows_post
> >> +AT_CAPTURE_FILE([offlows_post])
> >> +# We should have a flow in table 33 that transitions from the ingress pipeline
> >> +# to the egress pipeline of gw_router.
> >> +AT_CHECK_UNQUOTED([grep -c "table=33.*metadata=${ro_meta}.*load:0x29a->NXM_NX_REG12[]" offlows_post], [0], [dnl
> >> +1
> >> +])
> >> +
> >> +# We should have a flow in table 65 that transitions from the egress pipeline
> >> +# of sw0 to the ingress pipeline of gw_router.
> >> +AT_CHECK_UNQUOTED([grep -c "table=65.*metadata=${sw_meta}.*load:0x29a->NXM_NX_REG12[]" offlows_post], [0], [dnl
> >> +1
> >> +])
> >> +
> >> +OVN_CLEANUP([hv1])
> >> +AT_CLEANUP
> >> --
> >> 2.25.4
> >>
> >> _______________________________________________
> >> dev mailing list
> >> dev at openvswitch.org
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>
> >
>
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>


More information about the dev mailing list