[ovs-dev] [PATCH v4 ovn] Replace chassis mac with router port mac on destination chassis

Ankur Sharma ankur.sharma at nutanix.com
Tue Sep 10 21:24:18 UTC 2019


Hi Numan,

Thanks for calling out the failure.
Fixed the same in v5.

Did not realize that we have removed OVS tests from repo, hence based on test number, I ruled it out as an unrelated tests.

Regards,
Ankur

From: Numan Siddique <nusiddiq at redhat.com>
Sent: Tuesday, September 10, 2019 12:51 AM
To: Ankur Sharma <ankur.sharma at nutanix.com>
Cc: ovs-dev at openvswitch.org
Subject: Re: [ovs-dev] [PATCH v4 ovn] Replace chassis mac with router port mac on destination chassis



On Mon, Sep 9, 2019 at 1:58 AM Ankur Sharma <ankur.sharma at nutanix.com<mailto:ankur.sharma at nutanix.com>> wrote:
During E-W routing for vlan backed networks, we replace router port
mac with chassis mac, when packet leaves the source hypervisor.

As a result, the destination VM (on remote hypervisor) will see
chassis mac as source mac in received packet.

Although, functionality wise this does not cause any issue,
however chassis mac being see as source on VM, will
lead to following:
a. INCONSISTENT SOURCE MAC:
   If the destination VM moves to same hypervisor as sender,
   then it will see router port mac as source mac. Whereas, on
   a remote hypervisor, source mac will be the sender chassis mac.

   This will cause inconsistency in packet headers for the same
   flow and could be confusing for someone looking at packet
   captures inside the vm.

b. SYSTEM MAC BEING EXPOSED TO VM:
   Chassis mac is a CMS provided mac, i.e it is an infrastructure
   mac. It is not a good practice to expose such values to VM,
   which should not be seeing them in first place.

In order to replace chassis mac with router port mac, we will
do following.

a. Create conjunction for each chassis mac and router port vlan
   id combination. For example, for a 2 node chassis setup, where
   we have a logical router, connected to 4 logical switches with
   vlan ids: 2000, 1000, 0 and 24, the conjunction flow will look
   like following:

   cookie=0x0, duration=9094.608s, table=0, n_packets=0, n_bytes=0, idle_age=9094, priority=180,dl_src=aa:bb:cc:dd:ee:22 actions=conjunction(100,1/2)
   cookie=0x0, duration=9094.608s, table=0, n_packets=0, n_bytes=0, idle_age=9094, priority=180,dl_src=aa:bb:cc:dd:ff:ee actions=conjunction(100,1/2)

   cookie=0x0, duration=9094.552s, table=0, n_packets=0, n_bytes=0, idle_age=9094, priority=180,dl_vlan=2000 actions=conjunction(100,2/2)
   cookie=0x0, duration=9094.552s, table=0, n_packets=0, n_bytes=0, idle_age=9094, priority=180,dl_vlan=1000 actions=conjunction(100,2/2)
   cookie=0x0, duration=9094.552s, table=0, n_packets=0, n_bytes=0, idle_age=9094, priority=180,vlan_tci=0x0000/0x1fff actions=conjunction(100,2/2)
   cookie=0x0, duration=9094.552s, table=0, n_packets=0, n_bytes=0, idle_age=9094, priority=180,dl_vlan=24 actions=conjunction(100,2/2)

b. Using this conjunction as match, we can identify if packet entering destination
   hypervisor was routed at the source or not. This will be done in table=0 (Physical to logical)
   at priority=180.
   For example:
   cookie=0x0, duration=9795.957s, table=0, n_packets=1391, n_bytes=141882, idle_age=8396, priority=180,conj_id=100,in_port=146,dl_vlan=1000 actions=.........,mod_dl_src:00:00:01:01:02:03,...

c. We use conjunction, as it will ensure that we do not end up having lot of flows
   as we scale up. If we do not use conjunction, then we will have
   N (number of chassis macs) X M (number of router vlans) number of ovs flows.
   Conjunction converts it to N + M.
   Consider a setup, with 500 Chassis and 500 routed vlans.
   Without conjunction we will need 25000 (500 * 500) flows,
   whereas with conjunction that number comes down to 1000 (500 + 500).

Signed-off-by: Ankur Sharma <ankur.sharma at nutanix.com<mailto:ankur.sharma at nutanix.com>>

Hi Ankur,

Thanks for v4.

The below test case  is failing

## ----------------------- ##
116: ovn -- 2 HVs, 2 lports/HV, localnet ports, DVR N-S Ping FAILED (ovn.at:15773 [ovn.at]<https://urldefense.proofpoint.com/v2/url?u=http-3A__ovn.at-3A15773&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=mSPIZIN9J6N8ibOG1VZbwAPCcyGl5w2dX3HIc407lc8&s=MtiEiCBeIkE5PnqLVt87SPAVbZwQl_KEuWC2xagdlYo&e=>)

## ------------- ##
## Test results. ##
## ------------- ##

Numan

---
 controller/chassis.c        |   2 +-
 controller/chassis.h        |   3 +
 controller/ovn-controller.c |   5 +
 controller/physical.c       | 222 ++++++++++++++++++++++++++++++++++++++++++--
 controller/physical.h       |   1 +
 ovn-architecture.7.xml      |  10 +-
 tests/ovn.at [ovn.at]<https://urldefense.proofpoint.com/v2/url?u=http-3A__ovn.at&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=mSPIZIN9J6N8ibOG1VZbwAPCcyGl5w2dX3HIc407lc8&s=s8f0ZsvzCYQCkNxo-BC_hmZqrygAXjeM-CJR1tRspx8&e=>                |  12 ++-
 7 files changed, 243 insertions(+), 12 deletions(-)

diff --git a/controller/chassis.c b/controller/chassis.c
index 937c557..699b662 100644
--- a/controller/chassis.c
+++ b/controller/chassis.c
@@ -144,7 +144,7 @@ get_bridge_mappings(const struct smap *ext_ids)
     return smap_get_def(ext_ids, "ovn-bridge-mappings", "");
 }

-static const char *
+const char *
 get_chassis_mac_mappings(const struct smap *ext_ids)
 {
     return smap_get_def(ext_ids, "ovn-chassis-mac-mappings", "");
diff --git a/controller/chassis.h b/controller/chassis.h
index eb46ca3..178d295 100644
--- a/controller/chassis.h
+++ b/controller/chassis.h
@@ -27,6 +27,7 @@ struct sbrec_chassis;
 struct sbrec_chassis_table;
 struct sset;
 struct eth_addr;
+struct smap;

 void chassis_register_ovs_idl(struct ovsdb_idl *);
 const struct sbrec_chassis *chassis_run(
@@ -42,5 +43,7 @@ bool chassis_get_mac(const struct sbrec_chassis *chassis,
                      const char *bridge_mapping,
                      struct eth_addr *chassis_mac);
 const char *chassis_get_id(void);
+const char * get_chassis_mac_mappings(const struct smap *ext_ids);
+

 #endif /* controller/chassis.h */
diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 86f29ac..2a4001b 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -1268,9 +1268,14 @@ en_flow_output_run(struct engine_node *node)
         (struct sbrec_port_binding_table *)EN_OVSDB_GET(
             engine_get_input("SB_port_binding", node));

+    struct sbrec_chassis_table *chassis_table =
+        (struct sbrec_chassis_table *)EN_OVSDB_GET(
+            engine_get_input("SB_chassis", node));
+
     physical_run(sbrec_port_binding_by_name,
                  multicast_group_table,
                  port_binding_table,
+                 chassis_table,
                  mff_ovn_geneve,
                  br_int, chassis, ct_zones,
                  local_datapaths, local_lports,
diff --git a/controller/physical.c b/controller/physical.c
index c818646..398bb19 100644
--- a/controller/physical.c
+++ b/controller/physical.c
@@ -47,9 +47,23 @@

 VLOG_DEFINE_THIS_MODULE(physical);

+/* Datapath zone IDs for connection tracking and NAT */
+struct zone_ids {
+    int ct;                     /* MFF_LOG_CT_ZONE. */
+    int dnat;                   /* MFF_LOG_DNAT_ZONE. */
+    int snat;                   /* MFF_LOG_SNAT_ZONE. */
+};
+
+static void
+load_logical_ingress_metadata(const struct sbrec_port_binding *binding,
+                              const struct zone_ids *zone_ids,
+                              struct ofpbuf *ofpacts_p);
+
 /* UUID to identify OF flows not associated with ovsdb rows. */
 static struct uuid *hc_uuid = NULL;

+#define CHASSIS_MAC_TO_ROUTER_MAC_CONJID        100
+
 void
 physical_register_ovs_idl(struct ovsdb_idl *ovs_idl)
 {
@@ -199,12 +213,6 @@ get_localnet_port(const struct hmap *local_datapaths, int64_t tunnel_key)
     return ld ? ld->localnet_port : NULL;
 }

-/* Datapath zone IDs for connection tracking and NAT */
-struct zone_ids {
-    int ct;                     /* MFF_LOG_CT_ZONE. */
-    int dnat;                   /* MFF_LOG_DNAT_ZONE. */
-    int snat;                   /* MFF_LOG_SNAT_ZONE. */
-};

 static struct zone_ids
 get_zone_ids(const struct sbrec_port_binding *binding,
@@ -385,6 +393,200 @@ put_remote_port_redirect_overlay(const struct
                     match, ofpacts_p, &binding->header_.uuid);
 }

+
+static struct hmap remote_chassis_macs =
+              HMAP_INITIALIZER(&remote_chassis_macs);
+
+/* Maps from a physical network name to the chassis macs of remote chassis. */
+struct remote_chassis_mac {
+    struct hmap_node hmap_node;
+    char *chassis_mac;
+    char *chassis_id;
+};
+
+static void
+populate_remote_chassis_macs(const struct sbrec_chassis *my_chassis,
+                             const struct sbrec_chassis_table *chassis_table)
+{
+    const struct sbrec_chassis *chassis;
+    SBREC_CHASSIS_TABLE_FOR_EACH (chassis, chassis_table) {
+
+        /* We want only remote chassis macs. */
+        if (!strcmp(my_chassis->name, chassis->name)) {
+            continue;
+        }
+
+        const char *tokens
+            = get_chassis_mac_mappings(&chassis->external_ids);
+
+        if (!strlen(tokens)) {
+            continue;
+        }
+
+        char *save_ptr = NULL;
+        char *token;
+        char *tokstr = xstrdup(tokens);
+
+        /* Format for a chassis mac configuration is:
+         * ovn-chassis-mac-mappings="bridge-name1:MAC1,bridge-name2:MAC2"
+         */
+        for (token = strtok_r(tokstr, ",", &save_ptr);
+             token != NULL;
+             token = strtok_r(NULL, ",", &save_ptr)) {
+            char *save_ptr2 = NULL;
+            char *chassis_mac_bridge = strtok_r(token, ":", &save_ptr2);
+            char *chassis_mac_str = strtok_r(NULL, "", &save_ptr2);
+            struct remote_chassis_mac *remote_chassis_mac = NULL;
+            remote_chassis_mac = xmalloc(sizeof *remote_chassis_mac);
+            hmap_insert(&remote_chassis_macs, &remote_chassis_mac->hmap_node,
+                        hash_string(chassis_mac_bridge, 0));
+            remote_chassis_mac->chassis_mac = xstrdup(chassis_mac_str);
+            remote_chassis_mac->chassis_id = xstrdup(chassis->name);
+        }
+        free(tokstr);
+    }
+}
+
+static void
+free_remote_chassis_macs(void)
+{
+    struct remote_chassis_mac *mac, *next_mac;
+
+    HMAP_FOR_EACH_SAFE (mac, next_mac, hmap_node, &remote_chassis_macs) {
+        hmap_remove(&remote_chassis_macs, &mac->hmap_node);
+        free(mac->chassis_mac);
+        free(mac->chassis_id);
+        free(mac);
+    }
+}
+
+static void
+put_chassis_mac_conj_id_flow(const struct sbrec_chassis_table *chassis_table,
+                             const struct sbrec_chassis *chassis,
+                             struct ofpbuf *ofpacts_p,
+                             struct ovn_desired_flow_table *flow_table)
+{
+    struct match match;
+    struct remote_chassis_mac *mac;
+
+    populate_remote_chassis_macs(chassis, chassis_table);
+
+    HMAP_FOR_EACH (mac, hmap_node, &remote_chassis_macs) {
+        struct eth_addr chassis_mac;
+        char *err_str = NULL;
+        struct ofpact_conjunction *conj;
+
+        if ((err_str = str_to_mac(mac->chassis_mac, &chassis_mac))) {
+            free(err_str);
+            free_remote_chassis_macs();
+            return;
+        }
+
+        ofpbuf_clear(ofpacts_p);
+        match_init_catchall(&match);
+
+
+        match_set_dl_src(&match, chassis_mac);
+
+        conj = ofpact_put_CONJUNCTION(ofpacts_p);
+        conj->id = CHASSIS_MAC_TO_ROUTER_MAC_CONJID;
+        conj->n_clauses = 2;
+        conj->clause = 0;
+        ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG, 180, 0,
+                        &match, ofpacts_p, hc_uuid);
+    }
+
+    free_remote_chassis_macs();
+}
+
+static void
+put_replace_chassis_mac_flows(const struct simap *ct_zones,
+                              const struct
+                              sbrec_port_binding *localnet_port,
+                              const struct hmap *local_datapaths,
+                              struct ofpbuf *ofpacts_p,
+                              ofp_port_t ofport,
+                              struct ovn_desired_flow_table *flow_table)
+{
+    /* Packets arriving on localnet port, could have been routed on
+     * source chassis and hence will have a chassis mac.
+     * conj_match will match source mac with chassis macs conjunction
+     * and replace it with corresponding router port mac.
+     */
+    struct local_datapath *ld = get_local_datapath(local_datapaths,
+                                                   localnet_port->datapath->
+                                                   tunnel_key);
+    ovs_assert(ld);
+
+    int tag = localnet_port->tag ? *localnet_port->tag : 0;
+    struct zone_ids zone_ids = get_zone_ids(localnet_port, ct_zones);
+
+    for (int i = 0; i < ld->n_peer_ports; i++) {
+        const struct sbrec_port_binding *rport_binding = ld->peer_ports[i];
+        struct eth_addr router_port_mac;
+        char *err_str = NULL;
+        struct match match;
+        struct ofpact_mac *replace_mac;
+
+        ovs_assert(rport_binding->n_mac == 1);
+        if ((err_str = str_to_mac(rport_binding->mac[0], &router_port_mac))) {
+            /* Parsing of mac failed. */
+            VLOG_WARN("Parsing or router port mac failed for router port: %s, "
+                    "with error: %s", rport_binding->logical_port, err_str);
+            free(err_str);
+            return;
+        }
+        ofpbuf_clear(ofpacts_p);
+        match_init_catchall(&match);
+
+        /* Add flow, which will match on conjunction id and will
+         * replace source with router port mac */
+
+        /* Match on ingress port, vlan_id and conjunction id */
+        match_set_in_port(&match, ofport);
+        match_set_conj_id(&match, CHASSIS_MAC_TO_ROUTER_MAC_CONJID);
+
+        if (tag) {
+            match_set_dl_vlan(&match, htons(tag), 0);
+        } else {
+            match_set_dl_tci_masked(&match, 0, htons(VLAN_CFI));
+        }
+
+        /* Actions */
+
+        if (tag) {
+            ofpact_put_STRIP_VLAN(ofpacts_p);
+        }
+        load_logical_ingress_metadata(localnet_port, &zone_ids, ofpacts_p);
+        replace_mac = ofpact_put_SET_ETH_SRC(ofpacts_p);
+        replace_mac->mac = router_port_mac;
+
+        /* Resubmit to first logical ingress pipeline table. */
+        put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, ofpacts_p);
+        ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG,
+                        180, 0, &match, ofpacts_p, hc_uuid);
+
+        /* Provide second search criteria, i.e localnet port's
+         * vlan ID for conjunction flow */
+        struct ofpact_conjunction *conj;
+        ofpbuf_clear(ofpacts_p);
+        match_init_catchall(&match);
+
+        if (tag) {
+            match_set_dl_vlan(&match, htons(tag), 0);
+        } else {
+            match_set_dl_tci_masked(&match, 0, htons(VLAN_CFI));
+        }
+
+        conj = ofpact_put_CONJUNCTION(ofpacts_p);
+        conj->id = CHASSIS_MAC_TO_ROUTER_MAC_CONJID;
+        conj->n_clauses = 2;
+        conj->clause = 1;
+        ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG, 180, 0, &match,
+                        ofpacts_p, hc_uuid);
+    }
+}
+
 static void
 put_replace_router_port_mac_flows(struct ovsdb_idl_index
                                   *sbrec_port_binding_by_name,
@@ -931,6 +1133,11 @@ consider_port_binding(struct ovsdb_idl_index *sbrec_port_binding_by_name,
                             &binding->header_.uuid);
         }

+        if (!strcmp(binding->type, "localnet")) {
+            put_replace_chassis_mac_flows(ct_zones, binding, local_datapaths,
+                                          ofpacts_p, ofport, flow_table);
+        }
+
         /* Table 65, Priority 100.
          * =======================
          *
@@ -1224,6 +1431,7 @@ void
 physical_run(struct ovsdb_idl_index *sbrec_port_binding_by_name,
              const struct sbrec_multicast_group_table *multicast_group_table,
              const struct sbrec_port_binding_table *port_binding_table,
+             const struct sbrec_chassis_table *chassis_table,
              enum mf_field_id mff_ovn_geneve,
              const struct ovsrec_bridge *br_int,
              const struct sbrec_chassis *chassis,
@@ -1369,6 +1577,8 @@ physical_run(struct ovsdb_idl_index *sbrec_port_binding_by_name,
     struct ofpbuf ofpacts;
     ofpbuf_init(&ofpacts, 0);

+    put_chassis_mac_conj_id_flow(chassis_table, chassis, &ofpacts, flow_table);
+
     /* Set up flows in table 0 for physical-to-logical translation and in table
      * 64 for logical-to-physical translation. */
     const struct sbrec_port_binding *binding;
diff --git a/controller/physical.h b/controller/physical.h
index a85e297..c93f6b1 100644
--- a/controller/physical.h
+++ b/controller/physical.h
@@ -46,6 +46,7 @@ void physical_register_ovs_idl(struct ovsdb_idl *);
 void physical_run(struct ovsdb_idl_index *sbrec_port_binding_by_name,
                   const struct sbrec_multicast_group_table *,
                   const struct sbrec_port_binding_table *,
+                  const struct sbrec_chassis_table *chassis_table,
                   enum mf_field_id mff_ovn_geneve,
                   const struct ovsrec_bridge *br_int,
                   const struct sbrec_chassis *chassis,
diff --git a/ovn-architecture.7.xml b/ovn-architecture.7.xml
index 415da59..6115e84 100644
--- a/ovn-architecture.7.xml
+++ b/ovn-architecture.7.xml
@@ -1426,7 +1426,7 @@
         (MAC,VLAN) tuple will seen by physical network from other chassis as
         well, which could cause these issues:
       </p>
-
+
       <ul>
         <li>
           Continuous MAC moves in top-of-rack switch (ToR).
@@ -1442,9 +1442,11 @@

     <li>
       The destination chassis receives the packet via the localnet port and
-      sends it to the integration bridge. The packet enters the
-      ingress pipeline and then egress pipeline of the destination localnet
-      logical switch and finally gets delivered to the destination VM port.
+      sends it to the integration bridge. Before entering the integration
+      bridge the source mac of the packet will be replaced with
+      router port mac again. The packet enters the ingress pipeline and
+      then egress pipeline of the destination localnet logical switch and
+      finally gets delivered to the destination VM port.
     </li>
   </ol>

diff --git a/tests/ovn.at [ovn.at]<https://urldefense.proofpoint.com/v2/url?u=http-3A__ovn.at&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=mSPIZIN9J6N8ibOG1VZbwAPCcyGl5w2dX3HIc407lc8&s=s8f0ZsvzCYQCkNxo-BC_hmZqrygAXjeM-CJR1tRspx8&e=> b/tests/ovn.at [ovn.at]<https://urldefense.proofpoint.com/v2/url?u=http-3A__ovn.at&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=mSPIZIN9J6N8ibOG1VZbwAPCcyGl5w2dX3HIc407lc8&s=s8f0ZsvzCYQCkNxo-BC_hmZqrygAXjeM-CJR1tRspx8&e=>
index 7bc25b1..5469d36 100644
--- a/tests/ovn.at [ovn.at]<https://urldefense.proofpoint.com/v2/url?u=http-3A__ovn.at&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=mSPIZIN9J6N8ibOG1VZbwAPCcyGl5w2dX3HIc407lc8&s=s8f0ZsvzCYQCkNxo-BC_hmZqrygAXjeM-CJR1tRspx8&e=>
+++ b/tests/ovn.at [ovn.at]<https://urldefense.proofpoint.com/v2/url?u=http-3A__ovn.at&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=mSPIZIN9J6N8ibOG1VZbwAPCcyGl5w2dX3HIc407lc8&s=s8f0ZsvzCYQCkNxo-BC_hmZqrygAXjeM-CJR1tRspx8&e=>
@@ -14274,6 +14274,14 @@ hv_to_chassis_mac () {
     esac
 }

+lrp_to_lrp_mac () {
+     case $1 in dnl (
+        router-to-ls[[1]]) echo 000001010203 ;; dnl (
+        router-to-ls[[2]]) echo 000001010205 ;; dnl (
+        *) AT_FAIL_IF([:]) ;;
+    esac
+}
+
 ip_to_hex() {
        printf "%02x%02x%02x%02x" "$@"
 }
@@ -14341,7 +14349,9 @@ test_ip() {
             # (and checksum).
             outport_num=`vif_to_num $outport`
             out_lrp=`vif_to_lrp $outport`
-            echo f000000000${outport_num}aabbccddee${hv_num}${hv_num}08004500001c00000000"3f1101"00${src_ip}${dst_ip}0035111100080000
+            lrp_mac=`lrp_to_lrp_mac $out_lrp`
+            echo f000000000${outport_num}${lrp_mac}08004500001c00000000"3f1101"00${src_ip}${dst_ip}0035111100080000
+
         fi >> $outport.expected
     done
 }
--
1.8.3.1

_______________________________________________
dev mailing list
dev at openvswitch.org<mailto:dev at openvswitch.org>
https://mail.openvswitch.org/mailman/listinfo/ovs-dev [mail.openvswitch.org]<https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=mSPIZIN9J6N8ibOG1VZbwAPCcyGl5w2dX3HIc407lc8&s=sw1Tku03Wim76ztcrh3Nea3PUpRPZaZ7_NSqZZ-tJys&e=>


More information about the dev mailing list