[ovs-dev] [patch_v2] ovn: Fix receive from vxlan in ovn-controller.

Darrell Ball dlu998 at gmail.com
Wed Jun 8 21:30:33 UTC 2016


OVN only supports source_node replication and previously vtep interaction,
which used service node replication by default for multicast/broadcast/unknown
unicast traffic worked by happenstance.  Because of limited vxlan
encapsulation metadata, received packets were resubmitted to find the egress
port(s). This is not correct for multicast, broadcast and unknown unicast traffic
as traffic will get resent on the tunnel mesh. ovn-controller is changed not to
send traffic received from vxlan tunnels out the tunnel mesh again.  Traffic
received from vxlan tunnels is now only sent locally as intended.

To support keeping state for receipt from a vxlan tunnel a MFF logical
register is allocated for general scratchpad purposes and one bit is used for
receipt from vxlan.  The new register usage is documented in a new
OVN-DESIGN.md document and a table is added to track MFF logical metadata
and register usage.

Some micro-details (e.g.) register assignments) that may change over time
were moved from the ovn-architecture.7.xml document to the OVN-DESIGN.md
document.  The OVN-DESIGN.md file was tested using the following markdown
parsers:

https://jbt.github.io/markdown-editor/
http://dillinger.io/

As part of this change ovn-controller-vtep is hard-coded to set the replication
mode of each logical switch to source node as OVN will only support source
node replication.

Signed-off-by: Darrell Ball <dlu998 at gmail.com>
---

 v1->v2:  Rebased after recent conflicting commit.  Converted some xml
 comments ported from the ovn-architecture document. Removed redundant
 register initialization and unnecessary bit declaration.

 ovn/OVN-DESIGN.md          | 180 +++++++++++++++++++++++++++++++++++++++++++
 ovn/automake.mk            |   1 +
 ovn/controller-vtep/vtep.c |   4 +
 ovn/controller/physical.c  |  25 ++++--
 ovn/lib/logical-fields.h   |  13 +++-
 ovn/ovn-architecture.7.xml | 188 ---------------------------------------------
 tests/ovn.at               |   3 +
 7 files changed, 219 insertions(+), 195 deletions(-)
 create mode 100644 ovn/OVN-DESIGN.md

diff --git a/ovn/OVN-DESIGN.md b/ovn/OVN-DESIGN.md
new file mode 100644
index 0000000..6676de5
--- /dev/null
+++ b/ovn/OVN-DESIGN.md
@@ -0,0 +1,180 @@
+OVN register and metadata usage:
+-------------------------------
+
+logical datapath field:
+
+A field that denotes the logical datapath through which a packet is being
+processed.
+_Keep the following in sync with MFF_LOG_DATAPATH in_
+_ovn/lib/logical-fields.h._
+OVN uses the field that OpenFlow 1.1+ simply (and confusingly) calls
+'metadata' to store the logical datapath.  (This field is passed across
+tunnels as part of the tunnel key.)
+
+
+logical input port field:
+
+A field that denotes the logical port from which the packet entered the
+logical datapath.
+_Keep the following in sync with MFF_LOG_INPORT in_
+_ovn/lib/logical-fields.h._
+OVN stores this in Nicira extension register number 6.
+
+Geneve and STT tunnels pass this field as part of the tunnel key.
+Although VXLAN tunnels do not explicitly carry a logical input port,
+OVN only uses VXLAN to communicate with gateways that from OVN's
+perspective consist of only a single logical port, so that OVN can set
+the logical input port field to this one on ingress to the OVN logical
+pipeline.
+
+
+logical output port field:
+
+A field that denotes the logical port from which the packet will leave
+the logical datapath.  This is initialized to 0 at the beginning of the
+logical ingress pipeline.
+_Keep the following in sync with MFF_LOG_OUTPORT in_
+_ovn/lib/logical-fields.h._
+OVN stores this in Nicira extension register number 7.
+
+Geneve and STT tunnels pass this field as part of the tunnel key.  VXLAN
+tunnels do not transmit the logical output port field.
+
+
+conntrack zone field for logical ports:
+
+A field that denotes the connection tracking zone for logical ports.
+The value only has local significance and is not meaningful between
+chassis.  This is initialized to 0 at the beginning of the logical
+ingress pipeline.  OVN stores this in Nicira extension register number 5.
+
+
+conntrack zone fields for Gateway router:
+
+Fields that denote the connection tracking zones for Gateway routers.
+These values only have local significance (only on chassis that have
+Gateway routers instantiated) and is not meaningful between
+chassis.  OVN stores the zone information for DNATting in Nicira
+extension register number 3 and zone information for SNATing in Nicira
+extension register number 4.
+
+
+flags field:
+
+Scratchpad flags that denote the pipeline state between tables.  The
+values only have local significance and are not meaningful between
+chassis.  This is initialized to 0 at the beginning of the logical
+ingress pipeline. 
+_Keep the following in sync with MFF_LOG_FLAGS in_
+_ovn/lib/logical-fields.h._
+OVN stores this in Nicira extension register number 2.
+
+
+VLAN ID:
+
+The VLAN ID is used as an interface between OVN and containers nested
+inside a VM (see Life Cycle of a container interface inside a VM, in 
+ovn-architecture.7.xml, for more information).
+
+
+The following table summarizes the register and metadata usage for OVN:
+
+```
+  Register/Metadata         Usage                      Bits (Used/Total)
+  ----------------        ---------                    --------------
+  MFF_METADATA         logical datapath                      24/64
+  MFF_REG0             ipv4 address                          32/32
+  MFF_REG1             ipv4 address                          32/32
+  MFF_REG2             flags                                  1/32
+  MFF_REG3             conntrack dnat zone gateway           16/32
+  MFF_REG4             conntrack snat zone gateway           16/32
+  MFF_REG5             conntrack zone logical ports          16/32
+  MFF_REG6             logical input port                    15/32
+  MFF_REG7             logical output port                   16/32
+```
+
+OVN Tunnel Encapsulations:
+-------------------------
+
+tunnel key:
+
+When OVN encapsulates a packet in Geneve or another tunnel, it attaches
+extra data to it to allow the receiving OVN instance to process it
+correctly.  This takes different forms depending on the particular
+encapsulation, but in each case we refer to it here as the 'tunnel key'.
+
+OVN supports three types of IP tunnel encapsulations used in the IP mesh
+connecting HVs, physical switches, appliances and/or servers.  OVN passes
+packet context information across tunnels.
+
+OVN annotates logical network packets that it sends from one hypervisor to
+another with the following three pieces of metadata, which are encoded in an
+encapsulation-specific fashion.
+
+
+24-bit logical datapath identifier:
+This is derived from the tunnel_key column in the OVN Southbound
+Datapath_Binding table.
+ 
+15-bit logical ingress port identifier:
+ID 0 is reserved for internal use within OVN.  IDs 1 through 32767, inclusive,
+may be assigned to logical ports (see the tunnel_key column in the OVN
+Southbound Port_Binding table).
+
+16-bit logical egress port identifier:
+IDs 0 through 32767 have the same meaning as for logical ingress ports. IDs
+32768 through 65535, inclusive, may be assigned to logical multicast groups
+(see the tunnel_key column in the OVN Southbound Multicast_Group table).
+
+For hypervisor-to-hypervisor traffic, OVN supports only Geneve and STT
+encapsulations, for the following reasons:
+
+Only STT and Geneve support the large amounts of metadata (over 32 bits per
+packet) that OVN uses (as described above).
+
+STT and Geneve use randomized UDP or TCP source ports that allows efficient
+distribution among multiple paths in environments that use ECMP in their
+underlay.
+
+NICs are available to offload STT and Geneve encapsulation and decapsulation.
+ 
+Due to its flexibility, the preferred encapsulation between hypervisors is
+Geneve.  For Geneve encapsulation, OVN transmits the logical datapath
+identifier in the Geneve VNI.
+
+_Keep the following in sync with ovn/controller/physical.h._
+OVN transmits the logical ingress and logical egress ports in a TLV with
+class 0x0102, type 0, and a 32-bit value encoded as follows, from MSB to
+LSB:
+
+```
+      1          15             16
+  -----------------------------------------
+  |       |                |              |
+  |  resv |  ingress port  |  egress port |
+  |       |                |              |
+  -----------------------------------------
+ ``` 
+
+Environments whose NICs lack Geneve offload may prefer STT encapsulation
+for performance reasons.  For STT encapsulation, OVN encodes all three
+pieces of logical metadata in the STT 64-bit tunnel ID as follows, from MSB
+to LSB:
+
+```
+
+      9          15              16             24
+  ------------------------------------------------------
+  |       |                |              |            |
+  |  resv |  ingress port  |  egress port |  datapath  |
+  |       |                |              |            |
+  ------------------------------------------------------
+```
+
+For connecting to gateways, in addition to Geneve and STT, OVN supports
+VXLAN, because only VXLAN support is common on top-of-rack (ToR) switches.
+Currently, gateways have a feature set that matches the capabilities as
+defined by the VTEP schema, so fewer bits of metadata are necessary.  In
+the future, gateways that do not support encapsulations with large amounts
+of metadata may continue to have a reduced feature set.
+
diff --git a/ovn/automake.mk b/ovn/automake.mk
index f3f40e5..39277de 100644
--- a/ovn/automake.mk
+++ b/ovn/automake.mk
@@ -73,6 +73,7 @@ DISTCLEANFILES += ovn/ovn-architecture.7
 EXTRA_DIST += \
 	ovn/TODO \
 	ovn/CONTAINERS.OpenStack.md \
+        ovn/OVN-DESIGN.md \
 	ovn/OVN-GW-HA.md
 
 # Version checking for ovn-nb.ovsschema.
diff --git a/ovn/controller-vtep/vtep.c b/ovn/controller-vtep/vtep.c
index e412b6b..f9151c2 100644
--- a/ovn/controller-vtep/vtep.c
+++ b/ovn/controller-vtep/vtep.c
@@ -233,6 +233,10 @@ vtep_lswitch_run(struct shash *vtep_pbs, struct sset *vtep_pswitches,
                          vtep_ls->tunnel_key[0], tnl_key);
             }
             vteprec_logical_switch_set_tunnel_key(vtep_ls, &tnl_key, 1);
+            /* OVN is expected to always use source node replication mode,
+             * hence the replication mode is hard-coded for each logical
+             * switch in the context of ovn-controller-vtep. */
+            vteprec_logical_switch_set_replication_mode(vtep_ls, "source_node");
             sset_add(&used_ls, lswitch_name);
         }
     }
diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c
index 85528e0..0dbdd21 100644
--- a/ovn/controller/physical.c
+++ b/ovn/controller/physical.c
@@ -524,6 +524,21 @@ physical_run(struct controller_ctx *ctx, enum mf_field_id mff_ovn_geneve,
             ofpact_put_OUTPUT(&ofpacts)->port = ofport;
             ofctrl_add_flow(flow_table, OFTABLE_REMOTE_OUTPUT, 100,
                             &match, &ofpacts);
+
+            /* For packets received from a Vxlan tunnel which get
+             * resubmitted to OFTABLE_LOG_INGRESS_PIPELINE due to lack of
+             * needed metadata in Vxlan, explicitly skip sending back out
+             * any tunnels and resubmit to table 33 for local delivery. */
+            match_init_catchall(&match);
+            ofpbuf_clear(&ofpacts);
+
+            match_set_reg(&match, MFF_LOG_FLAGS - MFF_REG0,
+                          MFF_LOG_FLAGS_RCV_FROM_VXLAN);
+            /* Resubmit to table 33. */
+            put_resubmit(OFTABLE_LOCAL_OUTPUT, &ofpacts);
+            ofctrl_add_flow(flow_table, OFTABLE_REMOTE_OUTPUT, 101, &match,
+                            &ofpacts);
+
         }
     }
 
@@ -687,11 +702,7 @@ physical_run(struct controller_ctx *ctx, enum mf_field_id mff_ovn_geneve,
      * metadata, we only support VXLAN for connections to gateways.  The
      * VNI is used to populate MFF_LOG_DATAPATH.  The gateway's logical
      * port is set to MFF_LOG_INPORT.  Then the packet is resubmitted to
-     * table 16 to determine the logical egress port.
-     *
-     * xxx Due to resubmitting to table 16, broadcasts will be re-sent to
-     * xxx all logical ports, including non-local ones which could cause
-     * xxx duplicate packets to be received by multiply-connected gateways. */
+     * table 16 to determine the logical egress port. */
     HMAP_FOR_EACH (tun, hmap_node, &tunnels) {
         if (tun->type != VXLAN) {
             continue;
@@ -711,6 +722,10 @@ physical_run(struct controller_ctx *ctx, enum mf_field_id mff_ovn_geneve,
             ofpbuf_clear(&ofpacts);
             put_move(MFF_TUN_ID, 0,  MFF_LOG_DATAPATH, 0, 24, &ofpacts);
             put_load(binding->tunnel_key, MFF_LOG_INPORT, 0, 15, &ofpacts);
+            /* For packets received from a vxlan tunnel, set a flag to that
+             * effect. */
+            put_load(MFF_LOG_FLAGS_RCV_FROM_VXLAN, MFF_LOG_FLAGS,
+                     0, 32, &ofpacts);
             put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, &ofpacts);
 
             ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG, 100, &match,
diff --git a/ovn/lib/logical-fields.h b/ovn/lib/logical-fields.h
index f0f97a9..b340653 100644
--- a/ovn/lib/logical-fields.h
+++ b/ovn/lib/logical-fields.h
@@ -23,6 +23,7 @@
  * These values are documented in ovn-architecture(7), please update the
  * documentation if you change any of them. */
 #define MFF_LOG_DATAPATH MFF_METADATA /* Logical datapath (64 bits). */
+#define MFF_LOG_FLAGS      MFF_REG2   /* Logical flags (32 bits). */
 #define MFF_LOG_DNAT_ZONE  MFF_REG3   /* conntrack dnat zone for gateway router
                                        * (32 bits). */
 #define MFF_LOG_SNAT_ZONE  MFF_REG4   /* conntrack snat zone for gateway router
@@ -37,7 +38,15 @@
  * Make sure these don't overlap with the logical fields! */
 #define MFF_LOG_REGS \
     MFF_LOG_REG(MFF_REG0) \
-    MFF_LOG_REG(MFF_REG1) \
-    MFF_LOG_REG(MFF_REG2)
+    MFF_LOG_REG(MFF_REG1)
+
+/* MFF_LOG_FLAGS_REG flag assignments */
+enum mff_log_flags {
+    /* This flag is used to indicate that a packet was received from a vxlan
+     * tunnel to compensate for the lack of egress port information available
+     * in Vxlan encapsulation.  Egress port information is available for Geneve
+     * and STT tunnel types. */
+    MFF_LOG_FLAGS_RCV_FROM_VXLAN = (1 << 0),
+};
 
 #endif /* ovn/lib/logical-fields.h */
diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
index 553c2e5..bc32882 100644
--- a/ovn/ovn-architecture.7.xml
+++ b/ovn/ovn-architecture.7.xml
@@ -607,95 +607,6 @@
   </p>
 
   <p>
-    This section mentions several data and metadata fields, for clarity
-    summarized here:
-  </p>
-
-  <dl>
-    <dt>tunnel key</dt>
-    <dd>
-      When OVN encapsulates a packet in Geneve or another tunnel, it attaches
-      extra data to it to allow the receiving OVN instance to process it
-      correctly.  This takes different forms depending on the particular
-      encapsulation, but in each case we refer to it here as the ``tunnel
-      key.''  See <code>Tunnel Encapsulations</code>, below, for details.
-    </dd>
-
-    <dt>logical datapath field</dt>
-    <dd>
-      A field that denotes the logical datapath through which a packet is being
-      processed.
-      <!-- Keep the following in sync with MFF_LOG_DATAPATH in
-           ovn/lib/logical-fields.h. -->
-      OVN uses the field that OpenFlow 1.1+ simply (and confusingly) calls
-      ``metadata'' to store the logical datapath.  (This field is passed across
-      tunnels as part of the tunnel key.)
-    </dd>
-
-    <dt>logical input port field</dt>
-    <dd>
-      <p>
-        A field that denotes the logical port from which the packet
-        entered the logical datapath.
-        <!-- Keep the following in sync with MFF_LOG_INPORT in
-             ovn/lib/logical-fields.h. -->
-        OVN stores this in Nicira extension register number 6.
-      </p>
-
-      <p>
-        Geneve and STT tunnels pass this field as part of the tunnel key.
-        Although VXLAN tunnels do not explicitly carry a logical input port,
-        OVN only uses VXLAN to communicate with gateways that from OVN's
-        perspective consist of only a single logical port, so that OVN can set
-        the logical input port field to this one on ingress to the OVN logical
-        pipeline.
-      </p>
-    </dd>
-
-    <dt>logical output port field</dt>
-    <dd>
-      <p>
-        A field that denotes the logical port from which the packet will
-        leave the logical datapath.  This is initialized to 0 at the
-        beginning of the logical ingress pipeline.
-        <!-- Keep the following in sync with MFF_LOG_OUTPORT in
-             ovn/lib/logical-fields.h. -->
-        OVN stores this in Nicira extension register number 7.
-      </p>
-
-      <p>
-        Geneve and STT tunnels pass this field as part of the tunnel key.
-        VXLAN tunnels do not transmit the logical output port field.
-      </p>
-    </dd>
-
-    <dt>conntrack zone field for logical ports</dt>
-    <dd>
-      A field that denotes the connection tracking zone for logical ports.
-      The value only has local significance and is not meaningful between
-      chassis.  This is initialized to 0 at the beginning of the logical
-      ingress pipeline.  OVN stores this in Nicira extension register number 5.
-    </dd>
-
-    <dt>conntrack zone fields for Gateway router</dt>
-    <dd>
-      Fields that denote the connection tracking zones for Gateway routers.
-      These values only have local significance (only on chassis that have
-      Gateway routers instantiated) and is not meaningful between
-      chassis.  OVN stores the zone information for DNATting in Nicira
-      extension register number 3 and zone information for SNATing in Nicira
-      extension register number 4.
-    </dd>
-
-    <dt>VLAN ID</dt>
-    <dd>
-      The VLAN ID is used as an interface between OVN and containers nested
-      inside a VM (see <code>Life Cycle of a container interface inside a
-      VM</code>, above, for more information).
-    </dd>
-  </dl>
-
-  <p>
     Initially, a VM or container on the ingress hypervisor sends a packet on a
     port attached to the OVN integration bridge.  Then:
   </p>
@@ -996,103 +907,4 @@
       entries and the <code>Logical_Switch</code> tunnel keys.
     </li>
   </ol>
-
-  <h1>Design Decisions</h1>
-
-  <h2>Tunnel Encapsulations</h2>
-
-  <p>
-    OVN annotates logical network packets that it sends from one hypervisor to
-    another with the following three pieces of metadata, which are encoded in
-    an encapsulation-specific fashion:
-  </p>
-
-  <ul>
-    <li>
-      24-bit logical datapath identifier, from the <code>tunnel_key</code>
-      column in the OVN Southbound <code>Datapath_Binding</code> table.
-    </li>
-
-    <li>
-      15-bit logical ingress port identifier.  ID 0 is reserved for internal
-      use within OVN.  IDs 1 through 32767, inclusive, may be assigned to
-      logical ports (see the <code>tunnel_key</code> column in the OVN
-      Southbound <code>Port_Binding</code> table).
-    </li>
-
-    <li>
-      16-bit logical egress port identifier.  IDs 0 through 32767 have the same
-      meaning as for logical ingress ports.  IDs 32768 through 65535,
-      inclusive, may be assigned to logical multicast groups (see the
-      <code>tunnel_key</code> column in the OVN Southbound
-      <code>Multicast_Group</code> table).
-    </li>
-  </ul>
-
-  <p>
-    For hypervisor-to-hypervisor traffic, OVN supports only Geneve and STT
-    encapsulations, for the following reasons:
-  </p>
-
-  <ul>
-    <li>
-      Only STT and Geneve support the large amounts of metadata (over 32 bits
-      per packet) that OVN uses (as described above).
-    </li>
-
-    <li>
-      STT and Geneve use randomized UDP or TCP source ports that allows
-      efficient distribution among multiple paths in environments that use ECMP
-      in their underlay.
-    </li>
-
-    <li>
-      NICs are available to offload STT and Geneve encapsulation and
-      decapsulation.
-    </li>
-  </ul>
-
-  <p>
-    Due to its flexibility, the preferred encapsulation between hypervisors is
-    Geneve.  For Geneve encapsulation, OVN transmits the logical datapath
-    identifier in the Geneve VNI.
-
-    <!-- Keep the following in sync with ovn/controller/physical.h. -->
-    OVN transmits the logical ingress and logical egress ports in a TLV with
-    class 0x0102, type 0, and a 32-bit value encoded as follows, from MSB to
-    LSB:
-  </p>
-
-  <diagram>
-    <header name="">
-      <bits name="rsv" above="1" below="0" width=".25"/>
-      <bits name="ingress port" above="15" width=".75"/>
-      <bits name="egress port" above="16" width=".75"/>
-    </header>
-  </diagram>
-
-  <p>
-    Environments whose NICs lack Geneve offload may prefer STT encapsulation
-    for performance reasons.  For STT encapsulation, OVN encodes all three
-    pieces of logical metadata in the STT 64-bit tunnel ID as follows, from MSB
-    to LSB:
-  </p>
-
-  <diagram>
-    <header name="">
-      <bits name="reserved" above="9" below="0" width=".5"/>
-      <bits name="ingress port" above="15" width=".75"/>
-      <bits name="egress port" above="16" width=".75"/>
-      <bits name="datapath" above="24" width="1.25"/>
-    </header>
-  </diagram>
-
-  <p>
-    For connecting to gateways, in addition to Geneve and STT, OVN supports
-    VXLAN, because only VXLAN support is common on top-of-rack (ToR) switches.
-    Currently, gateways have a feature set that matches the capabilities as
-    defined by the VTEP schema, so fewer bits of metadata are necessary.  In
-    the future, gateways that do not support encapsulations with large amounts
-    of metadata may continue to have a reduced feature set.
-  </p>
 </manpage>
diff --git a/tests/ovn.at b/tests/ovn.at
index 633cf35..1228a34 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -1092,6 +1092,9 @@ sleep 1
 
 vtep-ctl bind-ls br-vtep br-vtep_n2 0 lsw0
 
+OVS_WAIT_UNTIL([test -n "`as vtep vtep-ctl get-replication-mode lsw0 |
+               grep -- source`"])
+# It takes more time for the update to be processed by ovs-vtep.
 sleep 1
 
 # Add hv3 on the other side of the vtep
-- 
1.9.1




More information about the dev mailing list