[ovs-dev] [PATCH v2 21/21] ovn: Change strategy for tunnel keys.

Ben Pfaff blp at nicira.com
Tue Jul 28 15:44:39 UTC 2015


Until now, OVN has used "flat" tunnel keys, in which the STT tunnel key or
Geneve VNI contains a logical port number.  Logical port numbers are unique
within an OVN deployment.

Flat tunnel keys have the advantage of simplicity.  However, for packets
that are destined to logical ports on multiple hypervisors, they require
sending one packet per destination logical port rather than one packet per
hypervisor.  They also make it hard to integrate with VXLAN-based hardware
switches, which use VNIs to designate logical networks instead of logical
ports.

This commit switches OVN to a different scheme.  In this scheme, in Geneve
the VNI designates a logical network and a Geneve option specifies the
logical input and output ports, which are now scoped within the logical
network rather than globally unique.  In STT, all three identifiers are
encoded in the tunnel key.

To allow for the reduced amount of traffic for packets destined to logical
ports on multiple hypervisors, this commit also introduces the concept
of a logical multicast group.  The membership of these groups can be set
using a new Multicast_Group table in the southbound database (and
ovn-northd does use it starting in this commit).

With multicast groups alone, it would be difficult to implement ACLs,
because an ACL might disallow only some of the packets being sent to
a remote hypervisor.  Thus, this commit also splits the OVN logical
pipeline into two pipelines: the "ingress" pipeline, which makes the
decision about the logical destination of a packet as a set of logical
ports or multicast groups, and the "egress" pipeline, which runs on the
destination hypervisor with the multicast group destination exploded into
individual ports and makes a final decision on whether to deliver the
packet.  The "egress" pipeline can efficiently apply ACLs.

Until now, the OVN logical and physical pipeline implementation was not
adequately documented.  This commit adds extensive documentation to
the OVN manpages to cover these issues.

Signed-off-by: Ben Pfaff <blp at nicira.com>
---
 ovn/TODO                        |   19 -
 ovn/controller/ovn-controller.c |    3 +-
 ovn/controller/physical.c       |  355 +++++++++---
 ovn/controller/physical.h       |   11 +-
 ovn/controller/rule.c           |   95 ++--
 ovn/controller/rule.h           |   10 +-
 ovn/northd/ovn-northd.c         | 1177 ++++++++++++++++++++++++---------------
 ovn/ovn-architecture.7.xml      |  363 ++++++++++--
 ovn/ovn-nb.xml                  |   17 +-
 ovn/ovn-sb.ovsschema            |   42 +-
 ovn/ovn-sb.xml                  |  332 +++++++++--
 11 files changed, 1696 insertions(+), 728 deletions(-)

diff --git a/ovn/TODO b/ovn/TODO
index 19c95ca..f0ff586 100644
--- a/ovn/TODO
+++ b/ovn/TODO
@@ -1,24 +1,5 @@
 * ovn-controller
 
-*** Determine how to split logical pipeline across physical nodes.
-
-    From the original OVN architecture document:
-
-    The pipeline processing is split between the ingress and egress
-    transport nodes.  In particular, the logical egress processing may
-    occur at either hypervisor.  Processing the logical egress on the
-    ingress hypervisor requires more state about the egress vif's
-    policies, but reduces traffic on the wire that would eventually be
-    dropped.  Whereas, processing on the egress hypervisor can reduce
-    broadcast traffic on the wire by doing local replication.  We
-    initially plan to process logical egress on the egress hypervisor
-    so that less state needs to be replicated.  However, we may change
-    this behavior once we gain some experience writing the logical
-    flows.
-
-    The split pipeline processing split will influence how tunnel keys
-    are encoded.
-
 ** ovn-controller parameters and configuration.
 
 *** SSL configuration.
diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c
index c6dfb41..dad2819 100644
--- a/ovn/controller/ovn-controller.c
+++ b/ovn/controller/ovn-controller.c
@@ -269,8 +269,9 @@ main(int argc, char *argv[])
 
             struct hmap flow_table = HMAP_INITIALIZER(&flow_table);
             rule_run(&ctx, &flow_table);
-                physical_run(&ctx, br_int, chassis_id, &flow_table);
             if (chassis_id && mff_ovn_geneve) {
+                physical_run(&ctx, mff_ovn_geneve,
+                             br_int, chassis_id, &flow_table);
             }
             ofctrl_put(&flow_table);
             hmap_destroy(&flow_table);
diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c
index e284a6a..09b7a99 100644
--- a/ovn/controller/physical.c
+++ b/ovn/controller/physical.c
@@ -21,10 +21,14 @@
 #include "ofpbuf.h"
 #include "ovn-controller.h"
 #include "ovn/lib/ovn-sb-idl.h"
+#include "openvswitch/vlog.h"
 #include "rule.h"
 #include "simap.h"
+#include "sset.h"
 #include "vswitch-idl.h"
 
+VLOG_DEFINE_THIS_MODULE(physical);
+
 void
 physical_register_ovs_idl(struct ovsdb_idl *ovs_idl)
 {
@@ -42,12 +46,90 @@ physical_register_ovs_idl(struct ovsdb_idl *ovs_idl)
     ovsdb_idl_add_column(ovs_idl, &ovsrec_interface_col_external_ids);
 }
 
+/* Maps from a chassis to the OpenFlow port number of the tunnel that can be
+ * used to reach that chassis. */
+struct chassis_tunnel {
+    struct hmap_node hmap_node;
+    const char *chassis_id;
+    ofp_port_t ofport;
+    enum chassis_tunnel_type { GENEVE, STT } type;
+};
+
+static struct chassis_tunnel *
+chassis_tunnel_find(struct hmap *tunnels, const char *chassis_id)
+{
+    struct chassis_tunnel *tun;
+    HMAP_FOR_EACH_WITH_HASH (tun, hmap_node, hash_string(chassis_id, 0),
+                             tunnels) {
+        if (!strcmp(tun->chassis_id, chassis_id)) {
+            return tun;
+        }
+    }
+    return NULL;
+}
+
+static void
+put_load(uint64_t value, enum mf_field_id dst, int ofs, int n_bits,
+         struct ofpbuf *ofpacts)
+{
+    struct ofpact_set_field *sf = ofpact_put_SET_FIELD(ofpacts);
+    sf->field = mf_from_id(dst);
+    sf->flow_has_vlan = false;
+
+    ovs_be64 n_value = htonll(value);
+    bitwise_copy(&n_value, 8, 0, &sf->value, sf->field->n_bytes, ofs, n_bits);
+    bitwise_one(&sf->mask, sf->field->n_bytes, ofs, n_bits);
+}
+
+static void
+put_move(enum mf_field_id src, int src_ofs,
+         enum mf_field_id dst, int dst_ofs,
+         int n_bits,
+         struct ofpbuf *ofpacts)
+{
+    struct ofpact_reg_move *move = ofpact_put_REG_MOVE(ofpacts);
+    move->src.field = mf_from_id(src);
+    move->src.ofs = src_ofs;
+    move->src.n_bits = n_bits;
+    move->dst.field = mf_from_id(dst);
+    move->dst.ofs = dst_ofs;
+    move->dst.n_bits = n_bits;
+}
+
+static void
+put_resubmit(uint8_t table_id, struct ofpbuf *ofpacts)
+{
+    struct ofpact_resubmit *resubmit = ofpact_put_RESUBMIT(ofpacts);
+    resubmit->in_port = OFPP_IN_PORT;
+    resubmit->table_id = table_id;
+}
+
+static void
+put_encapsulation(enum mf_field_id mff_ovn_geneve,
+                  const struct chassis_tunnel *tun,
+                  const struct sbrec_datapath_binding *datapath,
+                  uint16_t outport, struct ofpbuf *ofpacts)
+{
+    if (tun->type == GENEVE) {
+        put_load(datapath->tunnel_key, MFF_TUN_ID, 0, 24, ofpacts);
+        put_load(outport, mff_ovn_geneve, 0, 32, ofpacts);
+        put_move(MFF_LOG_INPORT, 0, mff_ovn_geneve, 16, 15, ofpacts);
+    } else if (tun->type == STT) {
+        put_load(datapath->tunnel_key | (outport << 24), MFF_TUN_ID, 0, 64,
+                 ofpacts);
+        put_move(MFF_LOG_INPORT, 0, MFF_TUN_ID, 40, 15, ofpacts);
+    } else {
+        OVS_NOT_REACHED();
+    }
+}
+
 void
-physical_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int,
-             const char *this_chassis_id, struct hmap *flow_table)
+physical_run(struct controller_ctx *ctx, enum mf_field_id mff_ovn_geneve,
+             const struct ovsrec_bridge *br_int, const char *this_chassis_id,
+             struct hmap *flow_table)
 {
     struct simap lport_to_ofport = SIMAP_INITIALIZER(&lport_to_ofport);
-    struct simap chassis_to_ofport = SIMAP_INITIALIZER(&chassis_to_ofport);
+    struct hmap tunnels = HMAP_INITIALIZER(&tunnels);
     for (int i = 0; i < br_int->n_ports; i++) {
         const struct ovsrec_port *port_rec = br_int->ports[i];
         if (!strcmp(port_rec->name, br_int->name)) {
@@ -74,7 +156,21 @@ physical_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int,
 
             /* Record as chassis or local logical port. */
             if (chassis_id) {
-                simap_put(&chassis_to_ofport, chassis_id, ofport);
+                enum chassis_tunnel_type tunnel_type;
+                if (!strcmp(iface_rec->type, "geneve")) {
+                    tunnel_type = GENEVE;
+                } else if (!strcmp(iface_rec->type, "stt")) {
+                    tunnel_type = STT;
+                } else {
+                    continue;
+                }
+
+                struct chassis_tunnel *tun = xmalloc(sizeof *tun);
+                hmap_insert(&tunnels, &tun->hmap_node,
+                            hash_string(chassis_id, 0));
+                tun->chassis_id = chassis_id;
+                tun->ofport = u16_to_ofp(ofport);
+                tun->type = tunnel_type;
                 break;
             } else {
                 const char *iface_id = smap_get(&iface_rec->external_ids,
@@ -114,27 +210,20 @@ physical_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int,
                                           binding->logical_port));
         }
 
-        bool local = ofport != 0;
-        if (!local) {
+        const struct chassis_tunnel *tun = NULL;
+        if (!ofport) {
             if (!binding->chassis) {
                 continue;
             }
-            ofport = u16_to_ofp(simap_get(&chassis_to_ofport,
-                                          binding->chassis->name));
-            if (!ofport) {
+            tun = chassis_tunnel_find(&tunnels, binding->chassis->name);
+            if (!tun) {
                 continue;
             }
-        }
-
-        /* Translate the logical datapath into the form we use in
-         * MFF_LOG_DATAPATH. */
-        uint32_t ldp = ldp_to_integer(&binding->logical_datapath);
-        if (!ldp) {
-            continue;
+            ofport = tun->ofport;
         }
 
         struct match match;
-        if (local) {
+        if (!tun) {
             /* Packets that arrive from a vif can belong to a VM or
              * to a container located inside that VM. Packets that
              * arrive from containers have a tag (vlan) associated with them.
@@ -149,7 +238,8 @@ physical_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int,
              *
              * For both types of traffic: set MFF_LOG_INPORT to the logical
              * input port, MFF_LOG_DATAPATH to the logical datapath, and
-             * resubmit into the logical pipeline starting at table 16. */
+             * resubmit into the logical ingress pipeline starting at table
+             * 16. */
             match_init_catchall(&match);
             ofpbuf_clear(&ofpacts);
             match_set_in_port(&match, ofport);
@@ -157,95 +247,214 @@ physical_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int,
                 match_set_dl_vlan(&match, htons(tag));
             }
 
-            /* Set MFF_LOG_DATAPATH. */
-            struct ofpact_set_field *sf = ofpact_put_SET_FIELD(&ofpacts);
-            sf->field = mf_from_id(MFF_LOG_DATAPATH);
-            sf->value.be64 = htonll(ldp);
-            sf->mask.be64 = OVS_BE64_MAX;
-
-            /* Set MFF_LOG_INPORT. */
-            sf = ofpact_put_SET_FIELD(&ofpacts);
-            sf->field = mf_from_id(MFF_LOG_INPORT);
-            sf->value.be32 = htonl(binding->tunnel_key);
-            sf->mask.be32 = OVS_BE32_MAX;
+            /* Set MFF_LOG_DATAPATH and MFF_LOG_INPORT. */
+            put_load(binding->datapath->tunnel_key, MFF_LOG_DATAPATH, 0, 64,
+                     &ofpacts);
+            put_load(binding->tunnel_key, MFF_LOG_INPORT, 0, 32, &ofpacts);
 
             /* Strip vlans. */
             if (tag) {
                 ofpact_put_STRIP_VLAN(&ofpacts);
             }
 
-            /* Resubmit to first logical pipeline table. */
-            struct ofpact_resubmit *resubmit = ofpact_put_RESUBMIT(&ofpacts);
-            resubmit->in_port = OFPP_IN_PORT;
-            resubmit->table_id = 16;
+            /* Resubmit to first logical ingress pipeline table. */
+            put_resubmit(16, &ofpacts);
             ofctrl_add_flow(flow_table, 0, tag ? 150 : 100, &match, &ofpacts);
 
-            /* Table 0, Priority 50.
-             * =====================
+            /* Table 33, priority 100.
+             * =======================
+             *
+             * Implements output to local hypervisor.  Each flow matches a
+             * logical output port on the local hypervisor, and resubmits to
+             * table 34.
+             */
+
+            match_init_catchall(&match);
+            ofpbuf_clear(&ofpacts);
+
+            /* Match MFF_LOG_DATAPATH, MFF_LOG_OUTPORT. */
+            match_set_metadata(&match, htonll(binding->datapath->tunnel_key));
+            match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0,
+                          binding->tunnel_key);
+
+            /* Resubmit to table 34. */
+            put_resubmit(34, &ofpacts);
+            ofctrl_add_flow(flow_table, 33, 100, &match, &ofpacts);
+
+            /* Table 64, Priority 50.
+             * =======================
              *
-             * For packets that arrive from a remote node destined to this
-             * local vif: deliver directly to the vif. If the destination
-             * is a container sitting behind a vif, tag the packets. */
+             * Deliver the packet to the local vif. */
             match_init_catchall(&match);
             ofpbuf_clear(&ofpacts);
-            match_set_tun_id(&match, htonll(binding->tunnel_key));
+            match_set_metadata(&match, htonll(binding->datapath->tunnel_key));
+            match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0,
+                          binding->tunnel_key);
             if (tag) {
+                /* For containers sitting behind a local vif, tag the packets
+                 * before delivering them. */
                 struct ofpact_vlan_vid *vlan_vid;
                 vlan_vid = ofpact_put_SET_VLAN_VID(&ofpacts);
                 vlan_vid->vlan_vid = tag;
                 vlan_vid->push_vlan_if_needed = true;
+
+                /* A packet might need to hair-pin back into its ingress
+                 * OpenFlow port (to a different logical port, which we already
+                 * checked back in table 34), so set the in_port to zero. */
+                put_load(0, MFF_IN_PORT, 0, 16, &ofpacts);
             }
             ofpact_put_OUTPUT(&ofpacts)->port = ofport;
-            ofctrl_add_flow(flow_table, 0, 50, &match, &ofpacts);
+            ofctrl_add_flow(flow_table, 64, 100, &match, &ofpacts);
+        } else {
+            /* Table 32, priority 100.
+             * =======================
+             *
+             * Implements output to remote hypervisors.  Each flow matches an
+             * output port that includes a logical port on a remote hypervisor,
+             * and tunnels the packet to that hypervisor.
+             */
+
+            match_init_catchall(&match);
+            ofpbuf_clear(&ofpacts);
+
+            /* Match MFF_LOG_DATAPATH, MFF_LOG_OUTPORT. */
+            match_set_metadata(&match, htonll(binding->datapath->tunnel_key));
+            match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0,
+                          binding->tunnel_key);
+
+            put_encapsulation(mff_ovn_geneve, tun, binding->datapath,
+                              binding->tunnel_key, &ofpacts);
+
+            /* Output to tunnel. */
+            ofpact_put_OUTPUT(&ofpacts)->port = ofport;
+            ofctrl_add_flow(flow_table, 32, 100, &match, &ofpacts);
         }
 
-        /* Table 64, Priority 100.
+        /* Table 34, Priority 100.
          * =======================
          *
          * Drop packets whose logical inport and outport are the same. */
         match_init_catchall(&match);
         ofpbuf_clear(&ofpacts);
+        match_set_metadata(&match, htonll(binding->datapath->tunnel_key));
         match_set_reg(&match, MFF_LOG_INPORT - MFF_REG0, binding->tunnel_key);
         match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0, binding->tunnel_key);
-        ofctrl_add_flow(flow_table, 64, 100, &match, &ofpacts);
+        ofctrl_add_flow(flow_table, 34, 100, &match, &ofpacts);
+    }
+
+    const struct sbrec_multicast_group *mc;
+    SBREC_MULTICAST_GROUP_FOR_EACH (mc, ctx->ovnsb_idl) {
+        struct sset remote_chassis = SSET_INITIALIZER(&remote_chassis);
+        struct match match;
 
-        /* Table 64, Priority 50.
-         * ======================
-         *
-         * For packets to remote machines, send them over a tunnel to the
-         * remote chassis.
-         *
-         * For packets to local vifs, deliver them directly. */
         match_init_catchall(&match);
+        match_set_metadata(&match, htonll(mc->datapath->tunnel_key));
+        match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0, mc->tunnel_key);
+
         ofpbuf_clear(&ofpacts);
-        match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0, binding->tunnel_key);
-        if (!local) {
-            /* Set MFF_TUN_ID. */
-            struct ofpact_set_field *sf = ofpact_put_SET_FIELD(&ofpacts);
-            sf->field = mf_from_id(MFF_TUN_ID);
-            sf->value.be64 = htonll(binding->tunnel_key);
-            sf->mask.be64 = OVS_BE64_MAX;
+        for (size_t i = 0; i < mc->n_ports; i++) {
+            struct sbrec_port_binding *port = mc->ports[i];
+
+            if (port->datapath != mc->datapath) {
+                static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
+                VLOG_WARN_RL(&rl, UUID_FMT": multicast group contains ports "
+                             "in wrong datapath",
+                             UUID_ARGS(&mc->header_.uuid));
+                continue;
+            }
+
+            if (simap_contains(&lport_to_ofport, port->logical_port)) {
+                put_load(port->tunnel_key, MFF_LOG_OUTPORT, 0, 32, &ofpacts);
+                put_resubmit(34, &ofpacts);
+            } else if (port->chassis) {
+                sset_add(&remote_chassis, port->chassis->name);
+            }
+        }
+
+        bool local_ports = ofpacts.size > 0;
+        if (local_ports) {
+            ofctrl_add_flow(flow_table, 33, 100, &match, &ofpacts);
         }
-        if (tag) {
-            /* For containers sitting behind a local vif, tag the packets
-             * before delivering them. Since there is a possibility of
-             * packets needing to hair-pin back into the same vif from
-             * which it came, make the in_port as zero. */
-            struct ofpact_vlan_vid *vlan_vid;
-            vlan_vid = ofpact_put_SET_VLAN_VID(&ofpacts);
-            vlan_vid->vlan_vid = tag;
-            vlan_vid->push_vlan_if_needed = true;
-
-            struct ofpact_set_field *sf = ofpact_put_SET_FIELD(&ofpacts);
-            sf->field = mf_from_id(MFF_IN_PORT);
-            sf->value.be16 = 0;
-            sf->mask.be16 = OVS_BE16_MAX;
+
+        if (!sset_is_empty(&remote_chassis)) {
+            ofpbuf_clear(&ofpacts);
+
+            const char *chassis;
+            const struct chassis_tunnel *prev = NULL;
+            SSET_FOR_EACH (chassis, &remote_chassis) {
+                const struct chassis_tunnel *tun
+                    = chassis_tunnel_find(&tunnels, chassis);
+                if (!tun) {
+                    continue;
+                }
+
+                if (!prev || tun->type != prev->type) {
+                    put_encapsulation(mff_ovn_geneve, tun,
+                                      mc->datapath, mc->tunnel_key, &ofpacts);
+                    prev = tun;
+                }
+                ofpact_put_OUTPUT(&ofpacts)->port = tun->ofport;
+            }
+
+            if (ofpacts.size) {
+                if (local_ports) {
+                    put_resubmit(33, &ofpacts);
+                }
+                ofctrl_add_flow(flow_table, 32, 100, &match, &ofpacts);
+            }
         }
-        ofpact_put_OUTPUT(&ofpacts)->port = ofport;
-        ofctrl_add_flow(flow_table, 64, 50, &match, &ofpacts);
+        sset_destroy(&remote_chassis);
     }
 
+    /* Table 0, priority 100.
+     * ======================
+     *
+     * For packets that arrive from a remote hypervisor (by matching a tunnel
+     * in_port), set MFF_LOG_DATAPATH, MFF_LOG_INPORT, and MFF_LOG_OUTPORT from
+     * the tunnel key data, then resubmit to table 33 to handle packets to the
+     * local hypervisor. */
+
+    struct chassis_tunnel *tun;
+    HMAP_FOR_EACH (tun, hmap_node, &tunnels) {
+        struct match match = MATCH_CATCHALL_INITIALIZER;
+        match_set_in_port(&match, tun->ofport);
+
+        ofpbuf_clear(&ofpacts);
+        if (tun->type == GENEVE) {
+            put_move(MFF_TUN_ID, 0,  MFF_LOG_DATAPATH, 0, 24, &ofpacts);
+            put_move(mff_ovn_geneve, 16, MFF_LOG_INPORT, 0, 15,
+                     &ofpacts);
+            put_move(mff_ovn_geneve, 0, MFF_LOG_OUTPORT, 0, 16,
+                     &ofpacts);
+        } else if (tun->type == STT) {
+            put_move(MFF_TUN_ID, 40, MFF_LOG_INPORT,   0, 15, &ofpacts);
+            put_move(MFF_TUN_ID, 24, MFF_LOG_OUTPORT,  0, 16, &ofpacts);
+            put_move(MFF_TUN_ID,  0, MFF_LOG_DATAPATH, 0, 24, &ofpacts);
+        } else {
+            OVS_NOT_REACHED();
+        }
+        put_resubmit(33, &ofpacts);
+
+        ofctrl_add_flow(flow_table, 0, 100, &match, &ofpacts);
+    }
+
+    /* Table 34, Priority 0.
+     * =======================
+     *
+     * Resubmit packets that don't output to the ingress port to the logical
+     * egress pipeline. */
+    struct match match;
+    match_init_catchall(&match);
+    ofpbuf_clear(&ofpacts);
+    put_resubmit(48, &ofpacts);
+    ofctrl_add_flow(flow_table, 34, 0, &match, &ofpacts);
+
     ofpbuf_uninit(&ofpacts);
     simap_destroy(&lport_to_ofport);
-    simap_destroy(&chassis_to_ofport);
+    struct chassis_tunnel *tun_next;
+    HMAP_FOR_EACH_SAFE (tun, tun_next, hmap_node, &tunnels) {
+        hmap_remove(&tunnels, &tun->hmap_node);
+        free(tun);
+    }
+    hmap_destroy(&tunnels);
 }
diff --git a/ovn/controller/physical.h b/ovn/controller/physical.h
index 82baa2f..edb644b 100644
--- a/ovn/controller/physical.h
+++ b/ovn/controller/physical.h
@@ -20,10 +20,13 @@
  * ============================
  *
  * This module implements physical-to-logical and logical-to-physical
- * translation as separate OpenFlow tables that run before and after,
- * respectively, the logical pipeline OpenFlow tables.
+ * translation as separate OpenFlow tables that run before the ingress pipeline
+ * and after the egress pipeline, respectively, as well as to connect the
+ * two pipelines.
  */
 
+#include "meta-flow.h"
+
 struct controller_ctx;
 struct hmap;
 struct ovsdb_idl;
@@ -37,8 +40,8 @@ struct ovsrec_bridge;
 #define OVN_GENEVE_LEN 4
 
 void physical_register_ovs_idl(struct ovsdb_idl *);
-void physical_run(struct controller_ctx *, const struct ovsrec_bridge *br_int,
-                  const char *chassis_id, 
+void physical_run(struct controller_ctx *, enum mf_field_id mff_ovn_geneve,
+                  const struct ovsrec_bridge *br_int, const char *chassis_id,
                   struct hmap *flow_table);
 
 #endif /* ovn/physical.h */
diff --git a/ovn/controller/rule.c b/ovn/controller/rule.c
index c7281a0..de8b509 100644
--- a/ovn/controller/rule.c
+++ b/ovn/controller/rule.c
@@ -135,19 +135,12 @@ symtab_init(void)
 
 /* A logical datapath.
  *
- * 'uuid' is the UUID that represents the logical datapath in the OVN_SB
- * database.
- *
- * 'integer' represents the logical datapath as an integer value that is unique
- * only within the local hypervisor.  Because of its size, this value is more
- * practical for use in an OpenFlow flow table than a UUID.
- *
  * 'ports' maps 'logical_port' names to 'tunnel_key' values in the OVN_SB
  * Binding table within the logical datapath. */
 struct logical_datapath {
     struct hmap_node hmap_node; /* Indexed on 'uuid'. */
-    struct uuid uuid;           /* The logical_datapath's UUID. */
-    uint32_t integer;           /* Locally unique among logical datapaths. */
+    struct uuid uuid;           /* UUID from Datapath_Binding row. */
+    uint32_t tunnel_key;        /* 'tunnel_key' from Datapath_Binding row. */
     struct simap ports;         /* Logical port name to port number. */
 };
 
@@ -157,45 +150,40 @@ static struct hmap logical_datapaths = HMAP_INITIALIZER(&logical_datapaths);
 /* Finds and returns the logical_datapath with the given 'uuid', or NULL if
  * no such logical_datapath exists. */
 static struct logical_datapath *
-ldp_lookup(const struct uuid *uuid)
+ldp_lookup(const struct sbrec_datapath_binding *binding)
 {
     struct logical_datapath *ldp;
-    HMAP_FOR_EACH_IN_BUCKET (ldp, hmap_node, uuid_hash(uuid),
+    HMAP_FOR_EACH_IN_BUCKET (ldp, hmap_node, uuid_hash(&binding->header_.uuid),
                              &logical_datapaths) {
-        if (uuid_equals(&ldp->uuid, uuid)) {
+        if (uuid_equals(&ldp->uuid, &binding->header_.uuid)) {
             return ldp;
         }
     }
     return NULL;
 }
 
-/* Finds and returns the integer value corresponding to the given 'uuid', or 0
- * if no such logical datapath exists. */
-uint32_t
-ldp_to_integer(const struct uuid *logical_datapath)
-{
-    const struct logical_datapath *ldp = ldp_lookup(logical_datapath);
-    return ldp ? ldp->integer : 0;
-}
-
 /* Creates a new logical_datapath with the given 'uuid'. */
 static struct logical_datapath *
-ldp_create(const struct uuid *uuid)
+ldp_create(const struct sbrec_datapath_binding *binding)
 {
-    static uint32_t next_integer = 1;
     struct logical_datapath *ldp;
 
-    /* We don't handle the case where the logical datapaths wrap around. */
-    ovs_assert(next_integer);
-
     ldp = xmalloc(sizeof *ldp);
-    hmap_insert(&logical_datapaths, &ldp->hmap_node, uuid_hash(uuid));
-    ldp->uuid = *uuid;
-    ldp->integer = next_integer++;
+    hmap_insert(&logical_datapaths, &ldp->hmap_node,
+                uuid_hash(&binding->header_.uuid));
+    ldp->uuid = binding->header_.uuid;
+    ldp->tunnel_key = binding->tunnel_key;
     simap_init(&ldp->ports);
     return ldp;
 }
 
+static struct logical_datapath *
+ldp_lookup_or_create(const struct sbrec_datapath_binding *binding)
+{
+    struct logical_datapath *ldp = ldp_lookup(binding);
+    return ldp ? ldp : ldp_create(binding);
+}
+
 static void
 ldp_free(struct logical_datapath *ldp)
 {
@@ -204,8 +192,9 @@ ldp_free(struct logical_datapath *ldp)
     free(ldp);
 }
 
-/* Iterates through all of the records in the Binding table, updating the
- * table of logical_datapaths to match the values found in active Bindings. */
+/* Iterates through all of the records in the Port_Binding table, updating the
+ * table of logical_datapaths to match the values found in active
+ * Port_Bindings. */
 static void
 ldp_run(struct controller_ctx *ctx)
 {
@@ -216,16 +205,17 @@ ldp_run(struct controller_ctx *ctx)
 
     const struct sbrec_port_binding *binding;
     SBREC_PORT_BINDING_FOR_EACH (binding, ctx->ovnsb_idl) {
-        struct logical_datapath *ldp;
-
-        ldp = ldp_lookup(&binding->logical_datapath);
-        if (!ldp) {
-            ldp = ldp_create(&binding->logical_datapath);
-        }
+        struct logical_datapath *ldp = ldp_lookup_or_create(binding->datapath);
 
         simap_put(&ldp->ports, binding->logical_port, binding->tunnel_key);
     }
 
+    const struct sbrec_multicast_group *mc;
+    SBREC_MULTICAST_GROUP_FOR_EACH (mc, ctx->ovnsb_idl) {
+        struct logical_datapath *ldp = ldp_lookup_or_create(mc->datapath);
+        simap_put(&ldp->ports, mc->name, mc->tunnel_key);
+    }
+
     struct logical_datapath *next_ldp;
     HMAP_FOR_EACH_SAFE (ldp, next_ldp, hmap_node, &logical_datapaths) {
         if (simap_is_empty(&ldp->ports)) {
@@ -250,9 +240,7 @@ rule_init(void)
 }
 
 /* Translates logical flows in the Rule table in the OVN_SB database into
- * OpenFlow flows, adding the OpenFlow flows to 'flow_table'.
- *
- * We put the Rule flows into OpenFlow tables 16 through 47 (inclusive). */
+ * OpenFlow flows.  See ovn-architecture(7) for more information. */
 void
 rule_run(struct controller_ctx *ctx, struct hmap *flow_table)
 {
@@ -268,22 +256,29 @@ rule_run(struct controller_ctx *ctx, struct hmap *flow_table)
          * bound to that logical datapath, so there's no point in maintaining
          * any flows for it anyway, so skip it. */
         const struct logical_datapath *ldp;
-        ldp = ldp_lookup(&rule->logical_datapath);
+        ldp = ldp_lookup(rule->logical_datapath);
         if (!ldp) {
             continue;
         }
 
-        /* Translate OVN actions into OpenFlow actions. */
+        /* Translate logical table ID to physical table ID. */
+        bool ingress = !strcmp(rule->pipeline, "ingress");
+        uint8_t phys_table = rule->table_id + (ingress ? 16 : 48);
+        uint8_t next_phys_table = rule->table_id < 15 ? phys_table + 1 : 0;
+        uint8_t output_phys_table = ingress ? 32 : 64;
+
+        /* Translate OVN actions into OpenFlow actions.
+         *
+         * XXX Deny changes to 'outport' in egress pipeline. */
         uint64_t ofpacts_stub[64 / 8];
         struct ofpbuf ofpacts;
         struct expr *prereqs;
-        uint8_t next_table_id;
         char *error;
 
         ofpbuf_use_stub(&ofpacts, ofpacts_stub, sizeof ofpacts_stub);
-        next_table_id = rule->table_id < 31 ? rule->table_id + 17 : 0;
         error = actions_parse_string(rule->actions, &symtab, &ldp->ports,
-                                     next_table_id, 64, &ofpacts, &prereqs);
+                                     next_phys_table, output_phys_table,
+                                     &ofpacts, &prereqs);
         if (error) {
             static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
             VLOG_WARN_RL(&rl, "error parsing actions \"%s\": %s",
@@ -322,13 +317,13 @@ rule_run(struct controller_ctx *ctx, struct hmap *flow_table)
         /* Prepare the OpenFlow matches for adding to the flow table. */
         struct expr_match *m;
         HMAP_FOR_EACH (m, hmap_node, &matches) {
-            match_set_metadata(&m->match, htonll(ldp->integer));
+            match_set_metadata(&m->match, htonll(ldp->tunnel_key));
             if (m->match.wc.masks.conj_id) {
                 m->match.flow.conj_id += conj_id_ofs;
             }
             if (!m->n) {
-                ofctrl_add_flow(flow_table, rule->table_id + 16,
-                                rule->priority, &m->match, &ofpacts);
+            ofctrl_add_flow(flow_table, phys_table, rule->priority,
+                                &m->match, &ofpacts);
             } else {
                 uint64_t conj_stubs[64 / 8];
                 struct ofpbuf conj;
@@ -343,8 +338,8 @@ rule_run(struct controller_ctx *ctx, struct hmap *flow_table)
                     dst->clause = src->clause;
                     dst->n_clauses = src->n_clauses;
                 }
-                ofctrl_add_flow(flow_table, rule->table_id + 16,
-                                rule->priority, &m->match, &conj);
+                ofctrl_add_flow(flow_table, phys_table, rule->priority,
+                                &m->match, &conj);
                 ofpbuf_uninit(&conj);
             }
         }
diff --git a/ovn/controller/rule.h b/ovn/controller/rule.h
index a7bd71f..a39fba8 100644
--- a/ovn/controller/rule.h
+++ b/ovn/controller/rule.h
@@ -20,10 +20,10 @@
 /* Rule table translation to OpenFlow
  * ==================================
  *
- * The Rule table obtained from the OVN_Southbound database works in terms
- * of logical entities, that is, logical flows among logical datapaths and
- * logical ports.  This code translates these logical flows into OpenFlow flows
- * that, again, work in terms of logical entities implemented through OpenFlow
+ * The Rule table obtained from the OVN_Southbound database works in terms of
+ * logical entities, that is, logical flows among logical datapaths and logical
+ * ports.  This code translates these logical flows into OpenFlow flows that,
+ * again, work in terms of logical entities implemented through OpenFlow
  * extensions (e.g. registers represent the logical input and output ports).
  *
  * Physical-to-logical and logical-to-physical translation are implemented in
@@ -46,6 +46,4 @@ void rule_init(void);
 void rule_run(struct controller_ctx *, struct hmap *flow_table);
 void rule_destroy(void);
 
-uint32_t ldp_to_integer(const struct uuid *logical_datapath);
-
 #endif /* ovn/rule.h */
diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index eac5546..5ecd13e 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -30,6 +30,7 @@
 #include "ovn/lib/ovn-nb-idl.h"
 #include "ovn/lib/ovn-sb-idl.h"
 #include "poll-loop.h"
+#include "smap.h"
 #include "stream.h"
 #include "stream-ssl.h"
 #include "unixctl.h"
@@ -74,135 +75,559 @@ Options:\n\
     stream_usage("database", true, true, false);
 }
 
-static int
-compare_strings(const void *a_, const void *b_)
+struct key_node {
+    struct hmap_node hmap_node;
+    uint32_t key;
+};
+
+static void
+keys_destroy(struct hmap *keys)
 {
-    char *const *a = a_;
-    char *const *b = b_;
-    return strcmp(*a, *b);
+    struct key_node *node, *next;
+    HMAP_FOR_EACH_SAFE (node, next, hmap_node, keys) {
+        hmap_remove(keys, &node->hmap_node);
+        free(node);
+    }
+    hmap_destroy(keys);
+}
+
+static void
+add_key(struct hmap *set, uint32_t key)
+{
+    struct key_node *node = xmalloc(sizeof *node);
+    hmap_insert(set, &node->hmap_node, hash_int(key, 0));
+    node->key = key;
 }
 
-/*
- * Determine whether 2 arrays of MAC addresses are the same.  It's possible that
- * the lists could be *very* long and this check is being done a lot (every
- * time the OVN_Northbound database changes).
- */
 static bool
-macs_equal(char **binding_macs_, size_t b_n_macs,
-           char **lport_macs_, size_t l_n_macs)
+key_in_use(const struct hmap *set, uint32_t key)
 {
-    char **binding_macs, **lport_macs;
-    size_t bytes, i;
+    const struct key_node *node;
+    HMAP_FOR_EACH_IN_BUCKET (node, hmap_node, hash_int(key, 0), set) {
+        if (node->key == key) {
+            return true;
+        }
+    }
+    return false;
+}
 
-    if (b_n_macs != l_n_macs) {
-        return false;
+static uint32_t
+allocate_key(struct hmap *set, const char *name, uint32_t max, uint32_t *prev)
+{
+    for (uint32_t key = *prev + 1; key != *prev;
+         key = key + 1 <= max ? key + 1 : 1) {
+        if (!key_in_use(set, key)) {
+            add_key(set, key);
+            *prev = key;
+            return key;
+        }
     }
 
-    bytes = b_n_macs * sizeof binding_macs_[0];
-    binding_macs = xmalloc(bytes);
-    lport_macs = xmalloc(bytes);
+    static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
+    VLOG_WARN_RL(&rl, "all %s tunnel keys exhausted", name);
+    return 0;
+}
+
+/* The 'key' comes from nb->header_.uuid or sb->external_ids's ' */
+struct ovn_datapath {
+    struct hmap_node key_node;  /* Index on 'key'. */
+    struct uuid key;            /* nb->header_.uuid. */
+
+    const struct nbrec_logical_switch *nb;   /* May be NULL. */
+    const struct sbrec_datapath_binding *sb; /* May be NULL. */
 
-    memcpy(binding_macs, binding_macs_, bytes);
-    memcpy(lport_macs, lport_macs_, bytes);
+    struct ovs_list list;       /* In list of similar records. */
 
-    qsort(binding_macs, b_n_macs, sizeof binding_macs[0], compare_strings);
-    qsort(lport_macs, l_n_macs, sizeof lport_macs[0], compare_strings);
+    struct hmap port_keys;
+    uint32_t max_port_key;
 
-    for (i = 0; i < b_n_macs; i++) {
-        if (strcmp(binding_macs[i], lport_macs[i])) {
-            break;
+    bool has_unknown;
+};
+
+static struct ovn_datapath *
+ovn_datapath_create(struct hmap *dp_map, const struct uuid *key,
+                    const struct nbrec_logical_switch *nb,
+                    const struct sbrec_datapath_binding *sb)
+{
+    struct ovn_datapath *od = xzalloc(sizeof *od);
+    od->key = *key;
+    od->sb = sb;
+    od->nb = nb;
+    hmap_init(&od->port_keys);
+    od->max_port_key = 0;
+    hmap_insert(dp_map, &od->key_node, uuid_hash(&od->key));
+    return od;
+}
+
+static void
+ovn_datapath_destroy(struct hmap *dp_map, struct ovn_datapath *od)
+{
+    if (od) {
+        /* Don't remove od->list, it's only safe and only used within
+         * build_datapaths(). */
+        hmap_remove(dp_map, &od->key_node);
+        keys_destroy(&od->port_keys);
+        free(od);
+    }
+}
+
+static struct ovn_datapath *
+ovn_datapath_find(struct hmap *dp_map, const struct uuid *uuid)
+{
+    struct ovn_datapath *od;
+
+    HMAP_FOR_EACH_WITH_HASH (od, key_node, uuid_hash(uuid), dp_map) {
+        if (uuid_equals(uuid, &od->key)) {
+            return od;
+        }
+    }
+    return NULL;
+}
+
+static struct ovn_datapath *
+ovn_datapath_from_sbrec(struct hmap *dp_map,
+                        const struct sbrec_datapath_binding *sb)
+{
+    struct uuid key;
+
+    if (!smap_get_uuid(&sb->external_ids, "logical-switch", &key)) {
+        return NULL;
+    }
+    return ovn_datapath_find(dp_map, &key);
+}
+
+static void
+join_datapaths(struct northd_context *ctx, struct hmap *dp_map,
+               struct ovs_list *sb_only, struct ovs_list *nb_only,
+               struct ovs_list *both)
+{
+    hmap_init(dp_map);
+    list_init(sb_only);
+    list_init(nb_only);
+    list_init(both);
+
+    const struct sbrec_datapath_binding *sb, *sb_next;
+    SBREC_DATAPATH_BINDING_FOR_EACH_SAFE (sb, sb_next, ctx->ovnsb_idl) {
+        struct uuid key;
+        if (!smap_get_uuid(&sb->external_ids, "logical-switch", &key)) {
+            ovsdb_idl_txn_add_comment(ctx->ovnsb_txn,
+                                      "deleting Datapath_Binding "UUID_FMT" that "
+                                      "lacks external-ids:logical-switch",
+                         UUID_ARGS(&sb->header_.uuid));
+            sbrec_datapath_binding_delete(sb);
+            continue;
+        }
+
+        if (ovn_datapath_find(dp_map, &key)) {
+            static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
+            VLOG_INFO_RL(&rl, "deleting Datapath_Binding "UUID_FMT" with "
+                         "duplicate external-ids:logical-switch "UUID_FMT,
+                         UUID_ARGS(&sb->header_.uuid), UUID_ARGS(&key));
+            sbrec_datapath_binding_delete(sb);
+            continue;
+        }
+
+        struct ovn_datapath *od = ovn_datapath_create(dp_map, &key, NULL, sb);
+        list_push_back(sb_only, &od->list);
+    }
+
+    const struct nbrec_logical_switch *nb;
+    NBREC_LOGICAL_SWITCH_FOR_EACH (nb, ctx->ovnnb_idl) {
+        struct ovn_datapath *od = ovn_datapath_find(dp_map, &nb->header_.uuid);
+        if (od) {
+            od->nb = nb;
+            list_remove(&od->list);
+            list_push_back(both, &od->list);
+        } else {
+            od = ovn_datapath_create(dp_map, &nb->header_.uuid, nb, NULL);
+            list_push_back(nb_only, &od->list);
+        }
+    }
+}
+
+static uint32_t
+ovn_datapath_allocate_key(struct hmap *dp_keys)
+{
+    static uint32_t prev;
+    return allocate_key(dp_keys, "datapath", (1u << 24) - 1, &prev);
+}
+
+static void
+build_datapaths(struct northd_context *ctx, struct hmap *dp_map)
+{
+    struct ovs_list sb_dps, nb_dps, both_dps;
+
+    join_datapaths(ctx, dp_map, &sb_dps, &nb_dps, &both_dps);
+
+    if (!list_is_empty(&nb_dps)) {
+        /* First index the in-use datapath tunnel keys. */
+        struct hmap dp_keys = HMAP_INITIALIZER(&dp_keys);
+        struct ovn_datapath *od;
+        LIST_FOR_EACH (od, list, &both_dps) {
+            add_key(&dp_keys, od->sb->tunnel_key);
+        }
+
+        /* Add southbound record for each unmatched northbound record. */
+        LIST_FOR_EACH (od, list, &nb_dps) {
+            uint16_t tunnel_key = ovn_datapath_allocate_key(&dp_keys);
+            if (!tunnel_key) {
+                break;
+            }
+
+            od->sb = sbrec_datapath_binding_insert(ctx->ovnsb_txn);
+
+            struct smap external_ids = SMAP_INITIALIZER(&external_ids);
+            char uuid_s[UUID_LEN + 1];
+            sprintf(uuid_s, UUID_FMT, UUID_ARGS(&od->nb->header_.uuid));
+            smap_add(&external_ids, "logical-switch", uuid_s);
+            sbrec_datapath_binding_set_external_ids(od->sb, &external_ids);
+            smap_destroy(&external_ids);
+
+            sbrec_datapath_binding_set_tunnel_key(od->sb, tunnel_key);
+        }
+    }
+
+    /* Delete southbound records without northbound matches. */
+    struct ovn_datapath *od, *next;
+    LIST_FOR_EACH_SAFE (od, next, list, &sb_dps) {
+        list_remove(&od->list);
+        sbrec_datapath_binding_delete(od->sb);
+        ovn_datapath_destroy(dp_map, od);
+    }
+}
+
+struct ovn_port {
+    struct hmap_node key_node;  /* Index on 'key'. */
+    const char *key;            /* nb->name and sb->logical_port */
+
+    const struct nbrec_logical_port *nb; /* May be NULL. */
+    const struct sbrec_port_binding *sb; /* May be NULL. */
+
+    struct ovn_datapath *od;
+
+    struct ovs_list list;       /* In list of similar records. */
+};
+
+static struct ovn_port *
+ovn_port_create(struct hmap *port_map, const char *key,
+                const struct nbrec_logical_port *nb,
+                const struct sbrec_port_binding *sb)
+{
+    struct ovn_port *op = xzalloc(sizeof *op);
+    op->key = key;
+    op->sb = sb;
+    op->nb = nb;
+    hmap_insert(port_map, &op->key_node, hash_string(op->key, 0));
+    return op;
+}
+
+static void
+ovn_port_destroy(struct hmap *port_map, struct ovn_port *port)
+{
+    if (port) {
+        /* Don't remove port->list, it's only safe and only used within
+         * build_ports(). */
+        hmap_remove(port_map, &port->key_node);
+        free(port);
+    }
+}
+
+static struct ovn_port *
+ovn_port_find(struct hmap *port_map, const char *name)
+{
+    struct ovn_port *op;
+
+    HMAP_FOR_EACH_WITH_HASH (op, key_node, hash_string(name, 0), port_map) {
+        if (!strcmp(op->key, name)) {
+            return op;
+        }
+    }
+    return NULL;
+}
+
+static uint32_t
+ovn_port_allocate_key(struct ovn_datapath *od)
+{
+    return allocate_key(&od->port_keys, "port",
+                        (1u << 16) - 1, &od->max_port_key);
+}
+
+static void
+join_logical_ports(struct northd_context *ctx,
+                   struct hmap *dp_map, struct hmap *port_map,
+                   struct ovs_list *sb_only, struct ovs_list *nb_only,
+                   struct ovs_list *both)
+{
+    hmap_init(port_map);
+    list_init(sb_only);
+    list_init(nb_only);
+    list_init(both);
+
+    const struct sbrec_port_binding *sb;
+    SBREC_PORT_BINDING_FOR_EACH (sb, ctx->ovnsb_idl) {
+        struct ovn_port *op = ovn_port_create(port_map, sb->logical_port,
+                                              NULL, sb);
+        list_push_back(sb_only, &op->list);
+    }
+
+    struct ovn_datapath *od;
+    HMAP_FOR_EACH (od, key_node, dp_map) {
+        for (size_t i = 0; i < od->nb->n_ports; i++) {
+            const struct nbrec_logical_port *nb = od->nb->ports[i];
+            struct ovn_port *op = ovn_port_find(port_map, nb->name);
+            if (op) {
+                op->nb = nb;
+                list_remove(&op->list);
+                list_push_back(both, &op->list);
+            } else {
+                op = ovn_port_create(port_map, nb->name, nb, NULL);
+                list_push_back(nb_only, &op->list);
+            }
+            op->od = od;
+        }
+    }
+}
+
+static void
+ovn_port_update_sbrec(const struct ovn_port *op)
+{
+    sbrec_port_binding_set_datapath(op->sb, op->od->sb);
+    sbrec_port_binding_set_parent_port(op->sb, op->nb->parent_name);
+    sbrec_port_binding_set_tag(op->sb, op->nb->tag, op->nb->n_tag);
+    sbrec_port_binding_set_mac(op->sb, (const char **) op->nb->macs,
+                               op->nb->n_macs);
+}
+
+static void
+build_ports(struct northd_context *ctx, struct hmap *dp_map,
+            struct hmap *port_map)
+{
+    struct ovs_list sb_ports, nb_ports, both_ports;
+
+    join_logical_ports(ctx, dp_map, port_map,
+                       &sb_ports, &nb_ports, &both_ports);
+
+    /* For logical ports that are in both databases, update the southbound
+     * record based on northbound data.  Also index the in-use tunnel_keys. */
+    struct ovn_port *op, *next;
+    LIST_FOR_EACH_SAFE (op, next, list, &both_ports) {
+        ovn_port_update_sbrec(op);
+
+        add_key(&op->od->port_keys, op->sb->tunnel_key);
+        if (op->sb->tunnel_key > op->od->max_port_key) {
+            op->od->max_port_key = op->sb->tunnel_key;
+        }
+    }
+
+    /* Add southbound record for each unmatched northbound record. */
+    LIST_FOR_EACH_SAFE (op, next, list, &nb_ports) {
+        uint16_t tunnel_key = ovn_port_allocate_key(op->od);
+        if (!tunnel_key) {
+            continue;
+        }
+
+        op->sb = sbrec_port_binding_insert(ctx->ovnsb_txn);
+        ovn_port_update_sbrec(op);
+
+        sbrec_port_binding_set_logical_port(op->sb, op->key);
+        sbrec_port_binding_set_tunnel_key(op->sb, tunnel_key);
+    }
+
+    /* Delete southbound records without northbound matches. */
+    LIST_FOR_EACH_SAFE(op, next, list, &sb_ports) {
+        list_remove(&op->list);
+        sbrec_port_binding_delete(op->sb);
+        ovn_port_destroy(port_map, op);
+    }
+}
+
+#define OVN_MIN_MULTICAST 32768
+#define OVN_MAX_MULTICAST 65535
+
+struct multicast_group {
+    const char *name;
+    uint16_t key;               /* OVN_MIN_MULTICAST...OVN_MAX_MULTICAST. */
+};
+
+#define MC_FLOOD "_MC_flood"
+static const struct multicast_group mc_flood = { MC_FLOOD, 65535 };
+
+#define MC_UNKNOWN "_MC_unknown"
+static const struct multicast_group mc_unknown = { MC_UNKNOWN, 65534 };
+
+static bool
+multicast_group_equal(const struct multicast_group *a,
+                      const struct multicast_group *b)
+{
+    return !strcmp(a->name, b->name) && a->key == b->key;
+}
+
+/* Multicast group entry. */
+struct ovn_multicast {
+    struct hmap_node hmap_node; /* Index on 'datapath', 'key', */
+    struct ovn_datapath *datapath;
+    const struct multicast_group *group;
+
+    struct ovn_port **ports;
+    size_t n_ports, allocated_ports;
+};
+
+static uint32_t
+ovn_multicast_hash(const struct ovn_datapath *datapath,
+                   const struct multicast_group *group)
+{
+    return hash_pointer(datapath, group->key);
+}
+
+static struct ovn_multicast *
+ovn_multicast_find(struct hmap *mcgroups, struct ovn_datapath *datapath,
+                   const struct multicast_group *group)
+{
+    struct ovn_multicast *mc;
+
+    HMAP_FOR_EACH_WITH_HASH (mc, hmap_node,
+                             ovn_multicast_hash(datapath, group), mcgroups) {
+        if (mc->datapath == datapath
+            && multicast_group_equal(mc->group, group)) {
+            return mc;
         }
     }
+    return NULL;
+}
 
-    free(binding_macs);
-    free(lport_macs);
+static void
+ovn_multicast_add(struct hmap *mcgroups, const struct multicast_group *group,
+                  struct ovn_port *port)
+{
+    struct ovn_datapath *od = port->od;
+    struct ovn_multicast *mc = ovn_multicast_find(mcgroups, od, group);
+    if (!mc) {
+        mc = xmalloc(sizeof *mc);
+        hmap_insert(mcgroups, &mc->hmap_node, ovn_multicast_hash(od, group));
+        mc->datapath = od;
+        mc->group = group;
+        mc->n_ports = 0;
+        mc->allocated_ports = 4;
+        mc->ports = xmalloc(mc->allocated_ports * sizeof *mc->ports);
+    }
+    if (mc->n_ports >= mc->allocated_ports) {
+        mc->ports = x2nrealloc(mc->ports, &mc->allocated_ports,
+                               sizeof *mc->ports);
+    }
+    mc->ports[mc->n_ports++] = port;
+}
 
-    return (i == b_n_macs) ? true : false;
+static void
+ovn_multicast_destroy(struct hmap *mcgroups, struct ovn_multicast *mc)
+{
+    if (mc) {
+        hmap_remove(mcgroups, &mc->hmap_node);
+        free(mc->ports);
+        free(mc);
+    }
+}
+
+static void
+ovn_multicast_update_sbrec(const struct ovn_multicast *mc,
+                           const struct sbrec_multicast_group *sb)
+{
+    struct sbrec_port_binding **ports = xmalloc(mc->n_ports * sizeof *ports);
+    for (size_t i = 0; i < mc->n_ports; i++) {
+        ports[i] = CONST_CAST(struct sbrec_port_binding *, mc->ports[i]->sb);
+    }
+    sbrec_multicast_group_set_ports(sb, ports, mc->n_ports);
+    free(ports);
 }
 
 /* Rule generation.
  *
- * This code generates the Rule table in the southbound database, as a
- * function of most of the northbound database.
+ * This code generates the Rule table in the southbound database, as a function
+ * of most of the northbound database.
  */
 
-/* Enough context to add a Rule row, using rule_add(). */
-struct rule_ctx {
-    /* From northd_context. */
-    struct ovsdb_idl *ovnsb_idl;
-    struct ovsdb_idl_txn *ovnsb_txn;
-
-    /* Contains "struct rule_hash_node"s.  Used to figure out what existing
-     * Rule rows should be deleted: we index all of the Rule rows into this
-     * data structure, then as existing rows are generated we remove them.
-     * After generating all the rows, any remaining in 'rule_hmap' must be
-     * deleted from the database. */
-    struct hmap rule_hmap;
-};
+struct ovn_rule {
+    struct hmap_node hmap_node;
 
-/* A row in the Rule table, indexed by its full contents, */
-struct rule_hash_node {
-    struct hmap_node node;
-    const struct sbrec_rule *rule;
+    struct ovn_datapath *od;
+    enum ovn_pipeline { P_IN, P_OUT } pipeline;
+    uint8_t table_id;
+    uint16_t priority;
+    char *match;
+    char *actions;
 };
 
 static size_t
-rule_hash(const struct uuid *logical_datapath, uint8_t table_id,
-          uint16_t priority, const char *match, const char *actions)
+ovn_rule_hash(const struct ovn_rule *rule)
 {
-    size_t hash = uuid_hash(logical_datapath);
-    hash = hash_2words((table_id << 16) | priority, hash);
-    hash = hash_string(match, hash);
-    return hash_string(actions, hash);
+    size_t hash = uuid_hash(&rule->od->key);
+    hash = hash_2words((rule->table_id << 16) | rule->priority, hash);
+    hash = hash_string(rule->match, hash);
+    return hash_string(rule->actions, hash);
 }
 
-static size_t
-rule_hash_rec(const struct sbrec_rule *rule)
+static bool
+ovn_rule_equal(const struct ovn_rule *a, const struct ovn_rule *b)
+{
+    return (a->od == b->od
+            && a->pipeline == b->pipeline
+            && a->table_id == b->table_id
+            && a->priority == b->priority
+            && !strcmp(a->match, b->match)
+            && !strcmp(a->actions, b->actions));
+}
+
+static void
+ovn_rule_init(struct ovn_rule *rule, struct ovn_datapath *od,
+              enum ovn_pipeline pipeline, uint8_t table_id, uint16_t priority,
+              char *match, char *actions)
 {
-    return rule_hash(&rule->logical_datapath, rule->table_id,
-                         rule->priority, rule->match,
-                         rule->actions);
+    rule->od = od;
+    rule->pipeline = pipeline;
+    rule->table_id = table_id;
+    rule->priority = priority;
+    rule->match = match;
+    rule->actions = actions;
 }
 
 /* Adds a row with the specified contents to the Rule table. */
 static void
-rule_add(struct rule_ctx *ctx,
-         const struct nbrec_logical_switch *logical_datapath,
-         uint8_t table_id,
-         uint16_t priority,
-         const char *match,
-         const char *actions)
-{
-    struct rule_hash_node *hash_node;
-
-    /* Check whether such a row already exists in the Rule table.  If so,
-     * remove it from 'ctx->rule_hmap' and we're done. */
-    HMAP_FOR_EACH_WITH_HASH (hash_node, node,
-                             rule_hash(&logical_datapath->header_.uuid,
-                                       table_id, priority, match, actions),
-                             &ctx->rule_hmap) {
-        const struct sbrec_rule *rule = hash_node->rule;
-        if (uuid_equals(&rule->logical_datapath,
-                        &logical_datapath->header_.uuid)
-            && rule->table_id == table_id
-            && rule->priority == priority
-            && !strcmp(rule->match, match)
-            && !strcmp(rule->actions, actions)) {
-            hmap_remove(&ctx->rule_hmap, &hash_node->node);
-            free(hash_node);
-            return;
-        }
-    }
-
-    /* No such Rule row.  Add one. */
-    const struct sbrec_rule *rule;
-    rule = sbrec_rule_insert(ctx->ovnsb_txn);
-    sbrec_rule_set_logical_datapath(rule,
-                                        logical_datapath->header_.uuid);
-    sbrec_rule_set_table_id(rule, table_id);
-    sbrec_rule_set_priority(rule, priority);
-    sbrec_rule_set_match(rule, match);
-    sbrec_rule_set_actions(rule, actions);
+rule_add(struct hmap *rule_map, struct ovn_datapath *od,
+         enum ovn_pipeline pipeline, uint8_t table_id, uint16_t priority,
+         const char *match, const char *actions)
+{
+    struct ovn_rule *rule = xmalloc(sizeof *rule);
+    ovn_rule_init(rule, od, pipeline, table_id, priority,
+                  xstrdup(match), xstrdup(actions));
+    hmap_insert(rule_map, &rule->hmap_node, ovn_rule_hash(rule));
+}
+
+static struct ovn_rule *
+ovn_rule_find(struct hmap *rules, struct ovn_datapath *od,
+              enum ovn_pipeline pipeline, uint8_t table_id, uint16_t priority,
+              const char *match, const char *actions)
+{
+    struct ovn_rule target;
+    ovn_rule_init(&target, od, pipeline, table_id, priority,
+                  CONST_CAST(char *, match), CONST_CAST(char *, actions));
+
+    struct ovn_rule *rule;
+    HMAP_FOR_EACH_WITH_HASH (rule, hmap_node, ovn_rule_hash(&target), rules) {
+        if (ovn_rule_equal(rule, &target)) {
+            return rule;
+        }
+    }
+    return NULL;
+}
+
+static void
+ovn_rule_destroy(struct hmap *rules, struct ovn_rule *rule)
+{
+    if (rule) {
+        hmap_remove(rules, &rule->hmap_node);
+        free(rule->match);
+        free(rule->actions);
+        free(rule);
+    }
 }
 
 /* Appends port security constraints on L2 address field 'eth_addr_field'
@@ -241,376 +666,207 @@ lport_is_enabled(const struct nbrec_logical_port *lport)
     return !lport->enabled || *lport->enabled;
 }
 
-/* Updates the Rule table in the OVN_SB database, constructing its contents
- * based on the OVN_NB database. */
+/* Updates the Rule and Multicast_Group tables in the OVN_SB database,
+ * constructing their contents based on the OVN_NB database. */
 static void
-build_rule(struct northd_context *ctx)
+build_rule(struct northd_context *ctx, struct hmap *datapaths,
+           struct hmap *ports)
 {
-    struct rule_ctx pc = {
-        .ovnsb_idl = ctx->ovnsb_idl,
-        .ovnsb_txn = ctx->ovnsb_txn,
-        .rule_hmap = HMAP_INITIALIZER(&pc.rule_hmap)
-    };
+    struct hmap rules = HMAP_INITIALIZER(&rules);
+    struct hmap mcgroups = HMAP_INITIALIZER(&mcgroups);
 
-    /* Add all the Rule entries currently in the southbound database to
-     * 'pc.rule_hmap'.  We remove entries that we generate from the hmap,
-     * thus by the time we're done only entries that need to be removed
-     * remain. */
-    const struct sbrec_rule *rule;
-    SBREC_RULE_FOR_EACH (rule, ctx->ovnsb_idl) {
-        struct rule_hash_node *hash_node = xzalloc(sizeof *hash_node);
-        hash_node->rule = rule;
-        hmap_insert(&pc.rule_hmap, &hash_node->node,
-                    rule_hash_rec(rule));
-    }
-
-    /* Table 0: Admission control framework. */
-    const struct nbrec_logical_switch *lswitch;
-    NBREC_LOGICAL_SWITCH_FOR_EACH (lswitch, ctx->ovnnb_idl) {
+    /* Ingress table 0: Admission control framework. */
+    struct ovn_datapath *od;
+    HMAP_FOR_EACH (od, key_node, datapaths) {
         /* Logical VLANs not supported. */
-        rule_add(&pc, lswitch, 0, 100, "vlan.present", "drop;");
+        rule_add(&rules, od, P_IN, 0, 100, "vlan.present", "drop;");
 
         /* Broadcast/multicast source address is invalid. */
-        rule_add(&pc, lswitch, 0, 100, "eth.src[40]", "drop;");
+        rule_add(&rules, od, P_IN, 0, 100, "eth.src[40]", "drop;");
 
         /* Port security flows have priority 50 (see below) and will continue
          * to the next table if packet source is acceptable. */
 
         /* Otherwise drop the packet. */
-        rule_add(&pc, lswitch, 0, 0, "1", "drop;");
+        rule_add(&rules, od, P_IN, 0, 0, "1", "drop;");
     }
 
-    /* Table 0: Ingress port security. */
-    NBREC_LOGICAL_SWITCH_FOR_EACH (lswitch, ctx->ovnnb_idl) {
-        for (size_t i = 0; i < lswitch->n_ports; i++) {
-            const struct nbrec_logical_port *lport = lswitch->ports[i];
-            struct ds match = DS_EMPTY_INITIALIZER;
-            ds_put_cstr(&match, "inport == ");
-            json_string_escape(lport->name, &match);
-            build_port_security("eth.src",
-                                lport->port_security, lport->n_port_security,
-                                &match);
-            rule_add(&pc, lswitch, 0, 50, ds_cstr(&match),
-                     lport_is_enabled(lport) ? "next;" : "drop;");
-            ds_destroy(&match);
-        }
+    /* Ingress table 0: Ingress port security. */
+    struct ovn_port *op;
+    HMAP_FOR_EACH (op, key_node, ports) {
+        struct ds match = DS_EMPTY_INITIALIZER;
+        ds_put_cstr(&match, "inport == ");
+        json_string_escape(op->key, &match);
+        build_port_security("eth.src",
+                            op->nb->port_security, op->nb->n_port_security,
+                            &match);
+        rule_add(&rules, op->od, P_IN, 0, 50, ds_cstr(&match),
+                 lport_is_enabled(op->nb) ? "next;" : "drop;");
+        ds_destroy(&match);
     }
 
-    /* Table 1: Destination lookup:
-     *
-     *   - Broadcast and multicast handling (priority 100).
-     *   - Unicast handling (priority 50).
-     *   - Unknown unicast address handling (priority 0).
-     *   */
-    NBREC_LOGICAL_SWITCH_FOR_EACH (lswitch, ctx->ovnnb_idl) {
-        struct ds bcast;        /* Actions for broadcast on 'lswitch'. */
-        struct ds unknown;      /* Actions for unknown MACs on 'lswitch'. */
-
-        ds_init(&bcast);
-        ds_init(&unknown);
-        for (size_t i = 0; i < lswitch->n_ports; i++) {
-            const struct nbrec_logical_port *lport = lswitch->ports[i];
-
-            ds_put_cstr(&bcast, "outport = ");
-            json_string_escape(lport->name, &bcast);
-            ds_put_cstr(&bcast, "; next; ");
-
-            for (size_t j = 0; j < lport->n_macs; j++) {
-                const char *s = lport->macs[j];
-                uint8_t mac[ETH_ADDR_LEN];
-
-                if (eth_addr_from_string(s, mac)) {
-                    struct ds match, unicast;
-
-                    ds_init(&match);
-                    ds_put_format(&match, "eth.dst == %s", s);
-
-                    ds_init(&unicast);
-                    ds_put_cstr(&unicast, "outport = ");
-                    json_string_escape(lport->name, &unicast);
-                    ds_put_cstr(&unicast, "; next;");
-                    rule_add(&pc, lswitch, 1, 50,
-                             ds_cstr(&match), ds_cstr(&unicast));
-                    ds_destroy(&unicast);
-                    ds_destroy(&match);
-                } else if (!strcmp(s, "unknown")) {
-                    ds_put_cstr(&unknown, "outport = ");
-                    json_string_escape(lport->name, &unknown);
-                    ds_put_cstr(&unknown, "; next; ");
-                } else {
-                    static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
-
-                    VLOG_INFO_RL(&rl, "%s: invalid syntax '%s' in macs column",
-                                 lport->name, s);
-                }
-            }
-        }
-
-        ds_chomp(&bcast, ' ');
-        rule_add(&pc, lswitch, 1, 100, "eth.dst[40]", ds_cstr(&bcast));
-        ds_destroy(&bcast);
-
-        if (unknown.length) {
-            ds_chomp(&unknown, ' ');
-            rule_add(&pc, lswitch, 1, 0, "1", ds_cstr(&unknown));
+    /* Ingress table 1: Destination lookup, broadcast and multicast handling
+     * (priority 100). */
+    HMAP_FOR_EACH (op, key_node, ports) {
+        if (lport_is_enabled(op->nb)) {
+            ovn_multicast_add(&mcgroups, &mc_flood, op);
         }
-        ds_destroy(&unknown);
+    }
+    HMAP_FOR_EACH (od, key_node, datapaths) {
+        rule_add(&rules, od, P_IN, 1, 100, "eth.dst[40]",
+                 "outport = \""MC_FLOOD"\"; output;");
     }
 
-    /* Table 2: ACLs. */
-    NBREC_LOGICAL_SWITCH_FOR_EACH (lswitch, ctx->ovnnb_idl) {
-        for (size_t i = 0; i < lswitch->n_acls; i++) {
-            const struct nbrec_acl *acl = lswitch->acls[i];
+    /* Ingress table 1: Destination lookup, unicast handling (priority 50), */
+    HMAP_FOR_EACH (op, key_node, ports) {
+        for (size_t i = 0; i < op->nb->n_macs; i++) {
+            uint8_t mac[ETH_ADDR_LEN];
+
+            if (eth_addr_from_string(op->nb->macs[i], mac)) {
+                struct ds match, actions;
+
+                ds_init(&match);
+                ds_put_format(&match, "eth.dst == %s", op->nb->macs[i]);
+
+                ds_init(&actions);
+                ds_put_cstr(&actions, "outport = ");
+                json_string_escape(op->nb->name, &actions);
+                ds_put_cstr(&actions, "; output;");
+                rule_add(&rules, op->od, P_IN, 1, 50,
+                         ds_cstr(&match), ds_cstr(&actions));
+                ds_destroy(&actions);
+                ds_destroy(&match);
+            } else if (!strcmp(op->nb->macs[i], "unknown")) {
+                ovn_multicast_add(&mcgroups, &mc_unknown, op);
+                op->od->has_unknown = true;
+            } else {
+                static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
 
-            NBREC_ACL_FOR_EACH (acl, ctx->ovnnb_idl) {
-                rule_add(&pc, lswitch, 2, acl->priority, acl->match,
-                         (!strcmp(acl->action, "allow") ||
-                          !strcmp(acl->action, "allow-related")
-                          ? "next;" : "drop;"));
+                VLOG_INFO_RL(&rl, "%s: invalid syntax '%s' in macs column",
+                             op->nb->name, op->nb->macs[i]);
             }
         }
-
-        rule_add(&pc, lswitch, 2, 0, "1", "next;");
     }
 
-    /* Table 3: Egress port security. */
-    NBREC_LOGICAL_SWITCH_FOR_EACH (lswitch, ctx->ovnnb_idl) {
-        rule_add(&pc, lswitch, 3, 100, "eth.dst[40]", "output;");
-
-        for (size_t i = 0; i < lswitch->n_ports; i++) {
-            const struct nbrec_logical_port *lport = lswitch->ports[i];
-            struct ds match;
-
-            ds_init(&match);
-            ds_put_cstr(&match, "outport == ");
-            json_string_escape(lport->name, &match);
-            build_port_security("eth.dst",
-                                lport->port_security, lport->n_port_security,
-                                &match);
-
-            rule_add(&pc, lswitch, 3, 50, ds_cstr(&match),
-                         lport_is_enabled(lport) ? "output;" : "drop;");
-
-            ds_destroy(&match);
+    /* Ingress table 1: Destination lookup for unknown MACs (priority 0). */
+    HMAP_FOR_EACH (od, key_node, datapaths) {
+        if (od->has_unknown) {
+            rule_add(&rules, od, P_IN, 1, 0, "1",
+                     "outport = \""MC_UNKNOWN"\"; output;");
         }
     }
 
-    /* Delete any existing Rule rows that were not re-generated.  */
-    struct rule_hash_node *hash_node, *next_hash_node;
-    HMAP_FOR_EACH_SAFE (hash_node, next_hash_node, node, &pc.rule_hmap) {
-        hmap_remove(&pc.rule_hmap, &hash_node->node);
-        sbrec_rule_delete(hash_node->rule);
-        free(hash_node);
-    }
-    hmap_destroy(&pc.rule_hmap);
-}
-
-static bool
-parents_equal(const struct sbrec_port_binding *binding,
-              const struct nbrec_logical_port *lport)
-{
-    if (!!binding->parent_port != !!lport->parent_name) {
-        /* One is set and the other is not. */
-        return false;
-    }
+    /* Egress table 0: ACLs. */
+    HMAP_FOR_EACH (od, key_node, datapaths) {
+        for (size_t i = 0; i < od->nb->n_acls; i++) {
+            const struct nbrec_acl *acl = od->nb->acls[i];
+            const char *action;
 
-    if (binding->parent_port) {
-        /* Both are set. */
-        return strcmp(binding->parent_port, lport->parent_name) ? false : true;
+            action = (!strcmp(acl->action, "allow") ||
+                      !strcmp(acl->action, "allow-related"))
+                ? "next;" : "drop;";
+            rule_add(&rules, od, P_OUT, 0, acl->priority, acl->match, action);
+        }
     }
-
-    /* Both are NULL. */
-    return true;
-}
-
-static bool
-tags_equal(const struct sbrec_port_binding *binding,
-           const struct nbrec_logical_port *lport)
-{
-    if (binding->n_tag != lport->n_tag) {
-        return false;
+    HMAP_FOR_EACH (od, key_node, datapaths) {
+        rule_add(&rules, od, P_OUT, 0, 0, "1", "next;");
     }
 
-    return binding->n_tag ? (binding->tag[0] == lport->tag[0]) : true;
-}
+    /* Egress table 1: Egress port security. */
+    HMAP_FOR_EACH (od, key_node, datapaths) {
+        rule_add(&rules, od, P_OUT, 1, 100, "eth.dst[40]", "output;");
+    }
+    HMAP_FOR_EACH (op, key_node, ports) {
+        struct ds match;
 
-struct port_binding_hash_node {
-    struct hmap_node lp_node; /* In 'lp_map', by binding->logical_port. */
-    struct hmap_node tk_node; /* In 'tk_map', by binding->tunnel_key. */
-    const struct sbrec_port_binding *binding;
-};
+        ds_init(&match);
+        ds_put_cstr(&match, "outport == ");
+        json_string_escape(op->key, &match);
+        build_port_security("eth.dst",
+                            op->nb->port_security, op->nb->n_port_security,
+                            &match);
 
-static bool
-tunnel_key_in_use(const struct hmap *tk_hmap, uint16_t tunnel_key)
-{
-    const struct port_binding_hash_node *hash_node;
+        rule_add(&rules, op->od, P_OUT, 1, 50, ds_cstr(&match),
+                 lport_is_enabled(op->nb) ? "output;" : "drop;");
 
-    HMAP_FOR_EACH_IN_BUCKET (hash_node, tk_node, hash_int(tunnel_key, 0),
-                             tk_hmap) {
-        if (hash_node->binding->tunnel_key == tunnel_key) {
-            return true;
-        }
+        ds_destroy(&match);
     }
-    return false;
-}
-
-/* Chooses and returns a positive tunnel key that is not already in use in
- * 'tk_hmap'.  Returns 0 if all tunnel keys are in use. */
-static uint16_t
-choose_tunnel_key(const struct hmap *tk_hmap)
-{
-    static uint16_t prev;
 
-    for (uint16_t key = prev + 1; key != prev; key++) {
-        if (!tunnel_key_in_use(tk_hmap, key)) {
-            prev = key;
-            return key;
+    /* Push changes to the Rule table to database. */
+    const struct sbrec_rule *sbrule, *next_sbrule;
+    SBREC_RULE_FOR_EACH_SAFE (sbrule, next_sbrule, ctx->ovnsb_idl) {
+        struct ovn_datapath *od
+            = ovn_datapath_from_sbrec(datapaths, sbrule->logical_datapath);
+        if (!od) {
+            sbrec_rule_delete(sbrule);
+            continue;
         }
-    }
-
-    static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
-    VLOG_WARN_RL(&rl, "all tunnel keys exhausted");
-    return 0;
-}
-
-/*
- * When a change has occurred in the OVN_Northbound database, we go through and
- * make sure that the contents of the Port_Binding table in the OVN_Southbound
- * database are up to date with the logical ports defined in the
- * OVN_Northbound database.
- */
-static void
-set_port_bindings(struct northd_context *ctx)
-{
-    const struct sbrec_port_binding *binding;
 
-    /*
-     * We will need to look up a port binding for every logical port.  We don't
-     * want to have to do an O(n) search for every binding, so start out by
-     * hashing them on the logical port.
-     *
-     * As we go through every logical port, we will update the binding if it
-     * exists or create one otherwise.  When the update is done, we'll remove
-     * it from the hashmap.  At the end, any bindings left in the hashmap are
-     * for logical ports that have been deleted.
-     *
-     * We index the logical_port column because that's the shared key between
-     * the OVN_NB and OVN_SB databases.  We index the tunnel_key column to
-     * allow us to choose a unique tunnel key for any Port_Binding rows we have
-     * to add.
-     */
-    struct hmap lp_hmap = HMAP_INITIALIZER(&lp_hmap);
-    struct hmap tk_hmap = HMAP_INITIALIZER(&tk_hmap);
-
-    SBREC_PORT_BINDING_FOR_EACH(binding, ctx->ovnsb_idl) {
-        struct port_binding_hash_node *hash_node = xzalloc(sizeof *hash_node);
-        hash_node->binding = binding;
-        hmap_insert(&lp_hmap, &hash_node->lp_node,
-                    hash_string(binding->logical_port, 0));
-        hmap_insert(&tk_hmap, &hash_node->tk_node,
-                    hash_int(binding->tunnel_key, 0));
-    }
-
-    const struct nbrec_logical_switch *lswitch;
-    NBREC_LOGICAL_SWITCH_FOR_EACH (lswitch, ctx->ovnnb_idl) {
-        const struct uuid *logical_datapath = &lswitch->header_.uuid;
-
-        for (size_t i = 0; i < lswitch->n_ports; i++) {
-            const struct nbrec_logical_port *lport = lswitch->ports[i];
-            struct port_binding_hash_node *hash_node;
-            binding = NULL;
-            HMAP_FOR_EACH_WITH_HASH(hash_node, lp_node,
-                                    hash_string(lport->name, 0), &lp_hmap) {
-                if (!strcmp(lport->name, hash_node->binding->logical_port)) {
-                    binding = hash_node->binding;
-                    break;
-                }
-            }
-
-            if (binding) {
-                /* We found an existing binding for this logical port.  Update
-                 * its contents. */
-
-                hmap_remove(&lp_hmap, &hash_node->lp_node);
-
-                if (!macs_equal(binding->mac, binding->n_mac,
-                                lport->macs, lport->n_macs)) {
-                    sbrec_port_binding_set_mac(binding,
-                                               (const char **) lport->macs,
-                                               lport->n_macs);
-                }
-                if (!parents_equal(binding, lport)) {
-                    sbrec_port_binding_set_parent_port(binding,
-                                                       lport->parent_name);
-                }
-                if (!tags_equal(binding, lport)) {
-                    sbrec_port_binding_set_tag(binding,
-                                               lport->tag, lport->n_tag);
-                }
-                if (!uuid_equals(&binding->logical_datapath,
-                                 logical_datapath)) {
-                    sbrec_port_binding_set_logical_datapath(binding,
-                                                            *logical_datapath);
-                }
-            } else {
-                /* There is no binding for this logical port, so create one. */
-
-                uint16_t tunnel_key = choose_tunnel_key(&tk_hmap);
-                if (!tunnel_key) {
-                    continue;
-                }
-
-                binding = sbrec_port_binding_insert(ctx->ovnsb_txn);
-                sbrec_port_binding_set_logical_port(binding, lport->name);
-                sbrec_port_binding_set_mac(binding,
-                                           (const char **) lport->macs,
-                                           lport->n_macs);
-                if (lport->parent_name && lport->n_tag > 0) {
-                    sbrec_port_binding_set_parent_port(binding,
-                                                       lport->parent_name);
-                    sbrec_port_binding_set_tag(binding,
-                                               lport->tag, lport->n_tag);
-                }
-
-                sbrec_port_binding_set_tunnel_key(binding, tunnel_key);
-                sbrec_port_binding_set_logical_datapath(binding,
-                                                        *logical_datapath);
-
-                /* Add the tunnel key to the tk_hmap so that we don't try to
-                 * use it for another port.  (We don't want it in the lp_hmap
-                 * because that would just get the Binding record deleted
-                 * later.) */
-                struct port_binding_hash_node *hash_node
-                    = xzalloc(sizeof *hash_node);
-                hash_node->binding = binding;
-                hmap_insert(&tk_hmap, &hash_node->tk_node,
-                            hash_int(binding->tunnel_key, 0));
-            }
+        struct ovn_rule *rule = ovn_rule_find(
+            &rules, od, (!strcmp(sbrule->pipeline, "ingress") ? P_IN : P_OUT),
+            sbrule->table_id, sbrule->priority,
+            sbrule->match, sbrule->actions);
+        if (rule) {
+            ovn_rule_destroy(&rules, rule);
+        } else {
+            sbrec_rule_delete(sbrule);
         }
     }
-
-    struct port_binding_hash_node *hash_node;
-    HMAP_FOR_EACH (hash_node, lp_node, &lp_hmap) {
-        hmap_remove(&lp_hmap, &hash_node->lp_node);
-        sbrec_port_binding_delete(hash_node->binding);
+    struct ovn_rule *rule, *next_rule;
+    HMAP_FOR_EACH_SAFE (rule, next_rule, hmap_node, &rules) {
+        sbrule = sbrec_rule_insert(ctx->ovnsb_txn);
+        sbrec_rule_set_logical_datapath(sbrule, rule->od->sb);
+        sbrec_rule_set_pipeline(sbrule,
+                                rule->pipeline == P_IN ? "ingress" : "egress");
+        sbrec_rule_set_table_id(sbrule, rule->table_id);
+        sbrec_rule_set_priority(sbrule, rule->priority);
+        sbrec_rule_set_match(sbrule, rule->match);
+        sbrec_rule_set_actions(sbrule, rule->actions);
+        ovn_rule_destroy(&rules, rule);
     }
-    hmap_destroy(&lp_hmap);
+    hmap_destroy(&rules);
+
+    /* Push changes to the Multicast_Group table to database. */
+    const struct sbrec_multicast_group *sbmc, *next_sbmc;
+    SBREC_MULTICAST_GROUP_FOR_EACH_SAFE (sbmc, next_sbmc, ctx->ovnsb_idl) {
+        struct ovn_datapath *od = ovn_datapath_from_sbrec(datapaths,
+                                                          sbmc->datapath);
+        if (!od) {
+            sbrec_multicast_group_delete(sbmc);
+            continue;
+        }
 
-    struct port_binding_hash_node *hash_node_next;
-    HMAP_FOR_EACH_SAFE (hash_node, hash_node_next, tk_node, &tk_hmap) {
-        hmap_remove(&tk_hmap, &hash_node->tk_node);
-        free(hash_node);
+        struct multicast_group group = { .name = sbmc->name,
+                                         .key = sbmc->tunnel_key };
+        struct ovn_multicast *mc = ovn_multicast_find(&mcgroups, od, &group);
+        if (mc) {
+            ovn_multicast_update_sbrec(mc, sbmc);
+            ovn_multicast_destroy(&mcgroups, mc);
+        } else {
+            sbrec_multicast_group_delete(sbmc);
+        }
+    }
+    struct ovn_multicast *mc, *next_mc;
+    HMAP_FOR_EACH_SAFE (mc, next_mc, hmap_node, &mcgroups) {
+        sbmc = sbrec_multicast_group_insert(ctx->ovnsb_txn);
+        sbrec_multicast_group_set_datapath(sbmc, mc->datapath->sb);
+        sbrec_multicast_group_set_name(sbmc, mc->group->name);
+        sbrec_multicast_group_set_tunnel_key(sbmc, mc->group->key);
+        ovn_multicast_update_sbrec(mc, sbmc);
+        ovn_multicast_destroy(&mcgroups, mc);
     }
-    hmap_destroy(&tk_hmap);
+    hmap_destroy(&mcgroups);
 }
-
+
 static void
 ovnnb_db_changed(struct northd_context *ctx)
 {
     VLOG_DBG("ovn-nb db contents have changed.");
 
-    set_port_bindings(ctx);
-    build_rule(ctx);
+    struct hmap datapaths, ports;
+    build_datapaths(ctx, &datapaths);
+    build_ports(ctx, &datapaths, &ports);
+    build_rule(ctx, &datapaths, &ports);
 }
 
 /*
@@ -622,48 +878,48 @@ static void
 ovnsb_db_changed(struct northd_context *ctx)
 {
     struct hmap lports_hmap;
-    const struct sbrec_port_binding *binding;
-    const struct nbrec_logical_port *lport;
+    const struct sbrec_port_binding *sb;
+    const struct nbrec_logical_port *nb;
 
     struct lport_hash_node {
         struct hmap_node node;
-        const struct nbrec_logical_port *lport;
+        const struct nbrec_logical_port *nb;
     } *hash_node, *hash_node_next;
 
     VLOG_DBG("Recalculating port up states for ovn-nb db.");
 
     hmap_init(&lports_hmap);
 
-    NBREC_LOGICAL_PORT_FOR_EACH(lport, ctx->ovnnb_idl) {
+    NBREC_LOGICAL_PORT_FOR_EACH(nb, ctx->ovnnb_idl) {
         hash_node = xzalloc(sizeof *hash_node);
-        hash_node->lport = lport;
-        hmap_insert(&lports_hmap, &hash_node->node,
-                hash_string(lport->name, 0));
+        hash_node->nb = nb;
+        hmap_insert(&lports_hmap, &hash_node->node, hash_string(nb->name, 0));
     }
 
-    SBREC_PORT_BINDING_FOR_EACH(binding, ctx->ovnsb_idl) {
-        lport = NULL;
+    SBREC_PORT_BINDING_FOR_EACH(sb, ctx->ovnsb_idl) {
+        nb = NULL;
         HMAP_FOR_EACH_WITH_HASH(hash_node, node,
-                hash_string(binding->logical_port, 0), &lports_hmap) {
-            if (!strcmp(binding->logical_port, hash_node->lport->name)) {
-                lport = hash_node->lport;
+                                hash_string(sb->logical_port, 0),
+                                &lports_hmap) {
+            if (!strcmp(sb->logical_port, hash_node->nb->name)) {
+                nb = hash_node->nb;
                 break;
             }
         }
 
-        if (!lport) {
+        if (!nb) {
             /* The logical port doesn't exist for this port binding.  This can
              * happen under normal circumstances when ovn-northd hasn't gotten
              * around to pruning the Port_Binding yet. */
             continue;
         }
 
-        if (binding->chassis && (!lport->up || !*lport->up)) {
+        if (sb->chassis && (!nb->up || !*nb->up)) {
             bool up = true;
-            nbrec_logical_port_set_up(lport, &up, 1);
-        } else if (!binding->chassis && (!lport->up || *lport->up)) {
+            nbrec_logical_port_set_up(nb, &up, 1);
+        } else if (!sb->chassis && (!nb->up || *nb->up)) {
             bool up = false;
-            nbrec_logical_port_set_up(lport, &up, 1);
+            nbrec_logical_port_set_up(nb, &up, 1);
         }
     }
 
@@ -753,6 +1009,14 @@ parse_options(int argc OVS_UNUSED, char *argv[] OVS_UNUSED)
     free(short_options);
 }
 
+static void
+add_column_noalert(struct ovsdb_idl *idl,
+                   const struct ovsdb_idl_column *column)
+{
+    ovsdb_idl_add_column(idl, column);
+    ovsdb_idl_omit_alert(idl, column);
+}
+
 int
 main(int argc, char *argv[])
 {
@@ -792,28 +1056,35 @@ main(int argc, char *argv[])
     ctx.ovnnb_idl = ovnnb_idl = ovsdb_idl_create(ovnnb_db,
             &nbrec_idl_class, true, true);
 
-    /* There is only a small subset of changes to the ovn-sb db that ovn-northd
-     * has to care about, so we'll enable monitoring those directly. */
     ctx.ovnsb_idl = ovnsb_idl = ovsdb_idl_create(ovnsb_db,
             &sbrec_idl_class, false, true);
+
+    ovsdb_idl_add_table(ovnsb_idl, &sbrec_table_rule);
+    add_column_noalert(ovnsb_idl, &sbrec_rule_col_logical_datapath);
+    add_column_noalert(ovnsb_idl, &sbrec_rule_col_pipeline);
+    add_column_noalert(ovnsb_idl, &sbrec_rule_col_table_id);
+    add_column_noalert(ovnsb_idl, &sbrec_rule_col_priority);
+    add_column_noalert(ovnsb_idl, &sbrec_rule_col_match);
+    add_column_noalert(ovnsb_idl, &sbrec_rule_col_actions);
+
+    ovsdb_idl_add_table(ovnsb_idl, &sbrec_table_multicast_group);
+    add_column_noalert(ovnsb_idl, &sbrec_multicast_group_col_datapath);
+    add_column_noalert(ovnsb_idl, &sbrec_multicast_group_col_tunnel_key);
+    add_column_noalert(ovnsb_idl, &sbrec_multicast_group_col_name);
+    add_column_noalert(ovnsb_idl, &sbrec_multicast_group_col_ports);
+
+    ovsdb_idl_add_table(ovnsb_idl, &sbrec_table_datapath_binding);
+    add_column_noalert(ovnsb_idl, &sbrec_datapath_binding_col_tunnel_key);
+    add_column_noalert(ovnsb_idl, &sbrec_datapath_binding_col_external_ids);
+
     ovsdb_idl_add_table(ovnsb_idl, &sbrec_table_port_binding);
-    ovsdb_idl_add_column(ovnsb_idl, &sbrec_port_binding_col_logical_port);
+    add_column_noalert(ovnsb_idl, &sbrec_port_binding_col_datapath);
+    add_column_noalert(ovnsb_idl, &sbrec_port_binding_col_logical_port);
+    add_column_noalert(ovnsb_idl, &sbrec_port_binding_col_tunnel_key);
+    add_column_noalert(ovnsb_idl, &sbrec_port_binding_col_parent_port);
+    add_column_noalert(ovnsb_idl, &sbrec_port_binding_col_tag);
     ovsdb_idl_add_column(ovnsb_idl, &sbrec_port_binding_col_chassis);
-    ovsdb_idl_add_column(ovnsb_idl, &sbrec_port_binding_col_mac);
-    ovsdb_idl_add_column(ovnsb_idl, &sbrec_port_binding_col_tag);
-    ovsdb_idl_add_column(ovnsb_idl, &sbrec_port_binding_col_parent_port);
-    ovsdb_idl_add_column(ovnsb_idl, &sbrec_port_binding_col_logical_datapath);
-    ovsdb_idl_add_column(ovnsb_idl, &sbrec_port_binding_col_tunnel_key);
-    ovsdb_idl_add_column(ovnsb_idl, &sbrec_rule_col_logical_datapath);
-    ovsdb_idl_omit_alert(ovnsb_idl, &sbrec_rule_col_logical_datapath);
-    ovsdb_idl_add_column(ovnsb_idl, &sbrec_rule_col_table_id);
-    ovsdb_idl_omit_alert(ovnsb_idl, &sbrec_rule_col_table_id);
-    ovsdb_idl_add_column(ovnsb_idl, &sbrec_rule_col_priority);
-    ovsdb_idl_omit_alert(ovnsb_idl, &sbrec_rule_col_priority);
-    ovsdb_idl_add_column(ovnsb_idl, &sbrec_rule_col_match);
-    ovsdb_idl_omit_alert(ovnsb_idl, &sbrec_rule_col_match);
-    ovsdb_idl_add_column(ovnsb_idl, &sbrec_rule_col_actions);
-    ovsdb_idl_omit_alert(ovnsb_idl, &sbrec_rule_col_actions);
+    add_column_noalert(ovnsb_idl, &sbrec_port_binding_col_mac);
 
     /*
      * The loop here just runs the IDL in a loop waiting for the seqno to
diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
index 0334d82..0af96a0 100644
--- a/ovn/ovn-architecture.7.xml
+++ b/ovn/ovn-architecture.7.xml
@@ -98,7 +98,7 @@
         OVN/CMS Plugin.  The database schema is meant to be ``impedance
         matched'' with the concepts used in a CMS, so that it directly supports
         notions of logical switches, routers, ACLs, and so on.  See
-        <code>ovs-nb</code>(5) for details.
+        <code>ovn-nb</code>(5) for details.
       </p>
 
       <p>
@@ -343,22 +343,21 @@
     </li>
 
     <li>
-      <code>ovn-northd</code> receives the OVN Northbound database update.
-      In turn, it makes the corresponding updates to the OVN Southbound
-      database, by adding rows to the OVN Southbound database
-      <code>Rule</code> table to reflect the new port, e.g. add a
-      flow to recognize that packets destined to the new port's MAC
-      address should be delivered to it, and update the flow that
-      delivers broadcast and multicast packets to include the new port.
-      It also creates a record in the <code>Binding</code> table and
+      <code>ovn-northd</code> receives the OVN Northbound database update.  In
+      turn, it makes the corresponding updates to the OVN Southbound database,
+      by adding rows to the OVN Southbound database <code>Rule</code> table to
+      reflect the new port, e.g. add a flow to recognize that packets destined
+      to the new port's MAC address should be delivered to it, and update the
+      flow that delivers broadcast and multicast packets to include the new
+      port.  It also creates a record in the <code>Binding</code> table and
       populates all its columns except the column that identifies the
       <code>chassis</code>.
     </li>
 
     <li>
       On every hypervisor, <code>ovn-controller</code> receives the
-      <code>Rule</code> table updates that <code>ovn-northd</code> made
-      in the previous step.  As long as the VM that owns the VIF is powered off,
+      <code>Rule</code> table updates that <code>ovn-northd</code> made in the
+      previous step.  As long as the VM that owns the VIF is powered off,
       <code>ovn-controller</code> cannot do much; it cannot, for example,
       arrange to send packets to or receive packets from the VIF, because the
       VIF does not actually exist anywhere.
@@ -404,8 +403,8 @@
       <code>Binding</code> table.  This provides <code>ovn-controller</code>
       the physical location of the logical port, so each instance updates the
       OpenFlow tables of its switch (based on logical datapath flows in the OVN
-      DB <code>Rule</code> table) so that packets to and from the VIF can
-      be properly handled via tunnels.
+      DB <code>Rule</code> table) so that packets to and from the VIF can be
+      properly handled via tunnels.
     </li>
 
     <li>
@@ -442,17 +441,16 @@
 
     <li>
       <code>ovn-northd</code> receives the OVN Northbound update and in turn
-      updates the OVN Southbound database accordingly, by removing or
-      updating the rows from the OVN Southbound database
-      <code>Rule</code> table and <code>Binding</code> table that
-      were related to the now-destroyed VIF.
+      updates the OVN Southbound database accordingly, by removing or updating
+      the rows from the OVN Southbound database <code>Rule</code> table and
+      <code>Binding</code> table that were related to the now-destroyed VIF.
     </li>
 
     <li>
       On every hypervisor, <code>ovn-controller</code> receives the
-      <code>Rule</code> table updates that <code>ovn-northd</code> made
-      in the previous step.  <code>ovn-controller</code> updates OpenFlow tables
-      to reflect the update, although there may not be much to do, since the VIF
+      <code>Rule</code> table updates that <code>ovn-northd</code> made in the
+      previous step.  <code>ovn-controller</code> updates OpenFlow tables to
+      reflect the update, although there may not be much to do, since the VIF
       had already become unreachable when it was removed from the
       <code>Binding</code> table in a previous step.
     </li>
@@ -538,13 +536,12 @@
     </li>
 
     <li>
-      <code>ovn-northd</code> receives the OVN Northbound database update.
-      In turn, it makes the corresponding updates to the OVN Southbound
-      database, by adding rows to the OVN Southbound database's
-      <code>Rule</code> table to reflect the new port and also by
-      creating a new row in the <code>Binding</code> table and
-      populating all its columns except the column that identifies the
-      <code>chassis</code>.
+      <code>ovn-northd</code> receives the OVN Northbound database update.  In
+      turn, it makes the corresponding updates to the OVN Southbound database,
+      by adding rows to the OVN Southbound database's <code>Rule</code> table
+      to reflect the new port and also by creating a new row in the
+      <code>Binding</code> table and populating all its columns except the
+      column that identifies the <code>chassis</code>.
     </li>
 
     <li>
@@ -580,11 +577,10 @@
 
     <li>
       <code>ovn-northd</code> receives the OVN Northbound update and in turn
-      updates the OVN Southbound database accordingly, by removing or
-      updating the rows from the OVN Southbound database
-      <code>Rule</code> table that were related to the now-destroyed
-      CIF.  It also deletes the row in the <code>Binding</code> table
-      for that CIF.
+      updates the OVN Southbound database accordingly, by removing or updating
+      the rows from the OVN Southbound database <code>Rule</code> table that
+      were related to the now-destroyed CIF.  It also deletes the row in the
+      <code>Binding</code> table for that CIF.
     </li>
 
     <li>
@@ -595,57 +591,304 @@
     </li>
   </ol>
 
-  <h1>Design Decisions</h1>
+  <h2>Life Cycle of a Packet</h2>
 
-  <h2>Supported Tunnel Encapsulations</h2>
   <p>
-    For connecting hypervisors to each other, the only supported tunnel
-    encapsulations are Geneve and STT. Hypervisors may use VXLAN to
-    connect to gateways. We have limited support to these encapsulations
-    for the following reasons:
+    This section describes how a packet travels from ingress into OVN from one
+    virtual machine or container to another.  This description focuses on the
+    physical treatment of a packet; for a description of the logical life cycle
+    of a packet, please refer to the <code>Rule</code> table in
+    <code>ovn-sb</code>(5).
   </p>
 
-  <ul>
+  <p>
+    This section mentions several data and metadata fields, for clarity
+    summarized here:
+  </p>
+
+  <dl>
+    <dt>tunnel key</dt>
+    <dd>
+      When OVN encapsulates a packet in Geneve or another tunnel, it attaches
+      extra data to it to allow the receiving OVN instance to process it
+      correctly.  This takes different forms depending on the particular
+      encapsulation, but in each case we refer to it here as the ``tunnel
+      key.''  See <code>Tunnel Encapsulations</code>, below, for details.
+    </dd>
+
+    <dt>logical datapath field</dt>
+    <dd>
+      A field that denotes the logical datapath through which a packet is being
+      processed.  OVN uses the field that OpenFlow 1.1+ simply (and
+      confusingly) calls ``metadata'' to store the logical datapath.  (This
+      field is passed across tunnels as part of the tunnel key.)
+    </dd>
+
+    <dt>logical input port field</dt>
+    <dd>
+      A field that denotes the logical port from which the packet entered the
+      logical datapath.  OVN stores this in a Nicira extension register.  (This
+      field is passed across tunnels as part of the tunnel key.)
+    </dd>
+
+    <dt>logical output port field</dt>
+    <dd>
+      A field that denotes the logical port from which the packet will leave
+      the logical datapath.  This is initialized to 0 at the beginning of the
+      logical ingress pipeline.  OVN stores this in a Nicira extension
+      register.  (This field is passed across tunnels as part of the tunnel
+      key.)
+    </dd>
+
+    <dt>VLAN ID</dt>
+    <dd>
+      The VLAN ID is used as an interface between OVN and containers nested
+      inside a VM (see <code>Life Cycle of a container interface inside a
+      VM</code>, above, for more information).
+    </dd>
+  </dl>
+
+  <p>
+    Initially, a VM or container on the ingress hypervisor sends a packet on a
+    port attached to the OVN integration bridge.  Then:
+  </p>
+
+  <ol>
     <li>
       <p>
-        They support large amounts of metadata.  In addition to
-        specifying the logical switch, we will likely want to indicate
-        the logical source port and where we are in the logical
-        pipeline.  Geneve supports a 24-bit VNI field and TLV-based
-        extensions.  The header of STT includes a 64-bit context id.
+        OpenFlow table 0 performs physical-to-logical translation.  It matches
+        the packet's ingress port.  Its actions annotate the packet with
+        logical metadata, by setting the logical datapath field to identify the
+        logical datapath that the packet is traversing and the logical input
+        port field to identify the ingress port.  Then it resubmits to table 16
+        to enter the logical ingress pipeline.
+      </p>
+
+      <p>
+        Packets that originate from a container nested within a VM are treated
+        in a slightly different way.  The originating container can be
+        distinguished based on the VLAN ID, so the physical-to-logical
+        translation flows additionally match on VLAN ID and the actions strip
+        the VLAN header.  Following this step, OVN treats packets from
+        containers just like any other packets.
+      </p>
+
+      <p>
+        Table 0 also processes packets that arrive from other hypervisors.  It
+        distinguishes them from other packets by ingress port, which is a
+        tunnel.  As with packets just entering the OVN pipeline, the actions
+        annotate these packets with logical datapath and logical ingress port
+        metadata.  In addition, the actions set the logical output port field,
+        which is available because in OVN tunneling occurs after the logical
+        output port is known.  These three pieces of information are obtained
+        from the tunnel encapsulation metadata (see <code>Tunnel
+        Encapsulations</code> for encoding details).  Then the actions resubmit
+        to table 33 to enter the logical egress pipeline.
       </p>
     </li>
 
     <li>
       <p>
-        They use randomized UDP or TCP source ports that allows
-        efficient distribution among multiple paths in environments that
-        use ECMP in their underlay.
+        OpenFlow tables 16 through 31 execute the logical ingress pipeline from
+        the <code>Rule</code> table in the OVN Southbound database.  These
+        tables are expressed entirely in terms of logical concepts like logical
+        ports and logical datapaths.  A big part of
+        <code>ovn-controller</code>'s job is to translate them into equivalent
+        OpenFlow (in particular it translates the table numbers:
+        <code>Rule</code> tables 0 through 15 become OpenFlow tables 16 through
+        31).  For a given packet, the logical ingress pipeline eventually
+        executes zero or more <code>output</code> actions:
       </p>
+
+      <ul>
+        <li>
+          If the pipeline executes no <code>output</code> actions at all, the
+          packet is effectively dropped.
+        </li>
+
+        <li>
+          Most commonly, the pipeline executes one <code>output</code> action,
+          which <code>ovn-controller</code> implements by resubmitting the
+          packet to table 32.
+        </li>
+
+        <li>
+          If the pipeline can execute more than one <code>output</code> action,
+          then each one is separately resubmitted to table 32.  This can be
+          used to send multiple copies to the packet to multiple ports.  (If
+          the packet was not modified between the <code>output</code> actions,
+          and some of the copies are destined to the same hypervisor, then
+          using a logical multicast output port would save bandwidth between
+          hypervisors.)
+        </li>
+      </ul>
     </li>
 
     <li>
       <p>
-        NICs are available that accelerate encapsulation and decapsulation.
+        OpenFlow tables 32 through 47 implement the <code>output</code> action
+        in the the logical ingress pipeline.  Specifically, table 32 handles
+        packets to remote hypervisors, table 33 handles packets to the local
+        hypervisor, and table 34 discards packets whose logical ingress and
+        egress port are the same.
+      </p>
+
+      <p>
+        Each flow in table 32 matches on a logical output port for unicast or
+        multicast logical ports that include a logical port on a remote
+        hypervisor.  Each flow's actions implement sending a packet to the port
+        it matches.  For unicast logical output ports on remote hypervisors,
+        the actions set the tunnel key to the correct value, then send the
+        packet on the tunnel port to the correct hypervisor.  (When the remote
+        hypervisor receives the packet, table 0 there will recognize it as a
+        tunneled packet and pass it along to table 33.)  For multicast logical
+        output ports, the actions send one copy of the packet to each remote
+        hypervisor, in the same way as for unicast destinations.  If a
+        multicast group includes a logical port or ports on the local
+        hypervisor, then its actions also resubmit to table 33.  Table 32 also
+        includes a fallback flow that resubmits to table 33 if there is no
+        other match.
+      </p>
+
+      <p>
+        Flows in table 33 resemble those in table 32 but for logical ports that
+        reside locally rather than remotely.  For unicast logical output ports
+        on the local hypervisor, the actions just resubmit to table 34.  For
+        multicast output ports that include one or more logical ports on the
+        local hypervisor, for each such logical port <var>P</var>, the actions
+        change the logical output port to <var>P</var>, then resubmit to table
+        34.
+      </p>
+
+      <p>
+        Table 34 matches and drops packets for which the logical input and
+        output ports are the same.  It resubmits other packets to table 48.
       </p>
     </li>
+
+    <li>
+      <p>
+        OpenFlow tables 48 through 63 execute the logical egress pipeline from
+        the <code>Rule</code> table in the OVN Southbound database.  The
+        egress pipeline can perform a final stage of validation before packet
+        delivery.  Eventually, it may execute an <code>output</code> action,
+        which <code>ovn-controller</code> implements by resubmitting to table
+        64.  A packet for which the pipeline never executes <code>output</code>
+        is effectively dropped (although it may have been transmitted through a
+        tunnel across a physical network).
+      </p>
+
+      <p>
+        The egress pipeline cannot change the logical output port or cause
+        further tunneling.
+      </p>
+    </li>
+
+    <li>
+      <p>
+        OpenFlow table 64 performs logical-to-physical translation, the
+        opposite of table 0.  It matches the packet's logical egress port.  Its
+        actions output the packet to the port attached to the OVN integration
+        bridge that represents that logical port.  If the logical egress port
+        is a container nested with a VM, then before sending the packet the
+        actions push on a VLAN header with an appropriate VLAN ID.
+      </p>
+    </li>
+  </ol>
+
+  <h1>Design Decisions</h1>
+
+  <h2>Tunnel Encapsulations</h2>
+
+  <p>
+    OVN annotates logical network packets that it sends from one hypervisor to
+    another with the following three pieces of metadata, which are encoded in
+    an encapsulation-specific fashion:
+  </p>
+
+  <ul>
+    <li>
+      24-bit logical datapath identifier, from the <code>tunnel_key</code>
+      column in the OVN Southbound <code>Datapath_Binding</code> table.
+    </li>
+
+    <li>
+      15-bit logical ingress port identifier, from the <code>tunnel_key</code>
+      column in the OVN Southbound <code>Port_Binding</code> table.
+    </li>
+
+    <li>
+      16-bit logical egress port identifier, from the <code>Port_Binding</code>
+      <code>tunnel_key</code> column in the OVN Southbound
+      <code>Port_Binding</code> (as for the logical ingress port) or
+      <code>Multicast_Group</code> table.
+    </li>
+  </ul>
+
+  <p>
+    For hypervisor-to-hypervisor traffic, OVN supports only Geneve and STT
+    encapsulations, for the following reasons:
+  </p>
+
+  <ul>
+    <li>
+      Only STT and Geneve support the large amounts of metadata (over 32 bits
+      per packet) that OVN uses (as described above).
+    </li>
+
+    <li>
+      STT and Geneve use randomized UDP or TCP source ports that allows
+      efficient distribution among multiple paths in environments that use ECMP
+      in their underlay.
+    </li>
+
+    <li>
+      NICs are available to offload STT and Geneve encapsulation and
+      decapsulation.
+    </li>
   </ul>
 
   <p>
-    Due to its flexibility, the preferred encapsulation between
-    hypervisors is Geneve.  Some environments may want to use STT for
-    performance reasons until the NICs they use support hardware offload
-    of Geneve.
+    Due to its flexibility, the preferred encapsulation between hypervisors is
+    Geneve.  For Geneve encapsulation, OVN transmits the logical datapath
+    identifier in the Geneve VNI.
+
+    <!-- Keep the following in sync with ovn/controller/physical.h. -->
+    OVN transmits the logical ingress and logical egress ports in a TLV with
+    class 0xffff, type 0, and a 32-bit value encoded as follows, from MSB to
+    LSB:
+  </p>
+
+  <diagram>
+    <header name="">
+      <bits name="rsv" above="1" below="0" width=".25"/>
+      <bits name="ingress port" above="15" width=".75"/>
+      <bits name="egress port" above="16" width=".75"/>
+    </header>
+  </diagram>
+
+  <p>
+    Environments whose NICs lack Geneve offload may prefer STT encapsulation
+    for performance reasons.  For STT encapsulation, OVN encodes all three
+    pieces of logical metadata in the STT 64-bit tunnel ID as follows, from MSB
+    to LSB:
   </p>
 
+  <diagram>
+    <header name="">
+      <bits name="reserved" above="9" below="0" width=".5"/>
+      <bits name="ingress port" above="15" width=".75"/>
+      <bits name="egress port" above="16" width=".75"/>
+      <bits name="datapath" above="24" width="1.25"/>
+    </header>
+  </diagram>
+
   <p>
-    For connecting to gateways, the only supported tunnel encapsulations
-    are VXLAN, Geneve, and STT.  While support for Geneve is becoming
-    available for TOR (top-of-rack) switches, VXLAN is far more common.
-    Currently, gateways have a feature set that matches the capabilities
-    as defined by the VTEP schema, so fewer bits of metadata are
-    necessary.  In the future, gateways that do not support
-    encapsulations with large amounts of metadata may continue to have a
-    reduced feature set.
+    For connecting to gateways, in addition to Geneve and STT, OVN supports
+    VXLAN, because only VXLAN support is common on top-of-rack (ToR) switch.
+    Currently, gateways have a feature set that matches the capabilities as
+    defined by the VTEP schema, so fewer bits of metadata are necessary.  In
+    the future, gateways that do not support encapsulations with large amounts
+    of metadata may continue to have a reduced feature set.
   </p>
 </manpage>
diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
index d953fa5..5b368a4 100644
--- a/ovn/ovn-nb.xml
+++ b/ovn/ovn-nb.xml
@@ -201,15 +201,14 @@
     </column>
 
     <column name="match">
-      The packets that the ACL should match, in the same expression
-      language used for the <ref column="match" table="Rule"
-      db="OVN_Southbound"/> column in the OVN Southbound database's <ref
-      table="Rule" db="OVN_Southbound"/> table.  Match
-      <code>inport</code> and <code>outport</code> against names of
-      logical ports within <ref column="lswitch"/> to implement ingress
-      and egress ACLs, respectively.  In logical switches connected to
-      logical routers, the special port name <code>ROUTER</code> refers
-      to the logical router port.
+      The packets that the ACL should match, in the same expression language
+      used for the <ref column="match" table="Rule" db="OVN_Southbound"/>
+      column in the OVN Southbound database's <ref table="Rule"
+      db="OVN_Southbound"/> table.  Match <code>inport</code> and
+      <code>outport</code> against names of logical ports within <ref
+      column="lswitch"/> to implement ingress and egress ACLs, respectively.
+      In logical switches connected to logical routers, the special port name
+      <code>ROUTER</code> refers to the logical router port.
     </column>
 
     <column name="action">
diff --git a/ovn/ovn-sb.ovsschema b/ovn/ovn-sb.ovsschema
index add908b..9c2e553 100644
--- a/ovn/ovn-sb.ovsschema
+++ b/ovn/ovn-sb.ovsschema
@@ -34,24 +34,56 @@
                                               "max": "unlimited"}}}},
         "Rule": {
             "columns": {
-                "logical_datapath": {"type": "uuid"},
+                "logical_datapath": {"type": {"key": {"type": "uuid",
+                                                      "refTable": "Datapath_Binding"}}},
+                "pipeline": {"type": {"key": {"type": "string",
+                                      "enum": ["set", ["ingress",
+                                                       "egress"]]}}},
                 "table_id": {"type": {"key": {"type": "integer",
                                               "minInteger": 0,
-                                              "maxInteger": 31}}},
+                                              "maxInteger": 15}}},
                 "priority": {"type": {"key": {"type": "integer",
                                               "minInteger": 0,
                                               "maxInteger": 65535}}},
                 "match": {"type": "string"},
                 "actions": {"type": "string"}},
             "isRoot": true},
+        "Multicast_Group": {
+            "columns": {
+                "datapath": {"type": {"key": {"type": "uuid",
+                                              "refTable": "Datapath_Binding"}}},
+                "name": {"type": "string"},
+                "tunnel_key": {
+                    "type": {"key": {"type": "integer",
+                                     "minInteger": 32768,
+                                     "maxInteger": 65535}}},
+                "ports": {"type": {"key": {"type": "uuid",
+                                           "refTable": "Port_Binding",
+                                           "refType": "weak"},
+                                   "min": 1, "max": "unlimited"}}},
+            "indexes": [["datapath", "tunnel_key"],
+                        ["datapath", "name"]],
+            "isRoot": true},
+        "Datapath_Binding": {
+            "columns": {
+                "tunnel_key": {
+                     "type": {"key": {"type": "integer",
+                                      "minInteger": 1,
+                                      "maxInteger": 16777215}}},
+                "external_ids": {
+                    "type": {"key": "string", "value": "string",
+                             "min": 0, "max": "unlimited"}}},
+            "indexes": [["tunnel_key"]],
+            "isRoot": true},
         "Port_Binding": {
             "columns": {
-                "logical_datapath": {"type": "uuid"},
                 "logical_port": {"type": "string"},
+                "datapath": {"type": {"key": {"type": "uuid",
+                                              "refTable": "Datapath_Binding"}}},
                 "tunnel_key": {
                      "type": {"key": {"type": "integer",
                                       "minInteger": 1,
-                                      "maxInteger": 65535}}},
+                                      "maxInteger": 32767}}},
                 "parent_port": {"type": {"key": "string", "min": 0, "max": 1}},
                 "tag": {
                      "type": {"key": {"type": "integer",
@@ -65,6 +97,6 @@
                 "mac": {"type": {"key": "string",
                                  "min": 0,
                                  "max": "unlimited"}}},
-            "indexes": [["logical_port"], ["tunnel_key"]],
+            "indexes": [["datapath", "tunnel_key"], ["logical_port"]],
             "isRoot": true}},
     "version": "1.0.0"}
diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
index 2f2a55e..982eba7 100644
--- a/ovn/ovn-sb.xml
+++ b/ovn/ovn-sb.xml
@@ -74,15 +74,16 @@
   </p>
 
   <p>
-    The <ref table="Rule"/> table is currently the only LN table.
+    <ref table="Rule"/> and <ref table="Multicast_Group"/> contain LN data.
   </p>
 
   <h3>Bindings data</h3>
 
   <p>
-    The Binding tables contain the current placement of logical components
-    (such as VMs and VIFs) onto chassis and the bindings between logical ports
-    and MACs.
+    Bindings data link logical and physical components.  They show the current
+    placement of logical components (such as VMs and VIFs) onto chassis, and
+    map logical entities to the values that represent them in tunnel
+    encapsulations.
   </p>
 
   <p>
@@ -98,9 +99,32 @@
   </p>
 
   <p>
-    The <ref table="Port_Binding"/> table is currently the only binding data.
+    The <ref table="Port_Binding"/> and <ref table="Datapath_Binding"/> tables
+    contain binding data.
   </p>
 
+  <h2>Common Columns</h2>
+
+  <p>
+    Some tables contain a special column named <code>external_ids</code>.  This
+    column has the same form and purpose each place that it appears, so we
+    describe it here to save space later.
+  </p>
+
+  <dl>
+    <dt><code>external_ids</code>: map of string-string pairs</dt>
+    <dd>
+      Key-value pairs for use by the software that manages the OVN Southbound
+      database rather than by <code>ovn-controller</code>.  In particular,
+      <code>ovn-northd</code> can use key-value pairs in this column to relate
+      entities in the southbound database to higher-level entities (such as
+      entities in the OVN Northbound database).  Individual key-value pairs in
+      this column may be documented in some cases to aid in understanding and
+      troubleshooting, but the reader should not mistake such documentation as
+      comprehensive.
+    </dd>
+  </dl>
+
   <table name="Chassis" title="Physical Network Hypervisor and Gateway Information">
     <p>
       Each row in this table represents a hypervisor or gateway (a chassis) in
@@ -198,7 +222,7 @@
     </column>
   </table>
 
-  <table name="Rule" title="Logical Network Rule">
+  <table name="Rule" title="Logical Network Flows">
     <p>
       Each row in this table represents one logical flow.  The cloud management
       system, via its OVN integration, populates this table with logical flows
@@ -223,14 +247,111 @@
       The default action when no flow matches is to drop packets.
     </p>
 
+    <p><em>Logical Life Cycle of a Packet</em></p>
+
+    <p>
+      This following description focuses on the life cycle of a packet through
+      a logical datapath, ignoring physical details of the implementation.
+      Please refer to <em>Life Cycle of a Packet</em> in
+      <code>ovn-architecture</code>(7) for the physical information.
+    </p>
+
+    <p>
+      The description here is written as if OVN itself executes these steps,
+      but in fact OVN (that is, <code>ovn-controller</code>) programs Open
+      vSwitch, via OpenFlow and OVSDB, to execute them on its behalf.
+    </p>
+
+    <p>
+      At a high level, OVN passes each packet through the logical datapath's
+      logical ingress pipeline, which may output the packet to one or more
+      logical port or logical multicast groups.  For each such logical output
+      port, OVN passes the packet through the datapath's logical egress
+      pipeline, which may either drop the packet or deliver it to the
+      destination.  Between the two pipelines, outputs to logical multicast
+      groups are expanded into logical ports, so that the egress pipeline only
+      processes a single logical output port at a time.  Between the two
+      pipelines is also where, when necessary, OVN encapsulates a packet in a
+      tunnel (or tunnels) to transmit to remote hypervisors.
+    </p>
+
+    <p>
+      In more detail, to start, OVN searches the <ref table="Rule"/> table for
+      a row with correct <ref column="logical_datapath"/>, a <ref
+      column="pipeline"/> of <code>ingress</code>, a <ref column="table_id"/>
+      of 0, and a <ref column="match"/> that is true for the packet.  If none
+      is found, OVN drops the packet.  If OVN finds more than one, it chooses
+      the match with the highest <ref column="priority"/>.  Then OVN executes
+      each of the actions specified in the row's <ref table="actions"/> column,
+      in the order specified.  Some actions, such as those to modify packet
+      headers, require no further details.  The <code>next</code> and
+      <code>output</code> actions are special.
+    </p>
+
+    <p>
+      The <code>next</code> action causes the above process to be repeated
+      recursively, except that OVN searches for <ref column="table_id"/> of 1
+      instead of 0.  Similarly, any <code>next</code> action in a row found in
+      that table would cause a further search for a <ref column="table_id"/> of
+      2, and so on.  When recursive processing completes, flow control returns
+      to the action following <code>next</code>.
+    </p>
+
+    <p>
+      The <code>output</code> action also introduces recursion.  Its effect
+      depends on the current value of the <code>outport</code> field.  Suppose
+      <code>outport</code> designates a logical port.  First, OVN compares
+      <code>inport</code> to <code>outport</code>; if they are equal, it treats
+      the <code>output</code> as a no-op.  In the common case, where they are
+      different, the packet enters the egress pipeline.  This transition to the
+      egress pipeline discards register data (<code>reg0</code>
+      ... <code>reg5</code>).
+    </p>
+
+    <p>
+      To execute the egress pipeline, OVN again searches the <ref
+      table="Rule"/> table for a row with correct <ref
+      column="logical_datapath"/>, a <ref column="table_id"/> of 0, a <ref
+      column="match"/> that is true for the packet, but now looking for a <ref
+      column="pipeline"/> of <code>egress</code>.  If no matching row is found,
+      the output becomes a no-op.  Otherwise, OVN executes the actions for the
+      matching flow (which is chosen from multiple, if necessary, as already
+      described).
+    </p>
+
+    <p>
+      In the <code>egress</code> pipeline, the <code>next</code> action acts as
+      already described, except that it, of course, searches for
+      <code>egress</code> flows.  The <code>output</code> action, however, now
+      directly outputs the packet to the output port (which is now fixed,
+      because <code>outport</code> is read-only within the egress pipeline).
+    </p>
+
+    <p>
+      The description earlier assumed that <code>outport</code> referred to a
+      logical port.  If it instead designates a logical multicast group, then
+      the description above still applies, with the addition of fan-out from
+      the logical multicast group to each logical port in the group.  For each
+      member of the group, OVN executes the logical pipeline as described, with
+      the logical output port replaced by the group member.
+    </p>
+
     <column name="logical_datapath">
       The logical datapath to which the logical flow belongs.  A logical
       datapath implements a logical pipeline among the ports in the <ref
-      table="Port_Binding"/> table associated with it.  (No table represents a
-      logical datapath.)  In practice, the pipeline in a given logical datapath
-      implements either a logical switch or a logical router, and
-      <code>ovn-northd</code> reuses the UUIDs for those logical entities from
-      the <code>OVN_Northbound</code> for logical datapaths.
+      table="Port_Binding"/> table associated with it.  In practice, the
+      pipeline in a given logical datapath implements either a logical switch
+      or a logical router, and <code>ovn-northd</code> reuses the UUIDs for
+      those logical entities from the <code>OVN_Northbound</code> for logical
+      datapaths.
+    </column>
+
+    <column name="pipeline">
+      <p>
+        The primary flows used for deciding on a packet's destination are the
+        <code>ingress</code> flows.  The <code>egress</code> flows implement
+        ACLs.  See <em>Logical Life Cycle of a Packet</em>, above, for details.
+      </p>
     </column>
 
     <column name="table_id">
@@ -449,11 +570,7 @@
 
       <p>
         String constants have the same syntax as quoted strings in JSON (thus,
-        they are Unicode strings).  String constants are used for naming
-        logical ports.  Thus, the useful values are <ref
-        column="logical_port"/> names from the <ref column="Port_Binding"/> and
-        <ref column="Gateway"/> tables in a logical flow's <ref
-        column="logical_datapath"/>.
+        they are Unicode strings).
       </p>
 
       <p>
@@ -524,10 +641,21 @@
 
       <p><em>Symbols</em></p>
 
+      <p>
+        Most of the symbols below have integer type.  Only <code>inport</code>
+        and <code>outport</code> have string type.  <code>inport</code> names a
+        logical port.  Thus, its value is a <ref column="logical_port"/> names
+        from the <ref table="Port_Binding"/> and <ref table="Gateway"/> tables
+        in a logical flow's <ref column="logical_datapath"/>.
+        <code>outport</code> may name a logical port, as <code>inport</code>.
+        It may also name a logical multicast group defined in the <ref
+        table="Multicast_Group"/> table.
+      </p>
+
       <ul>
         <li>
-          <code>metadata</code> <code>reg0</code> ... <code>reg7</code>
-          <code>xreg0</code> ... <code>xreg3</code>
+          <code>reg0</code>...<code>reg5</code>
+          <code>xreg0</code>...<code>xreg2</code>
         </li>
         <li><code>inport</code> <code>outport</code> <code>queue</code></li>
         <li><code>eth.src</code> <code>eth.dst</code> <code>eth.type</code></li>
@@ -562,17 +690,32 @@
       </p>
 
       <p>
-        The following actions will be initially supported:
+	The following actions are defined:
       </p>
 
       <dl>
         <dt><code>output;</code></dt>
         <dd>
-          Outputs the packet to the logical port current designated by
-          <code>outport</code>.  Output to the ingress port is implicitly
-          dropped, that is, <code>output</code> becomes a no-op if
-          <code>outport</code> == <code>inport</code>.
-        </dd>
+          <p>
+	    In an <code>ingress</code> flow, this action executes the
+	    <code>egress</code> pipeline as a subroutine.  If
+	    <code>outport</code> names a logical port, the egress pipeline
+	    executes once; if it is a multicast group, the egress pipeline runs
+	    once for each logical port in the group.
+          </p>
+
+          <p>
+            In an <code>egress</code> flow, this action performs the actual
+            output to the <code>outport</code> logical port.  (In the egress
+            pipeline, <code>outport</code> never names a multicast group.)
+          </p>
+
+          <p>
+            Output to the input port is implicitly dropped, that is,
+            <code>output</code> becomes a no-op if <code>outport</code> ==
+            <code>inport</code>.
+          </p>
+	</dd>
 
         <dt><code>next;</code></dt>
         <dd>
@@ -581,21 +724,32 @@
 
         <dt><code><var>field</var> = <var>constant</var>;</code></dt>
         <dd>
-          Sets data or metadata field <var>field</var> to constant value
-          <var>constant</var>, e.g. <code>outport = "vif0";</code> to set the
-          logical output port.  Assigning to a field with prerequisites
-          implicitly adds those prerequisites to <ref column="match"/>; thus,
-          for example, a flow that sets <code>tcp.dst</code> applies only to
-          TCP flows, regardless of whether its <ref column="match"/> mentions
-          any TCP field.  To set only a subset of bits in a field,
-          <var>field</var> may be a subfield or <var>constant</var> may be
-          masked, e.g. <code>vlan.pcp[2] = 1;</code> and <code>vlan.pcp =
-          4/4;</code> both set the most sigificant bit of the VLAN PCP.  Not
-          all fields are modifiable (e.g. <code>eth.type</code> and
-          <code>ip.proto</code> are read-only), and not all modifiable fields
-          may be partially modified (e.g. <code>ip.ttl</code> must assigned as
-          a whole).
-        </dd>
+          <p>
+	    Sets data or metadata field <var>field</var> to constant value
+	    <var>constant</var>, e.g. <code>outport = "vif0";</code> to set the
+	    logical output port.  To set only a subset of bits in a field,
+	    specify a subfield for <var>field</var> or a masked
+	    <var>constant</var>, e.g. one may use <code>vlan.pcp[2] = 1;</code>
+	    or <code>vlan.pcp = 4/4;</code> to set the most sigificant bit of
+	    the VLAN PCP.
+          </p>
+
+          <p>
+            Assigning to a field with prerequisites implicitly adds those
+            prerequisites to <ref column="match"/>; thus, for example, a flow
+            that sets <code>tcp.dst</code> applies only to TCP flows,
+            regardless of whether its <ref column="match"/> mentions any TCP
+            field.
+          </p>
+
+          <p>
+            Not all fields are modifiable (e.g. <code>eth.type</code> and
+            <code>ip.proto</code> are read-only), and not all modifiable fields
+            may be partially modified (e.g. <code>ip.ttl</code> must assigned
+            as a whole).  The <code>outport</code> field is modifiable for an
+            <code>ingress</code> flow but not an <code>egress</code> flow.
+          </p>
+	</dd>
       </dl>
 
       <p>
@@ -628,6 +782,77 @@
     </column>
   </table>
 
+  <table name="Multicast_Group" title="Logical Port Multicast Groups">
+    <p>
+      The rows in this table define multicast groups of logical ports.
+      Multicast groups allow a single packet transmitted over a tunnel to a
+      hypervisor to be delivered to multiple VMs on that hypervisor, which
+      uses bandwidth more efficiently.
+    </p>
+
+    <p>
+      Each row in this table defines a logical multicast group numbered <ref
+      column="tunnel_key"/> within <ref column="datapath"/>, whose logical
+      ports are listed in the <ref column="ports"/> column.  All of the ports
+      must be in the <ref column="datapath"/> logical datapath (but the
+      database schema cannot enforce this).
+    </p>
+
+    <p>
+      Multicast group numbers and names are scoped within a logical datapath.
+    </p>
+
+    <p>
+      In the <ref table="Rule"/> table, multicast groups may be used for output
+      just as for individual logical ports, by assigning the group's name to
+      <code>outport</code>,
+    </p>
+
+    <p>
+      Multicast group names and logical port names share a single namespace and
+      thus should not overlap (but the database schema cannot enforce this).
+    </p>
+
+    <p>
+      An index prevents this table from containing any two rows with the same
+      <ref column="datapath"/> and <ref column="tunnel_key"/> values or the
+      same <ref column="datapath"/> and <ref column="name"/> values.
+    </p>
+
+    <column name="datapath"/>
+    <column name="tunnel_key"/>
+    <column name="name"/>
+    <column name="ports"/>
+  </table>
+
+  <table name="Datapath_Binding" title="Physical-Logical Datapath Bindings">
+    <p>
+      Each row in this table identifies physical bindings of a logical
+      datapath.
+    </p>
+
+    <column name="tunnel_key">
+      The tunnel key value to which the logical datapath is bound.
+      The <code>Tunnel Encapsulation</code> section in
+      <code>ovn-architecture</code>(7) describes how tunnel keys are
+      constructed for each supported encapsulation.
+    </column>
+
+    <column name="external_ids" key="logical-switch" type='{"type": "uuid"}'>
+      Each row in <ref table="Datapath_Binding"/> is associated with some
+      logical datapath.  <code>ovn-northd</code> uses this key to store the
+      UUID of the logical datapath <ref table="Logical_Switch"
+      db="OVN_Northbound"/> row in the <ref db="OVN_Northbound"/> database.
+    </column>
+
+    <group title="Common Columns">
+      The overall purpose of these columns is described under <code>Common
+      Columns</code> at the beginning of this document.
+
+      <column name="external_ids"/>
+    </group>
+  </table>
+
   <table name="Port_Binding" title="Physical-Logical Port Bindings">
     <p>
       Each row in this table identifies the physical location of a logical
@@ -651,7 +876,7 @@
     </p>
 
     <p>
-      When a chassis shuts down gracefully, it should cleanup the
+      When a chassis shuts down gracefully, it should clean up the
       <code>chassis</code> column that it previously had populated.
       (This is not critical because resources hosted on the chassis are equally
       unreachable regardless of whether their rows are present.)  To handle the
@@ -660,10 +885,8 @@
       <code>chassis</code> column with new information.
     </p>
 
-    <column name="logical_datapath">
-      The logical datapath to which the logical port belongs.  A logical
-      datapath implements a logical pipeline via logical flows in the <ref
-      table="Rule"/> table.  (No table represents a logical datapath.)
+    <column name="datapath">
+      The logical datapath to which the logical port belongs.
     </column>
 
     <column name="logical_port">
@@ -675,16 +898,29 @@
 
     <column name="tunnel_key">
       <p>
-        A number that represents the logical port in the key (e.g. VXLAN VNI or
-        STT key) field carried within tunnel protocol packets.  (This avoids
+        A number that represents the logical port in the key (e.g. STT key or
+        Geneve TLV) field carried within tunnel protocol packets.  This avoids
         wasting space for a whole UUID in tunneled packets.  It also allows OVN
         to support encapsulations that cannot fit an entire UUID in their
-        tunnel keys.)
+        tunnel keys (i.e. every encapsulation other than Geneve).
       </p>
 
       <p>
-        Tunnel ID 0 is reserved for internal use within OVN.
+        The tunnel ID must be unique within the scope of a logical datapath.
       </p>
+
+      <p>
+        Logical port tunnel IDs form a 16-bit space:
+      </p>
+
+      <ul>
+        <li>Tunnel ID 0 is reserved for internal use within OVN.</li>
+        <li>Tunnel IDs 1 through 32767, inclusive, may be assigned to logical
+        ports.</li>
+        <li>Tunnel IDs 32768 through 65535, inclusive, may be assigned to
+        logical multicast groups (see the <ref table="Multicast_Group"/>
+        table).</li>
+      </ul>
     </column>
 
     <column name="parent_port">
-- 
2.1.3




More information about the dev mailing list