[ovs-dev] [PATCH 22/22] ovn: Change strategy for tunnel keys.

Alex Wang alexw at nicira.com
Mon Aug 3 05:03:01 UTC 2015


Wow, this is really a big patch~  deserves multiple reviews~  I think I'll
re-
review it later~

Two comments inline:

Thanks,
Alex Wang,

On Sun, Jul 19, 2015 at 3:45 PM, Ben Pfaff <blp at nicira.com> wrote:

> Until now, OVN has used "flat" tunnel keys, in which the STT tunnel key or
> Geneve VNI contains a logical port number.  Logical port numbers are unique
> within an OVN deployment.
>
> Flat tunnel keys have the advantage of simplicity.  However, for packets
> that are destined to logical ports on multiple hypervisors, they require
> sending one packet per destination logical port rather than one packet per
> hypervisor.  They also make it hard to integrate with VXLAN-based hardware
> switches, which use VNIs to designate logical networks instead of logical
> ports.
>
> This commit switches OVN to a different scheme.  In this scheme, in Geneve
> the VNI designates a logical network and a Geneve option specifies the
> logical input and output ports, which are now scoped within the logical
> network rather than globally unique.  In STT, all three identifiers are
> encoded in the tunnel key.
>
> To allow for the reduced amount of traffic for packets destined to logical
> ports on multiple hypervisors, this commit also introduces the concept
> of a logical multicast group.  The membership of these groups can be set
> using a new Multicast_Group table in the southbound database (and
> ovn-northd does use it starting in this commit).
>
> With multicast groups alone, it would be difficult to implement ACLs,
> because an ACL might disallow only some of the packets being sent to
> a remote hypervisor.  Thus, this commit also splits the OVN logical
> pipeline into two pipelines: the "ingress" pipeline, which makes the
> decision about the logical destination of a packet as a set of logical
> ports or multicast groups, and the "egress" pipeline, which runs on the
> destination hypervisor with the multicast group destination exploded into
> individual ports and makes a final decision on whether to deliver the
> packet.  The "egress" pipeline can efficiently apply ACLs.
>
> Until now, the OVN logical and physical pipeline implementation was not
> adequately documented.  This commit adds extensive documentation to
> the OVN manpages to cover these issues.
>
> Signed-off-by: Ben Pfaff <blp at nicira.com>
> ---
>  ovn/TODO                        |   19 -
>  ovn/controller/ovn-controller.c |    3 +-
>  ovn/controller/physical.c       |  355 +++++++++---
>  ovn/controller/physical.h       |   11 +-
>  ovn/controller/rule.c           |   95 ++--
>  ovn/controller/rule.h           |   10 +-
>  ovn/northd/ovn-northd.c         | 1177
> ++++++++++++++++++++++++---------------
>  ovn/ovn-architecture.7.xml      |  363 ++++++++++--
>  ovn/ovn-nb.xml                  |   17 +-
>  ovn/ovn-sb.ovsschema            |   42 +-
>  ovn/ovn-sb.xml                  |  332 +++++++++--
>  11 files changed, 1696 insertions(+), 728 deletions(-)
>
> diff --git a/ovn/TODO b/ovn/TODO
> index 19c95ca..f0ff586 100644
> --- a/ovn/TODO
> +++ b/ovn/TODO
> @@ -1,24 +1,5 @@
>  * ovn-controller
>
> -*** Determine how to split logical pipeline across physical nodes.
> -
> -    From the original OVN architecture document:
> -
> -    The pipeline processing is split between the ingress and egress
> -    transport nodes.  In particular, the logical egress processing may
> -    occur at either hypervisor.  Processing the logical egress on the
> -    ingress hypervisor requires more state about the egress vif's
> -    policies, but reduces traffic on the wire that would eventually be
> -    dropped.  Whereas, processing on the egress hypervisor can reduce
> -    broadcast traffic on the wire by doing local replication.  We
> -    initially plan to process logical egress on the egress hypervisor
> -    so that less state needs to be replicated.  However, we may change
> -    this behavior once we gain some experience writing the logical
> -    flows.
> -
> -    The split pipeline processing split will influence how tunnel keys
> -    are encoded.
> -
>  ** ovn-controller parameters and configuration.
>
>  *** SSL configuration.
> diff --git a/ovn/controller/ovn-controller.c
> b/ovn/controller/ovn-controller.c
> index 488dce7..5ee6847 100644
> --- a/ovn/controller/ovn-controller.c
> +++ b/ovn/controller/ovn-controller.c
> @@ -269,8 +269,9 @@ main(int argc, char *argv[])
>
>              struct hmap flow_table = HMAP_INITIALIZER(&flow_table);
>              rule_run(&ctx, &flow_table);
> -                physical_run(&ctx, br_int, chassis_id, &flow_table);
>              if (chassis_id && mff_ovn_geneve) {
> +                physical_run(&ctx, mff_ovn_geneve,
> +                             br_int, chassis_id, &flow_table);
>              }
>              ofctrl_put(&flow_table);
>              hmap_destroy(&flow_table);
> diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c
> index e284a6a..09b7a99 100644
> --- a/ovn/controller/physical.c
> +++ b/ovn/controller/physical.c
> @@ -21,10 +21,14 @@
>  #include "ofpbuf.h"
>  #include "ovn-controller.h"
>  #include "ovn/lib/ovn-sb-idl.h"
> +#include "openvswitch/vlog.h"
>  #include "rule.h"
>  #include "simap.h"
> +#include "sset.h"
>  #include "vswitch-idl.h"
>
> +VLOG_DEFINE_THIS_MODULE(physical);
> +
>  void
>  physical_register_ovs_idl(struct ovsdb_idl *ovs_idl)
>  {
> @@ -42,12 +46,90 @@ physical_register_ovs_idl(struct ovsdb_idl *ovs_idl)
>      ovsdb_idl_add_column(ovs_idl, &ovsrec_interface_col_external_ids);
>  }
>
> +/* Maps from a chassis to the OpenFlow port number of the tunnel that can
> be
> + * used to reach that chassis. */
> +struct chassis_tunnel {
> +    struct hmap_node hmap_node;
> +    const char *chassis_id;
> +    ofp_port_t ofport;
> +    enum chassis_tunnel_type { GENEVE, STT } type;
> +};
> +
> +static struct chassis_tunnel *
> +chassis_tunnel_find(struct hmap *tunnels, const char *chassis_id)
> +{
> +    struct chassis_tunnel *tun;
> +    HMAP_FOR_EACH_WITH_HASH (tun, hmap_node, hash_string(chassis_id, 0),
> +                             tunnels) {
> +        if (!strcmp(tun->chassis_id, chassis_id)) {
> +            return tun;
> +        }
> +    }
> +    return NULL;
> +}
> +
> +static void
> +put_load(uint64_t value, enum mf_field_id dst, int ofs, int n_bits,
> +         struct ofpbuf *ofpacts)
> +{
> +    struct ofpact_set_field *sf = ofpact_put_SET_FIELD(ofpacts);
> +    sf->field = mf_from_id(dst);
> +    sf->flow_has_vlan = false;
> +
> +    ovs_be64 n_value = htonll(value);
> +    bitwise_copy(&n_value, 8, 0, &sf->value, sf->field->n_bytes, ofs,
> n_bits);
> +    bitwise_one(&sf->mask, sf->field->n_bytes, ofs, n_bits);
> +}
> +
> +static void
> +put_move(enum mf_field_id src, int src_ofs,
> +         enum mf_field_id dst, int dst_ofs,
> +         int n_bits,
> +         struct ofpbuf *ofpacts)
> +{
> +    struct ofpact_reg_move *move = ofpact_put_REG_MOVE(ofpacts);
> +    move->src.field = mf_from_id(src);
> +    move->src.ofs = src_ofs;
> +    move->src.n_bits = n_bits;
> +    move->dst.field = mf_from_id(dst);
> +    move->dst.ofs = dst_ofs;
> +    move->dst.n_bits = n_bits;
> +}
> +
> +static void
> +put_resubmit(uint8_t table_id, struct ofpbuf *ofpacts)
> +{
> +    struct ofpact_resubmit *resubmit = ofpact_put_RESUBMIT(ofpacts);
> +    resubmit->in_port = OFPP_IN_PORT;
> +    resubmit->table_id = table_id;
> +}
> +
> +static void
> +put_encapsulation(enum mf_field_id mff_ovn_geneve,
> +                  const struct chassis_tunnel *tun,
> +                  const struct sbrec_datapath_binding *datapath,
> +                  uint16_t outport, struct ofpbuf *ofpacts)
> +{
> +    if (tun->type == GENEVE) {
> +        put_load(datapath->tunnel_key, MFF_TUN_ID, 0, 24, ofpacts);
> +        put_load(outport, mff_ovn_geneve, 0, 32, ofpacts);
> +        put_move(MFF_LOG_INPORT, 0, mff_ovn_geneve, 16, 15, ofpacts);
> +    } else if (tun->type == STT) {
> +        put_load(datapath->tunnel_key | (outport << 24), MFF_TUN_ID, 0,
> 64,
> +                 ofpacts);
> +        put_move(MFF_LOG_INPORT, 0, MFF_TUN_ID, 40, 15, ofpacts);
> +    } else {
> +        OVS_NOT_REACHED();
> +    }
> +}
> +
>  void
> -physical_run(struct controller_ctx *ctx, const struct ovsrec_bridge
> *br_int,
> -             const char *this_chassis_id, struct hmap *flow_table)
> +physical_run(struct controller_ctx *ctx, enum mf_field_id mff_ovn_geneve,
> +             const struct ovsrec_bridge *br_int, const char
> *this_chassis_id,
> +             struct hmap *flow_table)
>  {
>      struct simap lport_to_ofport = SIMAP_INITIALIZER(&lport_to_ofport);
> -    struct simap chassis_to_ofport =
> SIMAP_INITIALIZER(&chassis_to_ofport);
> +    struct hmap tunnels = HMAP_INITIALIZER(&tunnels);
>      for (int i = 0; i < br_int->n_ports; i++) {
>          const struct ovsrec_port *port_rec = br_int->ports[i];
>          if (!strcmp(port_rec->name, br_int->name)) {
> @@ -74,7 +156,21 @@ physical_run(struct controller_ctx *ctx, const struct
> ovsrec_bridge *br_int,
>
>              /* Record as chassis or local logical port. */
>              if (chassis_id) {
> -                simap_put(&chassis_to_ofport, chassis_id, ofport);
> +                enum chassis_tunnel_type tunnel_type;
> +                if (!strcmp(iface_rec->type, "geneve")) {
> +                    tunnel_type = GENEVE;
> +                } else if (!strcmp(iface_rec->type, "stt")) {
> +                    tunnel_type = STT;
> +                } else {
> +                    continue;
> +                }
> +
> +                struct chassis_tunnel *tun = xmalloc(sizeof *tun);
> +                hmap_insert(&tunnels, &tun->hmap_node,
> +                            hash_string(chassis_id, 0));
> +                tun->chassis_id = chassis_id;
> +                tun->ofport = u16_to_ofp(ofport);
> +                tun->type = tunnel_type;
>                  break;
>              } else {
>                  const char *iface_id = smap_get(&iface_rec->external_ids,
> @@ -114,27 +210,20 @@ physical_run(struct controller_ctx *ctx, const
> struct ovsrec_bridge *br_int,
>                                            binding->logical_port));
>          }
>
> -        bool local = ofport != 0;
> -        if (!local) {
> +        const struct chassis_tunnel *tun = NULL;
> +        if (!ofport) {
>              if (!binding->chassis) {
>                  continue;
>              }
> -            ofport = u16_to_ofp(simap_get(&chassis_to_ofport,
> -                                          binding->chassis->name));
> -            if (!ofport) {
> +            tun = chassis_tunnel_find(&tunnels, binding->chassis->name);
> +            if (!tun) {
>                  continue;
>              }
> -        }
> -
> -        /* Translate the logical datapath into the form we use in
> -         * MFF_LOG_DATAPATH. */
> -        uint32_t ldp = ldp_to_integer(&binding->logical_datapath);
> -        if (!ldp) {
> -            continue;
> +            ofport = tun->ofport;
>          }
>
>          struct match match;
> -        if (local) {
> +        if (!tun) {
>              /* Packets that arrive from a vif can belong to a VM or
>               * to a container located inside that VM. Packets that
>               * arrive from containers have a tag (vlan) associated with
> them.
> @@ -149,7 +238,8 @@ physical_run(struct controller_ctx *ctx, const struct
> ovsrec_bridge *br_int,
>               *
>               * For both types of traffic: set MFF_LOG_INPORT to the
> logical
>               * input port, MFF_LOG_DATAPATH to the logical datapath, and
> -             * resubmit into the logical pipeline starting at table 16. */
> +             * resubmit into the logical ingress pipeline starting at
> table
> +             * 16. */
>              match_init_catchall(&match);
>              ofpbuf_clear(&ofpacts);
>              match_set_in_port(&match, ofport);
> @@ -157,95 +247,214 @@ physical_run(struct controller_ctx *ctx, const
> struct ovsrec_bridge *br_int,
>                  match_set_dl_vlan(&match, htons(tag));
>              }
>
> -            /* Set MFF_LOG_DATAPATH. */
> -            struct ofpact_set_field *sf = ofpact_put_SET_FIELD(&ofpacts);
> -            sf->field = mf_from_id(MFF_LOG_DATAPATH);
> -            sf->value.be64 = htonll(ldp);
> -            sf->mask.be64 = OVS_BE64_MAX;
> -
> -            /* Set MFF_LOG_INPORT. */
> -            sf = ofpact_put_SET_FIELD(&ofpacts);
> -            sf->field = mf_from_id(MFF_LOG_INPORT);
> -            sf->value.be32 = htonl(binding->tunnel_key);
> -            sf->mask.be32 = OVS_BE32_MAX;
> +            /* Set MFF_LOG_DATAPATH and MFF_LOG_INPORT. */
> +            put_load(binding->datapath->tunnel_key, MFF_LOG_DATAPATH, 0,
> 64,
> +                     &ofpacts);
> +            put_load(binding->tunnel_key, MFF_LOG_INPORT, 0, 32,
> &ofpacts);
>
>              /* Strip vlans. */
>              if (tag) {
>                  ofpact_put_STRIP_VLAN(&ofpacts);
>              }
>
> -            /* Resubmit to first logical pipeline table. */
> -            struct ofpact_resubmit *resubmit =
> ofpact_put_RESUBMIT(&ofpacts);
> -            resubmit->in_port = OFPP_IN_PORT;
> -            resubmit->table_id = 16;
> +            /* Resubmit to first logical ingress pipeline table. */
> +            put_resubmit(16, &ofpacts);
>              ofctrl_add_flow(flow_table, 0, tag ? 150 : 100, &match,
> &ofpacts);
>
> -            /* Table 0, Priority 50.
> -             * =====================
> +            /* Table 33, priority 100.
> +             * =======================
> +             *
> +             * Implements output to local hypervisor.  Each flow matches a
> +             * logical output port on the local hypervisor, and resubmits
> to
> +             * table 34.
> +             */
> +
> +            match_init_catchall(&match);
> +            ofpbuf_clear(&ofpacts);
> +
> +            /* Match MFF_LOG_DATAPATH, MFF_LOG_OUTPORT. */
> +            match_set_metadata(&match,
> htonll(binding->datapath->tunnel_key));
> +            match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0,
> +                          binding->tunnel_key);
> +
>

Want to know why do we need to match the tunnel_key here?

I think this may relate to Sugesh's comment about the "the traffic between
VMs in same node dropped because there is no rule created to process it"



> +            /* Resubmit to table 34. */
> +            put_resubmit(34, &ofpacts);
> +            ofctrl_add_flow(flow_table, 33, 100, &match, &ofpacts);
> +
> +            /* Table 64, Priority 50.
> +             * =======================
>               *
> -             * For packets that arrive from a remote node destined to this
> -             * local vif: deliver directly to the vif. If the destination
> -             * is a container sitting behind a vif, tag the packets. */
> +             * Deliver the packet to the local vif. */
>              match_init_catchall(&match);
>              ofpbuf_clear(&ofpacts);
> -            match_set_tun_id(&match, htonll(binding->tunnel_key));
> +            match_set_metadata(&match,
> htonll(binding->datapath->tunnel_key));
> +            match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0,
> +                          binding->tunnel_key);
>              if (tag) {
> +                /* For containers sitting behind a local vif, tag the
> packets
> +                 * before delivering them. */
>                  struct ofpact_vlan_vid *vlan_vid;
>                  vlan_vid = ofpact_put_SET_VLAN_VID(&ofpacts);
>                  vlan_vid->vlan_vid = tag;
>                  vlan_vid->push_vlan_if_needed = true;
> +
> +                /* A packet might need to hair-pin back into its ingress
> +                 * OpenFlow port (to a different logical port, which we
> already
> +                 * checked back in table 34), so set the in_port to zero.
> */
> +                put_load(0, MFF_IN_PORT, 0, 16, &ofpacts);
>              }
>              ofpact_put_OUTPUT(&ofpacts)->port = ofport;
> -            ofctrl_add_flow(flow_table, 0, 50, &match, &ofpacts);
> +            ofctrl_add_flow(flow_table, 64, 100, &match, &ofpacts);
> +        } else {
> +            /* Table 32, priority 100.
> +             * =======================
> +             *
> +             * Implements output to remote hypervisors.  Each flow
> matches an
> +             * output port that includes a logical port on a remote
> hypervisor,
> +             * and tunnels the packet to that hypervisor.
> +             */
> +
> +            match_init_catchall(&match);
> +            ofpbuf_clear(&ofpacts);
> +
> +            /* Match MFF_LOG_DATAPATH, MFF_LOG_OUTPORT. */
> +            match_set_metadata(&match,
> htonll(binding->datapath->tunnel_key));
> +            match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0,
> +                          binding->tunnel_key);
> +
> +            put_encapsulation(mff_ovn_geneve, tun, binding->datapath,
> +                              binding->tunnel_key, &ofpacts);
> +
> +            /* Output to tunnel. */
> +            ofpact_put_OUTPUT(&ofpacts)->port = ofport;
> +            ofctrl_add_flow(flow_table, 32, 100, &match, &ofpacts);
>          }
>
> -        /* Table 64, Priority 100.
> +        /* Table 34, Priority 100.
>           * =======================
>           *
>           * Drop packets whose logical inport and outport are the same. */
>          match_init_catchall(&match);
>          ofpbuf_clear(&ofpacts);
> +        match_set_metadata(&match, htonll(binding->datapath->tunnel_key));
>          match_set_reg(&match, MFF_LOG_INPORT - MFF_REG0,
> binding->tunnel_key);
>          match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0,
> binding->tunnel_key);
> -        ofctrl_add_flow(flow_table, 64, 100, &match, &ofpacts);
> +        ofctrl_add_flow(flow_table, 34, 100, &match, &ofpacts);
> +    }
> +
> +    const struct sbrec_multicast_group *mc;
> +    SBREC_MULTICAST_GROUP_FOR_EACH (mc, ctx->ovnsb_idl) {
> +        struct sset remote_chassis = SSET_INITIALIZER(&remote_chassis);
> +        struct match match;
>
> -        /* Table 64, Priority 50.
> -         * ======================
> -         *
> -         * For packets to remote machines, send them over a tunnel to the
> -         * remote chassis.
> -         *
> -         * For packets to local vifs, deliver them directly. */
>          match_init_catchall(&match);
> +        match_set_metadata(&match, htonll(mc->datapath->tunnel_key));
> +        match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0, mc->tunnel_key);
> +
>          ofpbuf_clear(&ofpacts);
> -        match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0,
> binding->tunnel_key);
> -        if (!local) {
> -            /* Set MFF_TUN_ID. */
> -            struct ofpact_set_field *sf = ofpact_put_SET_FIELD(&ofpacts);
> -            sf->field = mf_from_id(MFF_TUN_ID);
> -            sf->value.be64 = htonll(binding->tunnel_key);
> -            sf->mask.be64 = OVS_BE64_MAX;
> +        for (size_t i = 0; i < mc->n_ports; i++) {
> +            struct sbrec_port_binding *port = mc->ports[i];
> +
> +            if (port->datapath != mc->datapath) {
> +                static struct vlog_rate_limit rl =
> VLOG_RATE_LIMIT_INIT(5, 1);
> +                VLOG_WARN_RL(&rl, UUID_FMT": multicast group contains
> ports "
> +                             "in wrong datapath",
> +                             UUID_ARGS(&mc->header_.uuid));
> +                continue;
> +            }
> +
> +            if (simap_contains(&lport_to_ofport, port->logical_port)) {
> +                put_load(port->tunnel_key, MFF_LOG_OUTPORT, 0, 32,
> &ofpacts);
> +                put_resubmit(34, &ofpacts);
> +            } else if (port->chassis) {
> +                sset_add(&remote_chassis, port->chassis->name);
> +            }
> +        }
> +
> +        bool local_ports = ofpacts.size > 0;
> +        if (local_ports) {
> +            ofctrl_add_flow(flow_table, 33, 100, &match, &ofpacts);
>          }
> -        if (tag) {
> -            /* For containers sitting behind a local vif, tag the packets
> -             * before delivering them. Since there is a possibility of
> -             * packets needing to hair-pin back into the same vif from
> -             * which it came, make the in_port as zero. */
> -            struct ofpact_vlan_vid *vlan_vid;
> -            vlan_vid = ofpact_put_SET_VLAN_VID(&ofpacts);
> -            vlan_vid->vlan_vid = tag;
> -            vlan_vid->push_vlan_if_needed = true;
> -
> -            struct ofpact_set_field *sf = ofpact_put_SET_FIELD(&ofpacts);
> -            sf->field = mf_from_id(MFF_IN_PORT);
> -            sf->value.be16 = 0;
> -            sf->mask.be16 = OVS_BE16_MAX;
> +
> +        if (!sset_is_empty(&remote_chassis)) {
> +            ofpbuf_clear(&ofpacts);
> +
> +            const char *chassis;
> +            const struct chassis_tunnel *prev = NULL;
> +            SSET_FOR_EACH (chassis, &remote_chassis) {
> +                const struct chassis_tunnel *tun
> +                    = chassis_tunnel_find(&tunnels, chassis);
> +                if (!tun) {
> +                    continue;
> +                }
> +
> +                if (!prev || tun->type != prev->type) {
> +                    put_encapsulation(mff_ovn_geneve, tun,
> +                                      mc->datapath, mc->tunnel_key,
> &ofpacts);
> +                    prev = tun;
> +                }
> +                ofpact_put_OUTPUT(&ofpacts)->port = tun->ofport;
> +            }
> +
> +            if (ofpacts.size) {
> +                if (local_ports) {
> +                    put_resubmit(33, &ofpacts);
> +                }
> +                ofctrl_add_flow(flow_table, 32, 100, &match, &ofpacts);
> +            }
>          }
> -        ofpact_put_OUTPUT(&ofpacts)->port = ofport;
> -        ofctrl_add_flow(flow_table, 64, 50, &match, &ofpacts);
> +        sset_destroy(&remote_chassis);
>      }
>
> +    /* Table 0, priority 100.
> +     * ======================
> +     *
> +     * For packets that arrive from a remote hypervisor (by matching a
> tunnel
> +     * in_port), set MFF_LOG_DATAPATH, MFF_LOG_INPORT, and
> MFF_LOG_OUTPORT from
> +     * the tunnel key data, then resubmit to table 33 to handle packets
> to the
> +     * local hypervisor. */
> +
> +    struct chassis_tunnel *tun;
> +    HMAP_FOR_EACH (tun, hmap_node, &tunnels) {
> +        struct match match = MATCH_CATCHALL_INITIALIZER;
> +        match_set_in_port(&match, tun->ofport);
> +
> +        ofpbuf_clear(&ofpacts);
> +        if (tun->type == GENEVE) {
> +            put_move(MFF_TUN_ID, 0,  MFF_LOG_DATAPATH, 0, 24, &ofpacts);
> +            put_move(mff_ovn_geneve, 16, MFF_LOG_INPORT, 0, 15,
> +                     &ofpacts);
> +            put_move(mff_ovn_geneve, 0, MFF_LOG_OUTPORT, 0, 16,
> +                     &ofpacts);
> +        } else if (tun->type == STT) {
> +            put_move(MFF_TUN_ID, 40, MFF_LOG_INPORT,   0, 15, &ofpacts);
> +            put_move(MFF_TUN_ID, 24, MFF_LOG_OUTPORT,  0, 16, &ofpacts);
> +            put_move(MFF_TUN_ID,  0, MFF_LOG_DATAPATH, 0, 24, &ofpacts);
> +        } else {
> +            OVS_NOT_REACHED();
> +        }
> +        put_resubmit(33, &ofpacts);
> +
> +        ofctrl_add_flow(flow_table, 0, 100, &match, &ofpacts);
> +    }
> +
> +    /* Table 34, Priority 0.
> +     * =======================
> +     *
> +     * Resubmit packets that don't output to the ingress port to the
> logical
> +     * egress pipeline. */
> +    struct match match;
> +    match_init_catchall(&match);
> +    ofpbuf_clear(&ofpacts);
> +    put_resubmit(48, &ofpacts);
> +    ofctrl_add_flow(flow_table, 34, 0, &match, &ofpacts);
> +
>      ofpbuf_uninit(&ofpacts);
>      simap_destroy(&lport_to_ofport);
> -    simap_destroy(&chassis_to_ofport);
> +    struct chassis_tunnel *tun_next;
> +    HMAP_FOR_EACH_SAFE (tun, tun_next, hmap_node, &tunnels) {
> +        hmap_remove(&tunnels, &tun->hmap_node);
> +        free(tun);
> +    }
> +    hmap_destroy(&tunnels);
>  }
> diff --git a/ovn/controller/physical.h b/ovn/controller/physical.h
> index 82baa2f..edb644b 100644
> --- a/ovn/controller/physical.h
> +++ b/ovn/controller/physical.h
> @@ -20,10 +20,13 @@
>   * ============================
>   *
>   * This module implements physical-to-logical and logical-to-physical
> - * translation as separate OpenFlow tables that run before and after,
> - * respectively, the logical pipeline OpenFlow tables.
> + * translation as separate OpenFlow tables that run before the ingress
> pipeline
> + * and after the egress pipeline, respectively, as well as to connect the
> + * two pipelines.
>   */
>
> +#include "meta-flow.h"
> +
>  struct controller_ctx;
>  struct hmap;
>  struct ovsdb_idl;
> @@ -37,8 +40,8 @@ struct ovsrec_bridge;
>  #define OVN_GENEVE_LEN 4
>
>  void physical_register_ovs_idl(struct ovsdb_idl *);
> -void physical_run(struct controller_ctx *, const struct ovsrec_bridge
> *br_int,
> -                  const char *chassis_id,
> +void physical_run(struct controller_ctx *, enum mf_field_id
> mff_ovn_geneve,
> +                  const struct ovsrec_bridge *br_int, const char
> *chassis_id,
>                    struct hmap *flow_table);
>
>  #endif /* ovn/physical.h */
> diff --git a/ovn/controller/rule.c b/ovn/controller/rule.c
> index c7281a0..de8b509 100644
> --- a/ovn/controller/rule.c
> +++ b/ovn/controller/rule.c
> @@ -135,19 +135,12 @@ symtab_init(void)
>
>  /* A logical datapath.
>   *
> - * 'uuid' is the UUID that represents the logical datapath in the OVN_SB
> - * database.
> - *
> - * 'integer' represents the logical datapath as an integer value that is
> unique
> - * only within the local hypervisor.  Because of its size, this value is
> more
> - * practical for use in an OpenFlow flow table than a UUID.
> - *
>   * 'ports' maps 'logical_port' names to 'tunnel_key' values in the OVN_SB
>   * Binding table within the logical datapath. */
>  struct logical_datapath {
>      struct hmap_node hmap_node; /* Indexed on 'uuid'. */
> -    struct uuid uuid;           /* The logical_datapath's UUID. */
> -    uint32_t integer;           /* Locally unique among logical
> datapaths. */
> +    struct uuid uuid;           /* UUID from Datapath_Binding row. */
> +    uint32_t tunnel_key;        /* 'tunnel_key' from Datapath_Binding
> row. */
>      struct simap ports;         /* Logical port name to port number. */
>  };
>
> @@ -157,45 +150,40 @@ static struct hmap logical_datapaths =
> HMAP_INITIALIZER(&logical_datapaths);
>  /* Finds and returns the logical_datapath with the given 'uuid', or NULL
> if
>   * no such logical_datapath exists. */
>  static struct logical_datapath *
> -ldp_lookup(const struct uuid *uuid)
> +ldp_lookup(const struct sbrec_datapath_binding *binding)
>  {
>      struct logical_datapath *ldp;
> -    HMAP_FOR_EACH_IN_BUCKET (ldp, hmap_node, uuid_hash(uuid),
> +    HMAP_FOR_EACH_IN_BUCKET (ldp, hmap_node,
> uuid_hash(&binding->header_.uuid),
>                               &logical_datapaths) {
> -        if (uuid_equals(&ldp->uuid, uuid)) {
> +        if (uuid_equals(&ldp->uuid, &binding->header_.uuid)) {
>              return ldp;
>          }
>      }
>      return NULL;
>  }
>
> -/* Finds and returns the integer value corresponding to the given 'uuid',
> or 0
> - * if no such logical datapath exists. */
> -uint32_t
> -ldp_to_integer(const struct uuid *logical_datapath)
> -{
> -    const struct logical_datapath *ldp = ldp_lookup(logical_datapath);
> -    return ldp ? ldp->integer : 0;
> -}
> -
>  /* Creates a new logical_datapath with the given 'uuid'. */
>  static struct logical_datapath *
> -ldp_create(const struct uuid *uuid)
> +ldp_create(const struct sbrec_datapath_binding *binding)
>  {
> -    static uint32_t next_integer = 1;
>      struct logical_datapath *ldp;
>
> -    /* We don't handle the case where the logical datapaths wrap around.
> */
> -    ovs_assert(next_integer);
> -
>      ldp = xmalloc(sizeof *ldp);
> -    hmap_insert(&logical_datapaths, &ldp->hmap_node, uuid_hash(uuid));
> -    ldp->uuid = *uuid;
> -    ldp->integer = next_integer++;
> +    hmap_insert(&logical_datapaths, &ldp->hmap_node,
> +                uuid_hash(&binding->header_.uuid));
> +    ldp->uuid = binding->header_.uuid;
> +    ldp->tunnel_key = binding->tunnel_key;
>      simap_init(&ldp->ports);
>      return ldp;
>  }
>
> +static struct logical_datapath *
> +ldp_lookup_or_create(const struct sbrec_datapath_binding *binding)
> +{
> +    struct logical_datapath *ldp = ldp_lookup(binding);
> +    return ldp ? ldp : ldp_create(binding);
> +}
> +
>  static void
>  ldp_free(struct logical_datapath *ldp)
>  {
> @@ -204,8 +192,9 @@ ldp_free(struct logical_datapath *ldp)
>      free(ldp);
>  }
>
> -/* Iterates through all of the records in the Binding table, updating the
> - * table of logical_datapaths to match the values found in active
> Bindings. */
> +/* Iterates through all of the records in the Port_Binding table,
> updating the
> + * table of logical_datapaths to match the values found in active
> + * Port_Bindings. */
>  static void
>  ldp_run(struct controller_ctx *ctx)
>  {
> @@ -216,16 +205,17 @@ ldp_run(struct controller_ctx *ctx)
>
>      const struct sbrec_port_binding *binding;
>      SBREC_PORT_BINDING_FOR_EACH (binding, ctx->ovnsb_idl) {
> -        struct logical_datapath *ldp;
> -
> -        ldp = ldp_lookup(&binding->logical_datapath);
> -        if (!ldp) {
> -            ldp = ldp_create(&binding->logical_datapath);
> -        }
> +        struct logical_datapath *ldp =
> ldp_lookup_or_create(binding->datapath);
>
>          simap_put(&ldp->ports, binding->logical_port,
> binding->tunnel_key);
>      }
>
> +    const struct sbrec_multicast_group *mc;
> +    SBREC_MULTICAST_GROUP_FOR_EACH (mc, ctx->ovnsb_idl) {
> +        struct logical_datapath *ldp = ldp_lookup_or_create(mc->datapath);
> +        simap_put(&ldp->ports, mc->name, mc->tunnel_key);
> +    }
> +
>      struct logical_datapath *next_ldp;
>      HMAP_FOR_EACH_SAFE (ldp, next_ldp, hmap_node, &logical_datapaths) {
>          if (simap_is_empty(&ldp->ports)) {
> @@ -250,9 +240,7 @@ rule_init(void)
>  }
>
>  /* Translates logical flows in the Rule table in the OVN_SB database into
> - * OpenFlow flows, adding the OpenFlow flows to 'flow_table'.
> - *
> - * We put the Rule flows into OpenFlow tables 16 through 47 (inclusive).
> */
> + * OpenFlow flows.  See ovn-architecture(7) for more information. */
>  void
>  rule_run(struct controller_ctx *ctx, struct hmap *flow_table)
>  {
> @@ -268,22 +256,29 @@ rule_run(struct controller_ctx *ctx, struct hmap
> *flow_table)
>           * bound to that logical datapath, so there's no point in
> maintaining
>           * any flows for it anyway, so skip it. */
>          const struct logical_datapath *ldp;
> -        ldp = ldp_lookup(&rule->logical_datapath);
> +        ldp = ldp_lookup(rule->logical_datapath);
>          if (!ldp) {
>              continue;
>          }
>
> -        /* Translate OVN actions into OpenFlow actions. */
> +        /* Translate logical table ID to physical table ID. */
> +        bool ingress = !strcmp(rule->pipeline, "ingress");
> +        uint8_t phys_table = rule->table_id + (ingress ? 16 : 48);
> +        uint8_t next_phys_table = rule->table_id < 15 ? phys_table + 1 :
> 0;
> +        uint8_t output_phys_table = ingress ? 32 : 64;
> +
> +        /* Translate OVN actions into OpenFlow actions.
> +         *
> +         * XXX Deny changes to 'outport' in egress pipeline. */
>          uint64_t ofpacts_stub[64 / 8];
>          struct ofpbuf ofpacts;
>          struct expr *prereqs;
> -        uint8_t next_table_id;
>          char *error;
>
>          ofpbuf_use_stub(&ofpacts, ofpacts_stub, sizeof ofpacts_stub);
> -        next_table_id = rule->table_id < 31 ? rule->table_id + 17 : 0;
>          error = actions_parse_string(rule->actions, &symtab, &ldp->ports,
> -                                     next_table_id, 64, &ofpacts,
> &prereqs);
> +                                     next_phys_table, output_phys_table,
> +                                     &ofpacts, &prereqs);
>          if (error) {
>              static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
>              VLOG_WARN_RL(&rl, "error parsing actions \"%s\": %s",
> @@ -322,13 +317,13 @@ rule_run(struct controller_ctx *ctx, struct hmap
> *flow_table)
>          /* Prepare the OpenFlow matches for adding to the flow table. */
>          struct expr_match *m;
>          HMAP_FOR_EACH (m, hmap_node, &matches) {
> -            match_set_metadata(&m->match, htonll(ldp->integer));
> +            match_set_metadata(&m->match, htonll(ldp->tunnel_key));
>              if (m->match.wc.masks.conj_id) {
>                  m->match.flow.conj_id += conj_id_ofs;
>              }
>              if (!m->n) {
> -                ofctrl_add_flow(flow_table, rule->table_id + 16,
> -                                rule->priority, &m->match, &ofpacts);
> +            ofctrl_add_flow(flow_table, phys_table, rule->priority,
> +                                &m->match, &ofpacts);
>              } else {
>                  uint64_t conj_stubs[64 / 8];
>                  struct ofpbuf conj;
> @@ -343,8 +338,8 @@ rule_run(struct controller_ctx *ctx, struct hmap
> *flow_table)
>                      dst->clause = src->clause;
>                      dst->n_clauses = src->n_clauses;
>                  }
> -                ofctrl_add_flow(flow_table, rule->table_id + 16,
> -                                rule->priority, &m->match, &conj);
> +                ofctrl_add_flow(flow_table, phys_table, rule->priority,
> +                                &m->match, &conj);
>                  ofpbuf_uninit(&conj);
>              }
>          }
> diff --git a/ovn/controller/rule.h b/ovn/controller/rule.h
> index a7bd71f..a39fba8 100644
> --- a/ovn/controller/rule.h
> +++ b/ovn/controller/rule.h
> @@ -20,10 +20,10 @@
>  /* Rule table translation to OpenFlow
>   * ==================================
>   *
> - * The Rule table obtained from the OVN_Southbound database works in terms
> - * of logical entities, that is, logical flows among logical datapaths and
> - * logical ports.  This code translates these logical flows into OpenFlow
> flows
> - * that, again, work in terms of logical entities implemented through
> OpenFlow
> + * The Rule table obtained from the OVN_Southbound database works in
> terms of
> + * logical entities, that is, logical flows among logical datapaths and
> logical
> + * ports.  This code translates these logical flows into OpenFlow flows
> that,
> + * again, work in terms of logical entities implemented through OpenFlow
>   * extensions (e.g. registers represent the logical input and output
> ports).
>   *
>   * Physical-to-logical and logical-to-physical translation are
> implemented in
> @@ -46,6 +46,4 @@ void rule_init(void);
>  void rule_run(struct controller_ctx *, struct hmap *flow_table);
>  void rule_destroy(void);
>
> -uint32_t ldp_to_integer(const struct uuid *logical_datapath);
> -
>  #endif /* ovn/rule.h */
> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> index eac5546..5ecd13e 100644
> --- a/ovn/northd/ovn-northd.c
> +++ b/ovn/northd/ovn-northd.c
> @@ -30,6 +30,7 @@
>  #include "ovn/lib/ovn-nb-idl.h"
>  #include "ovn/lib/ovn-sb-idl.h"
>  #include "poll-loop.h"
> +#include "smap.h"
>  #include "stream.h"
>  #include "stream-ssl.h"
>  #include "unixctl.h"
> @@ -74,135 +75,559 @@ Options:\n\
>      stream_usage("database", true, true, false);
>  }
>
> -static int
> -compare_strings(const void *a_, const void *b_)
> +struct key_node {
> +    struct hmap_node hmap_node;
> +    uint32_t key;
> +};
> +
> +static void
> +keys_destroy(struct hmap *keys)
>  {
> -    char *const *a = a_;
> -    char *const *b = b_;
> -    return strcmp(*a, *b);
> +    struct key_node *node, *next;
> +    HMAP_FOR_EACH_SAFE (node, next, hmap_node, keys) {
> +        hmap_remove(keys, &node->hmap_node);
> +        free(node);
> +    }
> +    hmap_destroy(keys);
> +}
> +
> +static void
> +add_key(struct hmap *set, uint32_t key)
> +{
> +    struct key_node *node = xmalloc(sizeof *node);
> +    hmap_insert(set, &node->hmap_node, hash_int(key, 0));
> +    node->key = key;
>  }
>
> -/*
> - * Determine whether 2 arrays of MAC addresses are the same.  It's
> possible that
> - * the lists could be *very* long and this check is being done a lot
> (every
> - * time the OVN_Northbound database changes).
> - */
>  static bool
> -macs_equal(char **binding_macs_, size_t b_n_macs,
> -           char **lport_macs_, size_t l_n_macs)
> +key_in_use(const struct hmap *set, uint32_t key)
>  {
> -    char **binding_macs, **lport_macs;
> -    size_t bytes, i;
> +    const struct key_node *node;
> +    HMAP_FOR_EACH_IN_BUCKET (node, hmap_node, hash_int(key, 0), set) {
> +        if (node->key == key) {
> +            return true;
> +        }
> +    }
> +    return false;
> +}
>
> -    if (b_n_macs != l_n_macs) {
> -        return false;
> +static uint32_t
> +allocate_key(struct hmap *set, const char *name, uint32_t max, uint32_t
> *prev)
> +{
> +    for (uint32_t key = *prev + 1; key != *prev;
> +         key = key + 1 <= max ? key + 1 : 1) {
> +        if (!key_in_use(set, key)) {
> +            add_key(set, key);
> +            *prev = key;
> +            return key;
> +        }
>      }
>
> -    bytes = b_n_macs * sizeof binding_macs_[0];
> -    binding_macs = xmalloc(bytes);
> -    lport_macs = xmalloc(bytes);
> +    static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
> +    VLOG_WARN_RL(&rl, "all %s tunnel keys exhausted", name);
> +    return 0;
> +}
> +
> +/* The 'key' comes from nb->header_.uuid or sb->external_ids's ' */
> +struct ovn_datapath {
> +    struct hmap_node key_node;  /* Index on 'key'. */
> +    struct uuid key;            /* nb->header_.uuid. */
> +
> +    const struct nbrec_logical_switch *nb;   /* May be NULL. */
> +    const struct sbrec_datapath_binding *sb; /* May be NULL. */
>
> -    memcpy(binding_macs, binding_macs_, bytes);
> -    memcpy(lport_macs, lport_macs_, bytes);
> +    struct ovs_list list;       /* In list of similar records. */
>
> -    qsort(binding_macs, b_n_macs, sizeof binding_macs[0],
> compare_strings);
> -    qsort(lport_macs, l_n_macs, sizeof lport_macs[0], compare_strings);
> +    struct hmap port_keys;
> +    uint32_t max_port_key;
>
> -    for (i = 0; i < b_n_macs; i++) {
> -        if (strcmp(binding_macs[i], lport_macs[i])) {
> -            break;
> +    bool has_unknown;
> +};
> +
> +static struct ovn_datapath *
> +ovn_datapath_create(struct hmap *dp_map, const struct uuid *key,
> +                    const struct nbrec_logical_switch *nb,
> +                    const struct sbrec_datapath_binding *sb)
> +{
> +    struct ovn_datapath *od = xzalloc(sizeof *od);
> +    od->key = *key;
> +    od->sb = sb;
> +    od->nb = nb;
> +    hmap_init(&od->port_keys);
> +    od->max_port_key = 0;
> +    hmap_insert(dp_map, &od->key_node, uuid_hash(&od->key));
> +    return od;
> +}
> +
> +static void
> +ovn_datapath_destroy(struct hmap *dp_map, struct ovn_datapath *od)
> +{
> +    if (od) {
> +        /* Don't remove od->list, it's only safe and only used within
> +         * build_datapaths(). */
> +        hmap_remove(dp_map, &od->key_node);
> +        keys_destroy(&od->port_keys);
> +        free(od);
> +    }
> +}
> +
> +static struct ovn_datapath *
> +ovn_datapath_find(struct hmap *dp_map, const struct uuid *uuid)
> +{
> +    struct ovn_datapath *od;
> +
> +    HMAP_FOR_EACH_WITH_HASH (od, key_node, uuid_hash(uuid), dp_map) {
> +        if (uuid_equals(uuid, &od->key)) {
> +            return od;
> +        }
> +    }
> +    return NULL;
> +}
> +
> +static struct ovn_datapath *
> +ovn_datapath_from_sbrec(struct hmap *dp_map,
> +                        const struct sbrec_datapath_binding *sb)
> +{
> +    struct uuid key;
> +
> +    if (!smap_get_uuid(&sb->external_ids, "logical-switch", &key)) {
> +        return NULL;
> +    }
> +    return ovn_datapath_find(dp_map, &key);
> +}
> +
> +static void
> +join_datapaths(struct northd_context *ctx, struct hmap *dp_map,
> +               struct ovs_list *sb_only, struct ovs_list *nb_only,
> +               struct ovs_list *both)
> +{
> +    hmap_init(dp_map);
> +    list_init(sb_only);
> +    list_init(nb_only);
> +    list_init(both);
> +
> +    const struct sbrec_datapath_binding *sb, *sb_next;
> +    SBREC_DATAPATH_BINDING_FOR_EACH_SAFE (sb, sb_next, ctx->ovnsb_idl) {
> +        struct uuid key;
> +        if (!smap_get_uuid(&sb->external_ids, "logical-switch", &key)) {
> +            ovsdb_idl_txn_add_comment(ctx->ovnsb_txn,
> +                                      "deleting Datapath_Binding
> "UUID_FMT" that "
> +                                      "lacks external-ids:logical-switch",
> +                         UUID_ARGS(&sb->header_.uuid));
> +            sbrec_datapath_binding_delete(sb);
> +            continue;
> +        }
> +
> +        if (ovn_datapath_find(dp_map, &key)) {
> +            static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
> +            VLOG_INFO_RL(&rl, "deleting Datapath_Binding "UUID_FMT" with "
> +                         "duplicate external-ids:logical-switch "UUID_FMT,
> +                         UUID_ARGS(&sb->header_.uuid), UUID_ARGS(&key));
> +            sbrec_datapath_binding_delete(sb);
> +            continue;
> +        }
> +
> +        struct ovn_datapath *od = ovn_datapath_create(dp_map, &key, NULL,
> sb);
> +        list_push_back(sb_only, &od->list);
> +    }
> +
> +    const struct nbrec_logical_switch *nb;
> +    NBREC_LOGICAL_SWITCH_FOR_EACH (nb, ctx->ovnnb_idl) {
> +        struct ovn_datapath *od = ovn_datapath_find(dp_map,
> &nb->header_.uuid);
> +        if (od) {
> +            od->nb = nb;
> +            list_remove(&od->list);
> +            list_push_back(both, &od->list);
> +        } else {
> +            od = ovn_datapath_create(dp_map, &nb->header_.uuid, nb, NULL);
> +            list_push_back(nb_only, &od->list);
> +        }
> +    }
> +}
> +
> +static uint32_t
> +ovn_datapath_allocate_key(struct hmap *dp_keys)
> +{
> +    static uint32_t prev;
> +    return allocate_key(dp_keys, "datapath", (1u << 24) - 1, &prev);
> +}
> +
> +static void
> +build_datapaths(struct northd_context *ctx, struct hmap *dp_map)
> +{
> +    struct ovs_list sb_dps, nb_dps, both_dps;
> +
> +    join_datapaths(ctx, dp_map, &sb_dps, &nb_dps, &both_dps);
> +
> +    if (!list_is_empty(&nb_dps)) {
> +        /* First index the in-use datapath tunnel keys. */
> +        struct hmap dp_keys = HMAP_INITIALIZER(&dp_keys);
> +        struct ovn_datapath *od;
> +        LIST_FOR_EACH (od, list, &both_dps) {
> +            add_key(&dp_keys, od->sb->tunnel_key);
> +        }
> +
> +        /* Add southbound record for each unmatched northbound record. */
> +        LIST_FOR_EACH (od, list, &nb_dps) {
> +            uint16_t tunnel_key = ovn_datapath_allocate_key(&dp_keys);
> +            if (!tunnel_key) {
> +                break;
> +            }
> +
> +            od->sb = sbrec_datapath_binding_insert(ctx->ovnsb_txn);
> +
> +            struct smap external_ids = SMAP_INITIALIZER(&external_ids);
> +            char uuid_s[UUID_LEN + 1];
> +            sprintf(uuid_s, UUID_FMT, UUID_ARGS(&od->nb->header_.uuid));
> +            smap_add(&external_ids, "logical-switch", uuid_s);
> +            sbrec_datapath_binding_set_external_ids(od->sb,
> &external_ids);
> +            smap_destroy(&external_ids);
> +
> +            sbrec_datapath_binding_set_tunnel_key(od->sb, tunnel_key);
> +        }
> +    }
> +
>


We need to destroy the "dp_keys"?





> +    /* Delete southbound records without northbound matches. */
> +    struct ovn_datapath *od, *next;
> +    LIST_FOR_EACH_SAFE (od, next, list, &sb_dps) {
> +        list_remove(&od->list);
> +        sbrec_datapath_binding_delete(od->sb);
> +        ovn_datapath_destroy(dp_map, od);
> +    }
> +}
> +
> +struct ovn_port {
> +    struct hmap_node key_node;  /* Index on 'key'. */
> +    const char *key;            /* nb->name and sb->logical_port */
> +
> +    const struct nbrec_logical_port *nb; /* May be NULL. */
> +    const struct sbrec_port_binding *sb; /* May be NULL. */
> +
> +    struct ovn_datapath *od;
> +
> +    struct ovs_list list;       /* In list of similar records. */
> +};
> +
> +static struct ovn_port *
> +ovn_port_create(struct hmap *port_map, const char *key,
> +                const struct nbrec_logical_port *nb,
> +                const struct sbrec_port_binding *sb)
> +{
> +    struct ovn_port *op = xzalloc(sizeof *op);
> +    op->key = key;
> +    op->sb = sb;
> +    op->nb = nb;
> +    hmap_insert(port_map, &op->key_node, hash_string(op->key, 0));
> +    return op;
> +}
> +
> +static void
> +ovn_port_destroy(struct hmap *port_map, struct ovn_port *port)
> +{
> +    if (port) {
> +        /* Don't remove port->list, it's only safe and only used within
> +         * build_ports(). */
> +        hmap_remove(port_map, &port->key_node);
> +        free(port);
> +    }
> +}
> +
> +static struct ovn_port *
> +ovn_port_find(struct hmap *port_map, const char *name)
> +{
> +    struct ovn_port *op;
> +
> +    HMAP_FOR_EACH_WITH_HASH (op, key_node, hash_string(name, 0),
> port_map) {
> +        if (!strcmp(op->key, name)) {
> +            return op;
> +        }
> +    }
> +    return NULL;
> +}
> +
> +static uint32_t
> +ovn_port_allocate_key(struct ovn_datapath *od)
> +{
> +    return allocate_key(&od->port_keys, "port",
> +                        (1u << 16) - 1, &od->max_port_key);
> +}
> +
> +static void
> +join_logical_ports(struct northd_context *ctx,
> +                   struct hmap *dp_map, struct hmap *port_map,
> +                   struct ovs_list *sb_only, struct ovs_list *nb_only,
> +                   struct ovs_list *both)
> +{
> +    hmap_init(port_map);
> +    list_init(sb_only);
> +    list_init(nb_only);
> +    list_init(both);
> +
> +    const struct sbrec_port_binding *sb;
> +    SBREC_PORT_BINDING_FOR_EACH (sb, ctx->ovnsb_idl) {
> +        struct ovn_port *op = ovn_port_create(port_map, sb->logical_port,
> +                                              NULL, sb);
> +        list_push_back(sb_only, &op->list);
> +    }
> +
> +    struct ovn_datapath *od;
> +    HMAP_FOR_EACH (od, key_node, dp_map) {
> +        for (size_t i = 0; i < od->nb->n_ports; i++) {
> +            const struct nbrec_logical_port *nb = od->nb->ports[i];
> +            struct ovn_port *op = ovn_port_find(port_map, nb->name);
> +            if (op) {
> +                op->nb = nb;
> +                list_remove(&op->list);
> +                list_push_back(both, &op->list);
> +            } else {
> +                op = ovn_port_create(port_map, nb->name, nb, NULL);
> +                list_push_back(nb_only, &op->list);
> +            }
> +            op->od = od;
> +        }
> +    }
> +}
> +
> +static void
> +ovn_port_update_sbrec(const struct ovn_port *op)
> +{
> +    sbrec_port_binding_set_datapath(op->sb, op->od->sb);
> +    sbrec_port_binding_set_parent_port(op->sb, op->nb->parent_name);
> +    sbrec_port_binding_set_tag(op->sb, op->nb->tag, op->nb->n_tag);
> +    sbrec_port_binding_set_mac(op->sb, (const char **) op->nb->macs,
> +                               op->nb->n_macs);
> +}
> +
> +static void
> +build_ports(struct northd_context *ctx, struct hmap *dp_map,
> +            struct hmap *port_map)
> +{
> +    struct ovs_list sb_ports, nb_ports, both_ports;
> +
> +    join_logical_ports(ctx, dp_map, port_map,
> +                       &sb_ports, &nb_ports, &both_ports);
> +
> +    /* For logical ports that are in both databases, update the southbound
> +     * record based on northbound data.  Also index the in-use
> tunnel_keys. */
> +    struct ovn_port *op, *next;
> +    LIST_FOR_EACH_SAFE (op, next, list, &both_ports) {
> +        ovn_port_update_sbrec(op);
> +
> +        add_key(&op->od->port_keys, op->sb->tunnel_key);
> +        if (op->sb->tunnel_key > op->od->max_port_key) {
> +            op->od->max_port_key = op->sb->tunnel_key;
> +        }
> +    }
> +
> +    /* Add southbound record for each unmatched northbound record. */
> +    LIST_FOR_EACH_SAFE (op, next, list, &nb_ports) {
> +        uint16_t tunnel_key = ovn_port_allocate_key(op->od);
> +        if (!tunnel_key) {
> +            continue;
> +        }
> +
> +        op->sb = sbrec_port_binding_insert(ctx->ovnsb_txn);
> +        ovn_port_update_sbrec(op);
> +
> +        sbrec_port_binding_set_logical_port(op->sb, op->key);
> +        sbrec_port_binding_set_tunnel_key(op->sb, tunnel_key);
> +    }
> +
> +    /* Delete southbound records without northbound matches. */
> +    LIST_FOR_EACH_SAFE(op, next, list, &sb_ports) {
> +        list_remove(&op->list);
> +        sbrec_port_binding_delete(op->sb);
> +        ovn_port_destroy(port_map, op);
> +    }
> +}
> +
> +#define OVN_MIN_MULTICAST 32768
> +#define OVN_MAX_MULTICAST 65535
> +
> +struct multicast_group {
> +    const char *name;
> +    uint16_t key;               /* OVN_MIN_MULTICAST...OVN_MAX_MULTICAST.
> */
> +};
> +
> +#define MC_FLOOD "_MC_flood"
> +static const struct multicast_group mc_flood = { MC_FLOOD, 65535 };
> +
> +#define MC_UNKNOWN "_MC_unknown"
> +static const struct multicast_group mc_unknown = { MC_UNKNOWN, 65534 };
> +
> +static bool
> +multicast_group_equal(const struct multicast_group *a,
> +                      const struct multicast_group *b)
> +{
> +    return !strcmp(a->name, b->name) && a->key == b->key;
> +}
> +
> +/* Multicast group entry. */
> +struct ovn_multicast {
> +    struct hmap_node hmap_node; /* Index on 'datapath', 'key', */
> +    struct ovn_datapath *datapath;
> +    const struct multicast_group *group;
> +
> +    struct ovn_port **ports;
> +    size_t n_ports, allocated_ports;
> +};
> +
> +static uint32_t
> +ovn_multicast_hash(const struct ovn_datapath *datapath,
> +                   const struct multicast_group *group)
> +{
> +    return hash_pointer(datapath, group->key);
> +}
> +
> +static struct ovn_multicast *
> +ovn_multicast_find(struct hmap *mcgroups, struct ovn_datapath *datapath,
> +                   const struct multicast_group *group)
> +{
> +    struct ovn_multicast *mc;
> +
> +    HMAP_FOR_EACH_WITH_HASH (mc, hmap_node,
> +                             ovn_multicast_hash(datapath, group),
> mcgroups) {
> +        if (mc->datapath == datapath
> +            && multicast_group_equal(mc->group, group)) {
> +            return mc;
>          }
>      }
> +    return NULL;
> +}
>
> -    free(binding_macs);
> -    free(lport_macs);
> +static void
> +ovn_multicast_add(struct hmap *mcgroups, const struct multicast_group
> *group,
> +                  struct ovn_port *port)
> +{
> +    struct ovn_datapath *od = port->od;
> +    struct ovn_multicast *mc = ovn_multicast_find(mcgroups, od, group);
> +    if (!mc) {
> +        mc = xmalloc(sizeof *mc);
> +        hmap_insert(mcgroups, &mc->hmap_node, ovn_multicast_hash(od,
> group));
> +        mc->datapath = od;
> +        mc->group = group;
> +        mc->n_ports = 0;
> +        mc->allocated_ports = 4;
> +        mc->ports = xmalloc(mc->allocated_ports * sizeof *mc->ports);
> +    }
> +    if (mc->n_ports >= mc->allocated_ports) {
> +        mc->ports = x2nrealloc(mc->ports, &mc->allocated_ports,
> +                               sizeof *mc->ports);
> +    }
> +    mc->ports[mc->n_ports++] = port;
> +}
>
> -    return (i == b_n_macs) ? true : false;
> +static void
> +ovn_multicast_destroy(struct hmap *mcgroups, struct ovn_multicast *mc)
> +{
> +    if (mc) {
> +        hmap_remove(mcgroups, &mc->hmap_node);
> +        free(mc->ports);
> +        free(mc);
> +    }
> +}
> +
> +static void
> +ovn_multicast_update_sbrec(const struct ovn_multicast *mc,
> +                           const struct sbrec_multicast_group *sb)
> +{
> +    struct sbrec_port_binding **ports = xmalloc(mc->n_ports * sizeof
> *ports);
> +    for (size_t i = 0; i < mc->n_ports; i++) {
> +        ports[i] = CONST_CAST(struct sbrec_port_binding *,
> mc->ports[i]->sb);
> +    }
> +    sbrec_multicast_group_set_ports(sb, ports, mc->n_ports);
> +    free(ports);
>  }
>
>  /* Rule generation.
>   *
> - * This code generates the Rule table in the southbound database, as a
> - * function of most of the northbound database.
> + * This code generates the Rule table in the southbound database, as a
> function
> + * of most of the northbound database.
>   */
>
> -/* Enough context to add a Rule row, using rule_add(). */
> -struct rule_ctx {
> -    /* From northd_context. */
> -    struct ovsdb_idl *ovnsb_idl;
> -    struct ovsdb_idl_txn *ovnsb_txn;
> -
> -    /* Contains "struct rule_hash_node"s.  Used to figure out what
> existing
> -     * Rule rows should be deleted: we index all of the Rule rows into
> this
> -     * data structure, then as existing rows are generated we remove them.
> -     * After generating all the rows, any remaining in 'rule_hmap' must be
> -     * deleted from the database. */
> -    struct hmap rule_hmap;
> -};
> +struct ovn_rule {
> +    struct hmap_node hmap_node;
>
> -/* A row in the Rule table, indexed by its full contents, */
> -struct rule_hash_node {
> -    struct hmap_node node;
> -    const struct sbrec_rule *rule;
> +    struct ovn_datapath *od;
> +    enum ovn_pipeline { P_IN, P_OUT } pipeline;
> +    uint8_t table_id;
> +    uint16_t priority;
> +    char *match;
> +    char *actions;
>  };
>
>  static size_t
> -rule_hash(const struct uuid *logical_datapath, uint8_t table_id,
> -          uint16_t priority, const char *match, const char *actions)
> +ovn_rule_hash(const struct ovn_rule *rule)
>  {
> -    size_t hash = uuid_hash(logical_datapath);
> -    hash = hash_2words((table_id << 16) | priority, hash);
> -    hash = hash_string(match, hash);
> -    return hash_string(actions, hash);
> +    size_t hash = uuid_hash(&rule->od->key);
> +    hash = hash_2words((rule->table_id << 16) | rule->priority, hash);
> +    hash = hash_string(rule->match, hash);
> +    return hash_string(rule->actions, hash);
>  }
>
> -static size_t
> -rule_hash_rec(const struct sbrec_rule *rule)
> +static bool
> +ovn_rule_equal(const struct ovn_rule *a, const struct ovn_rule *b)
> +{
> +    return (a->od == b->od
> +            && a->pipeline == b->pipeline
> +            && a->table_id == b->table_id
> +            && a->priority == b->priority
> +            && !strcmp(a->match, b->match)
> +            && !strcmp(a->actions, b->actions));
> +}
> +
> +static void
> +ovn_rule_init(struct ovn_rule *rule, struct ovn_datapath *od,
> +              enum ovn_pipeline pipeline, uint8_t table_id, uint16_t
> priority,
> +              char *match, char *actions)
>  {
> -    return rule_hash(&rule->logical_datapath, rule->table_id,
> -                         rule->priority, rule->match,
> -                         rule->actions);
> +    rule->od = od;
> +    rule->pipeline = pipeline;
> +    rule->table_id = table_id;
> +    rule->priority = priority;
> +    rule->match = match;
> +    rule->actions = actions;
>  }
>
>  /* Adds a row with the specified contents to the Rule table. */
>  static void
> -rule_add(struct rule_ctx *ctx,
> -         const struct nbrec_logical_switch *logical_datapath,
> -         uint8_t table_id,
> -         uint16_t priority,
> -         const char *match,
> -         const char *actions)
> -{
> -    struct rule_hash_node *hash_node;
> -
> -    /* Check whether such a row already exists in the Rule table.  If so,
> -     * remove it from 'ctx->rule_hmap' and we're done. */
> -    HMAP_FOR_EACH_WITH_HASH (hash_node, node,
> -                             rule_hash(&logical_datapath->header_.uuid,
> -                                       table_id, priority, match,
> actions),
> -                             &ctx->rule_hmap) {
> -        const struct sbrec_rule *rule = hash_node->rule;
> -        if (uuid_equals(&rule->logical_datapath,
> -                        &logical_datapath->header_.uuid)
> -            && rule->table_id == table_id
> -            && rule->priority == priority
> -            && !strcmp(rule->match, match)
> -            && !strcmp(rule->actions, actions)) {
> -            hmap_remove(&ctx->rule_hmap, &hash_node->node);
> -            free(hash_node);
> -            return;
> -        }
> -    }
> -
> -    /* No such Rule row.  Add one. */
> -    const struct sbrec_rule *rule;
> -    rule = sbrec_rule_insert(ctx->ovnsb_txn);
> -    sbrec_rule_set_logical_datapath(rule,
> -                                        logical_datapath->header_.uuid);
> -    sbrec_rule_set_table_id(rule, table_id);
> -    sbrec_rule_set_priority(rule, priority);
> -    sbrec_rule_set_match(rule, match);
> -    sbrec_rule_set_actions(rule, actions);
> +rule_add(struct hmap *rule_map, struct ovn_datapath *od,
> +         enum ovn_pipeline pipeline, uint8_t table_id, uint16_t priority,
> +         const char *match, const char *actions)
> +{
> +    struct ovn_rule *rule = xmalloc(sizeof *rule);
> +    ovn_rule_init(rule, od, pipeline, table_id, priority,
> +                  xstrdup(match), xstrdup(actions));
> +    hmap_insert(rule_map, &rule->hmap_node, ovn_rule_hash(rule));
> +}
> +
> +static struct ovn_rule *
> +ovn_rule_find(struct hmap *rules, struct ovn_datapath *od,
> +              enum ovn_pipeline pipeline, uint8_t table_id, uint16_t
> priority,
> +              const char *match, const char *actions)
> +{
> +    struct ovn_rule target;
> +    ovn_rule_init(&target, od, pipeline, table_id, priority,
> +                  CONST_CAST(char *, match), CONST_CAST(char *, actions));
> +
> +    struct ovn_rule *rule;
> +    HMAP_FOR_EACH_WITH_HASH (rule, hmap_node, ovn_rule_hash(&target),
> rules) {
> +        if (ovn_rule_equal(rule, &target)) {
> +            return rule;
> +        }
> +    }
> +    return NULL;
> +}
> +
> +static void
> +ovn_rule_destroy(struct hmap *rules, struct ovn_rule *rule)
> +{
> +    if (rule) {
> +        hmap_remove(rules, &rule->hmap_node);
> +        free(rule->match);
> +        free(rule->actions);
> +        free(rule);
> +    }
>  }
>
>  /* Appends port security constraints on L2 address field 'eth_addr_field'
> @@ -241,376 +666,207 @@ lport_is_enabled(const struct nbrec_logical_port
> *lport)
>      return !lport->enabled || *lport->enabled;
>  }
>
> -/* Updates the Rule table in the OVN_SB database, constructing its
> contents
> - * based on the OVN_NB database. */
> +/* Updates the Rule and Multicast_Group tables in the OVN_SB database,
> + * constructing their contents based on the OVN_NB database. */
>  static void
> -build_rule(struct northd_context *ctx)
> +build_rule(struct northd_context *ctx, struct hmap *datapaths,
> +           struct hmap *ports)
>  {
> -    struct rule_ctx pc = {
> -        .ovnsb_idl = ctx->ovnsb_idl,
> -        .ovnsb_txn = ctx->ovnsb_txn,
> -        .rule_hmap = HMAP_INITIALIZER(&pc.rule_hmap)
> -    };
> +    struct hmap rules = HMAP_INITIALIZER(&rules);
> +    struct hmap mcgroups = HMAP_INITIALIZER(&mcgroups);
>
> -    /* Add all the Rule entries currently in the southbound database to
> -     * 'pc.rule_hmap'.  We remove entries that we generate from the hmap,
> -     * thus by the time we're done only entries that need to be removed
> -     * remain. */
> -    const struct sbrec_rule *rule;
> -    SBREC_RULE_FOR_EACH (rule, ctx->ovnsb_idl) {
> -        struct rule_hash_node *hash_node = xzalloc(sizeof *hash_node);
> -        hash_node->rule = rule;
> -        hmap_insert(&pc.rule_hmap, &hash_node->node,
> -                    rule_hash_rec(rule));
> -    }
> -
> -    /* Table 0: Admission control framework. */
> -    const struct nbrec_logical_switch *lswitch;
> -    NBREC_LOGICAL_SWITCH_FOR_EACH (lswitch, ctx->ovnnb_idl) {
> +    /* Ingress table 0: Admission control framework. */
> +    struct ovn_datapath *od;
> +    HMAP_FOR_EACH (od, key_node, datapaths) {
>          /* Logical VLANs not supported. */
> -        rule_add(&pc, lswitch, 0, 100, "vlan.present", "drop;");
> +        rule_add(&rules, od, P_IN, 0, 100, "vlan.present", "drop;");
>
>          /* Broadcast/multicast source address is invalid. */
> -        rule_add(&pc, lswitch, 0, 100, "eth.src[40]", "drop;");
> +        rule_add(&rules, od, P_IN, 0, 100, "eth.src[40]", "drop;");
>
>          /* Port security flows have priority 50 (see below) and will
> continue
>           * to the next table if packet source is acceptable. */
>
>          /* Otherwise drop the packet. */
> -        rule_add(&pc, lswitch, 0, 0, "1", "drop;");
> +        rule_add(&rules, od, P_IN, 0, 0, "1", "drop;");
>      }
>
> -    /* Table 0: Ingress port security. */
> -    NBREC_LOGICAL_SWITCH_FOR_EACH (lswitch, ctx->ovnnb_idl) {
> -        for (size_t i = 0; i < lswitch->n_ports; i++) {
> -            const struct nbrec_logical_port *lport = lswitch->ports[i];
> -            struct ds match = DS_EMPTY_INITIALIZER;
> -            ds_put_cstr(&match, "inport == ");
> -            json_string_escape(lport->name, &match);
> -            build_port_security("eth.src",
> -                                lport->port_security,
> lport->n_port_security,
> -                                &match);
> -            rule_add(&pc, lswitch, 0, 50, ds_cstr(&match),
> -                     lport_is_enabled(lport) ? "next;" : "drop;");
> -            ds_destroy(&match);
> -        }
> +    /* Ingress table 0: Ingress port security. */
> +    struct ovn_port *op;
> +    HMAP_FOR_EACH (op, key_node, ports) {
> +        struct ds match = DS_EMPTY_INITIALIZER;
> +        ds_put_cstr(&match, "inport == ");
> +        json_string_escape(op->key, &match);
> +        build_port_security("eth.src",
> +                            op->nb->port_security,
> op->nb->n_port_security,
> +                            &match);
> +        rule_add(&rules, op->od, P_IN, 0, 50, ds_cstr(&match),
> +                 lport_is_enabled(op->nb) ? "next;" : "drop;");
> +        ds_destroy(&match);
>      }
>
> -    /* Table 1: Destination lookup:
> -     *
> -     *   - Broadcast and multicast handling (priority 100).
> -     *   - Unicast handling (priority 50).
> -     *   - Unknown unicast address handling (priority 0).
> -     *   */
> -    NBREC_LOGICAL_SWITCH_FOR_EACH (lswitch, ctx->ovnnb_idl) {
> -        struct ds bcast;        /* Actions for broadcast on 'lswitch'. */
> -        struct ds unknown;      /* Actions for unknown MACs on 'lswitch'.
> */
> -
> -        ds_init(&bcast);
> -        ds_init(&unknown);
> -        for (size_t i = 0; i < lswitch->n_ports; i++) {
> -            const struct nbrec_logical_port *lport = lswitch->ports[i];
> -
> -            ds_put_cstr(&bcast, "outport = ");
> -            json_string_escape(lport->name, &bcast);
> -            ds_put_cstr(&bcast, "; next; ");
> -
> -            for (size_t j = 0; j < lport->n_macs; j++) {
> -                const char *s = lport->macs[j];
> -                uint8_t mac[ETH_ADDR_LEN];
> -
> -                if (eth_addr_from_string(s, mac)) {
> -                    struct ds match, unicast;
> -
> -                    ds_init(&match);
> -                    ds_put_format(&match, "eth.dst == %s", s);
> -
> -                    ds_init(&unicast);
> -                    ds_put_cstr(&unicast, "outport = ");
> -                    json_string_escape(lport->name, &unicast);
> -                    ds_put_cstr(&unicast, "; next;");
> -                    rule_add(&pc, lswitch, 1, 50,
> -                             ds_cstr(&match), ds_cstr(&unicast));
> -                    ds_destroy(&unicast);
> -                    ds_destroy(&match);
> -                } else if (!strcmp(s, "unknown")) {
> -                    ds_put_cstr(&unknown, "outport = ");
> -                    json_string_escape(lport->name, &unknown);
> -                    ds_put_cstr(&unknown, "; next; ");
> -                } else {
> -                    static struct vlog_rate_limit rl =
> VLOG_RATE_LIMIT_INIT(1, 1);
> -
> -                    VLOG_INFO_RL(&rl, "%s: invalid syntax '%s' in macs
> column",
> -                                 lport->name, s);
> -                }
> -            }
> -        }
> -
> -        ds_chomp(&bcast, ' ');
> -        rule_add(&pc, lswitch, 1, 100, "eth.dst[40]", ds_cstr(&bcast));
> -        ds_destroy(&bcast);
> -
> -        if (unknown.length) {
> -            ds_chomp(&unknown, ' ');
> -            rule_add(&pc, lswitch, 1, 0, "1", ds_cstr(&unknown));
> +    /* Ingress table 1: Destination lookup, broadcast and multicast
> handling
> +     * (priority 100). */
> +    HMAP_FOR_EACH (op, key_node, ports) {
> +        if (lport_is_enabled(op->nb)) {
> +            ovn_multicast_add(&mcgroups, &mc_flood, op);
>          }
> -        ds_destroy(&unknown);
> +    }
> +    HMAP_FOR_EACH (od, key_node, datapaths) {
> +        rule_add(&rules, od, P_IN, 1, 100, "eth.dst[40]",
> +                 "outport = \""MC_FLOOD"\"; output;");
>      }
>
> -    /* Table 2: ACLs. */
> -    NBREC_LOGICAL_SWITCH_FOR_EACH (lswitch, ctx->ovnnb_idl) {
> -        for (size_t i = 0; i < lswitch->n_acls; i++) {
> -            const struct nbrec_acl *acl = lswitch->acls[i];
> +    /* Ingress table 1: Destination lookup, unicast handling (priority
> 50), */
> +    HMAP_FOR_EACH (op, key_node, ports) {
> +        for (size_t i = 0; i < op->nb->n_macs; i++) {
> +            uint8_t mac[ETH_ADDR_LEN];
> +
> +            if (eth_addr_from_string(op->nb->macs[i], mac)) {
> +                struct ds match, actions;
> +
> +                ds_init(&match);
> +                ds_put_format(&match, "eth.dst == %s", op->nb->macs[i]);
> +
> +                ds_init(&actions);
> +                ds_put_cstr(&actions, "outport = ");
> +                json_string_escape(op->nb->name, &actions);
> +                ds_put_cstr(&actions, "; output;");
> +                rule_add(&rules, op->od, P_IN, 1, 50,
> +                         ds_cstr(&match), ds_cstr(&actions));
> +                ds_destroy(&actions);
> +                ds_destroy(&match);
> +            } else if (!strcmp(op->nb->macs[i], "unknown")) {
> +                ovn_multicast_add(&mcgroups, &mc_unknown, op);
> +                op->od->has_unknown = true;
> +            } else {
> +                static struct vlog_rate_limit rl =
> VLOG_RATE_LIMIT_INIT(1, 1);
>
> -            NBREC_ACL_FOR_EACH (acl, ctx->ovnnb_idl) {
> -                rule_add(&pc, lswitch, 2, acl->priority, acl->match,
> -                         (!strcmp(acl->action, "allow") ||
> -                          !strcmp(acl->action, "allow-related")
> -                          ? "next;" : "drop;"));
> +                VLOG_INFO_RL(&rl, "%s: invalid syntax '%s' in macs
> column",
> +                             op->nb->name, op->nb->macs[i]);
>              }
>          }
> -
> -        rule_add(&pc, lswitch, 2, 0, "1", "next;");
>      }
>
> -    /* Table 3: Egress port security. */
> -    NBREC_LOGICAL_SWITCH_FOR_EACH (lswitch, ctx->ovnnb_idl) {
> -        rule_add(&pc, lswitch, 3, 100, "eth.dst[40]", "output;");
> -
> -        for (size_t i = 0; i < lswitch->n_ports; i++) {
> -            const struct nbrec_logical_port *lport = lswitch->ports[i];
> -            struct ds match;
> -
> -            ds_init(&match);
> -            ds_put_cstr(&match, "outport == ");
> -            json_string_escape(lport->name, &match);
> -            build_port_security("eth.dst",
> -                                lport->port_security,
> lport->n_port_security,
> -                                &match);
> -
> -            rule_add(&pc, lswitch, 3, 50, ds_cstr(&match),
> -                         lport_is_enabled(lport) ? "output;" : "drop;");
> -
> -            ds_destroy(&match);
> +    /* Ingress table 1: Destination lookup for unknown MACs (priority 0).
> */
> +    HMAP_FOR_EACH (od, key_node, datapaths) {
> +        if (od->has_unknown) {
> +            rule_add(&rules, od, P_IN, 1, 0, "1",
> +                     "outport = \""MC_UNKNOWN"\"; output;");
>          }
>      }
>
> -    /* Delete any existing Rule rows that were not re-generated.  */
> -    struct rule_hash_node *hash_node, *next_hash_node;
> -    HMAP_FOR_EACH_SAFE (hash_node, next_hash_node, node, &pc.rule_hmap) {
> -        hmap_remove(&pc.rule_hmap, &hash_node->node);
> -        sbrec_rule_delete(hash_node->rule);
> -        free(hash_node);
> -    }
> -    hmap_destroy(&pc.rule_hmap);
> -}
> -
> -static bool
> -parents_equal(const struct sbrec_port_binding *binding,
> -              const struct nbrec_logical_port *lport)
> -{
> -    if (!!binding->parent_port != !!lport->parent_name) {
> -        /* One is set and the other is not. */
> -        return false;
> -    }
> +    /* Egress table 0: ACLs. */
> +    HMAP_FOR_EACH (od, key_node, datapaths) {
> +        for (size_t i = 0; i < od->nb->n_acls; i++) {
> +            const struct nbrec_acl *acl = od->nb->acls[i];
> +            const char *action;
>
> -    if (binding->parent_port) {
> -        /* Both are set. */
> -        return strcmp(binding->parent_port, lport->parent_name) ? false :
> true;
> +            action = (!strcmp(acl->action, "allow") ||
> +                      !strcmp(acl->action, "allow-related"))
> +                ? "next;" : "drop;";
> +            rule_add(&rules, od, P_OUT, 0, acl->priority, acl->match,
> action);
> +        }
>      }
> -
> -    /* Both are NULL. */
> -    return true;
> -}
> -
> -static bool
> -tags_equal(const struct sbrec_port_binding *binding,
> -           const struct nbrec_logical_port *lport)
> -{
> -    if (binding->n_tag != lport->n_tag) {
> -        return false;
> +    HMAP_FOR_EACH (od, key_node, datapaths) {
> +        rule_add(&rules, od, P_OUT, 0, 0, "1", "next;");
>      }
>
> -    return binding->n_tag ? (binding->tag[0] == lport->tag[0]) : true;
> -}
> +    /* Egress table 1: Egress port security. */
> +    HMAP_FOR_EACH (od, key_node, datapaths) {
> +        rule_add(&rules, od, P_OUT, 1, 100, "eth.dst[40]", "output;");
> +    }
> +    HMAP_FOR_EACH (op, key_node, ports) {
> +        struct ds match;
>
> -struct port_binding_hash_node {
> -    struct hmap_node lp_node; /* In 'lp_map', by binding->logical_port. */
> -    struct hmap_node tk_node; /* In 'tk_map', by binding->tunnel_key. */
> -    const struct sbrec_port_binding *binding;
> -};
> +        ds_init(&match);
> +        ds_put_cstr(&match, "outport == ");
> +        json_string_escape(op->key, &match);
> +        build_port_security("eth.dst",
> +                            op->nb->port_security,
> op->nb->n_port_security,
> +                            &match);
>
> -static bool
> -tunnel_key_in_use(const struct hmap *tk_hmap, uint16_t tunnel_key)
> -{
> -    const struct port_binding_hash_node *hash_node;
> +        rule_add(&rules, op->od, P_OUT, 1, 50, ds_cstr(&match),
> +                 lport_is_enabled(op->nb) ? "output;" : "drop;");
>
> -    HMAP_FOR_EACH_IN_BUCKET (hash_node, tk_node, hash_int(tunnel_key, 0),
> -                             tk_hmap) {
> -        if (hash_node->binding->tunnel_key == tunnel_key) {
> -            return true;
> -        }
> +        ds_destroy(&match);
>      }
> -    return false;
> -}
> -
> -/* Chooses and returns a positive tunnel key that is not already in use in
> - * 'tk_hmap'.  Returns 0 if all tunnel keys are in use. */
> -static uint16_t
> -choose_tunnel_key(const struct hmap *tk_hmap)
> -{
> -    static uint16_t prev;
>
> -    for (uint16_t key = prev + 1; key != prev; key++) {
> -        if (!tunnel_key_in_use(tk_hmap, key)) {
> -            prev = key;
> -            return key;
> +    /* Push changes to the Rule table to database. */
> +    const struct sbrec_rule *sbrule, *next_sbrule;
> +    SBREC_RULE_FOR_EACH_SAFE (sbrule, next_sbrule, ctx->ovnsb_idl) {
> +        struct ovn_datapath *od
> +            = ovn_datapath_from_sbrec(datapaths,
> sbrule->logical_datapath);
> +        if (!od) {
> +            sbrec_rule_delete(sbrule);
> +            continue;
>          }
> -    }
> -
> -    static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
> -    VLOG_WARN_RL(&rl, "all tunnel keys exhausted");
> -    return 0;
> -}
> -
> -/*
> - * When a change has occurred in the OVN_Northbound database, we go
> through and
> - * make sure that the contents of the Port_Binding table in the
> OVN_Southbound
> - * database are up to date with the logical ports defined in the
> - * OVN_Northbound database.
> - */
> -static void
> -set_port_bindings(struct northd_context *ctx)
> -{
> -    const struct sbrec_port_binding *binding;
>
> -    /*
> -     * We will need to look up a port binding for every logical port.  We
> don't
> -     * want to have to do an O(n) search for every binding, so start out
> by
> -     * hashing them on the logical port.
> -     *
> -     * As we go through every logical port, we will update the binding if
> it
> -     * exists or create one otherwise.  When the update is done, we'll
> remove
> -     * it from the hashmap.  At the end, any bindings left in the hashmap
> are
> -     * for logical ports that have been deleted.
> -     *
> -     * We index the logical_port column because that's the shared key
> between
> -     * the OVN_NB and OVN_SB databases.  We index the tunnel_key column to
> -     * allow us to choose a unique tunnel key for any Port_Binding rows
> we have
> -     * to add.
> -     */
> -    struct hmap lp_hmap = HMAP_INITIALIZER(&lp_hmap);
> -    struct hmap tk_hmap = HMAP_INITIALIZER(&tk_hmap);
> -
> -    SBREC_PORT_BINDING_FOR_EACH(binding, ctx->ovnsb_idl) {
> -        struct port_binding_hash_node *hash_node = xzalloc(sizeof
> *hash_node);
> -        hash_node->binding = binding;
> -        hmap_insert(&lp_hmap, &hash_node->lp_node,
> -                    hash_string(binding->logical_port, 0));
> -        hmap_insert(&tk_hmap, &hash_node->tk_node,
> -                    hash_int(binding->tunnel_key, 0));
> -    }
> -
> -    const struct nbrec_logical_switch *lswitch;
> -    NBREC_LOGICAL_SWITCH_FOR_EACH (lswitch, ctx->ovnnb_idl) {
> -        const struct uuid *logical_datapath = &lswitch->header_.uuid;
> -
> -        for (size_t i = 0; i < lswitch->n_ports; i++) {
> -            const struct nbrec_logical_port *lport = lswitch->ports[i];
> -            struct port_binding_hash_node *hash_node;
> -            binding = NULL;
> -            HMAP_FOR_EACH_WITH_HASH(hash_node, lp_node,
> -                                    hash_string(lport->name, 0),
> &lp_hmap) {
> -                if (!strcmp(lport->name,
> hash_node->binding->logical_port)) {
> -                    binding = hash_node->binding;
> -                    break;
> -                }
> -            }
> -
> -            if (binding) {
> -                /* We found an existing binding for this logical port.
> Update
> -                 * its contents. */
> -
> -                hmap_remove(&lp_hmap, &hash_node->lp_node);
> -
> -                if (!macs_equal(binding->mac, binding->n_mac,
> -                                lport->macs, lport->n_macs)) {
> -                    sbrec_port_binding_set_mac(binding,
> -                                               (const char **)
> lport->macs,
> -                                               lport->n_macs);
> -                }
> -                if (!parents_equal(binding, lport)) {
> -                    sbrec_port_binding_set_parent_port(binding,
> -
>  lport->parent_name);
> -                }
> -                if (!tags_equal(binding, lport)) {
> -                    sbrec_port_binding_set_tag(binding,
> -                                               lport->tag, lport->n_tag);
> -                }
> -                if (!uuid_equals(&binding->logical_datapath,
> -                                 logical_datapath)) {
> -                    sbrec_port_binding_set_logical_datapath(binding,
> -
> *logical_datapath);
> -                }
> -            } else {
> -                /* There is no binding for this logical port, so create
> one. */
> -
> -                uint16_t tunnel_key = choose_tunnel_key(&tk_hmap);
> -                if (!tunnel_key) {
> -                    continue;
> -                }
> -
> -                binding = sbrec_port_binding_insert(ctx->ovnsb_txn);
> -                sbrec_port_binding_set_logical_port(binding, lport->name);
> -                sbrec_port_binding_set_mac(binding,
> -                                           (const char **) lport->macs,
> -                                           lport->n_macs);
> -                if (lport->parent_name && lport->n_tag > 0) {
> -                    sbrec_port_binding_set_parent_port(binding,
> -
>  lport->parent_name);
> -                    sbrec_port_binding_set_tag(binding,
> -                                               lport->tag, lport->n_tag);
> -                }
> -
> -                sbrec_port_binding_set_tunnel_key(binding, tunnel_key);
> -                sbrec_port_binding_set_logical_datapath(binding,
> -
> *logical_datapath);
> -
> -                /* Add the tunnel key to the tk_hmap so that we don't try
> to
> -                 * use it for another port.  (We don't want it in the
> lp_hmap
> -                 * because that would just get the Binding record deleted
> -                 * later.) */
> -                struct port_binding_hash_node *hash_node
> -                    = xzalloc(sizeof *hash_node);
> -                hash_node->binding = binding;
> -                hmap_insert(&tk_hmap, &hash_node->tk_node,
> -                            hash_int(binding->tunnel_key, 0));
> -            }
> +        struct ovn_rule *rule = ovn_rule_find(
> +            &rules, od, (!strcmp(sbrule->pipeline, "ingress") ? P_IN :
> P_OUT),
> +            sbrule->table_id, sbrule->priority,
> +            sbrule->match, sbrule->actions);
> +        if (rule) {
> +            ovn_rule_destroy(&rules, rule);
> +        } else {
> +            sbrec_rule_delete(sbrule);
>          }
>      }
> -
> -    struct port_binding_hash_node *hash_node;
> -    HMAP_FOR_EACH (hash_node, lp_node, &lp_hmap) {
> -        hmap_remove(&lp_hmap, &hash_node->lp_node);
> -        sbrec_port_binding_delete(hash_node->binding);
> +    struct ovn_rule *rule, *next_rule;
> +    HMAP_FOR_EACH_SAFE (rule, next_rule, hmap_node, &rules) {
> +        sbrule = sbrec_rule_insert(ctx->ovnsb_txn);
> +        sbrec_rule_set_logical_datapath(sbrule, rule->od->sb);
> +        sbrec_rule_set_pipeline(sbrule,
> +                                rule->pipeline == P_IN ? "ingress" :
> "egress");
> +        sbrec_rule_set_table_id(sbrule, rule->table_id);
> +        sbrec_rule_set_priority(sbrule, rule->priority);
> +        sbrec_rule_set_match(sbrule, rule->match);
> +        sbrec_rule_set_actions(sbrule, rule->actions);
> +        ovn_rule_destroy(&rules, rule);
>      }
> -    hmap_destroy(&lp_hmap);
> +    hmap_destroy(&rules);
> +
> +    /* Push changes to the Multicast_Group table to database. */
> +    const struct sbrec_multicast_group *sbmc, *next_sbmc;
> +    SBREC_MULTICAST_GROUP_FOR_EACH_SAFE (sbmc, next_sbmc, ctx->ovnsb_idl)
> {
> +        struct ovn_datapath *od = ovn_datapath_from_sbrec(datapaths,
> +                                                          sbmc->datapath);
> +        if (!od) {
> +            sbrec_multicast_group_delete(sbmc);
> +            continue;
> +        }
>
> -    struct port_binding_hash_node *hash_node_next;
> -    HMAP_FOR_EACH_SAFE (hash_node, hash_node_next, tk_node, &tk_hmap) {
> -        hmap_remove(&tk_hmap, &hash_node->tk_node);
> -        free(hash_node);
> +        struct multicast_group group = { .name = sbmc->name,
> +                                         .key = sbmc->tunnel_key };
> +        struct ovn_multicast *mc = ovn_multicast_find(&mcgroups, od,
> &group);
> +        if (mc) {
> +            ovn_multicast_update_sbrec(mc, sbmc);
> +            ovn_multicast_destroy(&mcgroups, mc);
> +        } else {
> +            sbrec_multicast_group_delete(sbmc);
> +        }
> +    }
> +    struct ovn_multicast *mc, *next_mc;
> +    HMAP_FOR_EACH_SAFE (mc, next_mc, hmap_node, &mcgroups) {
> +        sbmc = sbrec_multicast_group_insert(ctx->ovnsb_txn);
> +        sbrec_multicast_group_set_datapath(sbmc, mc->datapath->sb);
> +        sbrec_multicast_group_set_name(sbmc, mc->group->name);
> +        sbrec_multicast_group_set_tunnel_key(sbmc, mc->group->key);
> +        ovn_multicast_update_sbrec(mc, sbmc);
> +        ovn_multicast_destroy(&mcgroups, mc);
>      }
> -    hmap_destroy(&tk_hmap);
> +    hmap_destroy(&mcgroups);
>  }
> -
> +
>  static void
>  ovnnb_db_changed(struct northd_context *ctx)
>  {
>      VLOG_DBG("ovn-nb db contents have changed.");
>
> -    set_port_bindings(ctx);
> -    build_rule(ctx);
> +    struct hmap datapaths, ports;
> +    build_datapaths(ctx, &datapaths);
> +    build_ports(ctx, &datapaths, &ports);
> +    build_rule(ctx, &datapaths, &ports);
>  }
>
>  /*
> @@ -622,48 +878,48 @@ static void
>  ovnsb_db_changed(struct northd_context *ctx)
>  {
>      struct hmap lports_hmap;
> -    const struct sbrec_port_binding *binding;
> -    const struct nbrec_logical_port *lport;
> +    const struct sbrec_port_binding *sb;
> +    const struct nbrec_logical_port *nb;
>
>      struct lport_hash_node {
>          struct hmap_node node;
> -        const struct nbrec_logical_port *lport;
> +        const struct nbrec_logical_port *nb;
>      } *hash_node, *hash_node_next;
>
>      VLOG_DBG("Recalculating port up states for ovn-nb db.");
>
>      hmap_init(&lports_hmap);
>
> -    NBREC_LOGICAL_PORT_FOR_EACH(lport, ctx->ovnnb_idl) {
> +    NBREC_LOGICAL_PORT_FOR_EACH(nb, ctx->ovnnb_idl) {
>          hash_node = xzalloc(sizeof *hash_node);
> -        hash_node->lport = lport;
> -        hmap_insert(&lports_hmap, &hash_node->node,
> -                hash_string(lport->name, 0));
> +        hash_node->nb = nb;
> +        hmap_insert(&lports_hmap, &hash_node->node, hash_string(nb->name,
> 0));
>      }
>
> -    SBREC_PORT_BINDING_FOR_EACH(binding, ctx->ovnsb_idl) {
> -        lport = NULL;
> +    SBREC_PORT_BINDING_FOR_EACH(sb, ctx->ovnsb_idl) {
> +        nb = NULL;
>          HMAP_FOR_EACH_WITH_HASH(hash_node, node,
> -                hash_string(binding->logical_port, 0), &lports_hmap) {
> -            if (!strcmp(binding->logical_port, hash_node->lport->name)) {
> -                lport = hash_node->lport;
> +                                hash_string(sb->logical_port, 0),
> +                                &lports_hmap) {
> +            if (!strcmp(sb->logical_port, hash_node->nb->name)) {
> +                nb = hash_node->nb;
>                  break;
>              }
>          }
>
> -        if (!lport) {
> +        if (!nb) {
>              /* The logical port doesn't exist for this port binding.
> This can
>               * happen under normal circumstances when ovn-northd hasn't
> gotten
>               * around to pruning the Port_Binding yet. */
>              continue;
>          }
>
> -        if (binding->chassis && (!lport->up || !*lport->up)) {
> +        if (sb->chassis && (!nb->up || !*nb->up)) {
>              bool up = true;
> -            nbrec_logical_port_set_up(lport, &up, 1);
> -        } else if (!binding->chassis && (!lport->up || *lport->up)) {
> +            nbrec_logical_port_set_up(nb, &up, 1);
> +        } else if (!sb->chassis && (!nb->up || *nb->up)) {
>              bool up = false;
> -            nbrec_logical_port_set_up(lport, &up, 1);
> +            nbrec_logical_port_set_up(nb, &up, 1);
>          }
>      }
>
> @@ -753,6 +1009,14 @@ parse_options(int argc OVS_UNUSED, char *argv[]
> OVS_UNUSED)
>      free(short_options);
>  }
>
> +static void
> +add_column_noalert(struct ovsdb_idl *idl,
> +                   const struct ovsdb_idl_column *column)
> +{
> +    ovsdb_idl_add_column(idl, column);
> +    ovsdb_idl_omit_alert(idl, column);
> +}
> +
>  int
>  main(int argc, char *argv[])
>  {
> @@ -792,28 +1056,35 @@ main(int argc, char *argv[])
>      ctx.ovnnb_idl = ovnnb_idl = ovsdb_idl_create(ovnnb_db,
>              &nbrec_idl_class, true, true);
>
> -    /* There is only a small subset of changes to the ovn-sb db that
> ovn-northd
> -     * has to care about, so we'll enable monitoring those directly. */
>      ctx.ovnsb_idl = ovnsb_idl = ovsdb_idl_create(ovnsb_db,
>              &sbrec_idl_class, false, true);
> +
> +    ovsdb_idl_add_table(ovnsb_idl, &sbrec_table_rule);
> +    add_column_noalert(ovnsb_idl, &sbrec_rule_col_logical_datapath);
> +    add_column_noalert(ovnsb_idl, &sbrec_rule_col_pipeline);
> +    add_column_noalert(ovnsb_idl, &sbrec_rule_col_table_id);
> +    add_column_noalert(ovnsb_idl, &sbrec_rule_col_priority);
> +    add_column_noalert(ovnsb_idl, &sbrec_rule_col_match);
> +    add_column_noalert(ovnsb_idl, &sbrec_rule_col_actions);
> +
> +    ovsdb_idl_add_table(ovnsb_idl, &sbrec_table_multicast_group);
> +    add_column_noalert(ovnsb_idl, &sbrec_multicast_group_col_datapath);
> +    add_column_noalert(ovnsb_idl, &sbrec_multicast_group_col_tunnel_key);
> +    add_column_noalert(ovnsb_idl, &sbrec_multicast_group_col_name);
> +    add_column_noalert(ovnsb_idl, &sbrec_multicast_group_col_ports);
> +
> +    ovsdb_idl_add_table(ovnsb_idl, &sbrec_table_datapath_binding);
> +    add_column_noalert(ovnsb_idl, &sbrec_datapath_binding_col_tunnel_key);
> +    add_column_noalert(ovnsb_idl,
> &sbrec_datapath_binding_col_external_ids);
> +
>      ovsdb_idl_add_table(ovnsb_idl, &sbrec_table_port_binding);
> -    ovsdb_idl_add_column(ovnsb_idl, &sbrec_port_binding_col_logical_port);
> +    add_column_noalert(ovnsb_idl, &sbrec_port_binding_col_datapath);
> +    add_column_noalert(ovnsb_idl, &sbrec_port_binding_col_logical_port);
> +    add_column_noalert(ovnsb_idl, &sbrec_port_binding_col_tunnel_key);
> +    add_column_noalert(ovnsb_idl, &sbrec_port_binding_col_parent_port);
> +    add_column_noalert(ovnsb_idl, &sbrec_port_binding_col_tag);
>      ovsdb_idl_add_column(ovnsb_idl, &sbrec_port_binding_col_chassis);
> -    ovsdb_idl_add_column(ovnsb_idl, &sbrec_port_binding_col_mac);
> -    ovsdb_idl_add_column(ovnsb_idl, &sbrec_port_binding_col_tag);
> -    ovsdb_idl_add_column(ovnsb_idl, &sbrec_port_binding_col_parent_port);
> -    ovsdb_idl_add_column(ovnsb_idl,
> &sbrec_port_binding_col_logical_datapath);
> -    ovsdb_idl_add_column(ovnsb_idl, &sbrec_port_binding_col_tunnel_key);
> -    ovsdb_idl_add_column(ovnsb_idl, &sbrec_rule_col_logical_datapath);
> -    ovsdb_idl_omit_alert(ovnsb_idl, &sbrec_rule_col_logical_datapath);
> -    ovsdb_idl_add_column(ovnsb_idl, &sbrec_rule_col_table_id);
> -    ovsdb_idl_omit_alert(ovnsb_idl, &sbrec_rule_col_table_id);
> -    ovsdb_idl_add_column(ovnsb_idl, &sbrec_rule_col_priority);
> -    ovsdb_idl_omit_alert(ovnsb_idl, &sbrec_rule_col_priority);
> -    ovsdb_idl_add_column(ovnsb_idl, &sbrec_rule_col_match);
> -    ovsdb_idl_omit_alert(ovnsb_idl, &sbrec_rule_col_match);
> -    ovsdb_idl_add_column(ovnsb_idl, &sbrec_rule_col_actions);
> -    ovsdb_idl_omit_alert(ovnsb_idl, &sbrec_rule_col_actions);
> +    add_column_noalert(ovnsb_idl, &sbrec_port_binding_col_mac);
>
>      /*
>       * The loop here just runs the IDL in a loop waiting for the seqno to
> diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
> index 0334d82..0af96a0 100644
> --- a/ovn/ovn-architecture.7.xml
> +++ b/ovn/ovn-architecture.7.xml
> @@ -98,7 +98,7 @@
>          OVN/CMS Plugin.  The database schema is meant to be ``impedance
>          matched'' with the concepts used in a CMS, so that it directly
> supports
>          notions of logical switches, routers, ACLs, and so on.  See
> -        <code>ovs-nb</code>(5) for details.
> +        <code>ovn-nb</code>(5) for details.
>        </p>
>
>        <p>
> @@ -343,22 +343,21 @@
>      </li>
>
>      <li>
> -      <code>ovn-northd</code> receives the OVN Northbound database update.
> -      In turn, it makes the corresponding updates to the OVN Southbound
> -      database, by adding rows to the OVN Southbound database
> -      <code>Rule</code> table to reflect the new port, e.g. add a
> -      flow to recognize that packets destined to the new port's MAC
> -      address should be delivered to it, and update the flow that
> -      delivers broadcast and multicast packets to include the new port.
> -      It also creates a record in the <code>Binding</code> table and
> +      <code>ovn-northd</code> receives the OVN Northbound database
> update.  In
> +      turn, it makes the corresponding updates to the OVN Southbound
> database,
> +      by adding rows to the OVN Southbound database <code>Rule</code>
> table to
> +      reflect the new port, e.g. add a flow to recognize that packets
> destined
> +      to the new port's MAC address should be delivered to it, and update
> the
> +      flow that delivers broadcast and multicast packets to include the
> new
> +      port.  It also creates a record in the <code>Binding</code> table
> and
>        populates all its columns except the column that identifies the
>        <code>chassis</code>.
>      </li>
>
>      <li>
>        On every hypervisor, <code>ovn-controller</code> receives the
> -      <code>Rule</code> table updates that <code>ovn-northd</code> made
> -      in the previous step.  As long as the VM that owns the VIF is
> powered off,
> +      <code>Rule</code> table updates that <code>ovn-northd</code> made
> in the
> +      previous step.  As long as the VM that owns the VIF is powered off,
>        <code>ovn-controller</code> cannot do much; it cannot, for example,
>        arrange to send packets to or receive packets from the VIF, because
> the
>        VIF does not actually exist anywhere.
> @@ -404,8 +403,8 @@
>        <code>Binding</code> table.  This provides
> <code>ovn-controller</code>
>        the physical location of the logical port, so each instance updates
> the
>        OpenFlow tables of its switch (based on logical datapath flows in
> the OVN
> -      DB <code>Rule</code> table) so that packets to and from the VIF can
> -      be properly handled via tunnels.
> +      DB <code>Rule</code> table) so that packets to and from the VIF can
> be
> +      properly handled via tunnels.
>      </li>
>
>      <li>
> @@ -442,17 +441,16 @@
>
>      <li>
>        <code>ovn-northd</code> receives the OVN Northbound update and in
> turn
> -      updates the OVN Southbound database accordingly, by removing or
> -      updating the rows from the OVN Southbound database
> -      <code>Rule</code> table and <code>Binding</code> table that
> -      were related to the now-destroyed VIF.
> +      updates the OVN Southbound database accordingly, by removing or
> updating
> +      the rows from the OVN Southbound database <code>Rule</code> table
> and
> +      <code>Binding</code> table that were related to the now-destroyed
> VIF.
>      </li>
>
>      <li>
>        On every hypervisor, <code>ovn-controller</code> receives the
> -      <code>Rule</code> table updates that <code>ovn-northd</code> made
> -      in the previous step.  <code>ovn-controller</code> updates OpenFlow
> tables
> -      to reflect the update, although there may not be much to do, since
> the VIF
> +      <code>Rule</code> table updates that <code>ovn-northd</code> made
> in the
> +      previous step.  <code>ovn-controller</code> updates OpenFlow tables
> to
> +      reflect the update, although there may not be much to do, since the
> VIF
>        had already become unreachable when it was removed from the
>        <code>Binding</code> table in a previous step.
>      </li>
> @@ -538,13 +536,12 @@
>      </li>
>
>      <li>
> -      <code>ovn-northd</code> receives the OVN Northbound database update.
> -      In turn, it makes the corresponding updates to the OVN Southbound
> -      database, by adding rows to the OVN Southbound database's
> -      <code>Rule</code> table to reflect the new port and also by
> -      creating a new row in the <code>Binding</code> table and
> -      populating all its columns except the column that identifies the
> -      <code>chassis</code>.
> +      <code>ovn-northd</code> receives the OVN Northbound database
> update.  In
> +      turn, it makes the corresponding updates to the OVN Southbound
> database,
> +      by adding rows to the OVN Southbound database's <code>Rule</code>
> table
> +      to reflect the new port and also by creating a new row in the
> +      <code>Binding</code> table and populating all its columns except the
> +      column that identifies the <code>chassis</code>.
>      </li>
>
>      <li>
> @@ -580,11 +577,10 @@
>
>      <li>
>        <code>ovn-northd</code> receives the OVN Northbound update and in
> turn
> -      updates the OVN Southbound database accordingly, by removing or
> -      updating the rows from the OVN Southbound database
> -      <code>Rule</code> table that were related to the now-destroyed
> -      CIF.  It also deletes the row in the <code>Binding</code> table
> -      for that CIF.
> +      updates the OVN Southbound database accordingly, by removing or
> updating
> +      the rows from the OVN Southbound database <code>Rule</code> table
> that
> +      were related to the now-destroyed CIF.  It also deletes the row in
> the
> +      <code>Binding</code> table for that CIF.
>      </li>
>
>      <li>
> @@ -595,57 +591,304 @@
>      </li>
>    </ol>
>
> -  <h1>Design Decisions</h1>
> +  <h2>Life Cycle of a Packet</h2>
>
> -  <h2>Supported Tunnel Encapsulations</h2>
>    <p>
> -    For connecting hypervisors to each other, the only supported tunnel
> -    encapsulations are Geneve and STT. Hypervisors may use VXLAN to
> -    connect to gateways. We have limited support to these encapsulations
> -    for the following reasons:
> +    This section describes how a packet travels from ingress into OVN
> from one
> +    virtual machine or container to another.  This description focuses on
> the
> +    physical treatment of a packet; for a description of the logical life
> cycle
> +    of a packet, please refer to the <code>Rule</code> table in
> +    <code>ovn-sb</code>(5).
>    </p>
>
> -  <ul>
> +  <p>
> +    This section mentions several data and metadata fields, for clarity
> +    summarized here:
> +  </p>
> +
> +  <dl>
> +    <dt>tunnel key</dt>
> +    <dd>
> +      When OVN encapsulates a packet in Geneve or another tunnel, it
> attaches
> +      extra data to it to allow the receiving OVN instance to process it
> +      correctly.  This takes different forms depending on the particular
> +      encapsulation, but in each case we refer to it here as the ``tunnel
> +      key.''  See <code>Tunnel Encapsulations</code>, below, for details.
> +    </dd>
> +
> +    <dt>logical datapath field</dt>
> +    <dd>
> +      A field that denotes the logical datapath through which a packet is
> being
> +      processed.  OVN uses the field that OpenFlow 1.1+ simply (and
> +      confusingly) calls ``metadata'' to store the logical datapath.
> (This
> +      field is passed across tunnels as part of the tunnel key.)
> +    </dd>
> +
> +    <dt>logical input port field</dt>
> +    <dd>
> +      A field that denotes the logical port from which the packet entered
> the
> +      logical datapath.  OVN stores this in a Nicira extension register.
> (This
> +      field is passed across tunnels as part of the tunnel key.)
> +    </dd>
> +
> +    <dt>logical output port field</dt>
> +    <dd>
> +      A field that denotes the logical port from which the packet will
> leave
> +      the logical datapath.  This is initialized to 0 at the beginning of
> the
> +      logical ingress pipeline.  OVN stores this in a Nicira extension
> +      register.  (This field is passed across tunnels as part of the
> tunnel
> +      key.)
> +    </dd>
> +
> +    <dt>VLAN ID</dt>
> +    <dd>
> +      The VLAN ID is used as an interface between OVN and containers
> nested
> +      inside a VM (see <code>Life Cycle of a container interface inside a
> +      VM</code>, above, for more information).
> +    </dd>
> +  </dl>
> +
> +  <p>
> +    Initially, a VM or container on the ingress hypervisor sends a packet
> on a
> +    port attached to the OVN integration bridge.  Then:
> +  </p>
> +
> +  <ol>
>      <li>
>        <p>
> -        They support large amounts of metadata.  In addition to
> -        specifying the logical switch, we will likely want to indicate
> -        the logical source port and where we are in the logical
> -        pipeline.  Geneve supports a 24-bit VNI field and TLV-based
> -        extensions.  The header of STT includes a 64-bit context id.
> +        OpenFlow table 0 performs physical-to-logical translation.  It
> matches
> +        the packet's ingress port.  Its actions annotate the packet with
> +        logical metadata, by setting the logical datapath field to
> identify the
> +        logical datapath that the packet is traversing and the logical
> input
> +        port field to identify the ingress port.  Then it resubmits to
> table 16
> +        to enter the logical ingress pipeline.
> +      </p>
> +
> +      <p>
> +        Packets that originate from a container nested within a VM are
> treated
> +        in a slightly different way.  The originating container can be
> +        distinguished based on the VLAN ID, so the physical-to-logical
> +        translation flows additionally match on VLAN ID and the actions
> strip
> +        the VLAN header.  Following this step, OVN treats packets from
> +        containers just like any other packets.
> +      </p>
> +
> +      <p>
> +        Table 0 also processes packets that arrive from other
> hypervisors.  It
> +        distinguishes them from other packets by ingress port, which is a
> +        tunnel.  As with packets just entering the OVN pipeline, the
> actions
> +        annotate these packets with logical datapath and logical ingress
> port
> +        metadata.  In addition, the actions set the logical output port
> field,
> +        which is available because in OVN tunneling occurs after the
> logical
> +        output port is known.  These three pieces of information are
> obtained
> +        from the tunnel encapsulation metadata (see <code>Tunnel
> +        Encapsulations</code> for encoding details).  Then the actions
> resubmit
> +        to table 33 to enter the logical egress pipeline.
>        </p>
>      </li>
>
>      <li>
>        <p>
> -        They use randomized UDP or TCP source ports that allows
> -        efficient distribution among multiple paths in environments that
> -        use ECMP in their underlay.
> +        OpenFlow tables 16 through 31 execute the logical ingress
> pipeline from
> +        the <code>Rule</code> table in the OVN Southbound database.  These
> +        tables are expressed entirely in terms of logical concepts like
> logical
> +        ports and logical datapaths.  A big part of
> +        <code>ovn-controller</code>'s job is to translate them into
> equivalent
> +        OpenFlow (in particular it translates the table numbers:
> +        <code>Rule</code> tables 0 through 15 become OpenFlow tables 16
> through
> +        31).  For a given packet, the logical ingress pipeline eventually
> +        executes zero or more <code>output</code> actions:
>        </p>
> +
> +      <ul>
> +        <li>
> +          If the pipeline executes no <code>output</code> actions at all,
> the
> +          packet is effectively dropped.
> +        </li>
> +
> +        <li>
> +          Most commonly, the pipeline executes one <code>output</code>
> action,
> +          which <code>ovn-controller</code> implements by resubmitting the
> +          packet to table 32.
> +        </li>
> +
> +        <li>
> +          If the pipeline can execute more than one <code>output</code>
> action,
> +          then each one is separately resubmitted to table 32.  This can
> be
> +          used to send multiple copies to the packet to multiple ports.
> (If
> +          the packet was not modified between the <code>output</code>
> actions,
> +          and some of the copies are destined to the same hypervisor, then
> +          using a logical multicast output port would save bandwidth
> between
> +          hypervisors.)
> +        </li>
> +      </ul>
>      </li>
>
>      <li>
>        <p>
> -        NICs are available that accelerate encapsulation and
> decapsulation.
> +        OpenFlow tables 32 through 47 implement the <code>output</code>
> action
> +        in the the logical ingress pipeline.  Specifically, table 32
> handles
> +        packets to remote hypervisors, table 33 handles packets to the
> local
> +        hypervisor, and table 34 discards packets whose logical ingress
> and
> +        egress port are the same.
> +      </p>
> +
> +      <p>
> +        Each flow in table 32 matches on a logical output port for
> unicast or
> +        multicast logical ports that include a logical port on a remote
> +        hypervisor.  Each flow's actions implement sending a packet to
> the port
> +        it matches.  For unicast logical output ports on remote
> hypervisors,
> +        the actions set the tunnel key to the correct value, then send the
> +        packet on the tunnel port to the correct hypervisor.  (When the
> remote
> +        hypervisor receives the packet, table 0 there will recognize it
> as a
> +        tunneled packet and pass it along to table 33.)  For multicast
> logical
> +        output ports, the actions send one copy of the packet to each
> remote
> +        hypervisor, in the same way as for unicast destinations.  If a
> +        multicast group includes a logical port or ports on the local
> +        hypervisor, then its actions also resubmit to table 33.  Table 32
> also
> +        includes a fallback flow that resubmits to table 33 if there is no
> +        other match.
> +      </p>
> +
> +      <p>
> +        Flows in table 33 resemble those in table 32 but for logical
> ports that
> +        reside locally rather than remotely.  For unicast logical output
> ports
> +        on the local hypervisor, the actions just resubmit to table 34.
> For
> +        multicast output ports that include one or more logical ports on
> the
> +        local hypervisor, for each such logical port <var>P</var>, the
> actions
> +        change the logical output port to <var>P</var>, then resubmit to
> table
> +        34.
> +      </p>
> +
> +      <p>
> +        Table 34 matches and drops packets for which the logical input and
> +        output ports are the same.  It resubmits other packets to table
> 48.
>        </p>
>      </li>
> +
> +    <li>
> +      <p>
> +        OpenFlow tables 48 through 63 execute the logical egress pipeline
> from
> +        the <code>Rule</code> table in the OVN Southbound database.  The
> +        egress pipeline can perform a final stage of validation before
> packet
> +        delivery.  Eventually, it may execute an <code>output</code>
> action,
> +        which <code>ovn-controller</code> implements by resubmitting to
> table
> +        64.  A packet for which the pipeline never executes
> <code>output</code>
> +        is effectively dropped (although it may have been transmitted
> through a
> +        tunnel across a physical network).
> +      </p>
> +
> +      <p>
> +        The egress pipeline cannot change the logical output port or cause
> +        further tunneling.
> +      </p>
> +    </li>
> +
> +    <li>
> +      <p>
> +        OpenFlow table 64 performs logical-to-physical translation, the
> +        opposite of table 0.  It matches the packet's logical egress
> port.  Its
> +        actions output the packet to the port attached to the OVN
> integration
> +        bridge that represents that logical port.  If the logical egress
> port
> +        is a container nested with a VM, then before sending the packet
> the
> +        actions push on a VLAN header with an appropriate VLAN ID.
> +      </p>
> +    </li>
> +  </ol>
> +
> +  <h1>Design Decisions</h1>
> +
> +  <h2>Tunnel Encapsulations</h2>
> +
> +  <p>
> +    OVN annotates logical network packets that it sends from one
> hypervisor to
> +    another with the following three pieces of metadata, which are
> encoded in
> +    an encapsulation-specific fashion:
> +  </p>
> +
> +  <ul>
> +    <li>
> +      24-bit logical datapath identifier, from the <code>tunnel_key</code>
> +      column in the OVN Southbound <code>Datapath_Binding</code> table.
> +    </li>
> +
> +    <li>
> +      15-bit logical ingress port identifier, from the
> <code>tunnel_key</code>
> +      column in the OVN Southbound <code>Port_Binding</code> table.
> +    </li>
> +
> +    <li>
> +      16-bit logical egress port identifier, from the
> <code>Port_Binding</code>
> +      <code>tunnel_key</code> column in the OVN Southbound
> +      <code>Port_Binding</code> (as for the logical ingress port) or
> +      <code>Multicast_Group</code> table.
> +    </li>
> +  </ul>
> +
> +  <p>
> +    For hypervisor-to-hypervisor traffic, OVN supports only Geneve and STT
> +    encapsulations, for the following reasons:
> +  </p>
> +
> +  <ul>
> +    <li>
> +      Only STT and Geneve support the large amounts of metadata (over 32
> bits
> +      per packet) that OVN uses (as described above).
> +    </li>
> +
> +    <li>
> +      STT and Geneve use randomized UDP or TCP source ports that allows
> +      efficient distribution among multiple paths in environments that
> use ECMP
> +      in their underlay.
> +    </li>
> +
> +    <li>
> +      NICs are available to offload STT and Geneve encapsulation and
> +      decapsulation.
> +    </li>
>    </ul>
>
>    <p>
> -    Due to its flexibility, the preferred encapsulation between
> -    hypervisors is Geneve.  Some environments may want to use STT for
> -    performance reasons until the NICs they use support hardware offload
> -    of Geneve.
> +    Due to its flexibility, the preferred encapsulation between
> hypervisors is
> +    Geneve.  For Geneve encapsulation, OVN transmits the logical datapath
> +    identifier in the Geneve VNI.
> +
> +    <!-- Keep the following in sync with ovn/controller/physical.h. -->
> +    OVN transmits the logical ingress and logical egress ports in a TLV
> with
> +    class 0xffff, type 0, and a 32-bit value encoded as follows, from MSB
> to
> +    LSB:
> +  </p>
> +
> +  <diagram>
> +    <header name="">
> +      <bits name="rsv" above="1" below="0" width=".25"/>
> +      <bits name="ingress port" above="15" width=".75"/>
> +      <bits name="egress port" above="16" width=".75"/>
> +    </header>
> +  </diagram>
> +
> +  <p>
> +    Environments whose NICs lack Geneve offload may prefer STT
> encapsulation
> +    for performance reasons.  For STT encapsulation, OVN encodes all three
> +    pieces of logical metadata in the STT 64-bit tunnel ID as follows,
> from MSB
> +    to LSB:
>    </p>
>
> +  <diagram>
> +    <header name="">
> +      <bits name="reserved" above="9" below="0" width=".5"/>
> +      <bits name="ingress port" above="15" width=".75"/>
> +      <bits name="egress port" above="16" width=".75"/>
> +      <bits name="datapath" above="24" width="1.25"/>
> +    </header>
> +  </diagram>
> +
>    <p>
> -    For connecting to gateways, the only supported tunnel encapsulations
> -    are VXLAN, Geneve, and STT.  While support for Geneve is becoming
> -    available for TOR (top-of-rack) switches, VXLAN is far more common.
> -    Currently, gateways have a feature set that matches the capabilities
> -    as defined by the VTEP schema, so fewer bits of metadata are
> -    necessary.  In the future, gateways that do not support
> -    encapsulations with large amounts of metadata may continue to have a
> -    reduced feature set.
> +    For connecting to gateways, in addition to Geneve and STT, OVN
> supports
> +    VXLAN, because only VXLAN support is common on top-of-rack (ToR)
> switch.
> +    Currently, gateways have a feature set that matches the capabilities
> as
> +    defined by the VTEP schema, so fewer bits of metadata are necessary.
> In
> +    the future, gateways that do not support encapsulations with large
> amounts
> +    of metadata may continue to have a reduced feature set.
>    </p>
>  </manpage>
> diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
> index f2993a2..059e48a 100644
> --- a/ovn/ovn-nb.xml
> +++ b/ovn/ovn-nb.xml
> @@ -201,15 +201,14 @@
>      </column>
>
>      <column name="match">
> -      The packets that the ACL should match, in the same expression
> -      language used for the <ref column="match" table="Rule"
> -      db="OVN_Southbound"/> column in the OVN Southbound database's <ref
> -      table="Rule" db="OVN_Southbound"/> table.  Match
> -      <code>inport</code> and <code>outport</code> against names of
> -      logical ports within <ref column="lswitch"/> to implement ingress
> -      and egress ACLs, respectively.  In logical switches connected to
> -      logical routers, the special port name <code>ROUTER</code> refers
> -      to the logical router port.
> +      The packets that the ACL should match, in the same expression
> language
> +      used for the <ref column="match" table="Rule" db="OVN_Southbound"/>
> +      column in the OVN Southbound database's <ref table="Rule"
> +      db="OVN_Southbound"/> table.  Match <code>inport</code> and
> +      <code>outport</code> against names of logical ports within <ref
> +      column="lswitch"/> to implement ingress and egress ACLs,
> respectively.
> +      In logical switches connected to logical routers, the special port
> name
> +      <code>ROUTER</code> refers to the logical router port.
>      </column>
>
>      <column name="action">
> diff --git a/ovn/ovn-sb.ovsschema b/ovn/ovn-sb.ovsschema
> index add908b..9c2e553 100644
> --- a/ovn/ovn-sb.ovsschema
> +++ b/ovn/ovn-sb.ovsschema
> @@ -34,24 +34,56 @@
>                                                "max": "unlimited"}}}},
>          "Rule": {
>              "columns": {
> -                "logical_datapath": {"type": "uuid"},
> +                "logical_datapath": {"type": {"key": {"type": "uuid",
> +                                                      "refTable":
> "Datapath_Binding"}}},
> +                "pipeline": {"type": {"key": {"type": "string",
> +                                      "enum": ["set", ["ingress",
> +                                                       "egress"]]}}},
>                  "table_id": {"type": {"key": {"type": "integer",
>                                                "minInteger": 0,
> -                                              "maxInteger": 31}}},
> +                                              "maxInteger": 15}}},
>                  "priority": {"type": {"key": {"type": "integer",
>                                                "minInteger": 0,
>                                                "maxInteger": 65535}}},
>                  "match": {"type": "string"},
>                  "actions": {"type": "string"}},
>              "isRoot": true},
> +        "Multicast_Group": {
> +            "columns": {
> +                "datapath": {"type": {"key": {"type": "uuid",
> +                                              "refTable":
> "Datapath_Binding"}}},
> +                "name": {"type": "string"},
> +                "tunnel_key": {
> +                    "type": {"key": {"type": "integer",
> +                                     "minInteger": 32768,
> +                                     "maxInteger": 65535}}},
> +                "ports": {"type": {"key": {"type": "uuid",
> +                                           "refTable": "Port_Binding",
> +                                           "refType": "weak"},
> +                                   "min": 1, "max": "unlimited"}}},
> +            "indexes": [["datapath", "tunnel_key"],
> +                        ["datapath", "name"]],
> +            "isRoot": true},
> +        "Datapath_Binding": {
> +            "columns": {
> +                "tunnel_key": {
> +                     "type": {"key": {"type": "integer",
> +                                      "minInteger": 1,
> +                                      "maxInteger": 16777215}}},
> +                "external_ids": {
> +                    "type": {"key": "string", "value": "string",
> +                             "min": 0, "max": "unlimited"}}},
> +            "indexes": [["tunnel_key"]],
> +            "isRoot": true},
>          "Port_Binding": {
>              "columns": {
> -                "logical_datapath": {"type": "uuid"},
>                  "logical_port": {"type": "string"},
> +                "datapath": {"type": {"key": {"type": "uuid",
> +                                              "refTable":
> "Datapath_Binding"}}},
>                  "tunnel_key": {
>                       "type": {"key": {"type": "integer",
>                                        "minInteger": 1,
> -                                      "maxInteger": 65535}}},
> +                                      "maxInteger": 32767}}},
>                  "parent_port": {"type": {"key": "string", "min": 0,
> "max": 1}},
>                  "tag": {
>                       "type": {"key": {"type": "integer",
> @@ -65,6 +97,6 @@
>                  "mac": {"type": {"key": "string",
>                                   "min": 0,
>                                   "max": "unlimited"}}},
> -            "indexes": [["logical_port"], ["tunnel_key"]],
> +            "indexes": [["datapath", "tunnel_key"], ["logical_port"]],
>              "isRoot": true}},
>      "version": "1.0.0"}
> diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
> index 2f2a55e..982eba7 100644
> --- a/ovn/ovn-sb.xml
> +++ b/ovn/ovn-sb.xml
> @@ -74,15 +74,16 @@
>    </p>
>
>    <p>
> -    The <ref table="Rule"/> table is currently the only LN table.
> +    <ref table="Rule"/> and <ref table="Multicast_Group"/> contain LN
> data.
>    </p>
>
>    <h3>Bindings data</h3>
>
>    <p>
> -    The Binding tables contain the current placement of logical components
> -    (such as VMs and VIFs) onto chassis and the bindings between logical
> ports
> -    and MACs.
> +    Bindings data link logical and physical components.  They show the
> current
> +    placement of logical components (such as VMs and VIFs) onto chassis,
> and
> +    map logical entities to the values that represent them in tunnel
> +    encapsulations.
>    </p>
>
>    <p>
> @@ -98,9 +99,32 @@
>    </p>
>
>    <p>
> -    The <ref table="Port_Binding"/> table is currently the only binding
> data.
> +    The <ref table="Port_Binding"/> and <ref table="Datapath_Binding"/>
> tables
> +    contain binding data.
>    </p>
>
> +  <h2>Common Columns</h2>
> +
> +  <p>
> +    Some tables contain a special column named
> <code>external_ids</code>.  This
> +    column has the same form and purpose each place that it appears, so we
> +    describe it here to save space later.
> +  </p>
> +
> +  <dl>
> +    <dt><code>external_ids</code>: map of string-string pairs</dt>
> +    <dd>
> +      Key-value pairs for use by the software that manages the OVN
> Southbound
> +      database rather than by <code>ovn-controller</code>.  In particular,
> +      <code>ovn-northd</code> can use key-value pairs in this column to
> relate
> +      entities in the southbound database to higher-level entities (such
> as
> +      entities in the OVN Northbound database).  Individual key-value
> pairs in
> +      this column may be documented in some cases to aid in understanding
> and
> +      troubleshooting, but the reader should not mistake such
> documentation as
> +      comprehensive.
> +    </dd>
> +  </dl>
> +
>    <table name="Chassis" title="Physical Network Hypervisor and Gateway
> Information">
>      <p>
>        Each row in this table represents a hypervisor or gateway (a
> chassis) in
> @@ -198,7 +222,7 @@
>      </column>
>    </table>
>
> -  <table name="Rule" title="Logical Network Rule">
> +  <table name="Rule" title="Logical Network Flows">
>      <p>
>        Each row in this table represents one logical flow.  The cloud
> management
>        system, via its OVN integration, populates this table with logical
> flows
> @@ -223,14 +247,111 @@
>        The default action when no flow matches is to drop packets.
>      </p>
>
> +    <p><em>Logical Life Cycle of a Packet</em></p>
> +
> +    <p>
> +      This following description focuses on the life cycle of a packet
> through
> +      a logical datapath, ignoring physical details of the implementation.
> +      Please refer to <em>Life Cycle of a Packet</em> in
> +      <code>ovn-architecture</code>(7) for the physical information.
> +    </p>
> +
> +    <p>
> +      The description here is written as if OVN itself executes these
> steps,
> +      but in fact OVN (that is, <code>ovn-controller</code>) programs Open
> +      vSwitch, via OpenFlow and OVSDB, to execute them on its behalf.
> +    </p>
> +
> +    <p>
> +      At a high level, OVN passes each packet through the logical
> datapath's
> +      logical ingress pipeline, which may output the packet to one or more
> +      logical port or logical multicast groups.  For each such logical
> output
> +      port, OVN passes the packet through the datapath's logical egress
> +      pipeline, which may either drop the packet or deliver it to the
> +      destination.  Between the two pipelines, outputs to logical
> multicast
> +      groups are expanded into logical ports, so that the egress pipeline
> only
> +      processes a single logical output port at a time.  Between the two
> +      pipelines is also where, when necessary, OVN encapsulates a packet
> in a
> +      tunnel (or tunnels) to transmit to remote hypervisors.
> +    </p>
> +
> +    <p>
> +      In more detail, to start, OVN searches the <ref table="Rule"/>
> table for
> +      a row with correct <ref column="logical_datapath"/>, a <ref
> +      column="pipeline"/> of <code>ingress</code>, a <ref
> column="table_id"/>
> +      of 0, and a <ref column="match"/> that is true for the packet.  If
> none
> +      is found, OVN drops the packet.  If OVN finds more than one, it
> chooses
> +      the match with the highest <ref column="priority"/>.  Then OVN
> executes
> +      each of the actions specified in the row's <ref table="actions"/>
> column,
> +      in the order specified.  Some actions, such as those to modify
> packet
> +      headers, require no further details.  The <code>next</code> and
> +      <code>output</code> actions are special.
> +    </p>
> +
> +    <p>
> +      The <code>next</code> action causes the above process to be repeated
> +      recursively, except that OVN searches for <ref column="table_id"/>
> of 1
> +      instead of 0.  Similarly, any <code>next</code> action in a row
> found in
> +      that table would cause a further search for a <ref
> column="table_id"/> of
> +      2, and so on.  When recursive processing completes, flow control
> returns
> +      to the action following <code>next</code>.
> +    </p>
> +
> +    <p>
> +      The <code>output</code> action also introduces recursion.  Its
> effect
> +      depends on the current value of the <code>outport</code> field.
> Suppose
> +      <code>outport</code> designates a logical port.  First, OVN compares
> +      <code>inport</code> to <code>outport</code>; if they are equal, it
> treats
> +      the <code>output</code> as a no-op.  In the common case, where they
> are
> +      different, the packet enters the egress pipeline.  This transition
> to the
> +      egress pipeline discards register data (<code>reg0</code>
> +      ... <code>reg5</code>).
> +    </p>
> +
> +    <p>
> +      To execute the egress pipeline, OVN again searches the <ref
> +      table="Rule"/> table for a row with correct <ref
> +      column="logical_datapath"/>, a <ref column="table_id"/> of 0, a <ref
> +      column="match"/> that is true for the packet, but now looking for a
> <ref
> +      column="pipeline"/> of <code>egress</code>.  If no matching row is
> found,
> +      the output becomes a no-op.  Otherwise, OVN executes the actions
> for the
> +      matching flow (which is chosen from multiple, if necessary, as
> already
> +      described).
> +    </p>
> +
> +    <p>
> +      In the <code>egress</code> pipeline, the <code>next</code> action
> acts as
> +      already described, except that it, of course, searches for
> +      <code>egress</code> flows.  The <code>output</code> action,
> however, now
> +      directly outputs the packet to the output port (which is now fixed,
> +      because <code>outport</code> is read-only within the egress
> pipeline).
> +    </p>
> +
> +    <p>
> +      The description earlier assumed that <code>outport</code> referred
> to a
> +      logical port.  If it instead designates a logical multicast group,
> then
> +      the description above still applies, with the addition of fan-out
> from
> +      the logical multicast group to each logical port in the group.  For
> each
> +      member of the group, OVN executes the logical pipeline as
> described, with
> +      the logical output port replaced by the group member.
> +    </p>
> +
>      <column name="logical_datapath">
>        The logical datapath to which the logical flow belongs.  A logical
>        datapath implements a logical pipeline among the ports in the <ref
> -      table="Port_Binding"/> table associated with it.  (No table
> represents a
> -      logical datapath.)  In practice, the pipeline in a given logical
> datapath
> -      implements either a logical switch or a logical router, and
> -      <code>ovn-northd</code> reuses the UUIDs for those logical entities
> from
> -      the <code>OVN_Northbound</code> for logical datapaths.
> +      table="Port_Binding"/> table associated with it.  In practice, the
> +      pipeline in a given logical datapath implements either a logical
> switch
> +      or a logical router, and <code>ovn-northd</code> reuses the UUIDs
> for
> +      those logical entities from the <code>OVN_Northbound</code> for
> logical
> +      datapaths.
> +    </column>
> +
> +    <column name="pipeline">
> +      <p>
> +        The primary flows used for deciding on a packet's destination are
> the
> +        <code>ingress</code> flows.  The <code>egress</code> flows
> implement
> +        ACLs.  See <em>Logical Life Cycle of a Packet</em>, above, for
> details.
> +      </p>
>      </column>
>
>      <column name="table_id">
> @@ -449,11 +570,7 @@
>
>        <p>
>          String constants have the same syntax as quoted strings in JSON
> (thus,
> -        they are Unicode strings).  String constants are used for naming
> -        logical ports.  Thus, the useful values are <ref
> -        column="logical_port"/> names from the <ref
> column="Port_Binding"/> and
> -        <ref column="Gateway"/> tables in a logical flow's <ref
> -        column="logical_datapath"/>.
> +        they are Unicode strings).
>        </p>
>
>        <p>
> @@ -524,10 +641,21 @@
>
>        <p><em>Symbols</em></p>
>
> +      <p>
> +        Most of the symbols below have integer type.  Only
> <code>inport</code>
> +        and <code>outport</code> have string type.  <code>inport</code>
> names a
> +        logical port.  Thus, its value is a <ref column="logical_port"/>
> names
> +        from the <ref table="Port_Binding"/> and <ref table="Gateway"/>
> tables
> +        in a logical flow's <ref column="logical_datapath"/>.
> +        <code>outport</code> may name a logical port, as
> <code>inport</code>.
> +        It may also name a logical multicast group defined in the <ref
> +        table="Multicast_Group"/> table.
> +      </p>
> +
>        <ul>
>          <li>
> -          <code>metadata</code> <code>reg0</code> ... <code>reg7</code>
> -          <code>xreg0</code> ... <code>xreg3</code>
> +          <code>reg0</code>...<code>reg5</code>
> +          <code>xreg0</code>...<code>xreg2</code>
>          </li>
>          <li><code>inport</code> <code>outport</code>
> <code>queue</code></li>
>          <li><code>eth.src</code> <code>eth.dst</code>
> <code>eth.type</code></li>
> @@ -562,17 +690,32 @@
>        </p>
>
>        <p>
> -        The following actions will be initially supported:
> +       The following actions are defined:
>        </p>
>
>        <dl>
>          <dt><code>output;</code></dt>
>          <dd>
> -          Outputs the packet to the logical port current designated by
> -          <code>outport</code>.  Output to the ingress port is implicitly
> -          dropped, that is, <code>output</code> becomes a no-op if
> -          <code>outport</code> == <code>inport</code>.
> -        </dd>
> +          <p>
> +           In an <code>ingress</code> flow, this action executes the
> +           <code>egress</code> pipeline as a subroutine.  If
> +           <code>outport</code> names a logical port, the egress pipeline
> +           executes once; if it is a multicast group, the egress pipeline
> runs
> +           once for each logical port in the group.
> +          </p>
> +
> +          <p>
> +            In an <code>egress</code> flow, this action performs the
> actual
> +            output to the <code>outport</code> logical port.  (In the
> egress
> +            pipeline, <code>outport</code> never names a multicast group.)
> +          </p>
> +
> +          <p>
> +            Output to the input port is implicitly dropped, that is,
> +            <code>output</code> becomes a no-op if <code>outport</code> ==
> +            <code>inport</code>.
> +          </p>
> +       </dd>
>
>          <dt><code>next;</code></dt>
>          <dd>
> @@ -581,21 +724,32 @@
>
>          <dt><code><var>field</var> = <var>constant</var>;</code></dt>
>          <dd>
> -          Sets data or metadata field <var>field</var> to constant value
> -          <var>constant</var>, e.g. <code>outport = "vif0";</code> to set
> the
> -          logical output port.  Assigning to a field with prerequisites
> -          implicitly adds those prerequisites to <ref column="match"/>;
> thus,
> -          for example, a flow that sets <code>tcp.dst</code> applies only
> to
> -          TCP flows, regardless of whether its <ref column="match"/>
> mentions
> -          any TCP field.  To set only a subset of bits in a field,
> -          <var>field</var> may be a subfield or <var>constant</var> may be
> -          masked, e.g. <code>vlan.pcp[2] = 1;</code> and <code>vlan.pcp =
> -          4/4;</code> both set the most sigificant bit of the VLAN PCP.
> Not
> -          all fields are modifiable (e.g. <code>eth.type</code> and
> -          <code>ip.proto</code> are read-only), and not all modifiable
> fields
> -          may be partially modified (e.g. <code>ip.ttl</code> must
> assigned as
> -          a whole).
> -        </dd>
> +          <p>
> +           Sets data or metadata field <var>field</var> to constant value
> +           <var>constant</var>, e.g. <code>outport = "vif0";</code> to
> set the
> +           logical output port.  To set only a subset of bits in a field,
> +           specify a subfield for <var>field</var> or a masked
> +           <var>constant</var>, e.g. one may use <code>vlan.pcp[2] =
> 1;</code>
> +           or <code>vlan.pcp = 4/4;</code> to set the most sigificant bit
> of
> +           the VLAN PCP.
> +          </p>
> +
> +          <p>
> +            Assigning to a field with prerequisites implicitly adds those
> +            prerequisites to <ref column="match"/>; thus, for example, a
> flow
> +            that sets <code>tcp.dst</code> applies only to TCP flows,
> +            regardless of whether its <ref column="match"/> mentions any
> TCP
> +            field.
> +          </p>
> +
> +          <p>
> +            Not all fields are modifiable (e.g. <code>eth.type</code> and
> +            <code>ip.proto</code> are read-only), and not all modifiable
> fields
> +            may be partially modified (e.g. <code>ip.ttl</code> must
> assigned
> +            as a whole).  The <code>outport</code> field is modifiable
> for an
> +            <code>ingress</code> flow but not an <code>egress</code> flow.
> +          </p>
> +       </dd>
>        </dl>
>
>        <p>
> @@ -628,6 +782,77 @@
>      </column>
>    </table>
>
> +  <table name="Multicast_Group" title="Logical Port Multicast Groups">
> +    <p>
> +      The rows in this table define multicast groups of logical ports.
> +      Multicast groups allow a single packet transmitted over a tunnel to
> a
> +      hypervisor to be delivered to multiple VMs on that hypervisor, which
> +      uses bandwidth more efficiently.
> +    </p>
> +
> +    <p>
> +      Each row in this table defines a logical multicast group numbered
> <ref
> +      column="tunnel_key"/> within <ref column="datapath"/>, whose logical
> +      ports are listed in the <ref column="ports"/> column.  All of the
> ports
> +      must be in the <ref column="datapath"/> logical datapath (but the
> +      database schema cannot enforce this).
> +    </p>
> +
> +    <p>
> +      Multicast group numbers and names are scoped within a logical
> datapath.
> +    </p>
> +
> +    <p>
> +      In the <ref table="Rule"/> table, multicast groups may be used for
> output
> +      just as for individual logical ports, by assigning the group's name
> to
> +      <code>outport</code>,
> +    </p>
> +
> +    <p>
> +      Multicast group names and logical port names share a single
> namespace and
> +      thus should not overlap (but the database schema cannot enforce
> this).
> +    </p>
> +
> +    <p>
> +      An index prevents this table from containing any two rows with the
> same
> +      <ref column="datapath"/> and <ref column="tunnel_key"/> values or
> the
> +      same <ref column="datapath"/> and <ref column="name"/> values.
> +    </p>
> +
> +    <column name="datapath"/>
> +    <column name="tunnel_key"/>
> +    <column name="name"/>
> +    <column name="ports"/>
> +  </table>
> +
> +  <table name="Datapath_Binding" title="Physical-Logical Datapath
> Bindings">
> +    <p>
> +      Each row in this table identifies physical bindings of a logical
> +      datapath.
> +    </p>
> +
> +    <column name="tunnel_key">
> +      The tunnel key value to which the logical datapath is bound.
> +      The <code>Tunnel Encapsulation</code> section in
> +      <code>ovn-architecture</code>(7) describes how tunnel keys are
> +      constructed for each supported encapsulation.
> +    </column>
> +
> +    <column name="external_ids" key="logical-switch" type='{"type":
> "uuid"}'>
> +      Each row in <ref table="Datapath_Binding"/> is associated with some
> +      logical datapath.  <code>ovn-northd</code> uses this key to store
> the
> +      UUID of the logical datapath <ref table="Logical_Switch"
> +      db="OVN_Northbound"/> row in the <ref db="OVN_Northbound"/>
> database.
> +    </column>
> +
> +    <group title="Common Columns">
> +      The overall purpose of these columns is described under <code>Common
> +      Columns</code> at the beginning of this document.
> +
> +      <column name="external_ids"/>
> +    </group>
> +  </table>
> +
>    <table name="Port_Binding" title="Physical-Logical Port Bindings">
>      <p>
>        Each row in this table identifies the physical location of a logical
> @@ -651,7 +876,7 @@
>      </p>
>
>      <p>
> -      When a chassis shuts down gracefully, it should cleanup the
> +      When a chassis shuts down gracefully, it should clean up the
>        <code>chassis</code> column that it previously had populated.
>        (This is not critical because resources hosted on the chassis are
> equally
>        unreachable regardless of whether their rows are present.)  To
> handle the
> @@ -660,10 +885,8 @@
>        <code>chassis</code> column with new information.
>      </p>
>
> -    <column name="logical_datapath">
> -      The logical datapath to which the logical port belongs.  A logical
> -      datapath implements a logical pipeline via logical flows in the <ref
> -      table="Rule"/> table.  (No table represents a logical datapath.)
> +    <column name="datapath">
> +      The logical datapath to which the logical port belongs.
>      </column>
>
>      <column name="logical_port">
> @@ -675,16 +898,29 @@
>
>      <column name="tunnel_key">
>        <p>
> -        A number that represents the logical port in the key (e.g. VXLAN
> VNI or
> -        STT key) field carried within tunnel protocol packets.  (This
> avoids
> +        A number that represents the logical port in the key (e.g. STT
> key or
> +        Geneve TLV) field carried within tunnel protocol packets.  This
> avoids
>          wasting space for a whole UUID in tunneled packets.  It also
> allows OVN
>          to support encapsulations that cannot fit an entire UUID in their
> -        tunnel keys.)
> +        tunnel keys (i.e. every encapsulation other than Geneve).
>        </p>
>
>        <p>
> -        Tunnel ID 0 is reserved for internal use within OVN.
> +        The tunnel ID must be unique within the scope of a logical
> datapath.
>        </p>
> +
> +      <p>
> +        Logical port tunnel IDs form a 16-bit space:
> +      </p>
> +
> +      <ul>
> +        <li>Tunnel ID 0 is reserved for internal use within OVN.</li>
> +        <li>Tunnel IDs 1 through 32767, inclusive, may be assigned to
> logical
> +        ports.</li>
> +        <li>Tunnel IDs 32768 through 65535, inclusive, may be assigned to
> +        logical multicast groups (see the <ref table="Multicast_Group"/>
> +        table).</li>
> +      </ul>
>      </column>
>
>      <column name="parent_port">
> --
> 2.1.3
>
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>



More information about the dev mailing list