[ovs-dev] [PATCH v6 1/3] Avoid tunneling for VLAN packets redirected to a gateway chassis

Mark Michelson mmichels at redhat.com
Thu Jul 19 13:24:50 UTC 2018


I've had a look through and have some notes in-line. I know some of them 
will be a bit nit-picky, but...sometimes that's just how I am :)

On 06/25/2018 03:53 PM, vkommadi at redhat.com wrote:
> From: venkata anil <vkommadi at redhat.com>
> 
> When a vm on a vlan tenant network sends traffic to an external network,
> it is tunneled from host chassis to gateway chassis. In the earlier
> discussion [1], Russel (also in his doc [2]) suggested if we can figure
> out a way for OVN to do this redirect to the gateway host over a VLAN
> network. This patch implements his suggestion i.e will redirect to
> gateway chassis using incoming tenant vlan network. Gateway chassis are
> expected to be configured with tenant vlan networks. In this approach,
> new logical and physical flows introduced for packet processing in both
> host and gateway chassis.
> 
> Packet processing in the host chassis:
> 1) A new ovs flow added in physical table 65, which sets MLF_RCV_FROM_VLAN
>     flag for packets from vlan network entering into router pipeline
> 2) A new flow added in lr_in_ip_routing, for packets output through
>     distributed gateway port and matching MLF_RCV_FROM_VLAN flag,
>     set REGBIT_NAT_REDIRECT i.e
>     table=7 (lr_in_ip_routing   ), priority=2    , match=(
>     ip4.dst == 0.0.0.0/0 && flags.rcv_from_vlan == 1 &&
>     !is_chassis_resident("cr-alice")), action=(reg9[0] = 1; next;)
>     This flow will be set only on chassis not hosting chassisredirect
>     port i.e compute node.
>     When REGBIT_NAT_REDIRECT set,
>     a) lr_in_arp_resolve, will set packet eth.dst to distibuted gateway
>        port MAC
>     b) lr_in_gw_redirect, will set chassisredirect port as outport
> 3) A new ovs flow added in physical table 32 will use source vlan tenant
>     network tag as vlan ID for sending the packet to gateway chassis.
>     As this vlan packet destination MAC is distibuted gateway port MAC,
>     packet will only reach the gateway chassis.
>     table=32,priority=150,reg10=0x20/0x20,reg14=0x3,reg15=0x6,metadata=0x4
>     actions=mod_vlan_vid:2010,output:25,strip_vlan
>     This flow will be set only on chassis not hosting chassisredirect
>     port i.e compute node.
> 
> Packet processing in the gateway chassis:
> 1) A new ovs flow added in physical table 0 for vlan traffic coming
>     from localnet port with router distributed gateway port MAC as
>     destination MAC address, resubmit to connected router ingress pipeline
>     (i.e router attached to vlan tenant network).
>     table=0,priority=150,in_port=67,dl_vlan=2010,dl_dst=00:00:02:01:02:03
>     actions=strip_vlan,load:0x4->OXM_OF_METADATA[],load:0x3->NXM_NX_REG14[],
>     load:0x1->NXM_NX_REG10[5],resubmit(,8)
>     This flow will be set only on chassis hosting chassisredirect
>     port i.e gateway node.
> 2) A new flow added in lr_in_admission which checks MLF_RCV_FROM_VLAN
>     and allows the packet. This flow will be set only on chassis hosting
>     chassisredirect port i.e gateway node.
>     table=0 (lr_in_admission    ), priority=100  , match=(
>     flags.rcv_from_vlan == 1 && inport == "lrp-44383893-613a-4bfe-b483-
>     e7d0dc3055cd" && is_chassis_resident("cr-lrp-a6e3d2ab-313a-4ea3-
>     8ec4-c3c774a11f49")), action=(next;)
>     Then packet will pass through router ingress and egress pipelines and
>     then to external switch pipeline.
> 
> [1] https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046557.html
> [2] Point 3 in section 3.3.1 - Future Enhancements
>      https://docs.google.com/document/d/1JecGIXPH0RAqfGvD0nmtBdEU1zflHACp8WSRnKCFSgg/edit#
> 
> Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046543.html
> 
> Signed-off-by: Venkata Anil <vkommadi at redhat.com>
> ---
> 
> v5->v6:
> * Rebased
> 
> v4->v5:
> * No changes in this patch
> 
> v3->v4:
> * Previous v3 patch became this patch of v4
> * Updated the newly added flow in physical table 0 on gateway chassis
>    to check for distributed gateway port MAC and then resubmit to
>    router ingress pipeline
> * Improved the test
> * Added more comments
> 
>   ovn/controller/bfd.c            |   3 +-
>   ovn/controller/binding.c        |  10 +-
>   ovn/controller/ovn-controller.c |   3 +
>   ovn/controller/ovn-controller.h |  17 ++-
>   ovn/controller/physical.c       | 121 ++++++++++++++++-
>   ovn/lib/logical-fields.c        |   4 +
>   ovn/lib/logical-fields.h        |   2 +
>   ovn/northd/ovn-northd.c         |  35 +++++
>   tests/ovn.at                    | 278 ++++++++++++++++++++++++++++++++++++++++
>   9 files changed, 465 insertions(+), 8 deletions(-)
> 
> diff --git a/ovn/controller/bfd.c b/ovn/controller/bfd.c
> index 051781f..c696741 100644
> --- a/ovn/controller/bfd.c
> +++ b/ovn/controller/bfd.c
> @@ -139,8 +139,9 @@ bfd_travel_gw_related_chassis(
>       LIST_FOR_EACH_POP (dp_binding, node, &dp_list) {
>           dp = dp_binding->dp;
>           free(dp_binding);
> +        const struct sbrec_datapath_binding *pdp;
>           for (size_t i = 0; i < dp->n_peer_dps; i++) {
> -            const struct sbrec_datapath_binding *pdp = dp->peer_dps[i];
> +            pdp = dp->peer_dps[i]->peer_dp;
>               if (!pdp) {
>                   continue;
>               }
> diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
> index 021ecdd..de4385f 100644
> --- a/ovn/controller/binding.c
> +++ b/ovn/controller/binding.c
> @@ -145,10 +145,14 @@ add_local_datapath__(struct ovsdb_idl_index *sbrec_datapath_binding_by_key,
>       const struct sbrec_port_binding *pb;
>       SBREC_PORT_BINDING_FOR_EACH_EQUAL (pb, target,
>                                          sbrec_port_binding_by_datapath) {
> +        if (!strcmp(pb->type, "chassisredirect")) {
> +            ld->chassisredirect_port = pb;
> +        }
>           if (!strcmp(pb->type, "patch")) {
>               const char *peer_name = smap_get(&pb->options, "peer");
>               if (peer_name) {
>                   const struct sbrec_port_binding *peer;
> +                struct peer_datapath *pdp;
>   
>                   peer = lport_lookup_by_name(sbrec_port_binding_by_name,
>                                               peer_name);
> @@ -163,9 +167,13 @@ add_local_datapath__(struct ovsdb_idl_index *sbrec_datapath_binding_by_key,
>                       ld->peer_dps = xrealloc(
>                               ld->peer_dps,
>                               ld->n_peer_dps * sizeof *ld->peer_dps);
> -                    ld->peer_dps[ld->n_peer_dps - 1] = datapath_lookup_by_key(
> +                    pdp = xcalloc(1, sizeof(struct peer_datapath));

Use xmalloc here instead of xcalloc.

> +                    pdp->peer_dp = datapath_lookup_by_key(
>                           sbrec_datapath_binding_by_key,
>                           peer->datapath->tunnel_key);
> +                    pdp->patch = pb;
> +                    pdp->peer = peer;
> +                    ld->peer_dps[ld->n_peer_dps - 1] = pdp;
>                   }
>               }
>           }
> diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c
> index 6ee72a9..8e04780 100644
> --- a/ovn/controller/ovn-controller.c
> +++ b/ovn/controller/ovn-controller.c
> @@ -812,6 +812,9 @@ main(int argc, char *argv[])
>   
>           struct local_datapath *cur_node, *next_node;
>           HMAP_FOR_EACH_SAFE (cur_node, next_node, hmap_node, &local_datapaths) {
> +            for (int i = 0; i < cur_node->n_peer_dps; i++) {
> +                free(cur_node->peer_dps[i]);
> +            }
>               free(cur_node->peer_dps);
>               hmap_remove(&local_datapaths, &cur_node->hmap_node);
>               free(cur_node);
> diff --git a/ovn/controller/ovn-controller.h b/ovn/controller/ovn-controller.h
> index 3b15620..6c1afd9 100644
> --- a/ovn/controller/ovn-controller.h
> +++ b/ovn/controller/ovn-controller.h
> @@ -40,6 +40,17 @@ struct ct_zone_pending_entry {
>       enum ct_zone_pending_state state;
>   };
>   
> +/* Represents a peer datapath connected to a given datapath */
> +struct peer_datapath {
> +    const struct sbrec_datapath_binding *peer_dp;
> +
> +    /* Patch port connected to local datapath */
> +    const struct sbrec_port_binding *patch;
> +
> +    /* Peer patch port connected to peer datapath */
> +    const struct sbrec_port_binding *peer;
> +};
> +
>   /* A logical datapath that has some relevance to this hypervisor.  A logical
>    * datapath D is relevant to hypervisor H if:
>    *
> @@ -57,10 +68,14 @@ struct local_datapath {
>       /* The localnet port in this datapath, if any (at most one is allowed). */
>       const struct sbrec_port_binding *localnet_port;
>   
> +    /* The chassisredirect port in this datapath, if any
> +     * (at most one is allowed). */
> +    const struct sbrec_port_binding *chassisredirect_port;
> +
>       /* True if this datapath contains an l3gateway port located on this
>        * hypervisor. */
>       bool has_local_l3gateway;
> -    const struct sbrec_datapath_binding **peer_dps;
> +    struct peer_datapath **peer_dps;
>       size_t n_peer_dps;
>   };
>   
> diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c
> index dcf2183..c4cf419 100644
> --- a/ovn/controller/physical.c
> +++ b/ovn/controller/physical.c
> @@ -304,7 +304,8 @@ consider_port_binding(struct ovsdb_idl_index *sbrec_chassis_by_name,
>   {
>       uint32_t dp_key = binding->datapath->tunnel_key;
>       uint32_t port_key = binding->tunnel_key;
> -    if (!get_local_datapath(local_datapaths, dp_key)) {
> +    struct local_datapath *ld = get_local_datapath(local_datapaths, dp_key);
> +    if (!ld) {
>           return;
>       }
>   
> @@ -350,6 +351,16 @@ consider_port_binding(struct ovsdb_idl_index *sbrec_chassis_by_name,
>               put_load(0, MFF_LOG_REG0 + i, 0, 32, ofpacts_p);
>           }
>           put_load(0, MFF_IN_PORT, 0, 16, ofpacts_p);
> +
> +        /* Set MLF_RCV_FROM_VLAN flag for vlan network */
> +        if (ld->localnet_port) {
> +            int vlan_tag = (ld->localnet_port->n_tag ?
> +                            *ld->localnet_port->tag : 0);
> +            if (vlan_tag) {
> +                put_load(1, MFF_LOG_FLAGS, MLF_RCV_FROM_VLAN_BIT, 1,
> +                         ofpacts_p);
> +            }

Since this block doesn't actually use the vlan_tag for anything, it 
could be simplified to:

if (ld->localnet_port->n_tag) {
     put_load(...);
}

I think I'm right here since the "tag" field of the database is 
constrained to being between 1 and 4095.

> +        }
>           put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, ofpacts_p);
>           clone = ofpbuf_at_assert(ofpacts_p, clone_ofs, sizeof *clone);
>           ofpacts_p->header = clone;
> @@ -526,9 +537,15 @@ consider_port_binding(struct ovsdb_idl_index *sbrec_chassis_by_name,
>           put_local_common_flows(dp_key, port_key, nested_container, &zone_ids,
>                                  ofpacts_p, flow_table);
>   
> -        /* Table 0, Priority 150 and 100.
> +        /* Table 0, Priority 200, 150 and 100.
>            * ==============================
>            *
> +         * Priority 200 is for vlan traffic with distributed gateway port MAC
> +         * as destination MAC address. For such traffic, set MLF_RCV_FROM_VLAN
> +         * flag, MFF_LOG_DATAPATH to the router metadata and MFF_LOG_INPORT to
> +         * the patch port connecting router and vlan network and resubmit into
> +         * the logical router ingress pipeline.
> +         *
>            * Priority 150 is for tagged traffic. This may be containers in a
>            * VM or a VLAN on a local network. For such traffic, match on the
>            * tags and then strip the tag.
> @@ -540,6 +557,55 @@ consider_port_binding(struct ovsdb_idl_index *sbrec_chassis_by_name,
>            * input port, MFF_LOG_DATAPATH to the logical datapath, and
>            * resubmit into the logical ingress pipeline starting at table
>            * 16. */
> +
> +        /* For packet from vlan network with distributed gateway port MAC as
> +         * destination MAC address, submit it to router ingress pipeline */
> +        int vlan_tag = binding->n_tag ? *binding->tag : 0;
> +        if (!strcmp(binding->type, "localnet") && vlan_tag) {
> +            struct local_datapath *ldp = get_local_datapath(
> +                local_datapaths, binding->datapath->tunnel_key);

If I'm reading this correctly, ldp is the same as ld. So I think you can 
skip looking up ldp and just ld throughout the rest of this block.

> +            for (int i = 0; i < ldp->n_peer_dps; i++) {
> +                struct local_datapath *peer_ldp = get_local_datapath(
> +                    local_datapaths, ldp->peer_dps[i]->peer_dp->tunnel_key);

Is it possible for peer_ldp to be NULL? If so, you should check for it. 
If not, then you should probably add an assertion.

> +                const struct sbrec_port_binding *crp;
> +                crp = peer_ldp->chassisredirect_port;
> +                if (crp && crp->chassis &&
> +                   !strcmp(crp->chassis->name, chassis->name)) {
> +                    const char *gwp = smap_get(&crp->options,
> +                                               "distributed-port");

Check for gwp to be NULL here. If there is no options:distributed-port 
set, then we're gonna have a bad time.

> +                    if (strcmp(gwp, ldp->peer_dps[i]->peer->logical_port)) {
> +                        ofpbuf_clear(ofpacts_p);
> +                        match_init_catchall(&match);
> +
> +                        match_set_in_port(&match, ofport);
> +                        match_set_dl_vlan(&match, htons(vlan_tag));
> +                        for (int j = 0; j < crp->n_mac; j++) {
> +                            struct lport_addresses laddrs;
> +                            if (!extract_lsp_addresses(crp->mac[j], &laddrs)
> +                                || !laddrs.n_ipv4_addrs) {
> +                                continue;

A couple of notes here:

* Is there a reason why IPv4 addresses are required? Is this not 
intended to work in an IPv6-only scenario?
* In the case where extract_lsp_addresses() succeeds, but there are no 
IPv4 addresses, this will leak laddrs.


> +                            }
> +                            match_set_dl_dst(&match, laddrs.ea);
> +                            destroy_lport_addresses(&laddrs);
> +                            break;
> +                        }

If execution reaches here and we never added destination ethernet 
address to the match, should we continue creating the logical flow? 
Should we bail?

> +
> +                        ofpact_put_STRIP_VLAN(ofpacts_p);
> +                        put_load(peer_ldp->datapath->tunnel_key,
> +                                 MFF_LOG_DATAPATH, 0, 64, ofpacts_p);
> +                        put_load(ldp->peer_dps[i]->peer->tunnel_key,
> +                                 MFF_LOG_INPORT, 0, 32, ofpacts_p);
> +                        put_load(1, MFF_LOG_FLAGS,
> +                                 MLF_RCV_FROM_VLAN_BIT, 1, ofpacts_p);
> +                        put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, ofpacts_p);
> +
> +                        ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG,
> +                                        200, 0, &match, ofpacts_p);
> +                    }
> +                }
> +            }
> +        }
> +
>           ofpbuf_clear(ofpacts_p);
>           match_init_catchall(&match);
>           match_set_in_port(&match, ofport);
> @@ -633,13 +699,58 @@ consider_port_binding(struct ovsdb_idl_index *sbrec_chassis_by_name,
>       } else {
>           /* Remote port connected by tunnel */
>   
> -        /* Table 32, priority 100.
> -         * =======================
> +        /* Table 32, priority 150 and 100.
> +         * ==============================
>            *
>            * Handles traffic that needs to be sent to a remote hypervisor.  Each
>            * flow matches an output port that includes a logical port on a remote
> -         * hypervisor, and tunnels the packet to that hypervisor.
> +         * hypervisor, and tunnels the packet or send through vlan network to
> +         * that hypervisor.
>            */
> +
> +        /* For each vlan network connected to the router, add that network's
> +         * vlan tag to the packet and output it through localnet port */
> +        struct local_datapath *ldp = get_local_datapath(local_datapaths,
> +                                                        dp_key);
> +        for (int i = 0; i < ldp->n_peer_dps; i++) {
> +            struct ofpact_vlan_vid *vlan_vid;
> +            ofp_port_t port_ofport = 0;
> +            struct peer_datapath *pdp = ldp->peer_dps[i];
> +            struct local_datapath *peer_ldp = get_local_datapath(
> +                local_datapaths, pdp->peer_dp->tunnel_key);

I have the same question about peer_ldp here that I did about the 
peer_ldp lookup previously.

> +            if (peer_ldp->localnet_port && pdp->patch->tunnel_key) {
> +                int64_t vlan_tag = (peer_ldp->localnet_port->n_tag ?
> +                                    *peer_ldp->localnet_port->tag : 0);
> +                if (!vlan_tag) {
> +                    continue;
> +                }
> +                port_ofport = u16_to_ofp(simap_get(&localvif_to_ofport,
> +                    peer_ldp->localnet_port->logical_port));
> +                if (!port_ofport) {
> +                    continue;
> +                }
> +
> +                match_init_catchall(&match);
> +                ofpbuf_clear(ofpacts_p);
> +
> +                match_set_metadata(&match, htonll(dp_key));
> +                match_set_reg_masked(&match, MFF_LOG_FLAGS - MFF_REG0,
> +                                     MLF_RCV_FROM_VLAN, MLF_RCV_FROM_VLAN);
> +                match_set_reg(&match, MFF_LOG_INPORT - MFF_REG0,
> +                              pdp->patch->tunnel_key);
> +                match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0, port_key);
> +
> +                vlan_vid = ofpact_put_SET_VLAN_VID(ofpacts_p);
> +                vlan_vid->vlan_vid = vlan_tag;
> +                vlan_vid->push_vlan_if_needed = true;
> +                ofpact_put_OUTPUT(ofpacts_p)->port = port_ofport;
> +                ofpact_put_STRIP_VLAN(ofpacts_p);
> +
> +                ofctrl_add_flow(flow_table, OFTABLE_REMOTE_OUTPUT, 150, 0,
> +                                &match, ofpacts_p);
> +            }
> +        }
> +
>           match_init_catchall(&match);
>           ofpbuf_clear(ofpacts_p);
>   
> diff --git a/ovn/lib/logical-fields.c b/ovn/lib/logical-fields.c
> index a8b5e3c..b9efa02 100644
> --- a/ovn/lib/logical-fields.c
> +++ b/ovn/lib/logical-fields.c
> @@ -105,6 +105,10 @@ ovn_init_symtab(struct shash *symtab)
>                MLF_FORCE_SNAT_FOR_LB_BIT);
>       expr_symtab_add_subfield(symtab, "flags.force_snat_for_lb", NULL,
>                                flags_str);
> +    snprintf(flags_str, sizeof flags_str, "flags[%d]",
> +             MLF_RCV_FROM_VLAN_BIT);
> +    expr_symtab_add_subfield(symtab, "flags.rcv_from_vlan", NULL,
> +                             flags_str);
>   
>       /* Connection tracking state. */
>       expr_symtab_add_field(symtab, "ct_mark", MFF_CT_MARK, NULL, false);
> diff --git a/ovn/lib/logical-fields.h b/ovn/lib/logical-fields.h
> index b1dbb03..96250fd 100644
> --- a/ovn/lib/logical-fields.h
> +++ b/ovn/lib/logical-fields.h
> @@ -50,6 +50,7 @@ enum mff_log_flags_bits {
>       MLF_FORCE_SNAT_FOR_DNAT_BIT = 2,
>       MLF_FORCE_SNAT_FOR_LB_BIT = 3,
>       MLF_LOCAL_ONLY_BIT = 4,
> +    MLF_RCV_FROM_VLAN_BIT = 5,
>   };
>   
>   /* MFF_LOG_FLAGS_REG flag assignments */
> @@ -75,6 +76,7 @@ enum mff_log_flags {
>        * hypervisors should instead only be output to local targets
>        */
>       MLF_LOCAL_ONLY = (1 << MLF_LOCAL_ONLY_BIT),
> +    MLF_RCV_FROM_VLAN = (1 << MLF_RCV_FROM_VLAN_BIT),
>   };
>   
>   #endif /* ovn/lib/logical-fields.h */
> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> index 72fe4e7..350f015 100644
> --- a/ovn/northd/ovn-northd.c
> +++ b/ovn/northd/ovn-northd.c
> @@ -4411,6 +4411,28 @@ add_route(struct hmap *lflows, const struct ovn_port *op,
>        * routing. */
>       ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_ROUTING, priority,
>                     ds_cstr(&match), ds_cstr(&actions));
> +
> +    /* When output port is distributed gateway port, check if the router
> +     * input port is a patch port connected to vlan network.
> +     * Traffic from VLAN network to external network should be redirected
> +     * to "redirect-chassis" by setting REGBIT_NAT_REDIRECT flag.
> +     * Later physical table 32 will output this traffic to gateway
> +     * chassis using input network vlan tag */
> +    if (op == op->od->l3dgw_port) {
> +        ds_clear(&match);
> +        ds_clear(&actions);
> +
> +        ds_put_format(&match, "ip%s.%s == %s/%d", is_ipv4 ? "4" : "6",
> +                      dir, network_s, plen);
> +        ds_put_format(&match, " && flags.rcv_from_vlan == 1");
> +        ds_put_format(&match, " && !is_chassis_resident(%s)",
> +                      op->od->l3redirect_port->json_key);
> +
> +        ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_ROUTING,
> +                      priority + 1, ds_cstr(&match),
> +                      REGBIT_NAT_REDIRECT" = 1; next;");
> +    }
> +
>       ds_destroy(&match);
>       ds_destroy(&actions);
>   }
> @@ -4822,6 +4844,19 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
>           }
>           ovn_lflow_add(lflows, op->od, S_ROUTER_IN_ADMISSION, 50,
>                         ds_cstr(&match), "next;");
> +
> +        /* VLAN traffic from localnet port should be allowed for
> +         * router processing on the "redirect-chassis". */
> +        if (op->od->l3dgw_port && op->od->l3redirect_port && op->peer &&
> +            op->peer->od->localnet_port && (op != op->od->l3dgw_port)) {
> +            ds_clear(&match);
> +            ds_put_format(&match, "flags.rcv_from_vlan == 1");
> +            ds_put_format(&match, " && inport == %s", op->json_key);
> +            ds_put_format(&match, " && is_chassis_resident(%s)",
> +                          op->od->l3redirect_port->json_key);
> +            ovn_lflow_add(lflows, op->od, S_ROUTER_IN_ADMISSION, 100,
> +                          ds_cstr(&match), "next;");
> +        }
>       }
>   
>       /* Logical router ingress table 1: IP Input. */
> diff --git a/tests/ovn.at b/tests/ovn.at
> index 6553d17..5ae767d 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -7713,6 +7713,284 @@ test_ip_packet gw2 gw1
>   OVN_CLEANUP([hv1],[gw1],[gw2],[ext1])
>   AT_CLEANUP
>   
> +# VLAN traffic for external network redirected through distributed router gateway port
> +# should use vlans(i.e input network vlan tag) across hypervisors instead of tunneling.
> +AT_SETUP([ovn -- vlan traffic for external network with distributed router gateway port])
> +AT_SKIP_IF([test $HAVE_PYTHON = no])
> +ovn_start
> +
> +# Logical network:
> +# # One LR R1 that has switches foo (192.168.1.0/24) and
> +# # alice (172.16.1.0/24) connected to it.  The logical port
> +# # between R1 and alice has a "redirect-chassis" specified,
> +# # i.e. it is the distributed router gateway port(172.16.1.6).
> +# # Switch alice also has a localnet port defined.
> +# # An additional switch outside has the same subnet as alice
> +# # (172.16.1.0/24), a localnet port and nexthop port(172.16.1.1)
> +# # which will receive the packet destined for external network
> +# # (i.e 8.8.8.8 as destination ip).
> +
> +# Physical network:
> +# # Three hypervisors hv[123].
> +# # hv1 hosts vif foo1.
> +# # hv2 is the "redirect-chassis" that hosts the distributed router gateway port.
> +# # hv3 hosts nexthop port vif outside1.
> +# # All other tests connect hypervisors to network n1 through br-phys for tunneling.
> +# # But in this test, hv1 won't connect to n1(and no br-phys in hv1), and
> +# # in order to show vlans(instead of tunneling) used between hv1 and hv2,
> +# # a new network n2 created and hv1 and hv2 connected to this network through br-ex.
> +# # hv2 and hv3 are still connected to n1 network through br-phys.
> +net_add n1
> +
> +# We are not calling ovn_attach for hv1, to avoid adding br-phys.
> +# Tunneling won't work in hv1 as ovn-encap-ip is not added to any bridge in hv1
> +sim_add hv1
> +as hv1
> +ovs-vsctl \
> +    -- set Open_vSwitch . external-ids:system-id=hv1 \
> +    -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
> +    -- set Open_vSwitch . external-ids:ovn-encap-type=geneve,vxlan \
> +    -- set Open_vSwitch . external-ids:ovn-encap-ip=192.168.0.1 \
> +    -- add-br br-int \
> +    -- set bridge br-int fail-mode=secure other-config:disable-in-band=true \
> +    -- set Open_vSwitch . external-ids:ovn-bridge-mappings=public:br-ex
> +
> +start_daemon ovn-controller
> +ovs-vsctl -- add-port br-int hv1-vif1 -- \
> +    set interface hv1-vif1 external-ids:iface-id=foo1 \
> +    ofport-request=1
> +
> +sim_add hv2
> +as hv2
> +ovs-vsctl add-br br-phys
> +ovn_attach n1 br-phys 192.168.0.2
> +ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings="public:br-ex,phys:br-phys"
> +
> +sim_add hv3
> +as hv3
> +ovs-vsctl add-br br-phys
> +ovn_attach n1 br-phys 192.168.0.3
> +ovs-vsctl -- add-port br-int hv3-vif1 -- \
> +    set interface hv3-vif1 external-ids:iface-id=outside1 \
> +    options:tx_pcap=hv3/vif1-tx.pcap \
> +    options:rxq_pcap=hv3/vif1-rx.pcap \
> +    ofport-request=1
> +ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings="phys:br-phys"
> +
> +# Create network n2 for vlan connectivity between hv1 and hv2
> +net_add n2
> +
> +as hv1
> +ovs-vsctl add-br br-ex
> +net_attach n2 br-ex
> +
> +as hv2
> +ovs-vsctl add-br br-ex
> +net_attach n2 br-ex
> +
> +OVN_POPULATE_ARP
> +
> +ovn-nbctl create Logical_Router name=R1
> +
> +ovn-nbctl ls-add foo
> +ovn-nbctl ls-add alice
> +ovn-nbctl ls-add outside
> +
> +# Connect foo to R1
> +ovn-nbctl lrp-add R1 foo 00:00:01:01:02:03 192.168.1.1/24
> +ovn-nbctl lsp-add foo rp-foo -- set Logical_Switch_Port rp-foo \
> +    type=router options:router-port=foo \
> +    -- lsp-set-addresses rp-foo router
> +
> +# Connect alice to R1 as distributed router gateway port (172.16.1.6) on hv2
> +ovn-nbctl lrp-add R1 alice 00:00:02:01:02:03 172.16.1.6/24 \
> +    -- set Logical_Router_Port alice options:redirect-chassis="hv2"
> +ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice \
> +    type=router options:router-port=alice \
> +    -- lsp-set-addresses rp-alice router \
> +
> +
> +# Create logical port foo1 in foo
> +ovn-nbctl lsp-add foo foo1 \
> +-- lsp-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2"
> +
> +# Create logical port outside1 in outside, which is a nexthop address
> +# for 172.16.1.0/24
> +ovn-nbctl lsp-add outside outside1 \
> +-- lsp-set-addresses outside1 "f0:00:00:01:02:04 172.16.1.1"
> +
> +# Set default gateway (nexthop) to 172.16.1.1
> +ovn-nbctl lr-route-add R1 "0.0.0.0/0" 172.16.1.1 alice
> +AT_CHECK([ovn-nbctl lr-nat-add R1 snat 172.16.1.6 192.168.1.1/24])
> +ovn-nbctl set Logical_Switch_Port rp-alice options:nat-addresses=router
> +
> +ovn-nbctl lsp-add foo ln-foo
> +ovn-nbctl lsp-set-addresses ln-foo unknown
> +ovn-nbctl lsp-set-options ln-foo network_name=public
> +ovn-nbctl lsp-set-type ln-foo localnet
> +AT_CHECK([ovn-nbctl set Logical_Switch_Port ln-foo tag=2])
> +
> +# Create localnet port in alice
> +ovn-nbctl lsp-add alice ln-alice
> +ovn-nbctl lsp-set-addresses ln-alice unknown
> +ovn-nbctl lsp-set-type ln-alice localnet
> +ovn-nbctl lsp-set-options ln-alice network_name=phys
> +
> +# Create localnet port in outside
> +ovn-nbctl lsp-add outside ln-outside
> +ovn-nbctl lsp-set-addresses ln-outside unknown
> +ovn-nbctl lsp-set-type ln-outside localnet
> +ovn-nbctl lsp-set-options ln-outside network_name=phys
> +
> +# Allow some time for ovn-northd and ovn-controller to catch up.
> +# XXX This should be more systematic.
> +ovn-nbctl --wait=hv --timeout=3 sync
> +
> +echo "---------NB dump-----"
> +ovn-nbctl show
> +echo "---------------------"
> +ovn-nbctl list logical_router
> +echo "---------------------"
> +ovn-nbctl list nat
> +echo "---------------------"
> +ovn-nbctl list logical_router_port
> +echo "---------------------"
> +
> +echo "---------SB dump-----"
> +ovn-sbctl list datapath_binding
> +echo "---------------------"
> +ovn-sbctl list port_binding
> +echo "---------------------"
> +ovn-sbctl dump-flows
> +echo "---------------------"
> +ovn-sbctl list chassis
> +echo "---------------------"
> +
> +for chassis in hv1 hv2 hv3; do
> +    as $chassis
> +    echo "------ $chassis dump ----------"
> +    ovs-vsctl show br-int
> +    ovs-ofctl show br-int
> +    ovs-ofctl dump-flows br-int
> +    echo "--------------------------"
> +done
> +
> +ip_to_hex() {
> +    printf "%02x%02x%02x%02x" "$@"
> +}
> +
> +foo1_ip=$(ip_to_hex 192 168 1 2)
> +gw_ip=$(ip_to_hex 172 16 1 6)
> +dst_ip=$(ip_to_hex 8 8 8 8)
> +nexthop_ip=$(ip_to_hex 172 16 1 1)
> +
> +foo1_mac="f00000010203"
> +foo_mac="000001010203"
> +gw_mac="000002010203"
> +nexthop_mac="f00000010204"
> +
> +# Send ip packet from foo1 to 8.8.8.8
> +src_mac="f00000010203"
> +dst_mac="000001010203"
> +packet=${foo_mac}${foo1_mac}08004500001c0000000040110000${foo1_ip}${dst_ip}0035111100080000
> +
> +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
> +sleep 2
> +
> +# ARP request packet for nexthop_ip to expect at outside1
> +arp_request=ffffffffffff${gw_mac}08060001080006040001${gw_mac}${gw_ip}000000000000${nexthop_ip}
> +echo $arp_request >> hv3-vif1.expected
> +cat hv3-vif1.expected > expout
> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep ${nexthop_ip} | uniq > hv3-vif1
> +AT_CHECK([sort hv3-vif1], [0], [expout])
> +
> +# Send ARP reply from outside1 back to the router
> +reply_mac="f00000010204"
> +arp_reply=${gw_mac}${nexthop_mac}08060001080006040002${nexthop_mac}${nexthop_ip}${gw_mac}${gw_ip}
> +
> +as hv3 ovs-appctl netdev-dummy/receive hv3-vif1 $arp_reply
> +OVS_WAIT_UNTIL([
> +    test `as hv2 ovs-ofctl dump-flows br-int | grep table=66 | \
> +grep actions=mod_dl_dst:f0:00:00:01:02:04 | wc -l` -eq 1
> +    ])
> +
> +echo "ovn-sbctl list MAC_Binding"
> +ovn-sbctl list MAC_Binding
> +echo "============================"
> +
> +# VLAN tagged packet with distributed gateway port(172.16.1.6) MAC as destination MAC
> +# is expected on bridge connecting hv1 and hv2
> +expected=${gw_mac}${foo1_mac}8100000208004500001c0000000040110000${foo1_ip}${dst_ip}0035111100080000
> +echo $expected > hv1-br-ex_n2.expected
> +
> +# Packet to Expect at outside1 i.e nexthop(172.16.1.1) port.
> +# As connection tracking not enabled for this test, snat can't be done on the packet.
> +# We still see foo1 as the source ip address. But source mac(gateway MAC) and
> +# dest mac(nexthop mac) are properly configured.
> +expected=${nexthop_mac}${gw_mac}08004500001c000000003f110100${foo1_ip}${dst_ip}0035111100080000
> +echo $expected > hv3-vif1.expected
> +
> +reset_pcap_file() {
> +    local iface=$1
> +    local pcap_file=$2
> +    ovs-vsctl -- set Interface $iface options:tx_pcap=dummy-tx.pcap \
> +options:rxq_pcap=dummy-rx.pcap
> +    rm -f ${pcap_file}*.pcap
> +    ovs-vsctl -- set Interface $iface options:tx_pcap=${pcap_file}-tx.pcap \
> +options:rxq_pcap=${pcap_file}-rx.pcap
> +}
> +
> +as hv1 reset_pcap_file br-ex_n2 hv1/br-ex_n2
> +as hv3 reset_pcap_file hv3-vif1 hv3/vif1
> +sleep 2
> +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
> +sleep 2
> +
> +# On hv1, table 65 for packets going from vlan switch pipleline to router pipleine
> +# set MLF_RCV_FROM_VLAN flag
> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep "priority=100,reg15=0x1,metadata=0x2" \
> +| grep actions=clone | grep "load:0x1->NXM_NX_REG10" | wc -l], [0], [[1
> +]])
> +# On hv1, because of snat rule in table 15, a higher priority(i.e 2) flow
> +# added for packets with MLF_RCV_FROM_VLAN flag with output as distributed
> +# gateway port, which sets REGBIT_NAT_REDIRECT flag
> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=15 | grep "priority=2,ip,reg10=0x20/0x20,metadata=0x1" \
> +| grep "actions=load:0x1->OXM_OF_PKT_REG4" | wc -l], [0], [[1
> +]])
> +
> +# On hv1, table 32 flow which tags packet with source network vlan tag and sends it to hv2
> +# through br-ex
> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=32 | grep "priority=150,reg10=0x20/0x20,reg14=0x1,reg15=0x3,metadata=0x1" \
> +| grep "actions=mod_vlan_vid:2" | grep "n_packets=2," | wc -l], [0], [[1
> +]])
> +
> +# On hv2 table 0, vlan tagged packet is sent through router pipeline
> +# by setting MLF_RCV_FROM_VLAN flag (REG10)
> +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep "table=0," | grep "priority=200" | grep "dl_vlan=2" | \
> +grep "dl_dst=00:00:02:01:02:03" | grep "actions=strip_vlan,load:0x1->OXM_OF_METADATA" | grep "load:0x1->NXM_NX_REG14" | \
> +grep "load:0x1->NXM_NX_REG10" | wc -l], [0], [[1
> +]])
> +# on hv2 table 8, allow packets with router metadata and with MLF_RCV_FROM_VLAN flag
> +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int table=8 | grep "priority=100,reg10=0x20/0x20,reg14=0x1,metadata=0x1" | wc -l], [0], [[1
> +]])
> +
> +ip_packet() {
> +    grep "2010203f00000010203"
> +}
> +
> +# Check vlan tagged packet on the bridge connecting hv1 and hv2
> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv1/br-ex_n2-tx.pcap | ip_packet | uniq > hv1-br-ex_n2
> +cat hv1-br-ex_n2.expected > expout
> +AT_CHECK([sort hv1-br-ex_n2], [0], [expout])
> +
> +# Check expected packet on nexthop interface
> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep ${foo1_ip}${dst_ip} | uniq > hv3-vif1
> +cat hv3-vif1.expected > expout
> +AT_CHECK([sort hv3-vif1], [0], [expout])
> +
> +OVN_CLEANUP([hv1],[hv2],[hv3])
> +AT_CLEANUP
> +
>   AT_SETUP([ovn -- 1 LR with distributed router gateway port])
>   AT_SKIP_IF([test $HAVE_PYTHON = no])
>   ovn_start
> 



More information about the dev mailing list