[ovs-dev] [PATCH v6 1/3] Avoid tunneling for VLAN packets redirected to a gateway chassis

Anil Venkata anilvenkata at redhat.com
Thu Jul 26 13:52:33 UTC 2018


Thanks Mark for the review. I will address the review comments.

On Thu, Jul 19, 2018 at 6:54 PM, Mark Michelson <mmichels at redhat.com> wrote:

> I've had a look through and have some notes in-line. I know some of them
> will be a bit nit-picky, but...sometimes that's just how I am :)
>
>
> On 06/25/2018 03:53 PM, vkommadi at redhat.com wrote:
>
>> From: venkata anil <vkommadi at redhat.com>
>>
>> When a vm on a vlan tenant network sends traffic to an external network,
>> it is tunneled from host chassis to gateway chassis. In the earlier
>> discussion [1], Russel (also in his doc [2]) suggested if we can figure
>> out a way for OVN to do this redirect to the gateway host over a VLAN
>> network. This patch implements his suggestion i.e will redirect to
>> gateway chassis using incoming tenant vlan network. Gateway chassis are
>> expected to be configured with tenant vlan networks. In this approach,
>> new logical and physical flows introduced for packet processing in both
>> host and gateway chassis.
>>
>> Packet processing in the host chassis:
>> 1) A new ovs flow added in physical table 65, which sets MLF_RCV_FROM_VLAN
>>     flag for packets from vlan network entering into router pipeline
>> 2) A new flow added in lr_in_ip_routing, for packets output through
>>     distributed gateway port and matching MLF_RCV_FROM_VLAN flag,
>>     set REGBIT_NAT_REDIRECT i.e
>>     table=7 (lr_in_ip_routing   ), priority=2    , match=(
>>     ip4.dst == 0.0.0.0/0 && flags.rcv_from_vlan == 1 &&
>>     !is_chassis_resident("cr-alice")), action=(reg9[0] = 1; next;)
>>     This flow will be set only on chassis not hosting chassisredirect
>>     port i.e compute node.
>>     When REGBIT_NAT_REDIRECT set,
>>     a) lr_in_arp_resolve, will set packet eth.dst to distibuted gateway
>>        port MAC
>>     b) lr_in_gw_redirect, will set chassisredirect port as outport
>> 3) A new ovs flow added in physical table 32 will use source vlan tenant
>>     network tag as vlan ID for sending the packet to gateway chassis.
>>     As this vlan packet destination MAC is distibuted gateway port MAC,
>>     packet will only reach the gateway chassis.
>>     table=32,priority=150,reg10=0x20/0x20,reg14=0x3,reg15=0x6,me
>> tadata=0x4
>>     actions=mod_vlan_vid:2010,output:25,strip_vlan
>>     This flow will be set only on chassis not hosting chassisredirect
>>     port i.e compute node.
>>
>> Packet processing in the gateway chassis:
>> 1) A new ovs flow added in physical table 0 for vlan traffic coming
>>     from localnet port with router distributed gateway port MAC as
>>     destination MAC address, resubmit to connected router ingress pipeline
>>     (i.e router attached to vlan tenant network).
>>     table=0,priority=150,in_port=67,dl_vlan=2010,dl_dst=00:00:02:01:02:03
>>     actions=strip_vlan,load:0x4->OXM_OF_METADATA[],load:0x3->NXM
>> _NX_REG14[],
>>     load:0x1->NXM_NX_REG10[5],resubmit(,8)
>>     This flow will be set only on chassis hosting chassisredirect
>>     port i.e gateway node.
>> 2) A new flow added in lr_in_admission which checks MLF_RCV_FROM_VLAN
>>     and allows the packet. This flow will be set only on chassis hosting
>>     chassisredirect port i.e gateway node.
>>     table=0 (lr_in_admission    ), priority=100  , match=(
>>     flags.rcv_from_vlan == 1 && inport == "lrp-44383893-613a-4bfe-b483-
>>     e7d0dc3055cd" && is_chassis_resident("cr-lrp-a6e3d2ab-313a-4ea3-
>>     8ec4-c3c774a11f49")), action=(next;)
>>     Then packet will pass through router ingress and egress pipelines and
>>     then to external switch pipeline.
>>
>> [1] https://mail.openvswitch.org/pipermail/ovs-discuss/2018-Apri
>> l/046557.html
>> [2] Point 3 in section 3.3.1 - Future Enhancements
>>      https://docs.google.com/document/d/1JecGIXPH0RAqfGvD0nmtBdE
>> U1zflHACp8WSRnKCFSgg/edit#
>>
>> Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-Apri
>> l/046543.html
>>
>> Signed-off-by: Venkata Anil <vkommadi at redhat.com>
>> ---
>>
>> v5->v6:
>> * Rebased
>>
>> v4->v5:
>> * No changes in this patch
>>
>> v3->v4:
>> * Previous v3 patch became this patch of v4
>> * Updated the newly added flow in physical table 0 on gateway chassis
>>    to check for distributed gateway port MAC and then resubmit to
>>    router ingress pipeline
>> * Improved the test
>> * Added more comments
>>
>>   ovn/controller/bfd.c            |   3 +-
>>   ovn/controller/binding.c        |  10 +-
>>   ovn/controller/ovn-controller.c |   3 +
>>   ovn/controller/ovn-controller.h |  17 ++-
>>   ovn/controller/physical.c       | 121 ++++++++++++++++-
>>   ovn/lib/logical-fields.c        |   4 +
>>   ovn/lib/logical-fields.h        |   2 +
>>   ovn/northd/ovn-northd.c         |  35 +++++
>>   tests/ovn.at                    | 278 ++++++++++++++++++++++++++++++
>> ++++++++++
>>   9 files changed, 465 insertions(+), 8 deletions(-)
>>
>> diff --git a/ovn/controller/bfd.c b/ovn/controller/bfd.c
>> index 051781f..c696741 100644
>> --- a/ovn/controller/bfd.c
>> +++ b/ovn/controller/bfd.c
>> @@ -139,8 +139,9 @@ bfd_travel_gw_related_chassis(
>>       LIST_FOR_EACH_POP (dp_binding, node, &dp_list) {
>>           dp = dp_binding->dp;
>>           free(dp_binding);
>> +        const struct sbrec_datapath_binding *pdp;
>>           for (size_t i = 0; i < dp->n_peer_dps; i++) {
>> -            const struct sbrec_datapath_binding *pdp = dp->peer_dps[i];
>> +            pdp = dp->peer_dps[i]->peer_dp;
>>               if (!pdp) {
>>                   continue;
>>               }
>> diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
>> index 021ecdd..de4385f 100644
>> --- a/ovn/controller/binding.c
>> +++ b/ovn/controller/binding.c
>> @@ -145,10 +145,14 @@ add_local_datapath__(struct ovsdb_idl_index
>> *sbrec_datapath_binding_by_key,
>>       const struct sbrec_port_binding *pb;
>>       SBREC_PORT_BINDING_FOR_EACH_EQUAL (pb, target,
>>                                          sbrec_port_binding_by_datapath)
>> {
>> +        if (!strcmp(pb->type, "chassisredirect")) {
>> +            ld->chassisredirect_port = pb;
>> +        }
>>           if (!strcmp(pb->type, "patch")) {
>>               const char *peer_name = smap_get(&pb->options, "peer");
>>               if (peer_name) {
>>                   const struct sbrec_port_binding *peer;
>> +                struct peer_datapath *pdp;
>>                     peer = lport_lookup_by_name(sbrec_por
>> t_binding_by_name,
>>                                               peer_name);
>> @@ -163,9 +167,13 @@ add_local_datapath__(struct ovsdb_idl_index
>> *sbrec_datapath_binding_by_key,
>>                       ld->peer_dps = xrealloc(
>>                               ld->peer_dps,
>>                               ld->n_peer_dps * sizeof *ld->peer_dps);
>> -                    ld->peer_dps[ld->n_peer_dps - 1] =
>> datapath_lookup_by_key(
>> +                    pdp = xcalloc(1, sizeof(struct peer_datapath));
>>
>
> Use xmalloc here instead of xcalloc.
>
>
sure

>
> +                    pdp->peer_dp = datapath_lookup_by_key(
>>                           sbrec_datapath_binding_by_key,
>>                           peer->datapath->tunnel_key);
>> +                    pdp->patch = pb;
>> +                    pdp->peer = peer;
>> +                    ld->peer_dps[ld->n_peer_dps - 1] = pdp;
>>                   }
>>               }
>>           }
>> diff --git a/ovn/controller/ovn-controller.c
>> b/ovn/controller/ovn-controller.c
>> index 6ee72a9..8e04780 100644
>> --- a/ovn/controller/ovn-controller.c
>> +++ b/ovn/controller/ovn-controller.c
>> @@ -812,6 +812,9 @@ main(int argc, char *argv[])
>>             struct local_datapath *cur_node, *next_node;
>>           HMAP_FOR_EACH_SAFE (cur_node, next_node, hmap_node,
>> &local_datapaths) {
>> +            for (int i = 0; i < cur_node->n_peer_dps; i++) {
>> +                free(cur_node->peer_dps[i]);
>> +            }
>>               free(cur_node->peer_dps);
>>               hmap_remove(&local_datapaths, &cur_node->hmap_node);
>>               free(cur_node);
>> diff --git a/ovn/controller/ovn-controller.h
>> b/ovn/controller/ovn-controller.h
>> index 3b15620..6c1afd9 100644
>> --- a/ovn/controller/ovn-controller.h
>> +++ b/ovn/controller/ovn-controller.h
>> @@ -40,6 +40,17 @@ struct ct_zone_pending_entry {
>>       enum ct_zone_pending_state state;
>>   };
>>   +/* Represents a peer datapath connected to a given datapath */
>> +struct peer_datapath {
>> +    const struct sbrec_datapath_binding *peer_dp;
>> +
>> +    /* Patch port connected to local datapath */
>> +    const struct sbrec_port_binding *patch;
>> +
>> +    /* Peer patch port connected to peer datapath */
>> +    const struct sbrec_port_binding *peer;
>> +};
>> +
>>   /* A logical datapath that has some relevance to this hypervisor.  A
>> logical
>>    * datapath D is relevant to hypervisor H if:
>>    *
>> @@ -57,10 +68,14 @@ struct local_datapath {
>>       /* The localnet port in this datapath, if any (at most one is
>> allowed). */
>>       const struct sbrec_port_binding *localnet_port;
>>   +    /* The chassisredirect port in this datapath, if any
>> +     * (at most one is allowed). */
>> +    const struct sbrec_port_binding *chassisredirect_port;
>> +
>>       /* True if this datapath contains an l3gateway port located on this
>>        * hypervisor. */
>>       bool has_local_l3gateway;
>> -    const struct sbrec_datapath_binding **peer_dps;
>> +    struct peer_datapath **peer_dps;
>>       size_t n_peer_dps;
>>   };
>>   diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c
>> index dcf2183..c4cf419 100644
>> --- a/ovn/controller/physical.c
>> +++ b/ovn/controller/physical.c
>> @@ -304,7 +304,8 @@ consider_port_binding(struct ovsdb_idl_index
>> *sbrec_chassis_by_name,
>>   {
>>       uint32_t dp_key = binding->datapath->tunnel_key;
>>       uint32_t port_key = binding->tunnel_key;
>> -    if (!get_local_datapath(local_datapaths, dp_key)) {
>> +    struct local_datapath *ld = get_local_datapath(local_datapaths,
>> dp_key);
>> +    if (!ld) {
>>           return;
>>       }
>>   @@ -350,6 +351,16 @@ consider_port_binding(struct ovsdb_idl_index
>> *sbrec_chassis_by_name,
>>               put_load(0, MFF_LOG_REG0 + i, 0, 32, ofpacts_p);
>>           }
>>           put_load(0, MFF_IN_PORT, 0, 16, ofpacts_p);
>> +
>> +        /* Set MLF_RCV_FROM_VLAN flag for vlan network */
>> +        if (ld->localnet_port) {
>> +            int vlan_tag = (ld->localnet_port->n_tag ?
>> +                            *ld->localnet_port->tag : 0);
>> +            if (vlan_tag) {
>> +                put_load(1, MFF_LOG_FLAGS, MLF_RCV_FROM_VLAN_BIT, 1,
>> +                         ofpacts_p);
>> +            }
>>
>
> Since this block doesn't actually use the vlan_tag for anything, it could
> be simplified to:
>
> if (ld->localnet_port->n_tag) {
>     put_load(...);
> }
>
> I think I'm right here since the "tag" field of the database is
> constrained to being between 1 and 4095.
>
>
Sure

>
> +        }
>>           put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, ofpacts_p);
>>           clone = ofpbuf_at_assert(ofpacts_p, clone_ofs, sizeof *clone);
>>           ofpacts_p->header = clone;
>> @@ -526,9 +537,15 @@ consider_port_binding(struct ovsdb_idl_index
>> *sbrec_chassis_by_name,
>>           put_local_common_flows(dp_key, port_key, nested_container,
>> &zone_ids,
>>                                  ofpacts_p, flow_table);
>>   -        /* Table 0, Priority 150 and 100.
>> +        /* Table 0, Priority 200, 150 and 100.
>>            * ==============================
>>            *
>> +         * Priority 200 is for vlan traffic with distributed gateway
>> port MAC
>> +         * as destination MAC address. For such traffic, set
>> MLF_RCV_FROM_VLAN
>> +         * flag, MFF_LOG_DATAPATH to the router metadata and
>> MFF_LOG_INPORT to
>> +         * the patch port connecting router and vlan network and
>> resubmit into
>> +         * the logical router ingress pipeline.
>> +         *
>>            * Priority 150 is for tagged traffic. This may be containers
>> in a
>>            * VM or a VLAN on a local network. For such traffic, match on
>> the
>>            * tags and then strip the tag.
>> @@ -540,6 +557,55 @@ consider_port_binding(struct ovsdb_idl_index
>> *sbrec_chassis_by_name,
>>            * input port, MFF_LOG_DATAPATH to the logical datapath, and
>>            * resubmit into the logical ingress pipeline starting at table
>>            * 16. */
>> +
>> +        /* For packet from vlan network with distributed gateway port
>> MAC as
>> +         * destination MAC address, submit it to router ingress pipeline
>> */
>> +        int vlan_tag = binding->n_tag ? *binding->tag : 0;
>> +        if (!strcmp(binding->type, "localnet") && vlan_tag) {
>> +            struct local_datapath *ldp = get_local_datapath(
>> +                local_datapaths, binding->datapath->tunnel_key);
>>
>
> If I'm reading this correctly, ldp is the same as ld. So I think you can
> skip looking up ldp and just ld throughout the rest of this block.


Sure

>
>
> +            for (int i = 0; i < ldp->n_peer_dps; i++) {
>> +                struct local_datapath *peer_ldp = get_local_datapath(
>> +                    local_datapaths, ldp->peer_dps[i]->peer_dp->tun
>> nel_key);
>>
>
> Is it possible for peer_ldp to be NULL? If so, you should check for it. If
> not, then you should probably add an assertion.


peer_ldp shoudn't be NULL. How to add assertion for checking not NULL?

>
>
> +                const struct sbrec_port_binding *crp;
>> +                crp = peer_ldp->chassisredirect_port;
>> +                if (crp && crp->chassis &&
>> +                   !strcmp(crp->chassis->name, chassis->name)) {
>> +                    const char *gwp = smap_get(&crp->options,
>> +                                               "distributed-port");
>>
>
> Check for gwp to be NULL here. If there is no options:distributed-port
> set, then we're gonna have a bad time.


Sure.  But gwp can never be NULL. Chassisredirect port will be created from
 distributed gateway port.

>
>
> +                    if (strcmp(gwp, ldp->peer_dps[i]->peer->logical_port))
>> {
>> +                        ofpbuf_clear(ofpacts_p);
>> +                        match_init_catchall(&match);
>> +
>> +                        match_set_in_port(&match, ofport);
>> +                        match_set_dl_vlan(&match, htons(vlan_tag));
>> +                        for (int j = 0; j < crp->n_mac; j++) {
>> +                            struct lport_addresses laddrs;
>> +                            if (!extract_lsp_addresses(crp->mac[j],
>> &laddrs)
>> +                                || !laddrs.n_ipv4_addrs) {
>> +                                continue;
>>
>
> A couple of notes here:
>
> * Is there a reason why IPv4 addresses are required? Is this not intended
> to work in an IPv6-only scenario?
>

Thanks Mark. Its my mistake, shouldn't have checked for IPv4 addresses ( I
have blindly copied from pinctrl.c).

* In the case where extract_lsp_addresses() succeeds, but there are no IPv4
> addresses, this will leak laddrs.
>
>
> +                            }
>> +                            match_set_dl_dst(&match, laddrs.ea);
>> +                            destroy_lport_addresses(&laddrs);
>> +                            break;
>> +                        }
>>
>
> If execution reaches here and we never added destination ethernet address
> to the match, should we continue creating the logical flow? Should we bail?


Chassisredirect port will have MAC. But I will add a check and bail if no
MAC present.

>
>
> +
>> +                        ofpact_put_STRIP_VLAN(ofpacts_p);
>> +                        put_load(peer_ldp->datapath->tunnel_key,
>> +                                 MFF_LOG_DATAPATH, 0, 64, ofpacts_p);
>> +                        put_load(ldp->peer_dps[i]->peer->tunnel_key,
>> +                                 MFF_LOG_INPORT, 0, 32, ofpacts_p);
>> +                        put_load(1, MFF_LOG_FLAGS,
>> +                                 MLF_RCV_FROM_VLAN_BIT, 1, ofpacts_p);
>> +                        put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE,
>> ofpacts_p);
>> +
>> +                        ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG,
>> +                                        200, 0, &match, ofpacts_p);
>> +                    }
>> +                }
>> +            }
>> +        }
>> +
>>           ofpbuf_clear(ofpacts_p);
>>           match_init_catchall(&match);
>>           match_set_in_port(&match, ofport);
>> @@ -633,13 +699,58 @@ consider_port_binding(struct ovsdb_idl_index
>> *sbrec_chassis_by_name,
>>       } else {
>>           /* Remote port connected by tunnel */
>>   -        /* Table 32, priority 100.
>> -         * =======================
>> +        /* Table 32, priority 150 and 100.
>> +         * ==============================
>>            *
>>            * Handles traffic that needs to be sent to a remote
>> hypervisor.  Each
>>            * flow matches an output port that includes a logical port on
>> a remote
>> -         * hypervisor, and tunnels the packet to that hypervisor.
>> +         * hypervisor, and tunnels the packet or send through vlan
>> network to
>> +         * that hypervisor.
>>            */
>> +
>> +        /* For each vlan network connected to the router, add that
>> network's
>> +         * vlan tag to the packet and output it through localnet port */
>> +        struct local_datapath *ldp = get_local_datapath(local_datapaths,
>> +                                                        dp_key);
>> +        for (int i = 0; i < ldp->n_peer_dps; i++) {
>> +            struct ofpact_vlan_vid *vlan_vid;
>> +            ofp_port_t port_ofport = 0;
>> +            struct peer_datapath *pdp = ldp->peer_dps[i];
>> +            struct local_datapath *peer_ldp = get_local_datapath(
>> +                local_datapaths, pdp->peer_dp->tunnel_key);
>>
>
> I have the same question about peer_ldp here that I did about the peer_ldp
> lookup previously.
>
>
> +            if (peer_ldp->localnet_port && pdp->patch->tunnel_key) {
>> +                int64_t vlan_tag = (peer_ldp->localnet_port->n_tag ?
>> +                                    *peer_ldp->localnet_port->tag : 0);
>> +                if (!vlan_tag) {
>> +                    continue;
>> +                }
>> +                port_ofport = u16_to_ofp(simap_get(&localvif_to_ofport,
>> +                    peer_ldp->localnet_port->logical_port));
>> +                if (!port_ofport) {
>> +                    continue;
>> +                }
>> +
>> +                match_init_catchall(&match);
>> +                ofpbuf_clear(ofpacts_p);
>> +
>> +                match_set_metadata(&match, htonll(dp_key));
>> +                match_set_reg_masked(&match, MFF_LOG_FLAGS - MFF_REG0,
>> +                                     MLF_RCV_FROM_VLAN,
>> MLF_RCV_FROM_VLAN);
>> +                match_set_reg(&match, MFF_LOG_INPORT - MFF_REG0,
>> +                              pdp->patch->tunnel_key);
>> +                match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0,
>> port_key);
>> +
>> +                vlan_vid = ofpact_put_SET_VLAN_VID(ofpacts_p);
>> +                vlan_vid->vlan_vid = vlan_tag;
>> +                vlan_vid->push_vlan_if_needed = true;
>> +                ofpact_put_OUTPUT(ofpacts_p)->port = port_ofport;
>> +                ofpact_put_STRIP_VLAN(ofpacts_p);
>> +
>> +                ofctrl_add_flow(flow_table, OFTABLE_REMOTE_OUTPUT, 150,
>> 0,
>> +                                &match, ofpacts_p);
>> +            }
>> +        }
>> +
>>           match_init_catchall(&match);
>>           ofpbuf_clear(ofpacts_p);
>>   diff --git a/ovn/lib/logical-fields.c b/ovn/lib/logical-fields.c
>> index a8b5e3c..b9efa02 100644
>> --- a/ovn/lib/logical-fields.c
>> +++ b/ovn/lib/logical-fields.c
>> @@ -105,6 +105,10 @@ ovn_init_symtab(struct shash *symtab)
>>                MLF_FORCE_SNAT_FOR_LB_BIT);
>>       expr_symtab_add_subfield(symtab, "flags.force_snat_for_lb", NULL,
>>                                flags_str);
>> +    snprintf(flags_str, sizeof flags_str, "flags[%d]",
>> +             MLF_RCV_FROM_VLAN_BIT);
>> +    expr_symtab_add_subfield(symtab, "flags.rcv_from_vlan", NULL,
>> +                             flags_str);
>>         /* Connection tracking state. */
>>       expr_symtab_add_field(symtab, "ct_mark", MFF_CT_MARK, NULL, false);
>> diff --git a/ovn/lib/logical-fields.h b/ovn/lib/logical-fields.h
>> index b1dbb03..96250fd 100644
>> --- a/ovn/lib/logical-fields.h
>> +++ b/ovn/lib/logical-fields.h
>> @@ -50,6 +50,7 @@ enum mff_log_flags_bits {
>>       MLF_FORCE_SNAT_FOR_DNAT_BIT = 2,
>>       MLF_FORCE_SNAT_FOR_LB_BIT = 3,
>>       MLF_LOCAL_ONLY_BIT = 4,
>> +    MLF_RCV_FROM_VLAN_BIT = 5,
>>   };
>>     /* MFF_LOG_FLAGS_REG flag assignments */
>> @@ -75,6 +76,7 @@ enum mff_log_flags {
>>        * hypervisors should instead only be output to local targets
>>        */
>>       MLF_LOCAL_ONLY = (1 << MLF_LOCAL_ONLY_BIT),
>> +    MLF_RCV_FROM_VLAN = (1 << MLF_RCV_FROM_VLAN_BIT),
>>   };
>>     #endif /* ovn/lib/logical-fields.h */
>> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
>> index 72fe4e7..350f015 100644
>> --- a/ovn/northd/ovn-northd.c
>> +++ b/ovn/northd/ovn-northd.c
>> @@ -4411,6 +4411,28 @@ add_route(struct hmap *lflows, const struct
>> ovn_port *op,
>>        * routing. */
>>       ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_ROUTING, priority,
>>                     ds_cstr(&match), ds_cstr(&actions));
>> +
>> +    /* When output port is distributed gateway port, check if the router
>> +     * input port is a patch port connected to vlan network.
>> +     * Traffic from VLAN network to external network should be redirected
>> +     * to "redirect-chassis" by setting REGBIT_NAT_REDIRECT flag.
>> +     * Later physical table 32 will output this traffic to gateway
>> +     * chassis using input network vlan tag */
>> +    if (op == op->od->l3dgw_port) {
>> +        ds_clear(&match);
>> +        ds_clear(&actions);
>> +
>> +        ds_put_format(&match, "ip%s.%s == %s/%d", is_ipv4 ? "4" : "6",
>> +                      dir, network_s, plen);
>> +        ds_put_format(&match, " && flags.rcv_from_vlan == 1");
>> +        ds_put_format(&match, " && !is_chassis_resident(%s)",
>> +                      op->od->l3redirect_port->json_key);
>> +
>> +        ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_ROUTING,
>> +                      priority + 1, ds_cstr(&match),
>> +                      REGBIT_NAT_REDIRECT" = 1; next;");
>> +    }
>> +
>>       ds_destroy(&match);
>>       ds_destroy(&actions);
>>   }
>> @@ -4822,6 +4844,19 @@ build_lrouter_flows(struct hmap *datapaths, struct
>> hmap *ports,
>>           }
>>           ovn_lflow_add(lflows, op->od, S_ROUTER_IN_ADMISSION, 50,
>>                         ds_cstr(&match), "next;");
>> +
>> +        /* VLAN traffic from localnet port should be allowed for
>> +         * router processing on the "redirect-chassis". */
>> +        if (op->od->l3dgw_port && op->od->l3redirect_port && op->peer &&
>> +            op->peer->od->localnet_port && (op != op->od->l3dgw_port)) {
>> +            ds_clear(&match);
>> +            ds_put_format(&match, "flags.rcv_from_vlan == 1");
>> +            ds_put_format(&match, " && inport == %s", op->json_key);
>> +            ds_put_format(&match, " && is_chassis_resident(%s)",
>> +                          op->od->l3redirect_port->json_key);
>> +            ovn_lflow_add(lflows, op->od, S_ROUTER_IN_ADMISSION, 100,
>> +                          ds_cstr(&match), "next;");
>> +        }
>>       }
>>         /* Logical router ingress table 1: IP Input. */
>> diff --git a/tests/ovn.at b/tests/ovn.at
>> index 6553d17..5ae767d 100644
>> --- a/tests/ovn.at
>> +++ b/tests/ovn.at
>> @@ -7713,6 +7713,284 @@ test_ip_packet gw2 gw1
>>   OVN_CLEANUP([hv1],[gw1],[gw2],[ext1])
>>   AT_CLEANUP
>>   +# VLAN traffic for external network redirected through distributed
>> router gateway port
>> +# should use vlans(i.e input network vlan tag) across hypervisors
>> instead of tunneling.
>> +AT_SETUP([ovn -- vlan traffic for external network with distributed
>> router gateway port])
>> +AT_SKIP_IF([test $HAVE_PYTHON = no])
>> +ovn_start
>> +
>> +# Logical network:
>> +# # One LR R1 that has switches foo (192.168.1.0/24) and
>> +# # alice (172.16.1.0/24) connected to it.  The logical port
>> +# # between R1 and alice has a "redirect-chassis" specified,
>> +# # i.e. it is the distributed router gateway port(172.16.1.6).
>> +# # Switch alice also has a localnet port defined.
>> +# # An additional switch outside has the same subnet as alice
>> +# # (172.16.1.0/24), a localnet port and nexthop port(172.16.1.1)
>> +# # which will receive the packet destined for external network
>> +# # (i.e 8.8.8.8 as destination ip).
>> +
>> +# Physical network:
>> +# # Three hypervisors hv[123].
>> +# # hv1 hosts vif foo1.
>> +# # hv2 is the "redirect-chassis" that hosts the distributed router
>> gateway port.
>> +# # hv3 hosts nexthop port vif outside1.
>> +# # All other tests connect hypervisors to network n1 through br-phys
>> for tunneling.
>> +# # But in this test, hv1 won't connect to n1(and no br-phys in hv1), and
>> +# # in order to show vlans(instead of tunneling) used between hv1 and
>> hv2,
>> +# # a new network n2 created and hv1 and hv2 connected to this network
>> through br-ex.
>> +# # hv2 and hv3 are still connected to n1 network through br-phys.
>> +net_add n1
>> +
>> +# We are not calling ovn_attach for hv1, to avoid adding br-phys.
>> +# Tunneling won't work in hv1 as ovn-encap-ip is not added to any bridge
>> in hv1
>> +sim_add hv1
>> +as hv1
>> +ovs-vsctl \
>> +    -- set Open_vSwitch . external-ids:system-id=hv1 \
>> +    -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock
>> \
>> +    -- set Open_vSwitch . external-ids:ovn-encap-type=geneve,vxlan \
>> +    -- set Open_vSwitch . external-ids:ovn-encap-ip=192.168.0.1 \
>> +    -- add-br br-int \
>> +    -- set bridge br-int fail-mode=secure other-config:disable-in-band=true
>> \
>> +    -- set Open_vSwitch . external-ids:ovn-bridge-mappings=public:br-ex
>> +
>> +start_daemon ovn-controller
>> +ovs-vsctl -- add-port br-int hv1-vif1 -- \
>> +    set interface hv1-vif1 external-ids:iface-id=foo1 \
>> +    ofport-request=1
>> +
>> +sim_add hv2
>> +as hv2
>> +ovs-vsctl add-br br-phys
>> +ovn_attach n1 br-phys 192.168.0.2
>> +ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappin
>> gs="public:br-ex,phys:br-phys"
>> +
>> +sim_add hv3
>> +as hv3
>> +ovs-vsctl add-br br-phys
>> +ovn_attach n1 br-phys 192.168.0.3
>> +ovs-vsctl -- add-port br-int hv3-vif1 -- \
>> +    set interface hv3-vif1 external-ids:iface-id=outside1 \
>> +    options:tx_pcap=hv3/vif1-tx.pcap \
>> +    options:rxq_pcap=hv3/vif1-rx.pcap \
>> +    ofport-request=1
>> +ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappin
>> gs="phys:br-phys"
>> +
>> +# Create network n2 for vlan connectivity between hv1 and hv2
>> +net_add n2
>> +
>> +as hv1
>> +ovs-vsctl add-br br-ex
>> +net_attach n2 br-ex
>> +
>> +as hv2
>> +ovs-vsctl add-br br-ex
>> +net_attach n2 br-ex
>> +
>> +OVN_POPULATE_ARP
>> +
>> +ovn-nbctl create Logical_Router name=R1
>> +
>> +ovn-nbctl ls-add foo
>> +ovn-nbctl ls-add alice
>> +ovn-nbctl ls-add outside
>> +
>> +# Connect foo to R1
>> +ovn-nbctl lrp-add R1 foo 00:00:01:01:02:03 192.168.1.1/24
>> +ovn-nbctl lsp-add foo rp-foo -- set Logical_Switch_Port rp-foo \
>> +    type=router options:router-port=foo \
>> +    -- lsp-set-addresses rp-foo router
>> +
>> +# Connect alice to R1 as distributed router gateway port (172.16.1.6) on
>> hv2
>> +ovn-nbctl lrp-add R1 alice 00:00:02:01:02:03 172.16.1.6/24 \
>> +    -- set Logical_Router_Port alice options:redirect-chassis="hv2"
>> +ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice \
>> +    type=router options:router-port=alice \
>> +    -- lsp-set-addresses rp-alice router \
>> +
>> +
>> +# Create logical port foo1 in foo
>> +ovn-nbctl lsp-add foo foo1 \
>> +-- lsp-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2"
>> +
>> +# Create logical port outside1 in outside, which is a nexthop address
>> +# for 172.16.1.0/24
>> +ovn-nbctl lsp-add outside outside1 \
>> +-- lsp-set-addresses outside1 "f0:00:00:01:02:04 172.16.1.1"
>> +
>> +# Set default gateway (nexthop) to 172.16.1.1
>> +ovn-nbctl lr-route-add R1 "0.0.0.0/0" 172.16.1.1 alice
>> +AT_CHECK([ovn-nbctl lr-nat-add R1 snat 172.16.1.6 192.168.1.1/24])
>> +ovn-nbctl set Logical_Switch_Port rp-alice options:nat-addresses=router
>> +
>> +ovn-nbctl lsp-add foo ln-foo
>> +ovn-nbctl lsp-set-addresses ln-foo unknown
>> +ovn-nbctl lsp-set-options ln-foo network_name=public
>> +ovn-nbctl lsp-set-type ln-foo localnet
>> +AT_CHECK([ovn-nbctl set Logical_Switch_Port ln-foo tag=2])
>> +
>> +# Create localnet port in alice
>> +ovn-nbctl lsp-add alice ln-alice
>> +ovn-nbctl lsp-set-addresses ln-alice unknown
>> +ovn-nbctl lsp-set-type ln-alice localnet
>> +ovn-nbctl lsp-set-options ln-alice network_name=phys
>> +
>> +# Create localnet port in outside
>> +ovn-nbctl lsp-add outside ln-outside
>> +ovn-nbctl lsp-set-addresses ln-outside unknown
>> +ovn-nbctl lsp-set-type ln-outside localnet
>> +ovn-nbctl lsp-set-options ln-outside network_name=phys
>> +
>> +# Allow some time for ovn-northd and ovn-controller to catch up.
>> +# XXX This should be more systematic.
>> +ovn-nbctl --wait=hv --timeout=3 sync
>> +
>> +echo "---------NB dump-----"
>> +ovn-nbctl show
>> +echo "---------------------"
>> +ovn-nbctl list logical_router
>> +echo "---------------------"
>> +ovn-nbctl list nat
>> +echo "---------------------"
>> +ovn-nbctl list logical_router_port
>> +echo "---------------------"
>> +
>> +echo "---------SB dump-----"
>> +ovn-sbctl list datapath_binding
>> +echo "---------------------"
>> +ovn-sbctl list port_binding
>> +echo "---------------------"
>> +ovn-sbctl dump-flows
>> +echo "---------------------"
>> +ovn-sbctl list chassis
>> +echo "---------------------"
>> +
>> +for chassis in hv1 hv2 hv3; do
>> +    as $chassis
>> +    echo "------ $chassis dump ----------"
>> +    ovs-vsctl show br-int
>> +    ovs-ofctl show br-int
>> +    ovs-ofctl dump-flows br-int
>> +    echo "--------------------------"
>> +done
>> +
>> +ip_to_hex() {
>> +    printf "%02x%02x%02x%02x" "$@"
>> +}
>> +
>> +foo1_ip=$(ip_to_hex 192 168 1 2)
>> +gw_ip=$(ip_to_hex 172 16 1 6)
>> +dst_ip=$(ip_to_hex 8 8 8 8)
>> +nexthop_ip=$(ip_to_hex 172 16 1 1)
>> +
>> +foo1_mac="f00000010203"
>> +foo_mac="000001010203"
>> +gw_mac="000002010203"
>> +nexthop_mac="f00000010204"
>> +
>> +# Send ip packet from foo1 to 8.8.8.8
>> +src_mac="f00000010203"
>> +dst_mac="000001010203"
>> +packet=${foo_mac}${foo1_mac}08004500001c0000000040110000${f
>> oo1_ip}${dst_ip}0035111100080000
>> +
>> +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
>> +sleep 2
>> +
>> +# ARP request packet for nexthop_ip to expect at outside1
>> +arp_request=ffffffffffff${gw_mac}08060001080006040001${gw_m
>> ac}${gw_ip}000000000000${nexthop_ip}
>> +echo $arp_request >> hv3-vif1.expected
>> +cat hv3-vif1.expected > expout
>> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep
>> ${nexthop_ip} | uniq > hv3-vif1
>> +AT_CHECK([sort hv3-vif1], [0], [expout])
>> +
>> +# Send ARP reply from outside1 back to the router
>> +reply_mac="f00000010204"
>> +arp_reply=${gw_mac}${nexthop_mac}08060001080006040002${next
>> hop_mac}${nexthop_ip}${gw_mac}${gw_ip}
>> +
>> +as hv3 ovs-appctl netdev-dummy/receive hv3-vif1 $arp_reply
>> +OVS_WAIT_UNTIL([
>> +    test `as hv2 ovs-ofctl dump-flows br-int | grep table=66 | \
>> +grep actions=mod_dl_dst:f0:00:00:01:02:04 | wc -l` -eq 1
>> +    ])
>> +
>> +echo "ovn-sbctl list MAC_Binding"
>> +ovn-sbctl list MAC_Binding
>> +echo "============================"
>> +
>> +# VLAN tagged packet with distributed gateway port(172.16.1.6) MAC as
>> destination MAC
>> +# is expected on bridge connecting hv1 and hv2
>> +expected=${gw_mac}${foo1_mac}8100000208004500001c0000000040
>> 110000${foo1_ip}${dst_ip}0035111100080000
>> +echo $expected > hv1-br-ex_n2.expected
>> +
>> +# Packet to Expect at outside1 i.e nexthop(172.16.1.1) port.
>> +# As connection tracking not enabled for this test, snat can't be done
>> on the packet.
>> +# We still see foo1 as the source ip address. But source mac(gateway
>> MAC) and
>> +# dest mac(nexthop mac) are properly configured.
>> +expected=${nexthop_mac}${gw_mac}08004500001c000000003f11010
>> 0${foo1_ip}${dst_ip}0035111100080000
>> +echo $expected > hv3-vif1.expected
>> +
>> +reset_pcap_file() {
>> +    local iface=$1
>> +    local pcap_file=$2
>> +    ovs-vsctl -- set Interface $iface options:tx_pcap=dummy-tx.pcap \
>> +options:rxq_pcap=dummy-rx.pcap
>> +    rm -f ${pcap_file}*.pcap
>> +    ovs-vsctl -- set Interface $iface options:tx_pcap=${pcap_file}-tx.pcap
>> \
>> +options:rxq_pcap=${pcap_file}-rx.pcap
>> +}
>> +
>> +as hv1 reset_pcap_file br-ex_n2 hv1/br-ex_n2
>> +as hv3 reset_pcap_file hv3-vif1 hv3/vif1
>> +sleep 2
>> +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
>> +sleep 2
>> +
>> +# On hv1, table 65 for packets going from vlan switch pipleline to
>> router pipleine
>> +# set MLF_RCV_FROM_VLAN flag
>> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep
>> "priority=100,reg15=0x1,metadata=0x2" \
>> +| grep actions=clone | grep "load:0x1->NXM_NX_REG10" | wc -l], [0], [[1
>> +]])
>> +# On hv1, because of snat rule in table 15, a higher priority(i.e 2) flow
>> +# added for packets with MLF_RCV_FROM_VLAN flag with output as
>> distributed
>> +# gateway port, which sets REGBIT_NAT_REDIRECT flag
>> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=15 | grep
>> "priority=2,ip,reg10=0x20/0x20,metadata=0x1" \
>> +| grep "actions=load:0x1->OXM_OF_PKT_REG4" | wc -l], [0], [[1
>> +]])
>> +
>> +# On hv1, table 32 flow which tags packet with source network vlan tag
>> and sends it to hv2
>> +# through br-ex
>> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=32 | grep
>> "priority=150,reg10=0x20/0x20,reg14=0x1,reg15=0x3,metadata=0x1" \
>> +| grep "actions=mod_vlan_vid:2" | grep "n_packets=2," | wc -l], [0], [[1
>> +]])
>> +
>> +# On hv2 table 0, vlan tagged packet is sent through router pipeline
>> +# by setting MLF_RCV_FROM_VLAN flag (REG10)
>> +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep "table=0," | grep
>> "priority=200" | grep "dl_vlan=2" | \
>> +grep "dl_dst=00:00:02:01:02:03" | grep "actions=strip_vlan,load:0x1->OXM_OF_METADATA"
>> | grep "load:0x1->NXM_NX_REG14" | \
>> +grep "load:0x1->NXM_NX_REG10" | wc -l], [0], [[1
>> +]])
>> +# on hv2 table 8, allow packets with router metadata and with
>> MLF_RCV_FROM_VLAN flag
>> +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int table=8 | grep
>> "priority=100,reg10=0x20/0x20,reg14=0x1,metadata=0x1" | wc -l], [0], [[1
>> +]])
>> +
>> +ip_packet() {
>> +    grep "2010203f00000010203"
>> +}
>> +
>> +# Check vlan tagged packet on the bridge connecting hv1 and hv2
>> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv1/br-ex_n2-tx.pcap |
>> ip_packet | uniq > hv1-br-ex_n2
>> +cat hv1-br-ex_n2.expected > expout
>> +AT_CHECK([sort hv1-br-ex_n2], [0], [expout])
>> +
>> +# Check expected packet on nexthop interface
>> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep
>> ${foo1_ip}${dst_ip} | uniq > hv3-vif1
>> +cat hv3-vif1.expected > expout
>> +AT_CHECK([sort hv3-vif1], [0], [expout])
>> +
>> +OVN_CLEANUP([hv1],[hv2],[hv3])
>> +AT_CLEANUP
>> +
>>   AT_SETUP([ovn -- 1 LR with distributed router gateway port])
>>   AT_SKIP_IF([test $HAVE_PYTHON = no])
>>   ovn_start
>>
>>
>


More information about the dev mailing list