[ovs-dev] [PATCH v7 4/4] Replace router internal MAC with gateway MAC for reply packets

Numan Siddique nusiddiq at redhat.com
Tue Oct 9 09:14:54 UTC 2018


On Fri, Sep 28, 2018 at 12:36 AM Guru Shetty <guru at ovn.org> wrote:

>
>
> On Mon, 24 Sep 2018 at 03:10, Miguel Angel Ajo Pelayo <majopela at redhat.com>
> wrote:
>
>>
>>
>> No worries Guru, I understand your feeling, I worked with Anil on
>> developing this feature, and it's indeed rather complex (we are actually
>> replacing keepalived + VRRP with openflow and BFD).
>>
>> I'm happy to work on a more detailed documentation, I guess that Numan,
>> Anil and I could team up to help that put documentation in shape where [1]
>> should be a good starting point for the admin side, and may be after that
>> Numan can give it a thought for making the code simpler where there's room
>> for that. It's easier to have others help simplify once we have good
>> documentation.
>>
>> I agree with Numans that, may be we should take also this opportunity
>> (this patch Anil has been working on to make VLAN private networks possible
>> with l3ha) to also support SR-IOV, since it's actually not far from there
>> at all.
>>
>> Guru, what's your biggest concern, the design documentation? the user
>> side ? both?. My worry with design documentaiton is that tends to rot when
>> new patches comes, do we have any example of internals documentation that
>> talk about code in tree?
>>
>
> My biggest concern is that reading the code does not give out a story.
> Everything in ovn-northd can be understood by just reading the code. But
> when it comes to gateway redirect, it is hard.
>
> For e.g., I am pasting a snippet of code from one of the patches in this
> series.
>
> ++    if (op->peer && op->peer->od->localnet_port &&+        op->od->l3dgw_port && op->od->l3redirect_port &&+        (op != op->od->l3redirect_port) &&+        (op != op->od->l3dgw_port)) {+        ds_put_format(&match, " && is_chassis_resident(%s)",+                      op->od->l3redirect_port->json_key);+        ds_put_format(&actions, "; flags.rcv_from_vlan = 1");+    }
>
> You need to go back and read the first patches of the feature (gateway
> redirect) and its commit messages to understand the implications of the
> above if condition. The primary reason why it is hard is because, we try to
> move back and forth with packet flow and force it to move in non-natural
> directions.
>
> I feel that writing a documentation (A hybrid of man pages of
> ovn-architecture and ovn-northd) would make it easy to follow. For e.g., we
> follow a packet when it leaves the VM and heads to the gateway and gets
> SNATted.  Similarly, we follow the packet for DNAT. We can then enhance it
> with other use cases (for e.g this patch series).
>
> On a larger note, OVN is a network virtualization project. We implement
> the functionality of a router, switch, firewall etc and then when we
> interconnect them, things just work well. Code around Switches, routers,
> load-balancers etc are easy to understand because they are all in logical
> space. But gateways transcend the logical boundary to physical boundary.
> The reason why the "gateway redirect" code is hard to understand is because
> we are trying to implement it in the same router that is also used in
> logical space. A clean separation of boundaries does not exist. I tried to
> implement the clean separation with gateway routers, wherein the there is a
> separate router that connects to the physical world and is chassis bound.
> But OpenStack-OVN as a client did not want to handle the complication of
> maintaining 2 routers.
>
>
I was involved in moving away from the Gateway router to a distributed
gateway router port once Mickey Spiegel added the redirect chassis. It's
true that we didn't want to maintain 2 logical routers in networking-ovn.
But we moved mainly because we were co relating with the default neutron
reference implementation and wanted to bridge gaps between it (with various
agents ) and neutron + OVN. With the new redirect chassis support, we could
add the floating ip support for the VMs with distributed routing (instead
of centralized routing) and also later- HA support for centralized routing.

I have submitted a patch series as  an alternate solution to solve this
issue. Can you please take a look at it -
https://patchwork.ozlabs.org/project/openvswitch/list/?series=69280 and
provide your comments.
I will also work on having a documentation as you have suggested above.

Thanks
Numan


>
>
>
>
>>
>> [1]
>> https://github.com/openvswitch/ovs/blob/master/Documentation/topics/high-availability.rst
>>
>>
>>
>> On Fri, Sep 21, 2018 at 9:28 PM Numan Siddique <nusiddiq at redhat.com>
>> wrote:
>>
>>> On Fri, Sep 21, 2018 at 11:06 PM Guru Shetty <guru at ovn.org> wrote:
>>>
>>> > >
>>> > >
>>> > > I have tried to make sense of this patch series a few times. I think
>>> > > adding increasing complications like this will make gateway code
>>> > > unmaintainable. The whole gateway redirect chassis already makes it
>>> > > un-understandable and now this will mean that no one will be able to
>>> > > understand it as we go forward. [/rant off]
>>> > >
>>> >
>>> > Sorry about the "rant" if it came out as rude - was not my intention. I
>>> > will try to offer something more constructive. When you initially
>>> looked at
>>> > the gateway redirect code, how long did it take for you to understand
>>> it?
>>> > My main concern is not with your code, but the underlying code. I
>>> initially
>>> > reviewed the underlying redirect code. And it took me a good 2 days to
>>> > understand it. Back then, we wanted something working and it looked
>>> like
>>> > something that can eventually be simplified. But, we have been adding
>>> more
>>> > things to it. And right now, if this patch series is added, it will
>>> likely
>>> > get very very hard to understand it as we go forward. Do you reckon
>>> that a
>>> > year from now, you will be able to debug a corner case in the code?
>>> >
>>> > If there are others who have understood the "redirect" gateway code
>>> easily,
>>> > please do chime in.
>>> >
>>> > My suggestion is to re-look at the redirect gateway design and see
>>> whether
>>> > the code can be simplified. Or whether we can write detailed
>>> documentation
>>> > about it to make it easy to understand. Only once we are satisfied
>>> with it,
>>> > we can attempt to add more stuff to it.
>>> >
>>> >
>>> I always forget the gateway code and the design. (May be because I
>>> haven't
>>> worked
>>> on this part of the code much) and I have to refresh my memory everytime.
>>> I agree that we need to have a detailed documentation. May be "a journey
>>> of
>>> a packet "
>>> kind of documentation in ovn-architecture would also help.
>>>
>>> The proposed solution in this patch series, If I understand it correctly
>>> runs the router pipeline
>>> in the source chassis and then resumes it on the gateway chassis.
>>> Recently
>>> I was looking into
>>> SR-IOV kind of scenarios, where the VM traffic by passes the host kernel,
>>> and I think the
>>> proposed solution will not work. In order to support N/S traffic for
>>> SR-IOV
>>> instances
>>> we have to find probably another way and that would also complicate the
>>> code further.
>>>
>>> However I think it is possible to have one solution to support gateway
>>> traffic for
>>> VLAN tenant networks which covers SR-IOV and the normal case. I think the
>>> router pipeline
>>> should be executed only on the gateway chassis. I will have a discussion
>>> with
>>> Anil on this, this monday and see if it's possible.
>>>
>>> But unfortunately, supporting these use cases will add some amount of
>>> complexity :(.
>>>
>>> Thanks
>>> Numan
>>>
>>>
>>> >
>>> > >
>>> > >> ---
>>> > >>
>>> > >> v6->v7:
>>> > >> * Added this patch
>>> > >>
>>> > >>
>>> > >>  ovn/controller/physical.c   | 60
>>> > >> ++++++++++++++++++++++++++++++++++++++++++---
>>> > >>  ovn/northd/ovn-northd.8.xml | 10 ++++++++
>>> > >>  ovn/northd/ovn-northd.c     | 29 ++++++++++++++++++++++
>>> > >>  ovn/ovn-architecture.7.xml  |  4 ++-
>>> > >>  4 files changed, 99 insertions(+), 4 deletions(-)
>>> > >>
>>> > >> diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c
>>> > >> index f269a1d..1f41f59 100644
>>> > >> --- a/ovn/controller/physical.c
>>> > >> +++ b/ovn/controller/physical.c
>>> > >> @@ -190,7 +190,9 @@ get_zone_ids(const struct sbrec_port_binding
>>> > *binding,
>>> > >>  static void
>>> > >>  put_local_common_flows(uint32_t dp_key, uint32_t port_key,
>>> > >>                         bool nested_container, const struct zone_ids
>>> > >> *zone_ids,
>>> > >> -                       struct ofpbuf *ofpacts_p, struct hmap
>>> > *flow_table)
>>> > >> +                       struct ofpbuf *ofpacts_p, struct hmap
>>> > *flow_table,
>>> > >> +                       struct local_datapath *ld,
>>> > >> +                       const struct hmap *local_datapaths)
>>> > >>  {
>>> > >>      struct match match;
>>> > >>
>>> > >> @@ -221,11 +223,63 @@ put_local_common_flows(uint32_t dp_key,
>>> uint32_t
>>> > >> port_key,
>>> > >>          }
>>> > >>      }
>>> > >>
>>> > >> +    struct ofpbuf *clone = NULL;
>>> > >> +    clone = ofpbuf_clone(ofpacts_p);
>>> > >> +
>>> > >>      /* Resubmit to table 34. */
>>> > >>      put_resubmit(OFTABLE_CHECK_LOOPBACK, ofpacts_p);
>>> > >>      ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT, 100, 0,
>>> > >>                      &match, ofpacts_p);
>>> > >>
>>> > >> +    /* For a reply packet from gateway with VLAN switch port as
>>> > >> destination
>>> > >> +     * (excluding localnet_port and external VLAN networks),
>>> gateway
>>> > >> router
>>> > >> +     * will use gateway MAC address as source MAC instead of router
>>> > >> internal
>>> > >> +     * port MAC, so that external switches can learn gateway MAC
>>> > address.
>>> > >> +     * Here (before packet is given to the port) we replace router
>>> > >> gateway
>>> > >> +     * MAC address with router internal port MAC. */
>>> > >> +    if (ld->localnet_port && (port_key !=
>>> > >> ld->localnet_port->tunnel_key)) {
>>> > >> +        for (int i = 0; i < ld->n_peer_dps; i++) {
>>> > >> +            struct local_datapath *peer_ldp = get_local_datapath(
>>> > >> +                local_datapaths,
>>> ld->peer_dps[i]->peer_dp->tunnel_key);
>>> > >> +            const struct sbrec_port_binding *crp;
>>> > >> +            crp = peer_ldp->chassisredirect_port;
>>> > >> +            if (!crp) {
>>> > >> +                continue;
>>> > >> +            }
>>> > >> +
>>> > >> +            if (strcmp(smap_get(&crp->options, "distributed-port"),
>>> > >> +                       ld->peer_dps[i]->peer->logical_port) &&
>>> > >> +                (port_key != ld->peer_dps[i]->patch->tunnel_key)) {
>>> > >> +                for (int j = 0; j < crp->n_mac; j++) {
>>> > >> +                    struct lport_addresses laddrs;
>>> > >> +                    if (!extract_lsp_addresses(crp->mac[j],
>>> &laddrs)) {
>>> > >> +                        continue;
>>> > >> +                    }
>>> > >> +                    match_set_dl_src(&match, laddrs.ea);
>>> > >> +                    destroy_lport_addresses(&laddrs);
>>> > >> +                    break;
>>> > >> +                }
>>> > >> +                for (int j = 0; j < ld->peer_dps[i]->peer->n_mac;
>>> j++)
>>> > {
>>> > >> +                    struct lport_addresses laddrs;
>>> > >> +                    uint64_t mac64;
>>> > >> +                    if (!extract_lsp_addresses(
>>> > >> +                        ld->peer_dps[i]->peer->mac[j], &laddrs)) {
>>> > >> +                        continue;
>>> > >> +                    }
>>> > >> +                    mac64 = eth_addr_to_uint64(laddrs.ea);
>>> > >> +                    put_load(mac64,
>>> > >> +                             MFF_ETH_SRC, 0, 48, clone);
>>> > >> +                    destroy_lport_addresses(&laddrs);
>>> > >> +                    break;
>>> > >> +                }
>>> > >> +                put_resubmit(OFTABLE_CHECK_LOOPBACK, clone);
>>> > >> +                ofctrl_add_flow(flow_table, OFTABLE_LOCAL_OUTPUT,
>>> 150,
>>> > 0,
>>> > >> +                    &match, clone);
>>> > >> +            }
>>> > >> +        }
>>> > >> +    }
>>> > >> +    ofpbuf_delete(clone);
>>> > >> +
>>> > >>      /* Table 34, Priority 100.
>>> > >>       * =======================
>>> > >>       *
>>> > >> @@ -330,7 +384,7 @@ consider_port_binding(struct ovsdb_idl_index
>>> > >> *sbrec_chassis_by_name,
>>> > >>
>>> > >>          struct zone_ids binding_zones = get_zone_ids(binding,
>>> > ct_zones);
>>> > >>          put_local_common_flows(dp_key, port_key, false,
>>> &binding_zones,
>>> > >> -                               ofpacts_p, flow_table);
>>> > >> +                               ofpacts_p, flow_table, ld,
>>> > >> local_datapaths);
>>> > >>
>>> > >>          match_init_catchall(&match);
>>> > >>          ofpbuf_clear(ofpacts_p);
>>> > >> @@ -531,7 +585,7 @@ consider_port_binding(struct ovsdb_idl_index
>>> > >> *sbrec_chassis_by_name,
>>> > >>
>>> > >>          struct zone_ids zone_ids = get_zone_ids(binding, ct_zones);
>>> > >>          put_local_common_flows(dp_key, port_key, nested_container,
>>> > >> &zone_ids,
>>> > >> -                               ofpacts_p, flow_table);
>>> > >> +                               ofpacts_p, flow_table, ld,
>>> > >> local_datapaths);
>>> > >>
>>> > >>          /* Table 0, Priority 200, 150 and 100.
>>> > >>           * ==============================
>>> > >> diff --git a/ovn/northd/ovn-northd.8.xml
>>> b/ovn/northd/ovn-northd.8.xml
>>> > >> index 8fa5272..876c121 100644
>>> > >> --- a/ovn/northd/ovn-northd.8.xml
>>> > >> +++ b/ovn/northd/ovn-northd.8.xml
>>> > >> @@ -2013,6 +2013,16 @@ next;
>>> > >>        </li>
>>> > >>
>>> > >>        <li>
>>> > >> +        A priority-100 logical flow with match
>>> > >> +        <code>inport == <var>GW</var> &amp;&amp;
>>> > >> +        flags.rcv_from_vlan == 1</code> has actions
>>> > >> +        <code>eth.dst = <var>E</var>; next;</code>, where
>>> > >> +        <var>GW</var> is the logical router distributed gateway
>>> > >> +        port and <var>E</var> is the MAC address of router
>>> > >> +        distributed gateway port.
>>> > >> +      </li>
>>> > >> +
>>> > >> +      <li>
>>> > >>          For each NAT rule in the OVN Northbound database that can
>>> > >>          be handled in a distributed manner, a priority-100 logical
>>> > >>          flow with match <code>ip4.src == <var>B</var> &amp;&amp;
>>> > >> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
>>> > >> index bcf0b66..d012bb8 100644
>>> > >> --- a/ovn/northd/ovn-northd.c
>>> > >> +++ b/ovn/northd/ovn-northd.c
>>> > >> @@ -4419,6 +4419,15 @@ add_route(struct hmap *lflows, const struct
>>> > >> ovn_port *op,
>>> > >>      } else {
>>> > >>          ds_put_format(&actions, "ip%s.dst", is_ipv4 ? "4" : "6");
>>> > >>      }
>>> > >> +
>>> > >> +    if (op->peer && op->peer->od->localnet_port &&
>>> > >> +        op->od->l3dgw_port && op->od->l3redirect_port &&
>>> > >> +        (op != op->od->l3redirect_port) &&
>>> > >> +        (op != op->od->l3dgw_port)) {
>>> > >> +        ds_put_format(&match, " && is_chassis_resident(%s)",
>>> > >> +                      op->od->l3redirect_port->json_key);
>>> > >> +        ds_put_format(&actions, "; flags.rcv_from_vlan = 1");
>>> > >> +    }
>>> > >>      ds_put_format(&actions, "; "
>>> > >>                    "%sreg1 = %s; "
>>> > >>                    "eth.src = %s; "
>>> > >> @@ -6131,6 +6140,26 @@ build_lrouter_flows(struct hmap *datapaths,
>>> > struct
>>> > >> hmap *ports,
>>> > >>                        op->lrp_networks.ipv6_addrs[i].network_s,
>>> > >>                        op->lrp_networks.ipv6_addrs[i].plen, NULL,
>>> NULL);
>>> > >>          }
>>> > >> +
>>> > >> +        /* For a reply packet from gateway with VLAN switch port as
>>> > >> +         * destination, replace router internal port MAC with
>>> router
>>> > >> gateway
>>> > >> +         * MAC address, so that external switches can learn
>>> gateway MAC
>>> > >> +         * address. Later before delivering the packet to the port,
>>> > >> +         * controller will replace the gateway MAC with router
>>> internal
>>> > >> port
>>> > >> +         * MAC in table 33. */
>>> > >> +        if (op->od->l3dgw_port && (op == op->od->l3dgw_port) &&
>>> > >> +            op->od->l3redirect_port) {
>>> > >> +            ds_clear(&actions);
>>> > >> +            ds_clear(&match);
>>> > >> +            ds_put_format(&match, "inport == %s", op->json_key);
>>> > >> +            ds_put_format(&match, " && flags.rcv_from_vlan == 1");
>>> > >> +            ds_put_format(&match, " && is_chassis_resident(%s)",
>>> > >> +                          op->od->l3redirect_port->json_key);
>>> > >> +            ds_put_format(&actions,
>>> > >> +                          "eth.src = %s; next;",
>>> > op->lrp_networks.ea_s);
>>> > >> +            ovn_lflow_add(lflows, op->od, S_ROUTER_IN_GW_REDIRECT,
>>> 100,
>>> > >> +                          ds_cstr(&match), ds_cstr(&actions));
>>> > >> +        }
>>> > >>      }
>>> > >>
>>> > >>      /* Convert the static routes to flows. */
>>> > >> diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
>>> > >> index ad2101c..0de41d2 100644
>>> > >> --- a/ovn/ovn-architecture.7.xml
>>> > >> +++ b/ovn/ovn-architecture.7.xml
>>> > >> @@ -1067,7 +1067,9 @@
>>> > >>
>>> > >>        <p>
>>> > >>          Flows in table 33 resemble those in table 32 but for
>>> logical
>>> > >> ports that
>>> > >> -        reside locally rather than remotely.  For unicast logical
>>> > output
>>> > >> ports
>>> > >> +        reside locally rather than remotely.  If these are VLAN
>>> ports
>>> > and
>>> > >> +        packet has router gateway port MAC address as source,
>>> replace
>>> > it
>>> > >> with
>>> > >> +        router internal port MAC address. For unicast logical
>>> output
>>> > >> ports
>>> > >>          on the local hypervisor, the actions just resubmit to
>>> table 34.
>>> > >> For
>>> > >>          multicast output ports that include one or more logical
>>> ports
>>> > on
>>> > >> the
>>> > >>          local hypervisor, for each such logical port <var>P</var>,
>>> the
>>> > >> actions
>>> > >> --
>>> > >> 1.8.3.1
>>> > >>
>>> > >> _______________________________________________
>>> > >> dev mailing list
>>> > >> dev at openvswitch.org
>>> > >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>> > >>
>>> > >
>>> > _______________________________________________
>>> > dev mailing list
>>> > dev at openvswitch.org
>>> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>> >
>>> _______________________________________________
>>> dev mailing list
>>> dev at openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>
>>
>>
>> --
>> Miguel Ángel Ajo
>> OSP / Networking DFG, OVN Squad Engineering
>>
>


More information about the dev mailing list