[ovs-dev] OVN based distributed virtual routing for VLAN backed networks

Mark Michelson mmichels at redhat.com
Wed Oct 17 21:50:22 UTC 2018

Hi Ankur,

Thanks for the detailed document! I always appreciate it when things are 
planned out in great detail so we know exactly what to expect.

A general comment: there are places below where things like "figure 1" 
and "figure OVN bridge deployment" are referenced, but we can't see 
them. Is there a link to another document you can share that has these 
figures present?

Other comments of mine are inline below.

On 10/16/2018 06:43 PM, Ankur Sharma wrote:
> Hi,
> We have done some effort in evaluating usage of OVN for
> Distributed Virtual Routing (DVR) for vlan backed networks.
> We would like to take it forward with the community.
> We understand that some of the work could be overlapping with existing
> patches in review.
> We would appreciate the feedback and would be happy to update our patches
> to avoid known overlaps.
> This email explains the proposal. We will be following it up with patches.
> Each "CODE CHANGES" section summarizes the change that corresponding patch
> would have.
> ======================================================
> 1. OVN Bridge Deployment
> ------------------------------------
> Our design follows following ovn-bridge deployment model
> (please refer to figure OVN Bridge deployment).
>      i. br-int ==> OVN managed bridge.
>         br-pif ==> Learning Bridge, where physical NICs will be connected.
>     ii. Any packet that should be on physical network, will travel from BR-INT
>         to BR-PIF, via patch ports (localnet ports).
> 2. Layer 2
> -------------
>     DESIGN:
>     ~~~~~~~
>     a. Leverage on localnet logical port type as path port between br-int and
>         br-pif.
>     b. Each VLAN backed logical switch will have a localnet port connected
>         to it.
>     c. Tagging and untagging of vlan headers happens at localnet port boundary.
>     ~~~~~~~~~~~~~~~~~~~
>     a. Unlike geneve encap based solution, where we execute ingress pipeline on
>         source chassis and egress pipeline on destination chassis, for vlan
>         backed logical switches, packet will go through ingress pipeline
>         on destination chassis as well.
>     PACKET FLOW (Figure 1. shows topology and Figure 2. shows the packet flow):
>     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>     a. VM sends unicast traffic (destined to VM2_MAC) to br-int.
>     b. For br-int, destination mac is not local, hence it will forward it to
>         localnet port (by design), which is attached to br-pif. This is
>         the stage at which vlan tag is added. Br-pif forwards the packet
>         to physical interface.
>     c. br-pif on destination chassis sends the received traffic to patch-ports
>         on br-int (as unicast or unknown unicast).
>     d. br-int does vlan tag check, strips the vlan header and sends
>         the packet to ingress pipeline of the corresponding datapath.
>     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>     a. No encapsulation.
>     b. Both ingress and egress pipelines of logical switch are executed on
>         both source and destination hypervisor (unlike overlay where ingress
>         pipeline is executed on source hypervisor and egress on destination).
>     ~~~~~~~~~~~~~
>     a. ovn-nb.ovsschema:
>          1. Add a new column to table Logical_Switch.
>          2. Column name would be "type".
>          3. Values would be either "vlan" or "overlay", with "overlay"
>              being default.
>     b. ovn-sbctl:
>          1. Add a new cli which sets the "type" of logical-switch.
>              ovn-nbctl ls-set-network-type SWITCH TYPE
>     c. ovn-northd:
>          1. Add a new enum to ovn_datapath struct, which will indicate
>              if logical_switch datapath type is overlay or vlan.
>          2. Populate a new key value pair in southbound database for Datapath
>              Bindings of Logical_Switch.
>          3. Key value pair: <logical-switch-type, "vlan" or "overlay">, default
>              will be overlay.

I believe everything described in this section is doable in OVN already 
without any code changes.

Essentially, you can do the following:
1) On a logical switch, create a logical switch port of type "localnet" 
and set its addresses to "unknown".
2) On the localnet port, set options:network_name to a network name.
3) On the localnet port, set tag_request to the VLAN identifier you want 
to use.
4) On the hypervisor where ovn-controller runs, create the br-pif bridge.
5) On the hypervisor where ovn-controller runs, in the Open_vSwitch 
table's record, set external-ids:ovn-bridge-mappings = 
<network_name>:br-pif. "network_name" in this case is the network_name 
you set on the localnet port in step 2.

With this setup, ovn-controller will automatically create the patch 
ports between br-int and br-pif, and will use the VLAN tag from the 
localnet port for two purposes:
1) On traffic sent out of br-int over the patch port, the tag will be 
added to the packet.
2) On traffic received from the patch port into br-int, the VLAN tag 
must match the configured VLAN tag on the localnet port. If it matches, 
the tag is stripped.

The only aspect of the above I'm not 100% sure about is the logical 
switch ingress and egress pipelines being run on both source and 
destination hypervisor. But I *think* that's how it works in this case.

> 3. Layer 3 East West
> --------------------
>     DESIGN:
>     ~~~~~~~
>     a. Since the router port is distributed and there is no encapsulation,
>         hence packets with router port mac as source mac cannot go on wire.
>     b. We propose replacing router port mac with a chassis specific mac,
>         whenever packet goes on wire.
>     c. Number of chassis_mac per chassis could be dependent on number of
>         physical nics and corresponding bond policy  on br-pif.
>        As of now, we propose only one chassis_mac per chassis
>        (shared by all resident logical routers). However, we are analyzing
>        if br-pif's bond policy would require more macs per chassis.
>     ~~~~~~~~~~~~~~~~~~~
>     a. For a DVR E-W flow, both ingress and egress pipelines for logical_router
>         will execute on source chassis only.
>     PACKET FLOW (Figure 3. shows topology and Figure 4. shows the packet flow):
>     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>     a. VM1 sends packet (destined to IP2), to br-int.
>     b. On Source hypervisor, packet goes through following pipelines:
>        1. Ingress: logical-switch 1
>        2. Egress:  logical-switch 1
>        3. Ingress: logical-router
>        4. Egress:  logical-router
>        5. Ingress: logical-switch2
>        6. Egress:  logical-switch2
>        On wire, packet goes out with destination logical switch's vlan.
>        As mentioned in design, source mac (RP2_MAC) would be replaced with
>        CHASSIS_MAC and destination mac would be that of VM2.
>     c. Packet reaches destination chassis and enters logical-switch2
>         pipeline in br-int.
>     d. Packet goes through logical-switch2 pipeline (both ingress and egress)
>         and gets forwarded to VM2.
>     ~~~~~~~~~~~~~
>     a. ovn-sb.ovsschema:
>          1. Add a new column to the table Chassis.
>          2. Column name would be "chassis_macs", type being string and no
>              limit on range of values.
>          3. This column will hold a list if chassis unique macs.
>          4. This table will be populated from ovn-controller.
>     b. ovn-sbctl:
>          1. CLI to add/delete chassis_macs to/from the south bound database.
>     c. ovn-controller:
>          1. Read chassis macs from OVS Open_Vswitch table and populate
>              south bound database.
>          2. In table=65, add a new flow at priority 150, which will do following:
>             a. Match: source_mac == router_port_mac, metadata ==
>                 destination_logical_switch, logical_outport = localnet_port
>             b. Action: Replace source mac with chassis_mac, add vlan tag.

It sounds like this shares some similarities with this proposed patch: 

In the linked patch, the idea is to use a consistent source MAC in order 
to play well with physical switches. However, the approach used in the 
linked patch is quite different from your proposal here.

I like your proposal because I like the explicit configuration. The one 
question I have is, how do you determine which chassis MAC to use if 
multiple are specified? One idea might be to use something similar to 
the ovn-bridge-mappings. In other words, you map a network_name to a 
specific chassis MAC.

> 4. LAYER 3 North South (NO NAT)
> -------------------------------
>     DESIGN:
>     ~~~~~~~
>     a. For talking to external network endpoint, we will need a gateway
>        on OVN DVR.
>     b. We propose to use the gateway_chassis construct to achieve the same.
>     c. LRP will be attached to Gateway Chassis(s) and only on the active
>         chassis we will respond to ARP request for the LRP IP from undelay
>         network.
>     d. If NATing (keeping state) is not involved then traffic need not go
>         via the gateway chassis always, i.e traffic from OVN chassis to
>         external network need not go via the gateway chassis.
>     ~~~~~~~~~~~~~~~~~~~
>     a. From endpoint on OVN chassis to endpoint on underlay.
>        i. Like DVR E-W, logical_router ingress and egress pipelines are
>           executed on source chassis.
>     b. From endpoint on underlay TO endpoint on OVN chassis.
>        i. logical_router ingress and egress pipelines are executed on
>           gateway chassis.
>     PACKET FLOW LS ENDPOINT to UNDERLAY ENDPOINT (Figure 5. shows topology):
>     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>     a. Packet flow in this case is exactly same as Layer 3 E-W.
>     PACKET FLOW UNDERLAY ENDPOINT to LS ENDPOINT (Figure 5. shows topology and
>     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>     Figure 6. shows the packet flow):
>     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>     a. Gateway for endpoints behind DVR will be resident on only
>         gateway-chassis.
>     b. Unicast packets will come to gateway-chassis, with destination MAC
>         being RP2_MAC.
>     c. From now on, it is like L3 E-W flow.
>     ~~~~~~~~~~~~~
>     a. ovn-northd:
>          1. Changes to respond to vlan backed router port ARP from uplink,
>             only if it is on a gateway chassis.
>          2. Changes to make sure that in the absence of NAT configuration,
>             OVN_CHASSIS to external network traffic does not go via the gateway
>             chassis.
>     b. ovn-controller:
>          1. Send out garps, advertising the vlan backed router port's
>             (which has gateway chassis attached to it) from the
>             active gateway chassis.

It may be because it's getting late, but I'm having trouble following 
this :)

Maybe the figures would help to visualize it better?

> 5. LAYER 3 North South (NAT)
> ----------------------------
>     SNAT, DNAT, SNAT_AND_DNAT (without external mac):
>     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>     a. Our proposal aligns with following patch series which is out for review:
>         link <http://patchwork.ozlabs.org/patch/952119/>
>     b. However, our implementation deviates from proposal in following areas:
>        i. Usage of lr_in_ip_routing:
>           Our implementation sets the redirect flag after routing decision is taken.
>           This is to ensure that a user entered static route will not affect the
>           redirect decision (unless it is meant to).
>       ii. Using Tenant VLAN ID for "redirection":
>           Our implementation uses external network router port's
>           (router port that has gateway chassis attached to it) vlan id
>           for redirection. This is because chassisredirect port is NOT on
>           tenant network and logically packet is being forwarded to
>           chassisredirect port.
>     SNAT_AND_DNAT (with external mac):
>     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>     a. Current OVN implementation of not going via gateway chassis aligns with
>         our design and it worked fine.
> This is just an initial proposal. We have identified more areas that
> should be worked upon, we will submit patches (and put forth topics/design for discussion),
> as we make progress.
> Thanks
> Regards,
> Ankur
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

More information about the dev mailing list