[ovs-dev] OVN based distributed virtual routing for VLAN backed networks
mmichels at redhat.com
Wed Oct 17 21:50:22 UTC 2018
Thanks for the detailed document! I always appreciate it when things are
planned out in great detail so we know exactly what to expect.
A general comment: there are places below where things like "figure 1"
and "figure OVN bridge deployment" are referenced, but we can't see
them. Is there a link to another document you can share that has these
Other comments of mine are inline below.
On 10/16/2018 06:43 PM, Ankur Sharma wrote:
> We have done some effort in evaluating usage of OVN for
> Distributed Virtual Routing (DVR) for vlan backed networks.
> We would like to take it forward with the community.
> We understand that some of the work could be overlapping with existing
> patches in review.
> We would appreciate the feedback and would be happy to update our patches
> to avoid known overlaps.
> This email explains the proposal. We will be following it up with patches.
> Each "CODE CHANGES" section summarizes the change that corresponding patch
> would have.
> DISTRIBUTED VIRTUAL ROUTING FOR VLAN BACKED NETWORKS
> 1. OVN Bridge Deployment
> Our design follows following ovn-bridge deployment model
> (please refer to figure OVN Bridge deployment).
> i. br-int ==> OVN managed bridge.
> br-pif ==> Learning Bridge, where physical NICs will be connected.
> ii. Any packet that should be on physical network, will travel from BR-INT
> to BR-PIF, via patch ports (localnet ports).
> 2. Layer 2
> a. Leverage on localnet logical port type as path port between br-int and
> b. Each VLAN backed logical switch will have a localnet port connected
> to it.
> c. Tagging and untagging of vlan headers happens at localnet port boundary.
> PIPELINE EXECUTION:
> a. Unlike geneve encap based solution, where we execute ingress pipeline on
> source chassis and egress pipeline on destination chassis, for vlan
> backed logical switches, packet will go through ingress pipeline
> on destination chassis as well.
> PACKET FLOW (Figure 1. shows topology and Figure 2. shows the packet flow):
> a. VM sends unicast traffic (destined to VM2_MAC) to br-int.
> b. For br-int, destination mac is not local, hence it will forward it to
> localnet port (by design), which is attached to br-pif. This is
> the stage at which vlan tag is added. Br-pif forwards the packet
> to physical interface.
> c. br-pif on destination chassis sends the received traffic to patch-ports
> on br-int (as unicast or unknown unicast).
> d. br-int does vlan tag check, strips the vlan header and sends
> the packet to ingress pipeline of the corresponding datapath.
> KEY DIFFERENCES AS COMPARED TO OVERLAY:
> a. No encapsulation.
> b. Both ingress and egress pipelines of logical switch are executed on
> both source and destination hypervisor (unlike overlay where ingress
> pipeline is executed on source hypervisor and egress on destination).
> CODE CHANGES:
> a. ovn-nb.ovsschema:
> 1. Add a new column to table Logical_Switch.
> 2. Column name would be "type".
> 3. Values would be either "vlan" or "overlay", with "overlay"
> being default.
> b. ovn-sbctl:
> 1. Add a new cli which sets the "type" of logical-switch.
> ovn-nbctl ls-set-network-type SWITCH TYPE
> c. ovn-northd:
> 1. Add a new enum to ovn_datapath struct, which will indicate
> if logical_switch datapath type is overlay or vlan.
> 2. Populate a new key value pair in southbound database for Datapath
> Bindings of Logical_Switch.
> 3. Key value pair: <logical-switch-type, "vlan" or "overlay">, default
> will be overlay.
I believe everything described in this section is doable in OVN already
without any code changes.
Essentially, you can do the following:
1) On a logical switch, create a logical switch port of type "localnet"
and set its addresses to "unknown".
2) On the localnet port, set options:network_name to a network name.
3) On the localnet port, set tag_request to the VLAN identifier you want
4) On the hypervisor where ovn-controller runs, create the br-pif bridge.
5) On the hypervisor where ovn-controller runs, in the Open_vSwitch
table's record, set external-ids:ovn-bridge-mappings =
<network_name>:br-pif. "network_name" in this case is the network_name
you set on the localnet port in step 2.
With this setup, ovn-controller will automatically create the patch
ports between br-int and br-pif, and will use the VLAN tag from the
localnet port for two purposes:
1) On traffic sent out of br-int over the patch port, the tag will be
added to the packet.
2) On traffic received from the patch port into br-int, the VLAN tag
must match the configured VLAN tag on the localnet port. If it matches,
the tag is stripped.
The only aspect of the above I'm not 100% sure about is the logical
switch ingress and egress pipelines being run on both source and
destination hypervisor. But I *think* that's how it works in this case.
> 3. Layer 3 East West
> a. Since the router port is distributed and there is no encapsulation,
> hence packets with router port mac as source mac cannot go on wire.
> b. We propose replacing router port mac with a chassis specific mac,
> whenever packet goes on wire.
> c. Number of chassis_mac per chassis could be dependent on number of
> physical nics and corresponding bond policy on br-pif.
> As of now, we propose only one chassis_mac per chassis
> (shared by all resident logical routers). However, we are analyzing
> if br-pif's bond policy would require more macs per chassis.
> PIPELINE EXECUTION:
> a. For a DVR E-W flow, both ingress and egress pipelines for logical_router
> will execute on source chassis only.
> PACKET FLOW (Figure 3. shows topology and Figure 4. shows the packet flow):
> a. VM1 sends packet (destined to IP2), to br-int.
> b. On Source hypervisor, packet goes through following pipelines:
> 1. Ingress: logical-switch 1
> 2. Egress: logical-switch 1
> 3. Ingress: logical-router
> 4. Egress: logical-router
> 5. Ingress: logical-switch2
> 6. Egress: logical-switch2
> On wire, packet goes out with destination logical switch's vlan.
> As mentioned in design, source mac (RP2_MAC) would be replaced with
> CHASSIS_MAC and destination mac would be that of VM2.
> c. Packet reaches destination chassis and enters logical-switch2
> pipeline in br-int.
> d. Packet goes through logical-switch2 pipeline (both ingress and egress)
> and gets forwarded to VM2.
> CODE CHANGES:
> a. ovn-sb.ovsschema:
> 1. Add a new column to the table Chassis.
> 2. Column name would be "chassis_macs", type being string and no
> limit on range of values.
> 3. This column will hold a list if chassis unique macs.
> 4. This table will be populated from ovn-controller.
> b. ovn-sbctl:
> 1. CLI to add/delete chassis_macs to/from the south bound database.
> c. ovn-controller:
> 1. Read chassis macs from OVS Open_Vswitch table and populate
> south bound database.
> 2. In table=65, add a new flow at priority 150, which will do following:
> a. Match: source_mac == router_port_mac, metadata ==
> destination_logical_switch, logical_outport = localnet_port
> b. Action: Replace source mac with chassis_mac, add vlan tag.
It sounds like this shares some similarities with this proposed patch:
In the linked patch, the idea is to use a consistent source MAC in order
to play well with physical switches. However, the approach used in the
linked patch is quite different from your proposal here.
I like your proposal because I like the explicit configuration. The one
question I have is, how do you determine which chassis MAC to use if
multiple are specified? One idea might be to use something similar to
the ovn-bridge-mappings. In other words, you map a network_name to a
specific chassis MAC.
> 4. LAYER 3 North South (NO NAT)
> a. For talking to external network endpoint, we will need a gateway
> on OVN DVR.
> b. We propose to use the gateway_chassis construct to achieve the same.
> c. LRP will be attached to Gateway Chassis(s) and only on the active
> chassis we will respond to ARP request for the LRP IP from undelay
> d. If NATing (keeping state) is not involved then traffic need not go
> via the gateway chassis always, i.e traffic from OVN chassis to
> external network need not go via the gateway chassis.
> PIPELINE EXECUTION:
> a. From endpoint on OVN chassis to endpoint on underlay.
> i. Like DVR E-W, logical_router ingress and egress pipelines are
> executed on source chassis.
> b. From endpoint on underlay TO endpoint on OVN chassis.
> i. logical_router ingress and egress pipelines are executed on
> gateway chassis.
> PACKET FLOW LS ENDPOINT to UNDERLAY ENDPOINT (Figure 5. shows topology):
> a. Packet flow in this case is exactly same as Layer 3 E-W.
> PACKET FLOW UNDERLAY ENDPOINT to LS ENDPOINT (Figure 5. shows topology and
> Figure 6. shows the packet flow):
> a. Gateway for endpoints behind DVR will be resident on only
> b. Unicast packets will come to gateway-chassis, with destination MAC
> being RP2_MAC.
> c. From now on, it is like L3 E-W flow.
> CODE CHANGES:
> a. ovn-northd:
> 1. Changes to respond to vlan backed router port ARP from uplink,
> only if it is on a gateway chassis.
> 2. Changes to make sure that in the absence of NAT configuration,
> OVN_CHASSIS to external network traffic does not go via the gateway
> b. ovn-controller:
> 1. Send out garps, advertising the vlan backed router port's
> (which has gateway chassis attached to it) from the
> active gateway chassis.
It may be because it's getting late, but I'm having trouble following
Maybe the figures would help to visualize it better?
> 5. LAYER 3 North South (NAT)
> SNAT, DNAT, SNAT_AND_DNAT (without external mac):
> a. Our proposal aligns with following patch series which is out for review:
> link <http://patchwork.ozlabs.org/patch/952119/>
> b. However, our implementation deviates from proposal in following areas:
> i. Usage of lr_in_ip_routing:
> Our implementation sets the redirect flag after routing decision is taken.
> This is to ensure that a user entered static route will not affect the
> redirect decision (unless it is meant to).
> ii. Using Tenant VLAN ID for "redirection":
> Our implementation uses external network router port's
> (router port that has gateway chassis attached to it) vlan id
> for redirection. This is because chassisredirect port is NOT on
> tenant network and logically packet is being forwarded to
> chassisredirect port.
> SNAT_AND_DNAT (with external mac):
> a. Current OVN implementation of not going via gateway chassis aligns with
> our design and it worked fine.
> This is just an initial proposal. We have identified more areas that
> should be worked upon, we will submit patches (and put forth topics/design for discussion),
> as we make progress.
> dev mailing list
> dev at openvswitch.org
More information about the dev