[ovs-dev] VXLAN Patches

Mon Feb 6 22:55:49 UTC 2012

On Feb 6, 2012, at 3:19 PM, Jesse Gross wrote:
> On Sat, Feb 4, 2012 at 6:52 PM, Kyle Mestery <kmestery at cisco.com> wrote:
>> On Feb 4, 2012, at 1:14 PM, Joseph Glanville wrote:
>>> Hi
>>> 
>>> I ported the aforementioned patches to 1.4 but haven't gotten much
>>> further than that yet.
>>> Would be interested in lending a hand soonish. (Bit snowed under atm)
>>> 
>>> Joseph.
>>> 
>> Yes, I also did the same (see the github reference below). At this point, I was waiting for Jesse to send out his initial thoughts on the initial design before getting started. Will be happy to work with you on this Joseph, lets wait and see Jesse's thoughts before diving in deeper.
> 
> Great, thanks guys.  Obviously this is something that we've wanted to
> do for a while now but have gotten to it yet (and as an aside this is
> the major reason why we haven't pushed for tunneling support in
> upstream Linux yet as we wanted to get the model right first).
> 
> I spent some time thinking about how to do this and while the plan
> isn't fully fleshed out yet here's the rough idea:
> 
> When we first implemented tunneling support, all configuration went on
> the tunnel port itself including IP addresses and the key (or VNI or
> whatever you want to call it).  For a given packet, the tunnel
> properties were chosen by outputting to a particular port and on
> receive the port was done by a hash lookup before the flow table.  We
> later added an extension so that the key could be set via an action
> and included in the main flow table.  The goal here would be to do the
> same thing for the remaining tunnel fields (src/dst IP, ToS, TTL).
> The immediate impact is that it would allow us to drop the separate
> tunnel hash table and ToS/TTL inheritance mechanisms from the kernel
> and implement them in userspace.  It would also enable learning from
> multicast like vxlan does since the source information is not lost in
> the tunnel lookup like it currently is.
> 
Being able to set all this information from userspace via extensions is good. It moves back to handling configuration of tunnel ports into the protocol (via extensions) instead of punting the problem outside the protocol. Also, does this mean we would still end up with a single tunnel per remote host in the case of VXLAN?

> I think when you go down this path, you essentially end up with a
> tunnel port (or perhaps some other new construct) that indicates only
> the encapsulation format.  It registers with the IP stack to be the
> packet handler and implements protocol-specific
> encapsulation/fragmentation/etc.  On receive it supplies the outer
> header values along with the packet you get something like this as the
> flow to lookup:
> in_port(tunnel port), ip(struct ovs_key_ipv4), vxlan(vni),
> encap(struct ovs_key_ethernet),...)
> 
This looks great!

> In the kernel we can arrange the new fields at the end of the struct
> so there is no performance cost for non-tunneled packets and when
> talking to userspace the new fields won't be included at all if not
> used.
> 
> On transmit, we would have an action for setting those fields,
> possibly plus a few of the current configuration options.  i.e.:
> set_tunnel(struct ovs_key_ipv4, vni, csum...)
> These get used when the packet is encapsulated after being output to a
> tunnel port.
> 
> Once we do this, there is a less direct mapping between kernel vports
> and those in userspace.  I'd like to maintain the current port-based
> mechanism in addition vxlan learning so that basically means we need
> to look at received packets and assign them to an input port in
> userspace (and handle stats, etc.).  On output we would need to add
> the appropriate set_tunnel action instead of a direct output.  The
> final component, which I don't have a good plan for at the moment, is
> how to deal with ports on different datapaths since the tunnel packets
> can only arrive on a single datapath but they might want to be bridged
> to a physical port on another datapath.
> 
So, this somewhat ties into my earlier question about the number of tunnel ports. This design assumes a single tunnel port per host, but to fit in with the existing design, we'd need a single tunnel port per datapath? Essentially it would be good to have a single tunnel port and be able to have it service multiple datapaths on the host, right? I need to think about that a bit.

> Obviously, this is all very rough and I'll keep working on it but
> hopefully it's useful to start thinking about.
> 
> Thoughts?