[ovs-dev] [RFC v4] Add TCP encap_rcv hook (repost)

Wed Apr 25 17:17:25 UTC 2012

On Wed, Apr 25, 2012 at 1:39 AM, Simon Horman <horms at verge.net.au> wrote:
> On Tue, Apr 24, 2012 at 04:02:41PM +0000, Kyle Mestery (kmestery) wrote:
>> On Apr 23, 2012, at 9:25 PM, Simon Horman wrote:
>> > On Mon, Apr 23, 2012 at 03:59:24PM -0700, Jesse Gross wrote:
>> >> On Mon, Apr 23, 2012 at 3:32 PM, Simon Horman <horms at verge.net.au> wrote:
>> >>> On Mon, Apr 23, 2012 at 02:38:07PM -0700, Jesse Gross wrote:
>> >>>> On Mon, Apr 23, 2012 at 2:08 PM, David Miller <davem at davemloft.net> wrote:
>> >>>>> From: Jesse Gross <jesse at nicira.com>
>> >>>>> Date: Mon, 23 Apr 2012 13:53:42 -0700
>> >>>>>
>> >>>>>> On Mon, Apr 23, 2012 at 1:13 PM, David Miller <davem at davemloft.net> wrote:
>> >>>>>>> From: Jesse Gross <jesse at nicira.com>
>> >>>>>>> Date: Mon, 23 Apr 2012 13:08:49 -0700
>> >>>>>>>
>> >>>>>>>> Assuming that the TCP stack generates large TSO frames on transmit
>> >>>>>>>> (which could be the local stack; something sent by a VM; or packets
>> >>>>>>>> received, coalesced by GRO and then encapsulated by STT) then you can
>> >>>>>>>> just prepend the STT header (possibly slightly adjusting things like
>> >>>>>>>> requested MSS, number of segments, etc. slightly).  After that it's
>> >>>>>>>> possible to just output the resulting frame through the IP stack like
>> >>>>>>>> all tunnels do today.
>> >>>>>>>
>> >>>>>>> Which seems to potentially suggest a stronger intergration of the STT
>> >>>>>>> tunnel transmit path into our IP stack rather than the approach Simon
>> >>>>>>> is taking
>> >>>>>>
>> >>>>>> Did you have something in mind?
>> >>>>>
>> >>>>> A normal bonafide tunnel netdevice driver like GRE instead of the
>> >>>>> openvswitch approach Simon is using.
>> >>>>
>> >>>> Ahh, yes, that I agree with.  Independent of this, there's work being
>> >>>> done to make it so that OVS can use the normal in-tree tunneling code
>> >>>> and not need its own.  Once that's done I expect that STT will follow
>> >>>> the same model.
>> >>>
>> >>> Hi Jesse,
>> >>>
>> >>> I am wondering how firm the plans to on allowing OVS to use in-tree tunnel
>> >>> code are. I'm happy to move my efforts over to an in-tree STT implementation
>> >>> but ultimately I would like to get STT running in conjunction with OVS.
>> >>
>> >> I would say that it's a firm goal but the implementation probably
>> >> still has a ways to go.  Kyle Mestery (CC'ed) has volunteered to work
>> >> on this in support of adding VXLAN, which needs some additional
>> >> flexibility that this approach would also provide.  You might want to
>> >> talk to him to see if there are ways that you guys can work together
>> >> on it if you are interested.  Having better integration with upstream
>> >> tunneling is definitely a step that OVS needs to make and sooner would
>> >> be better than later.
>> >
>> > Hi Jesse, Hi Kyle,
>> >
>> > that sounds like an excellent plan.
>> >
>> > Kyle, do you have any thoughts on how we might best work together on this?
>> > Perhaps there are some patches floating around that I could take a look at?
>> >
>>
>> Hi Simon:
>>
>> The VXLAN work has been slow going for me at this point. What I have works, but is far from complete. It's available here:
>>
>> https://github.com/mestery/ovs-vxlan/tree/vxlan
>>
>> This is based on a fairly recent version of OVS. I'm currently working to allow tunnels to be flow-based rather than port-based, as they currently exist.
>> As Jesse may have mentioned, doing this allows us to move most tunnel state into user space. The outer header can now be part of the flow lookup and can
>> be passed to user space, so things like multicast learning for VXLAN become possible.
>>
>> With regards to working together, ping me off-list and we can work something out, I'm very much in favor of this!
>
> Hi Kyle,
>
> the component that is of most interest to me is enabling OVS to use in-tree
> tunnelling code - as it seems that makes most sense for an implementation
> of STT. I have taken a brief look over your vxlan work and it isn't clear
> to me if it is moving towards being an in-tree implementation.  Moreover,
> I'm a rather unclear on what changes need to be made to OVS in order for
> in-tree tunneling to be used.
>
> My recollection is that OVS did make use of in-tree tunnelling code
> but this was removed in favour of the current implementation for various
> reasons (performance being one IIRC). I gather that revisiting in-tree
> tunnelling won't revisit the previous set of problems. But I'm unclear how.
>
> Jesse, is it possible for you to describe that in a little detail
> or point me to some information?

This was what I had originally written a while back, although it's
more about OVS internally and less about how to connect to the in-tree
code:
http://openvswitch.org/pipermail/dev/2012-February/014779.html

In order to flexibly implement support for current and future tunnel
protocols OVS needs to be able to get/set information about the outer
tunnel header when processing the inner packet.  At the very least
this is src/dst IP addresses and the key/ID/VNI/etc.  In the upstream
tunnel implementations those are implicitly encoded in the device that
sends or receives the packet.  However, this has a two problems:
number of devices and ability to handle unknown values.  We addressed
part of this problem by allowing the tunnel ID to be set and matched
through the OVS flow table and an action.  In order to do this with
the in-tree tunneling code, we obviously need a way of passing this
information around since it would currently get lost as we pass
through the Linux device layer.

The plan to deal with that is to add a function to the in-tree
tunneling code that allows a skb to be encapsulated with specific
parameters and conversely a hook to receive decapsulated packets along
with header info.  This would make all of the kernel tunneling code
common, while still giving OVS userspace the ability to implement
essentially any type of tunneling policy.  In many ways, this is very
similar to how vlans look in OVS today.

While it would be possible to implement the hook to use the in-tree
tunnel code today without a lot of changes, we already know that we
want to move away from port-based model in the OVS kernel module
towards the flow model.  As we push this upstream the userspace/kernel
API should be the correct one, so that's why these two things are tied
together.