[ovs-dev] [PATCH 2/2] [RFC] tunnelling: stt: Prototype Implementation

Simon Horman horms at verge.net.au
Fri Apr 20 02:39:17 UTC 2012


On Tue, Apr 17, 2012 at 07:58:33PM -0700, Jesse Gross wrote:
> On Thu, Apr 12, 2012 at 12:36 AM, Simon Horman <horms at verge.net.au> wrote:
> > This is a not yet well exercised implementation of STT intended for review,
> > I am sure there are numerous areas that need improvement.
> >
> > In particular:
> > - The transmit path's generation of partial checksums needs to be tested
> > - The VLAN stripping code needs to be excercised
> > - The code needs to be exercised in the presence of HW checksumming
> > - In general, the code has been exercised by running Open vSwtich in
> >  KVM guests on the same host. Testing between physucal hosts is needed.
> >
> > This implementation is based on the CAPWAP implementation and in particular
> > includes defragmentation code almost identical to CAPWAP. It seems to me
> > that while fragmentation can be handled by GSO/TSO, defragmentation code is
> > needed in STT in the case where LRO/GRO doesn't reassemble an entire STT
> > frame for some reason.
> >
> > If the defragmentation code, which is of non-trivial length, remains more
> > or less in its present state then there is some scope for consolidation
> > with CAPWAP. Other code that may possibly be consolidated with CAPWAP has
> > been marked accordingly.
> >
> > This code depends on a encap_rcv hook being added to the Linux Kernel's TCP
> > stack. A patch to add such a hook will be posted separately. Ultimately
> > this change or some alternative will need to be applied to the mainline
> > Linux kernel's TCP stack if STT is to be widely deployed. Motivating this
> > change to the TCP stack is part of the purpose of this prototype STT
> > implementation.
> >
> > The configuration of STT is analogous to that of other tunneling
> > protocols such as GRE which are supported by Open vSwtich.
> >
> > e.g.
> >
> > ovs-vsctl add bridge project0 ports @newport \
> >        -- --id=@newport create port name=stt0 interfaces=[@newinterface] \
> >        -- --id=@newinterface create interface name=stt0 type=stt options="remote_ip=10.0.99.192,key=64"
> >
> > Signed-off-by: Simon Horman <horms at verge.net.au>
> 
> Sorry about being slow on this, I've been busy lately.  I have a few
> high level comments:
> 
>  * The hardware offloading parts don't look quite right to me.  I
> think there's confusion about what is happening between the inner and
> outer packets.  The offloads that are set when we are given a packet
> for encapsulation should essentially just be stored in the STT header
> (since this is what we will pass to the NIC if we decapsulate and
> forward).  Once we do that and add STT, we can set up the resulting
> packet for checksum offload/TSO based only on that packet and the
> device that it will be forwarded out.
>  * Conversely on receive we need to use the values stored in the STT
> header to recreate the skb offloading metadata.

Thanks, I will work on this.

>  * I think we could use the OVS flow hash to compute the src port and
> save a bunch of code.

Sure, I will look into that. The hash calculation code is indeed
rather verbose.

>  * Generally vlan will be stored out of band in skb->vlan_tci, so if
> we make it so the generic tunneling code doesn't insert that into the
> packet then we won't have to strip it out here.

Ok, I will look into that. As it stands It seems that it is
being inserted.

>  * It should always be possible to encapsulate any Ethernet frame.
> Things that don't have explicit support like QinQ can still be sent
> they just won't get the benefit of offloading (similar to a NIC that
> doesn't support a particular header format).

My reading of the spec was that it required the inner frame to
be untagged ethernet. I'll correct my code as per your description above.

>  * There are a lot of atomic ops and other expensive things like division...

Is there anywhere in particular you are noticing theses?

> I would recommend testing with netperf or some other tool that
> generates traffic that uses offloading since offloads are such an
> important motivator for the protocol.  I think that it would help sort
> out some of the issues in that area, which in would in turn probably
> illuminate some other parts of the protocol.

Thanks, good idea.



More information about the dev mailing list