[ovs-dev] [PATCH 2/2] [RFC] tunnelling: stt: Prototype Implementation

Wed Apr 18 02:58:33 UTC 2012

On Thu, Apr 12, 2012 at 12:36 AM, Simon Horman <horms at verge.net.au> wrote:
> This is a not yet well exercised implementation of STT intended for review,
> I am sure there are numerous areas that need improvement.
>
> In particular:
> - The transmit path's generation of partial checksums needs to be tested
> - The VLAN stripping code needs to be excercised
> - The code needs to be exercised in the presence of HW checksumming
> - In general, the code has been exercised by running Open vSwtich in
>  KVM guests on the same host. Testing between physucal hosts is needed.
>
> This implementation is based on the CAPWAP implementation and in particular
> includes defragmentation code almost identical to CAPWAP. It seems to me
> that while fragmentation can be handled by GSO/TSO, defragmentation code is
> needed in STT in the case where LRO/GRO doesn't reassemble an entire STT
> frame for some reason.
>
> If the defragmentation code, which is of non-trivial length, remains more
> or less in its present state then there is some scope for consolidation
> with CAPWAP. Other code that may possibly be consolidated with CAPWAP has
> been marked accordingly.
>
> This code depends on a encap_rcv hook being added to the Linux Kernel's TCP
> stack. A patch to add such a hook will be posted separately. Ultimately
> this change or some alternative will need to be applied to the mainline
> Linux kernel's TCP stack if STT is to be widely deployed. Motivating this
> change to the TCP stack is part of the purpose of this prototype STT
> implementation.
>
> The configuration of STT is analogous to that of other tunneling
> protocols such as GRE which are supported by Open vSwtich.
>
> e.g.
>
> ovs-vsctl add bridge project0 ports @newport \
>        -- --id=@newport create port name=stt0 interfaces=[@newinterface] \
>        -- --id=@newinterface create interface name=stt0 type=stt options="remote_ip=10.0.99.192,key=64"
>
> Signed-off-by: Simon Horman <horms at verge.net.au>

Sorry about being slow on this, I've been busy lately.  I have a few
high level comments:

 * The hardware offloading parts don't look quite right to me.  I
think there's confusion about what is happening between the inner and
outer packets.  The offloads that are set when we are given a packet
for encapsulation should essentially just be stored in the STT header
(since this is what we will pass to the NIC if we decapsulate and
forward).  Once we do that and add STT, we can set up the resulting
packet for checksum offload/TSO based only on that packet and the
device that it will be forwarded out.
 * Conversely on receive we need to use the values stored in the STT
header to recreate the skb offloading metadata.
 * I think we could use the OVS flow hash to compute the src port and
save a bunch of code.
 * Generally vlan will be stored out of band in skb->vlan_tci, so if
we make it so the generic tunneling code doesn't insert that into the
packet then we won't have to strip it out here.
 * It should always be possible to encapsulate any Ethernet frame.
Things that don't have explicit support like QinQ can still be sent
they just won't get the benefit of offloading (similar to a NIC that
doesn't support a particular header format).
 * There are a lot of atomic ops and other expensive things like division...

I would recommend testing with netperf or some other tool that
generates traffic that uses offloading since offloads are such an
important motivator for the protocol.  I think that it would help sort
out some of the issues in that area, which in would in turn probably
illuminate some other parts of the protocol.