[ovs-dev] [PATCH] datapath: Enable offloading on internal devices.

Jesse Gross jesse at nicira.com
Fri May 7 21:10:30 UTC 2010


On Fri, May 7, 2010 at 9:59 AM, Ben Pfaff <blp at nicira.com> wrote:

> On Thu, May 06, 2010 at 06:34:26PM -0700, Jesse Gross wrote:
> > Enables checksum offloading, scatter/gather, and TSO on internal
> > devices.  While these optimizations were not previously enabled on
> > internal ports we already could receive these types of packets from
> > Xen guests.  This has the obvious performance benefits when these
> > packets can be passed directly to hardware.
> >
> > There is also a more subtle benefit for GRE on Xen.  GRE packets
> > pass through OVS twice - once before encapsulation and once after
> > encapsulation, moving through an internal device in the process.
> > If it is a SG packet (as is common on Xen), a copy was necessary
> > to linearize for the internal device.  However, Xen uses the
> > memory allocator to track packets so when the original packet is
> > freed after the copy netback notifies the guest that the packet
> > has been sent, despite the fact that it is actually sitting in the
> > transmit queue.  The guest then sends packets as fast as the CPU
> > can handle, overflowing the transmit queue.  By enabling SG on
> > the internal device, we avoid the copy and keep the accounting
> > correct.
>
> This is excellent detective work.  How did you figure all of this out?
>

Through much pain, suffering, and reading of source code...


>
> > In certain circumstances this patch can decrease performance for
> > TCP.  TCP has its own mechanism for tracking in-flight packets
> > and therefore does not benefit from the corrected socket accounting.
> > However, certain NICs do not like SG when it is not being used for
> > TSO (these packets can no longer be handled by TSO after GRE
> > encapsulation).  These NICs presumably enable SG even though they
> > can't handle it well because TSO requires SG.
>
> This performance problem seems bizarre to me.  If these NICs don't
> handle scatter-gather well, to the extent that linearizing the packet in
> software yields better performance, why don't their NIC drivers
> linearize the packets?  Is it possible that we should try tweaking the
> NIC drivers a bit to see if we can get the performance back, and then
> pass those tweaks to the upstream maintainers of the drivers?
>

I agree that this should be handled in software in the drivers, my guess is
that this is just not a very common use case for most people.  The problem
is that there are many drivers, each of which support many cards with
potentially different capabilities, making it seem like a losing battle.  We
could deal with it as it comes up or just tell people to buy Intel NICs.


>
> The only thing I noticed in the patch is that it seems a little odd to
> make vport_receive() require its caller to call compute_ip_summed().
> Each of its callers calls compute_ip_summed() just before
> vport_receive(), so it might be sensible to have vport_receive() call it
> itself (although then it would need to take the xmit argument itself,
> which might be odder than the choice you made).
>

I would normally be inclined to do it that way too.  However, I was thinking
ahead to the "patch" vport, which I want to maintain the current state,
without calling compute_ip_summed().

I pushed this out.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-dev/attachments/20100507/7fc923b9/attachment-0003.html>


More information about the dev mailing list