[ovs-dev] [PATCH] datapath: Enable offloading on internal devices.

Ben Pfaff blp at nicira.com
Fri May 7 16:59:50 UTC 2010


On Thu, May 06, 2010 at 06:34:26PM -0700, Jesse Gross wrote:
> Enables checksum offloading, scatter/gather, and TSO on internal
> devices.  While these optimizations were not previously enabled on
> internal ports we already could receive these types of packets from
> Xen guests.  This has the obvious performance benefits when these
> packets can be passed directly to hardware.
> 
> There is also a more subtle benefit for GRE on Xen.  GRE packets
> pass through OVS twice - once before encapsulation and once after
> encapsulation, moving through an internal device in the process.
> If it is a SG packet (as is common on Xen), a copy was necessary
> to linearize for the internal device.  However, Xen uses the
> memory allocator to track packets so when the original packet is
> freed after the copy netback notifies the guest that the packet
> has been sent, despite the fact that it is actually sitting in the
> transmit queue.  The guest then sends packets as fast as the CPU
> can handle, overflowing the transmit queue.  By enabling SG on
> the internal device, we avoid the copy and keep the accounting
> correct.

This is excellent detective work.  How did you figure all of this out?

> In certain circumstances this patch can decrease performance for
> TCP.  TCP has its own mechanism for tracking in-flight packets
> and therefore does not benefit from the corrected socket accounting.
> However, certain NICs do not like SG when it is not being used for
> TSO (these packets can no longer be handled by TSO after GRE
> encapsulation).  These NICs presumably enable SG even though they
> can't handle it well because TSO requires SG.

This performance problem seems bizarre to me.  If these NICs don't
handle scatter-gather well, to the extent that linearizing the packet in
software yields better performance, why don't their NIC drivers
linearize the packets?  Is it possible that we should try tweaking the
NIC drivers a bit to see if we can get the performance back, and then
pass those tweaks to the upstream maintainers of the drivers?

The only thing I noticed in the patch is that it seems a little odd to
make vport_receive() require its caller to call compute_ip_summed().
Each of its callers calls compute_ip_summed() just before
vport_receive(), so it might be sensible to have vport_receive() call it
itself (although then it would need to take the xmit argument itself,
which might be odder than the choice you made).




More information about the dev mailing list