[ovs-discuss] Intra-Bridge Perfomance issue

Jesse Gross jesse at nicira.com
Thu Aug 30 01:27:15 UTC 2012


On Wed, Aug 29, 2012 at 6:19 PM, Michael A. Collins
<mike.a.collins at ark-net.org> wrote:
> I have several xensource servers running lots of PV-On-HVM Windows DomUs and
> I have a pretty weird problem.  Here are my details:
> Kernel: 3.5.0-rc2
> OpenvSwitch module: Built-in from upstream (aka did not install kernel
> module when building OpenvSwitch)
> OpenvSwitch userland tools: version 1.4.0
>
> I have a single Bridge with two fake-bridges.
> I have configured a LACP bond with 4 physical nics that connects to a
> PortChannel on a 6509.
> I setup the native vlan on the 6509 to be 102.
> I have configured the bond with vlan_mode=native-untagged and tag=102.
> All my vms are added to the fake-bridge associated with vlan 102.
> I have four servers configured this way all connected to the same 6509,
> ServerA, ServerB, ServerC, and ServerD.
>
> I have no problem sending and receiving traffic to any VM on any of the four
> servers, in other words all my VMs get IPs from a DHCP Server and can icmp
> each other.
> I have decent performance moving files, SMB2, from VMs that are on different
> servers, aka VM1 on ServerA copies a file to VM2 on ServerB.
>
> I have horrible performance when moving files, SMB2, from VMs that are on
> the same server, aka VM1 on ServerA copies a file to VM2 on ServerA.  I am
> not an expert on how OpenvSwitch works, and I can't discount that my own
> stupidity may be behind this, but I am at a loss for what to do to
> troubleshoot this.
>
> I have captured packets of a reproducible type or session of network
> traffic, aka Logging into a VM with the same account which has a roaming
> profile configured.  This pulls down about 50MB of data and when logging
> into a VM that is on a different server than the file server that hosts the
> profile it takes about 11 seconds.  When logging into a VM that is on the
> same server as the file server it takes well over 20 minutes and never
> really succeeds.
>
> What I can see that is different from the two packet captures are the amount
> of retransmits, Duplicate ACKs and Out-of-Order packets are insane when
> going from vm to vm on the same server.
>
> It seems to me after looking at the traffic in the capture that everything
> is trucking along until we get to a large file, say 5MB, then it just falls
> apart.  On the VM that is on a different server, I can get the file moved
> across in only 458 packets, with only 36 TCP ACKed lost segment packets
> flagged.
> On the VM that is on the same server, I can't get the file moved across even
> after 6800+ packets, with 5500 Dup ACKs, Out-of-Order or retransmission
> packets flagged.
>
> There has to be something going on that could explain this, but I am at a
> loss!  Any help would be greatly appreciated!!

The fact that it only happens when you start to see large packets
likely means that it is related to TCP segmentation offload.  I know
that some versions of the Windows PV drivers on Xen had bugs in this
area so I would look to see if there is a newer version that you can
upgrade to.  I don't know which versions are affected though.



More information about the discuss mailing list