[ovs-discuss] Intra-Bridge Perfomance issue
Mike Collins
mike.a.collins at ark-net.org
Thu Aug 30 02:43:35 UTC 2012
On Aug 29, 2012, at 9:27 PM, Jesse Gross <jesse at nicira.com> wrote:
> On Wed, Aug 29, 2012 at 6:19 PM, Michael A. Collins
> <mike.a.collins at ark-net.org> wrote:
>> I have several xensource servers running lots of PV-On-HVM Windows
>> DomUs and
>> I have a pretty weird problem. Here are my details:
>> Kernel: 3.5.0-rc2
>> OpenvSwitch module: Built-in from upstream (aka did not install
>> kernel
>> module when building OpenvSwitch)
>> OpenvSwitch userland tools: version 1.4.0
>>
>> I have a single Bridge with two fake-bridges.
>> I have configured a LACP bond with 4 physical nics that connects to a
>> PortChannel on a 6509.
>> I setup the native vlan on the 6509 to be 102.
>> I have configured the bond with vlan_mode=native-untagged and
>> tag=102.
>> All my vms are added to the fake-bridge associated with vlan 102.
>> I have four servers configured this way all connected to the same
>> 6509,
>> ServerA, ServerB, ServerC, and ServerD.
>>
>> I have no problem sending and receiving traffic to any VM on any of
>> the four
>> servers, in other words all my VMs get IPs from a DHCP Server and
>> can icmp
>> each other.
>> I have decent performance moving files, SMB2, from VMs that are on
>> different
>> servers, aka VM1 on ServerA copies a file to VM2 on ServerB.
>>
>> I have horrible performance when moving files, SMB2, from VMs that
>> are on
>> the same server, aka VM1 on ServerA copies a file to VM2 on
>> ServerA. I am
>> not an expert on how OpenvSwitch works, and I can't discount that
>> my own
>> stupidity may be behind this, but I am at a loss for what to do to
>> troubleshoot this.
>>
>> I have captured packets of a reproducible type or session of network
>> traffic, aka Logging into a VM with the same account which has a
>> roaming
>> profile configured. This pulls down about 50MB of data and when
>> logging
>> into a VM that is on a different server than the file server that
>> hosts the
>> profile it takes about 11 seconds. When logging into a VM that is
>> on the
>> same server as the file server it takes well over 20 minutes and
>> never
>> really succeeds.
>>
>> What I can see that is different from the two packet captures are
>> the amount
>> of retransmits, Duplicate ACKs and Out-of-Order packets are insane
>> when
>> going from vm to vm on the same server.
>>
>> It seems to me after looking at the traffic in the capture that
>> everything
>> is trucking along until we get to a large file, say 5MB, then it
>> just falls
>> apart. On the VM that is on a different server, I can get the file
>> moved
>> across in only 458 packets, with only 36 TCP ACKed lost segment
>> packets
>> flagged.
>> On the VM that is on the same server, I can't get the file moved
>> across even
>> after 6800+ packets, with 5500 Dup ACKs, Out-of-Order or
>> retransmission
>> packets flagged.
>>
>> There has to be something going on that could explain this, but I
>> am at a
>> loss! Any help would be greatly appreciated!!
>
> The fact that it only happens when you start to see large packets
> likely means that it is related to TCP segmentation offload. I know
> that some versions of the Windows PV drivers on Xen had bugs in this
> area so I would look to see if there is a newer version that you can
> upgrade to. I don't know which versions are affected though.
>
Wouldn't TCP seg offload affect all the traffic not just the traffic
that stays on the bridge? I will go grab the newest version of the pv
drivers and let you know.
Mike
More information about the discuss
mailing list