[ovs-discuss] Replacing IPsec-GRE tunnel ports

Thu Nov 24 17:21:55 UTC 2016

Hi, Ansis,

Thanks for the time taken to analyze the dumps.

1. For the bad case did you see ESP packets getting fragmented? The
> PCAP file you attached only has iperf packets so I can't tell that.
>

No fragmentation there.

> 2. Also, you did not explicitly mention if packet capture was gathered
> on sender (10.100.0.3) or receiver (10.100.0.4). However, I would be
> inclined to guess that you ran tcpdump on receiver (10.100.0.4),
> because of latency pattern in TCP three-way handshake.
>

Correct.

First of all before troubleshooting any iperf TCP performance issues I
> would recommend you to do several iperf UDP tests with -b flag,
> because TCP flow control introduces a lot of variables that I have to
> speculate about. Run this UDP test couple times and try to guess
> "optimal" target bandwidth when drops are still close to 0% and also
> keep attention to packet reordering.
>

This is a curious test. It seems the ' iperf3 -c 10.100.0.4 -u -b $size'
client will gladly accept any value and print as the speed, while the
server prints... zeros. I'd guess I should expect that, though:

phost3:~ # iperf3 -c 10.100.0.4 -u -b 100000
Connecting to host 10.100.0.4, port 5201
[  4] local 10.100.0.3 port 53959 connected to 10.100.0.4 port 5201
[ ID] Interval           Transfer     Bandwidth       Total Datagrams
[  4]   0.00-1.00   sec  16.0 KBytes   131 Kbits/sec  2
[  4]   1.00-2.00   sec  8.00 KBytes  65.5 Kbits/sec  1
[  4]   2.00-3.00   sec  16.0 KBytes   131 Kbits/sec  2
[  4]   3.00-4.00   sec  8.00 KBytes  65.5 Kbits/sec  1
[  4]   4.00-5.00   sec  16.0 KBytes   131 Kbits/sec  2
[  4]   5.00-6.00   sec  16.0 KBytes   131 Kbits/sec  2
[  4]   6.00-7.00   sec  8.00 KBytes  65.5 Kbits/sec  1
[  4]   7.00-8.00   sec  16.0 KBytes   131 Kbits/sec  2
[  4]   8.00-9.00   sec  8.00 KBytes  65.5 Kbits/sec  1
[  4]   9.00-10.00  sec  16.0 KBytes   131 Kbits/sec  2
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total
Datagrams
[  4]   0.00-10.00  sec   128 KBytes   105 Kbits/sec  0.016 ms  0/1 (0%)
[  4] Sent 1 datagrams

iperf Done.

phost4# iperf3 -s
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 10.100.0.3, port 35272
[  5] local 10.100.0.4 port 5201 connected to 10.100.0.3 port 53959
[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total
Datagrams
[  5]   0.00-1.00   sec  8.00 KBytes  65.5 Kbits/sec  0.016 ms  0/1 (0%)
[  5]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec  0.016 ms  0/0 (-nan%)
[  5]   2.00-3.00   sec  0.00 Bytes  0.00 bits/sec  0.016 ms  0/0 (-nan%)
[  5]   3.00-4.00   sec  0.00 Bytes  0.00 bits/sec  0.016 ms  0/0 (-nan%)
[  5]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec  0.016 ms  0/0 (-nan%)
[  5]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec  0.016 ms  0/0 (-nan%)
[  5]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec  0.016 ms  0/0 (-nan%)
[  5]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec  0.016 ms  0/0 (-nan%)
[  5]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec  0.016 ms  0/0 (-nan%)
[  5]   9.00-10.00  sec  0.00 Bytes  0.00 bits/sec  0.016 ms  0/0 (-nan%)
[  5]  10.00-10.04  sec  0.00 Bytes  0.00 bits/sec  0.016 ms  0/0 (-nan%)
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total
Datagrams
[  5]   0.00-10.04  sec  0.00 Bytes  0.00 bits/sec  0.016 ms  0/1 (0%)
-----------------------------------------------------------

The UDP size where I still see the packets in the server side is slightly
above 25kbps.

I did a dump from a 50kbps transmission from internal and external
interfaces on both machines. From the dump I can see that the first 8k UDP
datagram produces a nice 6 ESP packets sent out to the target. Subsequent
8k UDP packets sent on the internal interface don't end up on the external
interface at all.

UPDATE: After investigating the TCP traffic, I found the TCP flow tries to
send 6.7kB packets and those don't show on the external interface, but
after that, it reduces packets sent on the internal interface to fit in the
interface's MTU. By adding a "-l 1350" option to the iperf3 UDP transfer, I
managed to get 12Mbps with 0.35% loss rate. Anything above that caused much
increased loss.

Now, looking at this 12Mbps traffic closely... nothing bad happening here.

Now getting back to the TCP packet capture that you sent over to me...
> what I see in Wirehark's "TCP stream graph analysis tool" is:
> 1. that TCP data segments are received in bursts that are consistently
> separated by ~0.25s dormant intervals. Since the packet capture was
> gathered on receiver and not on the sender it could mean two things:
> 1.1. Either the TCP ACK from receiver to sender was delayed for one
> reason or another. Hence, TCP flow control kicked in and slowed down
> the data send rate on sender; OR
> 1.2. Either TCP data segments in 0.25s burst-rate fashion were delayed
> from sender to receiver. Since receiver did not receive any data it
> could not acknowledge it and tell sender to send packets at higher
> rate. This is more likely scenario (see point #2).
> 2. There is almost always one TCP segment from the next burst of TCP
> data segments that appears prematurely in previous burst. This makes
> me think that sender actually did send out more data except it was
> queued somewhere (see point #1.2).
> 3. There are bunch of out-of-order TCP segments within the "burst" as
> well. I would be interested to find out if UDP test would confirm the
> same packet reordering.
> 4. can you monitor "ovs-dpctl show" stats in tight loop and see if
> upcalls to ovs-vswitchd increase in 0.25 second pattern as well? This
> would prove or disprove if OVS is queuing packets and introduces this
> 0.25s delay.
>

Really nice analysis. I found that each now and then, sender attempts to
send a packet larger than the interface MTU, and the packet does not appear
on the external interface. This causes a retransmission request.

I am heavily puzzled. I know that path MTU discoveries tend to fail on
these tunnels, this got my attention in a number of places already.
However, the interface has a specific, low, 1394 MTU. Lowering to even less
doesn't help. Looking at the traffic in tcpdump, I guess the kernel fails
to acknowledge the MTU size enforced upon the interface. Or - fails to
acknowledge that from time to time. And only on an interfaces  bound to OVS
with a VLAN tag.

To me it looks like some kernel insight would be useful.

Best regards,
Bolesław Tokarski
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20161124/605f8a86/attachment.html>