[ovs-dev] [PATCH] userspace: fix bad UDP performance issue of veth
yang_y_yi at 163.com
yang_y_yi at 163.com
Thu Aug 20 09:12:14 UTC 2020
From: Yi Yang <yangyi01 at inspur.com>
iperf3 UDP performance of veth to veth case is
very very bad because of too many packet loss,
the root cause is rmem_default and wmem_default
are just 212992, but iperf3 UDP test used 8K
UDP size which resulted in many UDP fragment in
case that MTU size is 1500, one 8K UDP send would
enqueue 6 UDP fragments to socket receive queue,
the default small socket buffer size can't cache
so many packets that many packets are lost.
This commit fixed packet loss issue, it set socket
receive and send buffer to maximum possible value,
therefore there will not be packet loss forever,
this also helps improve TCP performance because of
no retransmit.
By the way, big socket buffer doesn't mean it will
allocate big buffer on creating socket, actually
it won't alocate any extra buffer compared to default
socket buffer size, it just means more skbuffs can
be enqueued to socket receive queue and send queue,
therefore there will not be packet loss.
The below is for your reference.
The result before apply this commit
===================================
$ ip netns exec ns02 iperf3 -t 5 -i 1 -u -b 100M -c 10.15.2.6 --get-server-output -A 5
Connecting to host 10.15.2.6, port 5201
[ 4] local 10.15.2.2 port 59053 connected to 10.15.2.6 port 5201
[ ID] Interval Transfer Bandwidth Total Datagrams
[ 4] 0.00-1.00 sec 10.8 MBytes 90.3 Mbits/sec 1378
[ 4] 1.00-2.00 sec 11.9 MBytes 100 Mbits/sec 1526
[ 4] 2.00-3.00 sec 11.9 MBytes 100 Mbits/sec 1526
[ 4] 3.00-4.00 sec 11.9 MBytes 100 Mbits/sec 1526
[ 4] 4.00-5.00 sec 11.9 MBytes 100 Mbits/sec 1526
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[ 4] 0.00-5.00 sec 58.5 MBytes 98.1 Mbits/sec 0.047 ms 357/531 (67%)
[ 4] Sent 531 datagrams
Server output:
-----------------------------------------------------------
Accepted connection from 10.15.2.2, port 60314
[ 5] local 10.15.2.6 port 5201 connected to 10.15.2.2 port 59053
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[ 5] 0.00-1.00 sec 1.36 MBytes 11.4 Mbits/sec 0.047 ms 357/531 (67%)
[ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 0.047 ms 0/0 (-nan%)
[ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 0.047 ms 0/0 (-nan%)
[ 5] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec 0.047 ms 0/0 (-nan%)
[ 5] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec 0.047 ms 0/0 (-nan%)
iperf Done.
The result after apply this commit
===================================
$ sudo ip netns exec ns02 iperf3 -t 5 -i 1 -u -b 4G -c 10.15.2.6 --get-server-output -A 5
Connecting to host 10.15.2.6, port 5201
[ 4] local 10.15.2.2 port 48547 connected to 10.15.2.6 port 5201
[ ID] Interval Transfer Bandwidth Total Datagrams
[ 4] 0.00-1.00 sec 440 MBytes 3.69 Gbits/sec 56276
[ 4] 1.00-2.00 sec 481 MBytes 4.04 Gbits/sec 61579
[ 4] 2.00-3.00 sec 474 MBytes 3.98 Gbits/sec 60678
[ 4] 3.00-4.00 sec 480 MBytes 4.03 Gbits/sec 61452
[ 4] 4.00-5.00 sec 480 MBytes 4.03 Gbits/sec 61441
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[ 4] 0.00-5.00 sec 2.30 GBytes 3.95 Gbits/sec 0.024 ms 0/301426 (0%)
[ 4] Sent 301426 datagrams
Server output:
-----------------------------------------------------------
Accepted connection from 10.15.2.2, port 60320
[ 5] local 10.15.2.6 port 5201 connected to 10.15.2.2 port 48547
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[ 5] 0.00-1.00 sec 209 MBytes 1.75 Gbits/sec 0.021 ms 0/26704 (0%)
[ 5] 1.00-2.00 sec 258 MBytes 2.16 Gbits/sec 0.025 ms 0/32967 (0%)
[ 5] 2.00-3.00 sec 258 MBytes 2.16 Gbits/sec 0.022 ms 0/32987 (0%)
[ 5] 3.00-4.00 sec 257 MBytes 2.16 Gbits/sec 0.023 ms 0/32954 (0%)
[ 5] 4.00-5.00 sec 257 MBytes 2.16 Gbits/sec 0.021 ms 0/32937 (0%)
[ 5] 5.00-6.00 sec 255 MBytes 2.14 Gbits/sec 0.026 ms 0/32685 (0%)
[ 5] 6.00-7.00 sec 254 MBytes 2.13 Gbits/sec 0.025 ms 0/32453 (0%)
[ 5] 7.00-8.00 sec 255 MBytes 2.14 Gbits/sec 0.026 ms 0/32679 (0%)
[ 5] 8.00-9.00 sec 255 MBytes 2.14 Gbits/sec 0.022 ms 0/32669 (0%)
iperf Done.
Signed-off-by: Yi Yang <yangyi01 at inspur.com>
---
lib/netdev-linux.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 54 insertions(+)
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index fe7fb9b..3c45191 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -1103,6 +1103,18 @@ netdev_linux_rxq_construct(struct netdev_rxq *rxq_)
ARRAY_SIZE(filt), (struct sock_filter *) filt
};
+ /* sock_buf_size must be less than 1G, so maximum value is
+ * (1 << 30) - 1, i.e. 1073741823, this doesn't mean this
+ * socket will allocate so big buffer, it just means the
+ * packets client sends won't be dropped because of small
+ * default socket buffer, the result is we can get the best
+ * possible throughtput, no packet loss, this can improve
+ * UDP and TCP performance significantly, especially for
+ * fragmented UDP.
+ */
+ unsigned int sock_buf_size = (1 << 30) - 1;
+ unsigned int sock_opt_len = sizeof(sock_buf_size);
+
/* Create file descriptor. */
rx->fd = socket(PF_PACKET, SOCK_RAW, 0);
if (rx->fd < 0) {
@@ -1161,6 +1173,48 @@ netdev_linux_rxq_construct(struct netdev_rxq *rxq_)
netdev_get_name(netdev_), ovs_strerror(error));
goto error;
}
+
+ /* Set send socket buffer size */
+ error = setsockopt(rx->fd, SOL_SOCKET, SO_SNDBUF, &sock_buf_size, 4);
+ if (error) {
+ error = errno;
+ VLOG_ERR("%s: failed to set send socket buffer size (%s)",
+ netdev_get_name(netdev_), ovs_strerror(error));
+ goto error;
+ }
+
+ /* Set recv socket buffer size */
+ error = setsockopt(rx->fd, SOL_SOCKET, SO_RCVBUF, &sock_buf_size, 4);
+ if (error) {
+ error = errno;
+ VLOG_ERR("%s: failed to set recv socket buffer size (%s)",
+ netdev_get_name(netdev_), ovs_strerror(error));
+ goto error;
+ }
+
+ /* Get final recv socket buffer size, it should be
+ * 2 * ((1 << 30) - 1) (i.e. 2147483646) if successfully.
+ * Don't doubt it is wrong, Linux kernel does so, i.e.
+ * final sk_rcvbuf = val * 2.
+ */
+ error= getsockopt(rx->fd, SOL_SOCKET, SO_RCVBUF, &sock_buf_size,
+ &sock_opt_len);
+ if (!error) {
+ VLOG_INFO("netdev %s socket recv buffer size: %d",
+ netdev_get_name(netdev_), sock_buf_size);
+ }
+
+ /* Get final send socket buffer size, it should be
+ * 2 * ((1 << 30) - 1) (i.e. 2147483646) if successfully.
+ * Don't doubt it is wrong, Linux kernel does so, i.e.
+ * final sk_sndbuf = val * 2.
+ */
+ error = getsockopt(rx->fd, SOL_SOCKET, SO_SNDBUF, &sock_buf_size,
+ &sock_opt_len);
+ if (!error) {
+ VLOG_INFO("netdev %s socket send buffer size: %d",
+ netdev_get_name(netdev_), sock_buf_size);
+ }
}
ovs_mutex_unlock(&netdev->mutex);
--
2.7.4
More information about the dev
mailing list