[ovs-dev] [PATCH v5 3/7] netdev-dpdk: Enable TSO when using multi-seg mbufs
Michal Obrembski
michalx.obrembski at intel.com
Wed Sep 11 14:10:01 UTC 2019
From: Tiago Lam <tiago.lam at intel.com>
TCP Segmentation Offload (TSO) is a feature which enables the TCP/IP
network stack to delegate segmentation of a TCP segment to the hardware
NIC, thus saving compute resources. This may improve performance
significantly for TCP workload in virtualized environments.
While a previous commit already added the necesary logic to netdev-dpdk
to deal with packets marked for TSO, this set of changes enables TSO by
default when using multi-segment mbufs.
Thus, to enable TSO on the physical DPDK interfaces, only the following
command needs to be issued before starting OvS:
ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true
Co-authored-by: Mark Kavanagh <mark.b.kavanagh at intel.com>
Signed-off-by: Mark Kavanagh <mark.b.kavanagh at intel.com>
Signed-off-by: Tiago Lam <tiago.lam at intel.com>
Signed-off-by: Michal Obrembski <michalx.obrembski at intel.com>
---
Documentation/automake.mk | 1 +
Documentation/topics/dpdk/index.rst | 1 +
Documentation/topics/dpdk/tso.rst | 99 +++++++++++++++++++++++++++++++++++++
NEWS | 1 +
lib/netdev-dpdk.c | 70 ++++++++++++++++++++++++--
5 files changed, 167 insertions(+), 5 deletions(-)
create mode 100644 Documentation/topics/dpdk/tso.rst
diff --git a/Documentation/automake.mk b/Documentation/automake.mk
index 2a3214a..5955dd7 100644
--- a/Documentation/automake.mk
+++ b/Documentation/automake.mk
@@ -40,6 +40,7 @@ DOC_SOURCE = \
Documentation/topics/dpdk/index.rst \
Documentation/topics/dpdk/bridge.rst \
Documentation/topics/dpdk/jumbo-frames.rst \
+ Documentation/topics/dpdk/tso.rst \
Documentation/topics/dpdk/memory.rst \
Documentation/topics/dpdk/pdump.rst \
Documentation/topics/dpdk/phy.rst \
diff --git a/Documentation/topics/dpdk/index.rst b/Documentation/topics/dpdk/index.rst
index cf24a7b..eb2a04d 100644
--- a/Documentation/topics/dpdk/index.rst
+++ b/Documentation/topics/dpdk/index.rst
@@ -40,4 +40,5 @@ The DPDK Datapath
/topics/dpdk/qos
/topics/dpdk/pdump
/topics/dpdk/jumbo-frames
+ /topics/dpdk/tso
/topics/dpdk/memory
diff --git a/Documentation/topics/dpdk/tso.rst b/Documentation/topics/dpdk/tso.rst
new file mode 100644
index 0000000..14f8c39
--- /dev/null
+++ b/Documentation/topics/dpdk/tso.rst
@@ -0,0 +1,99 @@
+..
+ Copyright 2018, Red Hat, Inc.
+
+ Licensed under the Apache License, Version 2.0 (the "License"); you may
+ not use this file except in compliance with the License. You may obtain
+ a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+ License for the specific language governing permissions and limitations
+ under the License.
+
+ Convention for heading levels in Open vSwitch documentation:
+
+ ======= Heading 0 (reserved for the title in a document)
+ ------- Heading 1
+ ~~~~~~~ Heading 2
+ +++++++ Heading 3
+ ''''''' Heading 4
+
+ Avoid deeper levels because they do not render well.
+
+===
+TSO
+===
+
+**Note:** This feature is considered experimental.
+
+TCP Segmentation Offload (TSO) is a mechanism which allows a TCP/IP stack to
+offload the TCP segmentation into hardware, thus saving the cycles that would
+be required to perform this same segmentation in software.
+
+TCP Segmentation Offload (TSO) enables a network stack to delegate segmentation
+of an oversized TCP segment to the underlying physical NIC. Offload of frame
+segmentation achieves computational savings in the core, freeing up CPU cycles
+for more useful work.
+
+A common use case for TSO is when using virtualization, where traffic that's
+coming in from a VM can offload the TCP segmentation, thus avoiding the
+fragmentation in software. Additionally, if the traffic is headed to a VM
+within the same host further optimization can be expected. As the traffic never
+leaves the machine, no MTU needs to be accounted for, and thus no segmentation
+and checksum calculations are required, which saves yet more cycles. Only when
+the traffic actually leaves the host the segmentation needs to happen, in which
+case it will be performed by the egress NIC.
+
+When using TSO with DPDK, the implementation relies on the multi-segment mbufs
+feature, described in :doc:`/topics/dpdk/jumbo-frames`, where each mbuf
+contains ~2KiB of the entire packet's data and is linked to the next mbuf that
+contains the next portion of data.
+
+Enabling TSO
+~~~~~~~~~~~~
+.. Important::
+
+ Once multi-segment mbufs is enabled, TSO will be enabled by default, if
+ there's support for it in the underlying physical NICs attached to
+ OvS-DPDK.
+
+When using :doc:`vHost User ports <vhost-user>`, TSO may be enabled in one of
+two ways, as follows.
+
+`TSO` is enabled in OvS by the DPDK vHost User backend; when a new guest
+connection is established, `TSO` is thus advertised to the guest as an
+available feature:
+
+1. QEMU Command Line Parameter::
+
+ $ sudo $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64 \
+ ...
+ -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,\
+ csum=on,guest_csum=on,guest_tso4=on,guest_tso6=on\
+ ...
+
+2. Ethtool. Assuming that the guest's OS also supports `TSO`, ethtool can be used to enable same::
+
+ $ ethtool -K eth0 sg on # scatter-gather is a prerequisite for TSO
+ $ ethtool -K eth0 tso on
+ $ ethtool -k eth0
+
+To enable TSO in a guest, the underlying NIC must first support `TSO` - consult
+your controller's datasheet for compatibility. Secondly, the NIC must have an
+associated DPDK Poll Mode Driver (PMD) which supports `TSO`.
+
+~~~~~~~~~~~
+Limitations
+~~~~~~~~~~~
+The current OvS `TSO` implementation supports flat and VLAN networks only (i.e.
+no support for `TSO` over tunneled connection [VxLAN, GRE, IPinIP, etc.]).
+
+Also, as TSO is built on top of multi-segments mbufs, the constraints pointed
+out in :doc:`/topics/dpdk/jumbo-frames` also apply for TSO. Thus, some
+performance hits might be noticed when running specific functionality, like
+the Userspace Connection tracker. And as mentioned in the same section, it is
+paramount that a packet's headers is contained within the first mbuf (~2KiB in
+size).
diff --git a/NEWS b/NEWS
index 1278ada..e219822 100644
--- a/NEWS
+++ b/NEWS
@@ -44,6 +44,7 @@ v2.12.0 - xx xxx xxxx
specific subtables based on the miniflow attributes, enhancing the
performance of the subtable search.
* Add Linux AF_XDP support through a new experimental netdev type "afxdp".
+ * Add support for TSO (experimental, between DPDK interfaces only).
- OVSDB:
* OVSDB clients can now resynchronize with clustered servers much more
quickly after a brief disconnection, saving bandwidth and CPU time.
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 2304f28..7552caa 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -345,7 +345,8 @@ struct ingress_policer {
enum dpdk_hw_ol_features {
NETDEV_RX_CHECKSUM_OFFLOAD = 1 << 0,
NETDEV_RX_HW_CRC_STRIP = 1 << 1,
- NETDEV_RX_HW_SCATTER = 1 << 2
+ NETDEV_RX_HW_SCATTER = 1 << 2,
+ NETDEV_TX_TSO_OFFLOAD = 1 << 3,
};
/*
@@ -996,8 +997,18 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq)
return -ENOTSUP;
}
+ if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) {
+ conf.txmode.offloads |= DEV_TX_OFFLOAD_TCP_TSO;
+ conf.txmode.offloads |= DEV_TX_OFFLOAD_TCP_CKSUM;
+ conf.txmode.offloads |= DEV_TX_OFFLOAD_IPV4_CKSUM;
+ }
+
txconf = info.default_txconf;
txconf.offloads = conf.txmode.offloads;
+ } else if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) {
+ dev->hw_ol_features &= ~NETDEV_TX_TSO_OFFLOAD;
+ VLOG_WARN("Failed to set Tx TSO offload in %s. Requires option "
+ "`dpdk-multi-seg-mbufs` to be enabled.", dev->up.name);
}
conf.intr_conf.lsc = dev->lsc_interrupt_mode;
@@ -1114,6 +1125,9 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
uint32_t rx_chksm_offload_capa = DEV_RX_OFFLOAD_UDP_CKSUM |
DEV_RX_OFFLOAD_TCP_CKSUM |
DEV_RX_OFFLOAD_IPV4_CKSUM;
+ uint32_t tx_tso_offload_capa = DEV_TX_OFFLOAD_TCP_TSO |
+ DEV_TX_OFFLOAD_TCP_CKSUM |
+ DEV_TX_OFFLOAD_IPV4_CKSUM;
rte_eth_dev_info_get(dev->port_id, &info);
@@ -1140,6 +1154,18 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
dev->hw_ol_features &= ~NETDEV_RX_HW_SCATTER;
}
+ if (dpdk_multi_segment_mbufs) {
+ if (info.tx_offload_capa & tx_tso_offload_capa) {
+ dev->hw_ol_features |= NETDEV_TX_TSO_OFFLOAD;
+ } else {
+ dev->hw_ol_features &= ~NETDEV_TX_TSO_OFFLOAD;
+ VLOG_WARN("Tx TSO offload is not supported on port "
+ DPDK_PORT_ID_FMT, dev->port_id);
+ }
+ } else {
+ dev->hw_ol_features &= ~NETDEV_TX_TSO_OFFLOAD;
+ }
+
n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq);
n_txq = MIN(info.max_tx_queues, dev->up.n_txq);
@@ -1727,6 +1753,11 @@ netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args)
} else {
smap_add(args, "rx_csum_offload", "false");
}
+ if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) {
+ smap_add(args, "tx_tso_offload", "true");
+ } else {
+ smap_add(args, "tx_tso_offload", "false");
+ }
smap_add(args, "lsc_interrupt_mode",
dev->lsc_interrupt_mode ? "true" : "false");
}
@@ -2445,9 +2476,21 @@ netdev_dpdk_qos_run(struct netdev_dpdk *dev, struct rte_mbuf **pkts,
return cnt;
}
+/* Filters a DPDK packet by the following criteria:
+ * - A packet is marked for TSO but the egress dev doesn't
+ * support TSO;
+ * - A packet pkt_len is bigger than the pre-defined
+ * max_packet_len, and the packet isn't marked for TSO.
+ *
+ * If any of the above case applies, the packet is then freed
+ * from 'pkts'. Otherwise the packet is kept in 'pkts'
+ * untouched.
+ *
+ * Returns the number of unfiltered packets left in 'pkts'.
+ */
static int
-netdev_dpdk_filter_packet_len(struct netdev_dpdk *dev, struct rte_mbuf **pkts,
- int pkt_cnt)
+netdev_dpdk_filter_packet(struct netdev_dpdk *dev, struct rte_mbuf **pkts,
+ int pkt_cnt)
{
int i = 0;
int cnt = 0;
@@ -2457,6 +2500,15 @@ netdev_dpdk_filter_packet_len(struct netdev_dpdk *dev, struct rte_mbuf **pkts,
for (i = 0; i < pkt_cnt; i++) {
pkt = pkts[i];
+ /* Drop TSO packet if there's no TSO support on egress port. */
+ if ((pkt->ol_flags & PKT_TX_TCP_SEG) &&
+ !(dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD)) {
+ VLOG_WARN_RL(&rl, "%s: TSO is disabled on port, TSO packet dropped"
+ "%" PRIu32 " ", dev->up.name, pkt->pkt_len);
+ rte_pktmbuf_free(pkt);
+ continue;
+ }
+
if (OVS_UNLIKELY(pkt->pkt_len > dev->max_packet_len)) {
if (!(pkt->ol_flags & PKT_TX_TCP_SEG)) {
VLOG_WARN_RL(&rl, "%s: Too big size %" PRIu32 " "
@@ -2528,7 +2580,7 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
rte_spinlock_lock(&dev->tx_q[qid].tx_lock);
- cnt = netdev_dpdk_filter_packet_len(dev, cur_pkts, cnt);
+ cnt = netdev_dpdk_filter_packet(dev, cur_pkts, cnt);
/* Check has QoS has been configured for the netdev */
cnt = netdev_dpdk_qos_run(dev, cur_pkts, cnt, true);
dropped = total_pkts - cnt;
@@ -2747,7 +2799,7 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, int qid,
int batch_cnt = dp_packet_batch_size(batch);
struct rte_mbuf **pkts = (struct rte_mbuf **) batch->packets;
- tx_cnt = netdev_dpdk_filter_packet_len(dev, pkts, batch_cnt);
+ tx_cnt = netdev_dpdk_filter_packet(dev, pkts, batch_cnt);
tx_cnt = netdev_dpdk_qos_run(dev, pkts, tx_cnt, true);
dropped = batch_cnt - tx_cnt;
@@ -4445,6 +4497,14 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev)
dev->tx_q[0].map = 0;
}
+ if (dpdk_multi_segment_mbufs) {
+ dev->hw_ol_features |= NETDEV_TX_TSO_OFFLOAD;
+
+ VLOG_DBG("%s: TSO enabled on vhost port", dev->up.name);
+ } else {
+ dev->hw_ol_features &= ~NETDEV_TX_TSO_OFFLOAD;
+ }
+
netdev_dpdk_remap_txqs(dev);
err = netdev_dpdk_mempool_configure(dev);
--
2.7.4
More information about the dev
mailing list