[ovs-dev] [PATCH v8 12/13] netdev-dpdk: support multi-segment jumbo frames

Tiago Lam tiago.lam at intel.com
Mon Jun 11 16:21:29 UTC 2018


From: Mark Kavanagh <mark.b.kavanagh at intel.com>

Currently, jumbo frame support for OvS-DPDK is implemented by
increasing the size of mbufs within a mempool, such that each mbuf
within the pool is large enough to contain an entire jumbo frame of
a user-defined size. Typically, for each user-defined MTU,
'requested_mtu', a new mempool is created, containing mbufs of size
~requested_mtu.

With the multi-segment approach, a port uses a single mempool,
(containing standard/default-sized mbufs of ~2k bytes), irrespective
of the user-requested MTU value. To accommodate jumbo frames, mbufs
are chained together, where each mbuf in the chain stores a portion of
the jumbo frame. Each mbuf in the chain is termed a segment, hence the
name.

== Enabling multi-segment mbufs ==
Multi-segment and single-segment mbufs are mutually exclusive, and the
user must decide on which approach to adopt on init. The introduction
of a new OVSDB field, 'dpdk-multi-seg-mbufs', facilitates this. This
is a global boolean value, which determines how jumbo frames are
represented across all DPDK ports. In the absence of a user-supplied
value, 'dpdk-multi-seg-mbufs' defaults to false, i.e. multi-segment
mbufs must be explicitly enabled / single-segment mbufs remain the
default.

Setting the field is identical to setting existing DPDK-specific OVSDB
fields:

    ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true
    ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x10
    ovs-vsctl set Open_vSwitch . other_config:dpdk-socket-mem=4096,0
==> ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true

Co-authored-by: Tiago Lam <tiago.lam at intel.com>

Signed-off-by: Mark Kavanagh <mark.b.kavanagh at intel.com>
Signed-off-by: Tiago Lam <tiago.lam at intel.com>
---
 Documentation/topics/dpdk/jumbo-frames.rst | 41 +++++++++++++++++++++++++
 NEWS                                       |  1 +
 lib/dpdk.c                                 |  8 +++++
 lib/netdev-dpdk.c                          | 49 ++++++++++++++++++++++++++++--
 lib/netdev-dpdk.h                          |  2 ++
 vswitchd/vswitch.xml                       | 20 ++++++++++++
 6 files changed, 118 insertions(+), 3 deletions(-)

diff --git a/Documentation/topics/dpdk/jumbo-frames.rst b/Documentation/topics/dpdk/jumbo-frames.rst
index 00360b4..4f98e83 100644
--- a/Documentation/topics/dpdk/jumbo-frames.rst
+++ b/Documentation/topics/dpdk/jumbo-frames.rst
@@ -71,3 +71,44 @@ Jumbo frame support has been validated against 9728B frames, which is the
 largest frame size supported by Fortville NIC using the DPDK i40e driver, but
 larger frames and other DPDK NIC drivers may be supported. These cases are
 common for use cases involving East-West traffic only.
+
+-------------------
+Multi-segment mbufs
+-------------------
+
+Instead of increasing the size of mbufs within a mempool, such that each mbuf
+within the pool is large enough to contain an entire jumbo frame of a
+user-defined size, mbufs can be chained together instead. In this approach each
+mbuf in the chain stores a portion of the jumbo frame, by default ~2K bytes,
+irrespective of the user-requested MTU value. Since each mbuf in the chain is
+termed a segment, this approach is named "multi-segment mbufs".
+
+This approach may bring more flexibility in use cases where the maximum packet
+length may be hard to guess. For example, in cases where packets originate from
+sources marked for oflload (such as TSO), each packet may be larger than the
+MTU, and as such, when forwarding it to a DPDK port a single mbuf may not be
+enough to hold all of the packet's data.
+
+Multi-segment and single-segment mbufs are mutually exclusive, and the user
+must decide on which approach to adopt on initialisation. If multi-segment
+mbufs is to be enabled, it can be done so with the following command::
+
+    $ ovs-vsctl set Open_vSwitch . other_config:dpdk-multi-seg-mbufs=true
+
+Single-segment mbufs still remain the default when using OvS-DPDK, and the
+above option `dpdk-multi-seg-mbufs` must be explicitly set to `true` if
+multi-segment mbufs are to be used.
+
+~~~~~~~~~~~~~~~~~
+Performance notes
+~~~~~~~~~~~~~~~~~
+
+When using multi-segment mbufs some PMDs may not support vectorized Tx
+functions, due to its non-contiguous nature. As a result this can hit
+performance for smaller packet sizes. For example, on a setup sending 64B
+packets at line rate, a decrease of ~20% has been observed. The performance
+impact stops being noticeable for larger packet sizes, although the exact size
+will vary.
+
+Because of this, multi-segment mbufs is not advised to be used with smaller
+packet sizes, such as 64B.
diff --git a/NEWS b/NEWS
index 484c6dc..7588cf0 100644
--- a/NEWS
+++ b/NEWS
@@ -110,6 +110,7 @@ v2.9.0 - 19 Feb 2018
        pmd assignments.
      * Add rxq utilization of pmd to appctl 'dpif-netdev/pmd-rxq-show'.
      * Add support for vHost dequeue zero copy (experimental)
+     * Add support for multi-segment mbufs
    - Userspace datapath:
      * Output packet batching support.
    - vswitchd:
diff --git a/lib/dpdk.c b/lib/dpdk.c
index 09afd8c..7fbaf9f 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -465,6 +465,14 @@ dpdk_init__(const struct smap *ovs_other_config)
 
     /* Finally, register the dpdk classes */
     netdev_dpdk_register();
+
+    bool multi_seg_mbufs_enable = smap_get_bool(ovs_other_config,
+            "dpdk-multi-seg-mbufs", false);
+    if (multi_seg_mbufs_enable) {
+        VLOG_INFO("DPDK multi-segment mbufs enabled\n");
+        netdev_dpdk_multi_segment_mbufs_enable();
+    }
+
     return true;
 }
 
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 0079e28..fde95e3 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -66,6 +66,7 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
 
 VLOG_DEFINE_THIS_MODULE(netdev_dpdk);
 static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
+static bool dpdk_multi_segment_mbufs = false;
 
 #define DPDK_PORT_WATCHDOG_INTERVAL 5
 
@@ -635,6 +636,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, uint16_t mbuf_pkt_data_len)
               + dev->requested_n_txq * dev->requested_txq_size
               + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST
               + MIN_NB_MBUF;
+    /* XXX: should n_mbufs be increased if multi-seg mbufs are used? */
 
     ovs_mutex_lock(&dpdk_mp_mutex);
     do {
@@ -734,7 +736,13 @@ dpdk_mp_release(struct rte_mempool *mp)
 
 /* Tries to allocate a new mempool - or re-use an existing one where
  * appropriate - on requested_socket_id with a size determined by
- * requested_mtu and requested Rx/Tx queues.
+ * requested_mtu and requested Rx/Tx queues. Some properties of the mempool's
+ * elements are dependent on the value of 'dpdk_multi_segment_mbufs':
+ * - if 'true', then the mempool contains standard-sized mbufs that are chained
+ *   together to accommodate packets of size 'requested_mtu'.
+ * - if 'false', then the members of the allocated mempool are
+ *   non-standard-sized mbufs. Each mbuf in the mempool is large enough to
+ *   fully accomdate packets of size 'requested_mtu'.
  * On success - or when re-using an existing mempool - the new configuration
  * will be applied.
  * On error, device will be left unchanged. */
@@ -742,10 +750,18 @@ static int
 netdev_dpdk_mempool_configure(struct netdev_dpdk *dev)
     OVS_REQUIRES(dev->mutex)
 {
-    uint16_t buf_size = dpdk_buf_size(dev->requested_mtu);
+    uint16_t buf_size = 0;
     struct rte_mempool *mp;
     int ret = 0;
 
+    /* Contiguous mbufs in use - permit oversized mbufs */
+    if (!dpdk_multi_segment_mbufs) {
+        buf_size = dpdk_buf_size(dev->requested_mtu);
+    } else {
+        /* multi-segment mbufs - use standard mbuf size */
+        buf_size = dpdk_buf_size(ETHER_MTU);
+    }
+
     dpdk_mp_sweep();
 
     mp = dpdk_mp_create(dev, buf_size);
@@ -827,6 +843,7 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq)
     int diag = 0;
     int i;
     struct rte_eth_conf conf = port_conf;
+    struct rte_eth_txconf txconf;
     struct rte_eth_dev_info info;
 
     /* As of DPDK 17.11.1 a few PMDs require to explicitly enable
@@ -842,6 +859,18 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq)
         }
     }
 
+    /* Multi-segment-mbuf-specific setup. */
+    if (dpdk_multi_segment_mbufs) {
+            /* DPDK PMDs typically attempt to use simple or vectorized
+             * transmit functions, neither of which are compatible with
+             * multi-segment mbufs. Ensure that these are disabled when
+             * multi-segment mbufs are enabled.
+             */
+            rte_eth_dev_info_get(dev->port_id, &info);
+            txconf = info.default_txconf;
+            txconf.txq_flags &= ~ETH_TXQ_FLAGS_NOMULTSEGS;
+    }
+
     conf.intr_conf.lsc = dev->lsc_interrupt_mode;
     conf.rxmode.hw_ip_checksum = (dev->hw_ol_features &
                                   NETDEV_RX_CHECKSUM_OFFLOAD) != 0;
@@ -871,7 +900,9 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq)
 
         for (i = 0; i < n_txq; i++) {
             diag = rte_eth_tx_queue_setup(dev->port_id, i, dev->txq_size,
-                                          dev->socket_id, NULL);
+                                          dev->socket_id,
+                                          dpdk_multi_segment_mbufs ? &txconf
+                                                                   : NULL);
             if (diag) {
                 VLOG_INFO("Interface %s unable to setup txq(%d): %s",
                           dev->up.name, i, rte_strerror(-diag));
@@ -3968,6 +3999,18 @@ unlock:
     return err;
 }
 
+bool
+netdev_dpdk_is_multi_segment_mbufs_enabled(void)
+{
+    return dpdk_multi_segment_mbufs == true;
+}
+
+void
+netdev_dpdk_multi_segment_mbufs_enable(void)
+{
+    dpdk_multi_segment_mbufs = true;
+}
+
 #define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT,    \
                           SET_CONFIG, SET_TX_MULTIQ, SEND,    \
                           GET_CARRIER, GET_STATS,			  \
diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h
index b7d02a7..19aa5c6 100644
--- a/lib/netdev-dpdk.h
+++ b/lib/netdev-dpdk.h
@@ -25,6 +25,8 @@ struct dp_packet;
 
 #ifdef DPDK_NETDEV
 
+bool netdev_dpdk_is_multi_segment_mbufs_enabled(void);
+void netdev_dpdk_multi_segment_mbufs_enable(void);
 void netdev_dpdk_register(void);
 void free_dpdk_buf(struct dp_packet *);
 
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 1e27a02..cbe4650 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -331,6 +331,26 @@
         </p>
       </column>
 
+      <column name="other_config" key="dpdk-multi-seg-mbufs"
+              type='{"type": "boolean"}'>
+        <p>
+          Specifies if DPDK uses multi-segment mbufs for handling jumbo frames.
+        </p>
+        <p>
+            If true, DPDK allocates a single mempool per port, irrespective
+            of the ports' requested MTU sizes. The elements of this mempool are
+            'standard'-sized mbufs (typically 2k MB), which may be chained
+            together to accommodate jumbo frames. In this approach, each mbuf
+            typically stores a fragment of the overall jumbo frame.
+        </p>
+        <p>
+            If not specified, defaults to <code>false</code>, in which case,
+            the size of each mbuf within a DPDK port's mempool will be grown to
+            accommodate jumbo frames within a single mbuf.
+        </p>
+      </column>
+
+
       <column name="other_config" key="vhost-sock-dir"
               type='{"type": "string"}'>
         <p>
-- 
2.7.4



More information about the dev mailing list