[ovs-discuss] OVS jumbo frame support for DPDK port types

Kavanagh, Mark B mark.b.kavanagh at intel.com
Thu May 12 09:40:07 UTC 2016


>Hi,
>
>I am working on a lab environment with OVS and DPDK attempting to get Jumbo frames and mbuf
>initialisation working, for DPDK port types only, with the current OVS master (testing with
>commit hash of 5d24608388bcf5610018cb51369adc2e6f3816e1) and the DPDK 16.04 release.
>
>I have come across the development of supporting Jumbo frames with OVS with these patches:
>
>[ovs-dev,V5,1/2] netdev-dpdk: clean up mbuf initialization -
>https://patchwork.ozlabs.org/patch/585153/
>[ovs-dev,V5,2/2] netdev-dpdk: add jumbo frame support -
>https://patchwork.ozlabs.org/patch/585154/
>
>Part 1 of the patch set is now part of the mainline code. As I understand, patch 2 will not
>be upstreamed as it stands and is waiting upon further patchsets to allow runtime
>modification of netdev properties, such as MTU, as discussed here
>http://openvswitch.org/pipermail/dev/2016-February/066940.html
>
>I would be interested in testing jumbo frame support with the current OVS master and DPDK
>versions mentioned, however patch 2/2 fails to apply successfully with the latest OVS code.
>Is there a later version of the jumbo patch that can be shared with the current OVS code?
>
>Thanks, Jim

Hi Jim,

I posted the rebased patch as an RFC to ovs-dev - http://openvswitch.org/pipermail/dev/2016-May/070892.html.

Please note, though, that I've just resolved compilation issues, and haven't tested jumbo frame functionality itself. 

Hope this helps.

Best regards,
Mark

_____________________

diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
index 7f76df8..9b83c78 100644
--- a/INSTALL.DPDK.md
+++ b/INSTALL.DPDK.md
@@ -913,10 +913,63 @@ by adding the following string:
 to <interface> sections of all network devices used by DPDK. Parameter 'N'
 determines how many queues can be used by the guest.
 
+Jumbo Frames
+------------
+
+Support for Jumbo Frames may be enabled at run-time for DPDK-type ports.
+
+To avail of Jumbo Frame support, add the 'mtu_request' option to the ovs-vsctl
+'add-port' command-line, along with the required MTU for the port.
+e.g.
+
+     ```
+     ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk options:mtu_request=9000
+     ```
+
+When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are
+increased, such that a full Jumbo Frame may be accommodated inside a single
+mbuf segment. Once set, the MTU for a DPDK port is immutable.
+
+Note that from an OVSDB perspective, the `mtu_request` option for a specific
+port may be disregarded once initially set, as subsequent modifications to this
+field are disregarded by the DPDK port. As with non-DPDK ports, the MTU of DPDK
+ports is reported by the `Interface` table's 'mtu' field.
+
+Jumbo frame support has been validated against 13312B frames, using the
+DPDK `igb_uio` driver, but larger frames and other DPDK NIC drivers may
+theoretically be supported. Supported port types excludes vHost-Cuse ports, as
+that feature is pending deprecation.
+
+vHost Ports and Jumbo Frames
+----------------------------
+Jumbo frame support is available for DPDK vHost-User ports only. Some additional
+configuration is needed to take advantage of this feature:
+
+  1. `mergeable buffers` must be enabled for vHost ports, as demonstrated in
+      the QEMU command line snippet below:
+
+      ```
+      '-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \'
+      '-device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on'
+      ```
+
+  2. Where virtio devices are bound to the Linux kernel driver in a guest
+     environment (i.e. interfaces are not bound to an in-guest DPDK driver), the
+     MTU of those logical network interfaces must also be increased. This
+     avoids segmentation of Jumbo Frames in the guest. Note that 'MTU' refers
+     to the length of the IP packet only, and not that of the entire frame.
+
+     e.g. To calculate the exact MTU of a standard IPv4 frame, subtract the L2
+     header and CRC lengths (i.e. 18B) from the max supported frame size.
+     So, to set the MTU for a 13312B Jumbo Frame:
+
+      ```
+      ifconfig eth1 mtu 13294
+      ```
+
 Restrictions:
 -------------
 
-  - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue.
   - Currently DPDK port does not make use any offload functionality.
   - DPDK-vHost support works with 1G huge pages.
 
@@ -945,6 +998,11 @@ Restrictions:
     increased to the desired number of queues. Both DPDK and OVS must be
     recompiled for this change to take effect.
 
+  Jumbo Frames:
+  - `virtio-pmd`: DPDK apps in the guest do not exit gracefully. This is a DPDK
+     issue that is currently being investigated.
+  - vHost-Cuse: Jumbo Frame support is not available for vHost Cuse ports.
+
 Bug Reporting:
 --------------
 
diff --git a/NEWS b/NEWS
index ea7f3a1..4bc0371 100644
--- a/NEWS
+++ b/NEWS
@@ -26,6 +26,7 @@ Post-v2.5.0
        assignment.
      * Type of log messages from PMD threads changed from INFO to DBG.
      * QoS functionality with sample egress-policer implementation.
+     * Support Jumbo Frames
    - ovs-benchmark: This utility has been removed due to lack of use and
      bitrot.
    - ovs-appctl:
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 208c5f5..98e8c3a 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -79,6 +79,8 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
                                     + sizeof(struct dp_packet)    \
                                     + RTE_PKTMBUF_HEADROOM)
 #define NETDEV_DPDK_MBUF_ALIGN      1024
+#define NETDEV_DPDK_MAX_FRAME_LEN   13312
+#define MTU_NOT_SET                 0
 
 /* Max and min number of packets in the mempool.  OVS tries to allocate a
  * mempool with MAX_NB_MBUF: if this fails (because the system doesn't have
@@ -531,6 +533,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
 {
     int diag = 0;
     int i;
+    struct rte_eth_conf conf = port_conf;
 
     /* A device may report more queues than it makes available (this has
      * been observed for Intel xl710, which reserves some of them for
@@ -542,7 +545,15 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
             VLOG_INFO("Retrying setup with (rxq:%d txq:%d)", n_rxq, n_txq);
         }
 
-        diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, &port_conf);
+        if (dev->mtu > ETHER_MTU) {
+            conf.rxmode.jumbo_frame = 1;
+            conf.rxmode.max_rx_pkt_len = dev->max_packet_len;
+        } else {
+            conf.rxmode.jumbo_frame = 0;
+            conf.rxmode.max_rx_pkt_len = 0;
+        }
+
+        diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, &conf);
         if (diag) {
             break;
         }
@@ -686,8 +697,6 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int port_no,
 {
     struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
     int sid;
-    int err = 0;
-    uint32_t buf_size;
 
     ovs_mutex_init(&dev->mutex);
     ovs_mutex_lock(&dev->mutex);
@@ -707,15 +716,7 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int port_no,
     dev->port_id = port_no;
     dev->type = type;
     dev->flags = 0;
-    dev->mtu = ETHER_MTU;
-    dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
-
-    buf_size = dpdk_buf_size(dev->mtu);
-    dev->dpdk_mp = dpdk_mp_get(dev->socket_id, FRAME_LEN_TO_MTU(buf_size));
-    if (!dev->dpdk_mp) {
-        err = ENOMEM;
-        goto unlock;
-    }
+    dev->mtu = MTU_NOT_SET;
 
     /* Initialise QoS configuration to NULL and qos lock to unlocked */
     dev->qos_conf = NULL;
@@ -728,22 +729,14 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int port_no,
 
     if (type == DPDK_DEV_ETH) {
         netdev_dpdk_alloc_txq(dev, NR_QUEUE);
-        err = dpdk_eth_dev_init(dev);
-        if (err) {
-            goto unlock;
-        }
     } else {
         netdev_dpdk_alloc_txq(dev, OVS_VHOST_MAX_QUEUE_NUM);
     }
 
     ovs_list_push_back(&dpdk_list, &dev->list_node);
 
-unlock:
-    if (err) {
-        rte_free(dev->tx_q);
-    }
     ovs_mutex_unlock(&dev->mutex);
-    return err;
+    return 0;
 }
 
 /* dev_name must be the prefix followed by a positive decimal number.
@@ -767,6 +760,31 @@ dpdk_dev_parse_name(const char dev_name[], const char prefix[],
     }
 }
 
+static void
+dpdk_dev_parse_mtu(const struct smap *args, int *mtu)
+{
+    const char *mtu_str = smap_get(args, "mtu_request");
+    char *end_ptr = NULL;
+    int local_mtu;
+
+    if (!mtu_str) {
+        local_mtu = ETHER_MTU;
+    } else {
+        local_mtu = strtoul(mtu_str, &end_ptr, 0);
+        if (local_mtu < ETHER_MTU ||
+            local_mtu > FRAME_LEN_TO_MTU(NETDEV_DPDK_MAX_FRAME_LEN) ||
+            *end_ptr != '\0') {
+            local_mtu = ETHER_MTU;
+            VLOG_WARN("Invalid mtu_request parameter - defaulting to %d.\n",
+                    local_mtu);
+        } else {
+            VLOG_INFO("mtu_request parameter %d detected.\n", local_mtu);
+        }
+    }
+
+    *mtu = local_mtu;
+}
+
 static int
 vhost_construct_helper(struct netdev *netdev) OVS_REQUIRES(dpdk_mutex)
 {
@@ -913,15 +931,72 @@ netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args)
     smap_add_format(args, "configured_rx_queues", "%d", netdev->n_rxq);
     smap_add_format(args, "requested_tx_queues", "%d", netdev->n_txq);
     smap_add_format(args, "configured_tx_queues", "%d", dev->real_n_txq);
+    smap_add_format(args, "mtu", "%d", dev->mtu);
     ovs_mutex_unlock(&dev->mutex);
 
     return 0;
 }
 
+/* Set the mtu of DPDK_DEV_ETH ports */
+static int
+netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu)
+{
+    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+    int err, dpdk_mtu;
+    uint32_t buf_size;
+    struct dpdk_mp *mp;
+
+    ovs_mutex_lock(&dpdk_mutex);
+    ovs_mutex_lock(&dev->mutex);
+    if (dev->mtu == mtu) {
+        err = 0;
+        goto out;
+    }
+
+    buf_size = dpdk_buf_size(mtu);
+    dpdk_mtu = FRAME_LEN_TO_MTU(buf_size);
+
+    mp = dpdk_mp_get(dev->socket_id, dpdk_mtu);
+    if (!mp) {
+        err = ENOMEM;
+        goto out;
+    }
+
+    rte_eth_dev_stop(dev->port_id);
+
+    dev->dpdk_mp = mp;
+    dev->mtu = mtu;
+    dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
+
+    err = dpdk_eth_dev_init(dev);
+    if (err) {
+        VLOG_WARN("Unable to set MTU '%d' for '%s'; falling back to default "
+                  "MTU '%d'\n", mtu, dev->up.name, ETHER_MTU);
+        dpdk_mp_put(mp);
+        dev->mtu = ETHER_MTU;
+        mp = dpdk_mp_get(dev->socket_id, dev->mtu);
+        if (!mp) {
+            err = ENOMEM;
+            goto out;
+        }
+        dev->dpdk_mp = mp;
+        dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
+        dpdk_eth_dev_init(dev);
+        goto out;
+    } else {
+        netdev_change_seq_changed(netdev);
+    }
+out:
+    ovs_mutex_unlock(&dev->mutex);
+    ovs_mutex_unlock(&dpdk_mutex);
+    return err;
+}
+
 static int
 netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args)
 {
     struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+    int mtu;
 
     ovs_mutex_lock(&dev->mutex);
     netdev->requested_n_rxq = MAX(smap_get_int(args, "n_rxq",
@@ -929,6 +1004,14 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args)
     netdev_change_seq_changed(netdev);
     ovs_mutex_unlock(&dev->mutex);
 
+    dpdk_dev_parse_mtu(args, &mtu);
+
+    if (!dev->mtu) {
+        return netdev_dpdk_set_mtu(netdev, mtu);
+    } else if (mtu != dev->mtu) {
+        VLOG_WARN("Unable to set MTU %d for port %d; this port has immutable MTU "
+                  "%d\n", mtu, dev->port_id, dev->mtu);
+    }
     return 0;
 }
 
@@ -1580,57 +1663,6 @@ netdev_dpdk_get_mtu(const struct netdev *netdev, int *mtup)
 }
 
 static int
-netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu)
-{
-    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
-    int old_mtu, err, dpdk_mtu;
-    struct dpdk_mp *old_mp;
-    struct dpdk_mp *mp;
-    uint32_t buf_size;
-
-    ovs_mutex_lock(&dpdk_mutex);
-    ovs_mutex_lock(&dev->mutex);
-    if (dev->mtu == mtu) {
-        err = 0;
-        goto out;
-    }
-
-    buf_size = dpdk_buf_size(mtu);
-    dpdk_mtu = FRAME_LEN_TO_MTU(buf_size);
-
-    mp = dpdk_mp_get(dev->socket_id, dpdk_mtu);
-    if (!mp) {
-        err = ENOMEM;
-        goto out;
-    }
-
-    rte_eth_dev_stop(dev->port_id);
-
-    old_mtu = dev->mtu;
-    old_mp = dev->dpdk_mp;
-    dev->dpdk_mp = mp;
-    dev->mtu = mtu;
-    dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
-
-    err = dpdk_eth_dev_init(dev);
-    if (err) {
-        dpdk_mp_put(mp);
-        dev->mtu = old_mtu;
-        dev->dpdk_mp = old_mp;
-        dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
-        dpdk_eth_dev_init(dev);
-        goto out;
-    }
-
-    dpdk_mp_put(old_mp);
-    netdev_change_seq_changed(netdev);
-out:
-    ovs_mutex_unlock(&dev->mutex);
-    ovs_mutex_unlock(&dpdk_mutex);
-    return err;
-}
-
-static int
 netdev_dpdk_get_carrier(const struct netdev *netdev, bool *carrier);
 
 static int
@@ -2276,6 +2307,61 @@ dpdk_vhost_user_class_init(void)
     return 0;
 }
 
+/* Set the mtu of DPDK_DEV_VHOST ports */
+static int
+netdev_dpdk_vhost_set_mtu(const struct netdev *netdev, int mtu)
+{
+    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+    int err = 0;
+    struct dpdk_mp *mp;
+
+    ovs_mutex_lock(&dpdk_mutex);
+    ovs_mutex_lock(&dev->mutex);
+    if (dev->mtu == mtu) {
+        err = 0;
+        goto out;
+    }
+
+    mp = dpdk_mp_get(dev->socket_id, mtu);
+    if (!mp) {
+        err = ENOMEM;
+        goto out;
+    }
+
+    dev->dpdk_mp = mp;
+    dev->mtu = mtu;
+    dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
+
+    netdev_change_seq_changed(netdev);
+out:
+    ovs_mutex_unlock(&dev->mutex);
+    ovs_mutex_unlock(&dpdk_mutex);
+    return err;
+}
+
+static int
+netdev_dpdk_vhost_set_config(struct netdev *netdev, const struct smap *args)
+{
+    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+    int mtu;
+
+    ovs_mutex_lock(&dev->mutex);
+    netdev->requested_n_rxq = MAX(smap_get_int(args, "n_rxq",
+                                               netdev->requested_n_rxq), 1);
+    netdev_change_seq_changed(netdev);
+    ovs_mutex_unlock(&dev->mutex);
+
+    dpdk_dev_parse_mtu(args, &mtu);
+
+    if (!dev->mtu) {
+        return netdev_dpdk_vhost_set_mtu(netdev, mtu);
+    } else if (mtu != dev->mtu) {
+        VLOG_WARN("Unable to set MTU %d for vhost port; this port has immutable MTU "
+                  "%d\n", mtu, dev->mtu);
+    }
+    return 0;
+}
+
 static void
 dpdk_common_init(void)
 {
@@ -2661,8 +2747,9 @@ static const struct dpdk_qos_ops egress_policer_ops = {
     egress_policer_run
 };
 
-#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, MULTIQ, SEND, \
-    GET_CARRIER, GET_STATS, GET_FEATURES, GET_STATUS, RXQ_RECV)          \
+#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, SET_CONFIG, \
+        MULTIQ, SEND, SET_MTU, GET_CARRIER, GET_STATS, GET_FEATURES,   \
+        GET_STATUS, RXQ_RECV)                                          \
 {                                                             \
     NAME,                                                     \
     true,                       /* is_pmd */                  \
@@ -2675,7 +2762,7 @@ static const struct dpdk_qos_ops egress_policer_ops = {
     DESTRUCT,                                                 \
     netdev_dpdk_dealloc,                                      \
     netdev_dpdk_get_config,                                   \
-    netdev_dpdk_set_config,                                   \
+    SET_CONFIG            ,                                   \
     NULL,                       /* get_tunnel_config */       \
     NULL,                       /* build header */            \
     NULL,                       /* push header */             \
@@ -2689,7 +2776,7 @@ static const struct dpdk_qos_ops egress_policer_ops = {
     netdev_dpdk_set_etheraddr,                                \
     netdev_dpdk_get_etheraddr,                                \
     netdev_dpdk_get_mtu,                                      \
-    netdev_dpdk_set_mtu,                                      \
+    SET_MTU,                                                  \
     netdev_dpdk_get_ifindex,                                  \
     GET_CARRIER,                                              \
     netdev_dpdk_get_carrier_resets,                           \
@@ -2834,8 +2921,10 @@ static const struct netdev_class dpdk_class =
         NULL,
         netdev_dpdk_construct,
         netdev_dpdk_destruct,
+        netdev_dpdk_set_config,
         netdev_dpdk_set_multiq,
         netdev_dpdk_eth_send,
+        netdev_dpdk_set_mtu,
         netdev_dpdk_get_carrier,
         netdev_dpdk_get_stats,
         netdev_dpdk_get_features,
@@ -2848,8 +2937,10 @@ static const struct netdev_class dpdk_ring_class =
         NULL,
         netdev_dpdk_ring_construct,
         netdev_dpdk_destruct,
+        netdev_dpdk_set_config,
         netdev_dpdk_set_multiq,
         netdev_dpdk_ring_send,
+        netdev_dpdk_set_mtu,
         netdev_dpdk_get_carrier,
         netdev_dpdk_get_stats,
         netdev_dpdk_get_features,
@@ -2862,8 +2953,10 @@ static const struct netdev_class OVS_UNUSED dpdk_vhost_cuse_class =
         dpdk_vhost_cuse_class_init,
         netdev_dpdk_vhost_cuse_construct,
         netdev_dpdk_vhost_destruct,
+        netdev_dpdk_set_config,
         netdev_dpdk_vhost_cuse_set_multiq,
         netdev_dpdk_vhost_send,
+        NULL,
         netdev_dpdk_vhost_get_carrier,
         netdev_dpdk_vhost_get_stats,
         NULL,
@@ -2876,8 +2969,10 @@ static const struct netdev_class OVS_UNUSED dpdk_vhost_user_class =
         dpdk_vhost_user_class_init,
         netdev_dpdk_vhost_user_construct,
         netdev_dpdk_vhost_destruct,
+        netdev_dpdk_vhost_set_config,
         netdev_dpdk_vhost_set_multiq,
         netdev_dpdk_vhost_send,
+        netdev_dpdk_vhost_set_mtu,
         netdev_dpdk_vhost_get_carrier,
         netdev_dpdk_vhost_get_stats,
         NULL,




>_______________________________________________
>discuss mailing list
>discuss at openvswitch.org
>http://openvswitch.org/mailman/listinfo/discuss


More information about the discuss mailing list