[ovs-discuss] OVS jumbo frame support for DPDK port types
Kavanagh, Mark B
mark.b.kavanagh at intel.com
Thu May 12 09:40:07 UTC 2016
>Hi,
>
>I am working on a lab environment with OVS and DPDK attempting to get Jumbo frames and mbuf
>initialisation working, for DPDK port types only, with the current OVS master (testing with
>commit hash of 5d24608388bcf5610018cb51369adc2e6f3816e1) and the DPDK 16.04 release.
>
>I have come across the development of supporting Jumbo frames with OVS with these patches:
>
>[ovs-dev,V5,1/2] netdev-dpdk: clean up mbuf initialization -
>https://patchwork.ozlabs.org/patch/585153/
>[ovs-dev,V5,2/2] netdev-dpdk: add jumbo frame support -
>https://patchwork.ozlabs.org/patch/585154/
>
>Part 1 of the patch set is now part of the mainline code. As I understand, patch 2 will not
>be upstreamed as it stands and is waiting upon further patchsets to allow runtime
>modification of netdev properties, such as MTU, as discussed here
>http://openvswitch.org/pipermail/dev/2016-February/066940.html
>
>I would be interested in testing jumbo frame support with the current OVS master and DPDK
>versions mentioned, however patch 2/2 fails to apply successfully with the latest OVS code.
>Is there a later version of the jumbo patch that can be shared with the current OVS code?
>
>Thanks, Jim
Hi Jim,
I posted the rebased patch as an RFC to ovs-dev - http://openvswitch.org/pipermail/dev/2016-May/070892.html.
Please note, though, that I've just resolved compilation issues, and haven't tested jumbo frame functionality itself.
Hope this helps.
Best regards,
Mark
_____________________
diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
index 7f76df8..9b83c78 100644
--- a/INSTALL.DPDK.md
+++ b/INSTALL.DPDK.md
@@ -913,10 +913,63 @@ by adding the following string:
to <interface> sections of all network devices used by DPDK. Parameter 'N'
determines how many queues can be used by the guest.
+Jumbo Frames
+------------
+
+Support for Jumbo Frames may be enabled at run-time for DPDK-type ports.
+
+To avail of Jumbo Frame support, add the 'mtu_request' option to the ovs-vsctl
+'add-port' command-line, along with the required MTU for the port.
+e.g.
+
+ ```
+ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk options:mtu_request=9000
+ ```
+
+When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are
+increased, such that a full Jumbo Frame may be accommodated inside a single
+mbuf segment. Once set, the MTU for a DPDK port is immutable.
+
+Note that from an OVSDB perspective, the `mtu_request` option for a specific
+port may be disregarded once initially set, as subsequent modifications to this
+field are disregarded by the DPDK port. As with non-DPDK ports, the MTU of DPDK
+ports is reported by the `Interface` table's 'mtu' field.
+
+Jumbo frame support has been validated against 13312B frames, using the
+DPDK `igb_uio` driver, but larger frames and other DPDK NIC drivers may
+theoretically be supported. Supported port types excludes vHost-Cuse ports, as
+that feature is pending deprecation.
+
+vHost Ports and Jumbo Frames
+----------------------------
+Jumbo frame support is available for DPDK vHost-User ports only. Some additional
+configuration is needed to take advantage of this feature:
+
+ 1. `mergeable buffers` must be enabled for vHost ports, as demonstrated in
+ the QEMU command line snippet below:
+
+ ```
+ '-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \'
+ '-device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on'
+ ```
+
+ 2. Where virtio devices are bound to the Linux kernel driver in a guest
+ environment (i.e. interfaces are not bound to an in-guest DPDK driver), the
+ MTU of those logical network interfaces must also be increased. This
+ avoids segmentation of Jumbo Frames in the guest. Note that 'MTU' refers
+ to the length of the IP packet only, and not that of the entire frame.
+
+ e.g. To calculate the exact MTU of a standard IPv4 frame, subtract the L2
+ header and CRC lengths (i.e. 18B) from the max supported frame size.
+ So, to set the MTU for a 13312B Jumbo Frame:
+
+ ```
+ ifconfig eth1 mtu 13294
+ ```
+
Restrictions:
-------------
- - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue.
- Currently DPDK port does not make use any offload functionality.
- DPDK-vHost support works with 1G huge pages.
@@ -945,6 +998,11 @@ Restrictions:
increased to the desired number of queues. Both DPDK and OVS must be
recompiled for this change to take effect.
+ Jumbo Frames:
+ - `virtio-pmd`: DPDK apps in the guest do not exit gracefully. This is a DPDK
+ issue that is currently being investigated.
+ - vHost-Cuse: Jumbo Frame support is not available for vHost Cuse ports.
+
Bug Reporting:
--------------
diff --git a/NEWS b/NEWS
index ea7f3a1..4bc0371 100644
--- a/NEWS
+++ b/NEWS
@@ -26,6 +26,7 @@ Post-v2.5.0
assignment.
* Type of log messages from PMD threads changed from INFO to DBG.
* QoS functionality with sample egress-policer implementation.
+ * Support Jumbo Frames
- ovs-benchmark: This utility has been removed due to lack of use and
bitrot.
- ovs-appctl:
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 208c5f5..98e8c3a 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -79,6 +79,8 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
+ sizeof(struct dp_packet) \
+ RTE_PKTMBUF_HEADROOM)
#define NETDEV_DPDK_MBUF_ALIGN 1024
+#define NETDEV_DPDK_MAX_FRAME_LEN 13312
+#define MTU_NOT_SET 0
/* Max and min number of packets in the mempool. OVS tries to allocate a
* mempool with MAX_NB_MBUF: if this fails (because the system doesn't have
@@ -531,6 +533,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
{
int diag = 0;
int i;
+ struct rte_eth_conf conf = port_conf;
/* A device may report more queues than it makes available (this has
* been observed for Intel xl710, which reserves some of them for
@@ -542,7 +545,15 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
VLOG_INFO("Retrying setup with (rxq:%d txq:%d)", n_rxq, n_txq);
}
- diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, &port_conf);
+ if (dev->mtu > ETHER_MTU) {
+ conf.rxmode.jumbo_frame = 1;
+ conf.rxmode.max_rx_pkt_len = dev->max_packet_len;
+ } else {
+ conf.rxmode.jumbo_frame = 0;
+ conf.rxmode.max_rx_pkt_len = 0;
+ }
+
+ diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, &conf);
if (diag) {
break;
}
@@ -686,8 +697,6 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int port_no,
{
struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
int sid;
- int err = 0;
- uint32_t buf_size;
ovs_mutex_init(&dev->mutex);
ovs_mutex_lock(&dev->mutex);
@@ -707,15 +716,7 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int port_no,
dev->port_id = port_no;
dev->type = type;
dev->flags = 0;
- dev->mtu = ETHER_MTU;
- dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
-
- buf_size = dpdk_buf_size(dev->mtu);
- dev->dpdk_mp = dpdk_mp_get(dev->socket_id, FRAME_LEN_TO_MTU(buf_size));
- if (!dev->dpdk_mp) {
- err = ENOMEM;
- goto unlock;
- }
+ dev->mtu = MTU_NOT_SET;
/* Initialise QoS configuration to NULL and qos lock to unlocked */
dev->qos_conf = NULL;
@@ -728,22 +729,14 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int port_no,
if (type == DPDK_DEV_ETH) {
netdev_dpdk_alloc_txq(dev, NR_QUEUE);
- err = dpdk_eth_dev_init(dev);
- if (err) {
- goto unlock;
- }
} else {
netdev_dpdk_alloc_txq(dev, OVS_VHOST_MAX_QUEUE_NUM);
}
ovs_list_push_back(&dpdk_list, &dev->list_node);
-unlock:
- if (err) {
- rte_free(dev->tx_q);
- }
ovs_mutex_unlock(&dev->mutex);
- return err;
+ return 0;
}
/* dev_name must be the prefix followed by a positive decimal number.
@@ -767,6 +760,31 @@ dpdk_dev_parse_name(const char dev_name[], const char prefix[],
}
}
+static void
+dpdk_dev_parse_mtu(const struct smap *args, int *mtu)
+{
+ const char *mtu_str = smap_get(args, "mtu_request");
+ char *end_ptr = NULL;
+ int local_mtu;
+
+ if (!mtu_str) {
+ local_mtu = ETHER_MTU;
+ } else {
+ local_mtu = strtoul(mtu_str, &end_ptr, 0);
+ if (local_mtu < ETHER_MTU ||
+ local_mtu > FRAME_LEN_TO_MTU(NETDEV_DPDK_MAX_FRAME_LEN) ||
+ *end_ptr != '\0') {
+ local_mtu = ETHER_MTU;
+ VLOG_WARN("Invalid mtu_request parameter - defaulting to %d.\n",
+ local_mtu);
+ } else {
+ VLOG_INFO("mtu_request parameter %d detected.\n", local_mtu);
+ }
+ }
+
+ *mtu = local_mtu;
+}
+
static int
vhost_construct_helper(struct netdev *netdev) OVS_REQUIRES(dpdk_mutex)
{
@@ -913,15 +931,72 @@ netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args)
smap_add_format(args, "configured_rx_queues", "%d", netdev->n_rxq);
smap_add_format(args, "requested_tx_queues", "%d", netdev->n_txq);
smap_add_format(args, "configured_tx_queues", "%d", dev->real_n_txq);
+ smap_add_format(args, "mtu", "%d", dev->mtu);
ovs_mutex_unlock(&dev->mutex);
return 0;
}
+/* Set the mtu of DPDK_DEV_ETH ports */
+static int
+netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu)
+{
+ struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+ int err, dpdk_mtu;
+ uint32_t buf_size;
+ struct dpdk_mp *mp;
+
+ ovs_mutex_lock(&dpdk_mutex);
+ ovs_mutex_lock(&dev->mutex);
+ if (dev->mtu == mtu) {
+ err = 0;
+ goto out;
+ }
+
+ buf_size = dpdk_buf_size(mtu);
+ dpdk_mtu = FRAME_LEN_TO_MTU(buf_size);
+
+ mp = dpdk_mp_get(dev->socket_id, dpdk_mtu);
+ if (!mp) {
+ err = ENOMEM;
+ goto out;
+ }
+
+ rte_eth_dev_stop(dev->port_id);
+
+ dev->dpdk_mp = mp;
+ dev->mtu = mtu;
+ dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
+
+ err = dpdk_eth_dev_init(dev);
+ if (err) {
+ VLOG_WARN("Unable to set MTU '%d' for '%s'; falling back to default "
+ "MTU '%d'\n", mtu, dev->up.name, ETHER_MTU);
+ dpdk_mp_put(mp);
+ dev->mtu = ETHER_MTU;
+ mp = dpdk_mp_get(dev->socket_id, dev->mtu);
+ if (!mp) {
+ err = ENOMEM;
+ goto out;
+ }
+ dev->dpdk_mp = mp;
+ dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
+ dpdk_eth_dev_init(dev);
+ goto out;
+ } else {
+ netdev_change_seq_changed(netdev);
+ }
+out:
+ ovs_mutex_unlock(&dev->mutex);
+ ovs_mutex_unlock(&dpdk_mutex);
+ return err;
+}
+
static int
netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args)
{
struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+ int mtu;
ovs_mutex_lock(&dev->mutex);
netdev->requested_n_rxq = MAX(smap_get_int(args, "n_rxq",
@@ -929,6 +1004,14 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args)
netdev_change_seq_changed(netdev);
ovs_mutex_unlock(&dev->mutex);
+ dpdk_dev_parse_mtu(args, &mtu);
+
+ if (!dev->mtu) {
+ return netdev_dpdk_set_mtu(netdev, mtu);
+ } else if (mtu != dev->mtu) {
+ VLOG_WARN("Unable to set MTU %d for port %d; this port has immutable MTU "
+ "%d\n", mtu, dev->port_id, dev->mtu);
+ }
return 0;
}
@@ -1580,57 +1663,6 @@ netdev_dpdk_get_mtu(const struct netdev *netdev, int *mtup)
}
static int
-netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu)
-{
- struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
- int old_mtu, err, dpdk_mtu;
- struct dpdk_mp *old_mp;
- struct dpdk_mp *mp;
- uint32_t buf_size;
-
- ovs_mutex_lock(&dpdk_mutex);
- ovs_mutex_lock(&dev->mutex);
- if (dev->mtu == mtu) {
- err = 0;
- goto out;
- }
-
- buf_size = dpdk_buf_size(mtu);
- dpdk_mtu = FRAME_LEN_TO_MTU(buf_size);
-
- mp = dpdk_mp_get(dev->socket_id, dpdk_mtu);
- if (!mp) {
- err = ENOMEM;
- goto out;
- }
-
- rte_eth_dev_stop(dev->port_id);
-
- old_mtu = dev->mtu;
- old_mp = dev->dpdk_mp;
- dev->dpdk_mp = mp;
- dev->mtu = mtu;
- dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
-
- err = dpdk_eth_dev_init(dev);
- if (err) {
- dpdk_mp_put(mp);
- dev->mtu = old_mtu;
- dev->dpdk_mp = old_mp;
- dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
- dpdk_eth_dev_init(dev);
- goto out;
- }
-
- dpdk_mp_put(old_mp);
- netdev_change_seq_changed(netdev);
-out:
- ovs_mutex_unlock(&dev->mutex);
- ovs_mutex_unlock(&dpdk_mutex);
- return err;
-}
-
-static int
netdev_dpdk_get_carrier(const struct netdev *netdev, bool *carrier);
static int
@@ -2276,6 +2307,61 @@ dpdk_vhost_user_class_init(void)
return 0;
}
+/* Set the mtu of DPDK_DEV_VHOST ports */
+static int
+netdev_dpdk_vhost_set_mtu(const struct netdev *netdev, int mtu)
+{
+ struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+ int err = 0;
+ struct dpdk_mp *mp;
+
+ ovs_mutex_lock(&dpdk_mutex);
+ ovs_mutex_lock(&dev->mutex);
+ if (dev->mtu == mtu) {
+ err = 0;
+ goto out;
+ }
+
+ mp = dpdk_mp_get(dev->socket_id, mtu);
+ if (!mp) {
+ err = ENOMEM;
+ goto out;
+ }
+
+ dev->dpdk_mp = mp;
+ dev->mtu = mtu;
+ dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
+
+ netdev_change_seq_changed(netdev);
+out:
+ ovs_mutex_unlock(&dev->mutex);
+ ovs_mutex_unlock(&dpdk_mutex);
+ return err;
+}
+
+static int
+netdev_dpdk_vhost_set_config(struct netdev *netdev, const struct smap *args)
+{
+ struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+ int mtu;
+
+ ovs_mutex_lock(&dev->mutex);
+ netdev->requested_n_rxq = MAX(smap_get_int(args, "n_rxq",
+ netdev->requested_n_rxq), 1);
+ netdev_change_seq_changed(netdev);
+ ovs_mutex_unlock(&dev->mutex);
+
+ dpdk_dev_parse_mtu(args, &mtu);
+
+ if (!dev->mtu) {
+ return netdev_dpdk_vhost_set_mtu(netdev, mtu);
+ } else if (mtu != dev->mtu) {
+ VLOG_WARN("Unable to set MTU %d for vhost port; this port has immutable MTU "
+ "%d\n", mtu, dev->mtu);
+ }
+ return 0;
+}
+
static void
dpdk_common_init(void)
{
@@ -2661,8 +2747,9 @@ static const struct dpdk_qos_ops egress_policer_ops = {
egress_policer_run
};
-#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, MULTIQ, SEND, \
- GET_CARRIER, GET_STATS, GET_FEATURES, GET_STATUS, RXQ_RECV) \
+#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, SET_CONFIG, \
+ MULTIQ, SEND, SET_MTU, GET_CARRIER, GET_STATS, GET_FEATURES, \
+ GET_STATUS, RXQ_RECV) \
{ \
NAME, \
true, /* is_pmd */ \
@@ -2675,7 +2762,7 @@ static const struct dpdk_qos_ops egress_policer_ops = {
DESTRUCT, \
netdev_dpdk_dealloc, \
netdev_dpdk_get_config, \
- netdev_dpdk_set_config, \
+ SET_CONFIG , \
NULL, /* get_tunnel_config */ \
NULL, /* build header */ \
NULL, /* push header */ \
@@ -2689,7 +2776,7 @@ static const struct dpdk_qos_ops egress_policer_ops = {
netdev_dpdk_set_etheraddr, \
netdev_dpdk_get_etheraddr, \
netdev_dpdk_get_mtu, \
- netdev_dpdk_set_mtu, \
+ SET_MTU, \
netdev_dpdk_get_ifindex, \
GET_CARRIER, \
netdev_dpdk_get_carrier_resets, \
@@ -2834,8 +2921,10 @@ static const struct netdev_class dpdk_class =
NULL,
netdev_dpdk_construct,
netdev_dpdk_destruct,
+ netdev_dpdk_set_config,
netdev_dpdk_set_multiq,
netdev_dpdk_eth_send,
+ netdev_dpdk_set_mtu,
netdev_dpdk_get_carrier,
netdev_dpdk_get_stats,
netdev_dpdk_get_features,
@@ -2848,8 +2937,10 @@ static const struct netdev_class dpdk_ring_class =
NULL,
netdev_dpdk_ring_construct,
netdev_dpdk_destruct,
+ netdev_dpdk_set_config,
netdev_dpdk_set_multiq,
netdev_dpdk_ring_send,
+ netdev_dpdk_set_mtu,
netdev_dpdk_get_carrier,
netdev_dpdk_get_stats,
netdev_dpdk_get_features,
@@ -2862,8 +2953,10 @@ static const struct netdev_class OVS_UNUSED dpdk_vhost_cuse_class =
dpdk_vhost_cuse_class_init,
netdev_dpdk_vhost_cuse_construct,
netdev_dpdk_vhost_destruct,
+ netdev_dpdk_set_config,
netdev_dpdk_vhost_cuse_set_multiq,
netdev_dpdk_vhost_send,
+ NULL,
netdev_dpdk_vhost_get_carrier,
netdev_dpdk_vhost_get_stats,
NULL,
@@ -2876,8 +2969,10 @@ static const struct netdev_class OVS_UNUSED dpdk_vhost_user_class =
dpdk_vhost_user_class_init,
netdev_dpdk_vhost_user_construct,
netdev_dpdk_vhost_destruct,
+ netdev_dpdk_vhost_set_config,
netdev_dpdk_vhost_set_multiq,
netdev_dpdk_vhost_send,
+ netdev_dpdk_vhost_set_mtu,
netdev_dpdk_vhost_get_carrier,
netdev_dpdk_vhost_get_stats,
NULL,
>_______________________________________________
>discuss mailing list
>discuss at openvswitch.org
>http://openvswitch.org/mailman/listinfo/discuss
More information about the discuss
mailing list