[ovs-dev] [PATCH V5 2/2] netdev-dpdk: add jumbo frame support
Flavio Leitner
fbl at sysclose.org
Mon Feb 22 22:59:01 UTC 2016
On Fri, 19 Feb 2016 11:25:12 +0000
Mark Kavanagh <mark.b.kavanagh at intel.com> wrote:
> Add support for Jumbo Frames to DPDK-enabled port types,
> using single-segment-mbufs.
>
> Using this approach, the amount of memory allocated for each mbuf
> to store frame data is increased to a value greater than 1518B
> (typical Ethernet maximum frame length). The increased space
> available in the mbuf means that an entire Jumbo Frame can be carried
> in a single mbuf, as opposed to partitioning it across multiple mbuf
> segments.
>
> The amount of space allocated to each mbuf to hold frame data is
> defined dynamically by the user when adding a DPDK port to a bridge.
> If an MTU value is not supplied, or the user-supplied value is invalid,
> the MTU for the port defaults to standard Ethernet MTU (i.e. 1500B).
>
> Signed-off-by: Mark Kavanagh <mark.b.kavanagh at intel.com>
> ---
> INSTALL.DPDK.md | 60 ++++++++++++-
> NEWS | 3 +-
> lib/netdev-dpdk.c | 248 +++++++++++++++++++++++++++++++++++++-----------------
> 3 files changed, 233 insertions(+), 78 deletions(-)
>
> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> index d892788..4ca98cb 100644
> --- a/INSTALL.DPDK.md
> +++ b/INSTALL.DPDK.md
> @@ -878,10 +878,63 @@ by adding the following string:
> to <interface> sections of all network devices used by DPDK. Parameter 'N'
> determines how many queues can be used by the guest.
>
> +Jumbo Frames
> +------------
> +
> +Support for Jumbo Frames may be enabled at run-time for DPDK-type ports.
> +
> +To avail of Jumbo Frame support, add the 'mtu_request' option to the ovs-vsctl
> +'add-port' command-line, along with the required MTU for the port.
> +e.g.
> +
> + ```
> + ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk options:mtu_request=9000
> + ```
> +
> +When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are
> +increased, such that a full Jumbo Frame may be accommodated inside a single
> +mbuf segment. Once set, the MTU for a DPDK port is immutable.
> +
> +Note that from an OVSDB perspective, the `mtu_request` option for a specific
> +port may be disregarded once initially set, as subsequent modifications to this
> +field are disregarded by the DPDK port. As with non-DPDK ports, the MTU of DPDK
> +ports is reported by the `Interface` table's 'mtu' field.
> +
> +Jumbo frame support has been validated against 13312B frames, using the
> +DPDK `igb_uio` driver, but larger frames and other DPDK NIC drivers may
> +theoretically be supported. Supported port types excludes vHost-Cuse ports, as
> +that feature is pending deprecation.
> +
> +vHost Ports and Jumbo Frames
> +----------------------------
> +Jumbo frame support is available for DPDK vHost-User ports only. Some additional
> +configuration is needed to take advantage of this feature:
> +
> + 1. `mergeable buffers` must be enabled for vHost ports, as demonstrated in
> + the QEMU command line snippet below:
> +
> + ```
> + '-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \'
> + '-device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on'
> + ```
> +
> + 2. Where virtio devices are bound to the Linux kernel driver in a guest
> + environment (i.e. interfaces are not bound to an in-guest DPDK driver), the
> + MTU of those logical network interfaces must also be increased. This
> + avoids segmentation of Jumbo Frames in the guest. Note that 'MTU' refers
> + to the length of the IP packet only, and not that of the entire frame.
> +
> + e.g. To calculate the exact MTU of a standard IPv4 frame, subtract the L2
> + header and CRC lengths (i.e. 18B) from the max supported frame size.
> + So, to set the MTU for a 13312B Jumbo Frame:
> +
> + ```
> + ifconfig eth1 mtu 13294
> + ```
> +
> Restrictions:
> -------------
>
> - - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue.
> - Currently DPDK port does not make use any offload functionality.
> - DPDK-vHost support works with 1G huge pages.
>
> @@ -922,6 +975,11 @@ Restrictions:
> the next release of DPDK (which includes the above patch) is available and
> integrated into OVS.
>
> + Jumbo Frames:
> + - `virtio-pmd`: DPDK apps in the guest do not exit gracefully. This is a DPDK
> + issue that is currently being investigated.
> + - vHost-Cuse: Jumbo Frame support is not available for vHost Cuse ports.
> +
> Bug Reporting:
> --------------
>
> diff --git a/NEWS b/NEWS
> index 3e33871..43127f9 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -10,10 +10,11 @@ Post-v2.5.0
> - DPDK:
> * New option "n_rxq" for PMD interfaces.
> Old 'other_config:n-dpdk-rxqs' is no longer supported.
> + * Support Jumbo Frames
> +
> - ovs-benchmark: This utility has been removed due to lack of use and
> bitrot.
>
> -
> v2.5.0 - xx xxx xxxx
> ---------------------
> - Dropped support for Python older than version 2.7. As a consequence,
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 2a06bb5..ac89ee6 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -77,6 +77,8 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
> + sizeof(struct dp_packet) \
> + RTE_PKTMBUF_HEADROOM)
> #define NETDEV_DPDK_MBUF_ALIGN 1024
> +#define NETDEV_DPDK_MAX_FRAME_LEN 13312
> +#define MTU_NOT_SET 0
>
> /* Max and min number of packets in the mempool. OVS tries to allocate a
> * mempool with MAX_NB_MBUF: if this fails (because the system doesn't have
> @@ -422,6 +424,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
> {
> int diag = 0;
> int i;
> + struct rte_eth_conf conf = port_conf;
>
> /* A device may report more queues than it makes available (this has
> * been observed for Intel xl710, which reserves some of them for
> @@ -433,7 +436,15 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
> VLOG_INFO("Retrying setup with (rxq:%d txq:%d)", n_rxq, n_txq);
> }
>
> - diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, &port_conf);
> + if (dev->mtu > ETHER_MTU) {
> + conf.rxmode.jumbo_frame = 1;
> + conf.rxmode.max_rx_pkt_len = dev->max_packet_len;
> + } else {
> + conf.rxmode.jumbo_frame = 0;
> + conf.rxmode.max_rx_pkt_len = 0;
> + }
> +
> + diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, &conf);
> if (diag) {
> break;
> }
> @@ -574,8 +585,6 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned int port_no,
> {
> struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
> int sid;
> - int err = 0;
> - uint32_t buf_size;
>
> ovs_mutex_init(&netdev->mutex);
> ovs_mutex_lock(&netdev->mutex);
> @@ -595,15 +604,7 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned int port_no,
> netdev->port_id = port_no;
> netdev->type = type;
> netdev->flags = 0;
> - netdev->mtu = ETHER_MTU;
> - netdev->max_packet_len = MTU_TO_FRAME_LEN(netdev->mtu);
> -
> - buf_size = dpdk_buf_size(netdev->mtu);
> - netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id, FRAME_LEN_TO_MTU(buf_size));
> - if (!netdev->dpdk_mp) {
> - err = ENOMEM;
> - goto unlock;
> - }
> + netdev->mtu = MTU_NOT_SET;
>
> netdev_->n_txq = NR_QUEUE;
> netdev_->n_rxq = NR_QUEUE;
> @@ -612,20 +613,12 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned int port_no,
>
> if (type == DPDK_DEV_ETH) {
> netdev_dpdk_alloc_txq(netdev, NR_QUEUE);
> - err = dpdk_eth_dev_init(netdev);
> - if (err) {
> - goto unlock;
> - }
> }
>
> list_push_back(&dpdk_list, &netdev->list_node);
>
> -unlock:
> - if (err) {
> - rte_free(netdev->tx_q);
> - }
> ovs_mutex_unlock(&netdev->mutex);
> - return err;
> + return 0;
> }
>
> static int
> @@ -643,6 +636,31 @@ dpdk_dev_parse_name(const char dev_name[], const char prefix[],
> return 0;
> }
>
> +static void
> +dpdk_dev_parse_mtu(const struct smap *args, int *mtu)
> +{
> + const char *mtu_str = smap_get(args, "mtu_request");
> + char *end_ptr = NULL;
> + int local_mtu;
> +
> + if (!mtu_str) {
> + local_mtu = ETHER_MTU;
> + } else {
> + local_mtu = strtoul(mtu_str, &end_ptr, 0);
> + if (local_mtu < ETHER_MTU ||
> + local_mtu > FRAME_LEN_TO_MTU(NETDEV_DPDK_MAX_FRAME_LEN) ||
> + *end_ptr != '\0') {
> + local_mtu = ETHER_MTU;
> + VLOG_WARN("Invalid mtu_request parameter - defaulting to %d.\n",
> + local_mtu);
> + } else {
> + VLOG_INFO("mtu_request parameter %d detected.\n", local_mtu);
That message could be VLOG_DBG because it only tells about a parameter being
detected and not if the mtu request succeed. Other than that the patch looks
good to me.
If no one else has comments to address, maybe a commiter can fix the message
level to avoid re-spinning the patchset.
Thanks for the patch!
Acked-by: Flavio Leitner <fbl at sysclose.org>
--
fbl
> + }
> + }
> +
> + *mtu = local_mtu;
> +}
> +
> static int
> vhost_construct_helper(struct netdev *netdev_) OVS_REQUIRES(dpdk_mutex)
> {
> @@ -773,15 +791,72 @@ netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args)
> smap_add_format(args, "configured_rx_queues", "%d", netdev->n_rxq);
> smap_add_format(args, "requested_tx_queues", "%d", netdev->n_txq);
> smap_add_format(args, "configured_tx_queues", "%d", dev->real_n_txq);
> + smap_add_format(args, "mtu", "%d", dev->mtu);
> ovs_mutex_unlock(&dev->mutex);
>
> return 0;
> }
>
> +/* Set the mtu of DPDK_DEV_ETH ports */
> +static int
> +netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu)
> +{
> + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> + int err, dpdk_mtu;
> + uint32_t buf_size;
> + struct dpdk_mp *mp;
> +
> + ovs_mutex_lock(&dpdk_mutex);
> + ovs_mutex_lock(&dev->mutex);
> + if (dev->mtu == mtu) {
> + err = 0;
> + goto out;
> + }
> +
> + buf_size = dpdk_buf_size(mtu);
> + dpdk_mtu = FRAME_LEN_TO_MTU(buf_size);
> +
> + mp = dpdk_mp_get(dev->socket_id, dpdk_mtu);
> + if (!mp) {
> + err = ENOMEM;
> + goto out;
> + }
> +
> + rte_eth_dev_stop(dev->port_id);
> +
> + dev->dpdk_mp = mp;
> + dev->mtu = mtu;
> + dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> +
> + err = dpdk_eth_dev_init(dev);
> + if (err) {
> + VLOG_WARN("Unable to set MTU '%d' for '%s'; falling back to default "
> + "MTU '%d'\n", mtu, dev->up.name, ETHER_MTU);
> + dpdk_mp_put(mp);
> + dev->mtu = ETHER_MTU;
> + mp = dpdk_mp_get(dev->socket_id, dev->mtu);
> + if (!mp) {
> + err = ENOMEM;
> + goto out;
> + }
> + dev->dpdk_mp = mp;
> + dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> + dpdk_eth_dev_init(dev);
> + goto out;
> + } else {
> + netdev_change_seq_changed(netdev);
> + }
> +out:
> + ovs_mutex_unlock(&dev->mutex);
> + ovs_mutex_unlock(&dpdk_mutex);
> + return err;
> +}
> +
> static int
> netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args)
> {
> struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> + int mtu;
>
> ovs_mutex_lock(&dev->mutex);
> netdev->requested_n_rxq = MAX(smap_get_int(args, "n_rxq",
> @@ -789,6 +864,14 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args)
> netdev_change_seq_changed(netdev);
> ovs_mutex_unlock(&dev->mutex);
>
> + dpdk_dev_parse_mtu(args, &mtu);
> +
> + if (!dev->mtu) {
> + return netdev_dpdk_set_mtu(netdev, mtu);
> + } else if (mtu != dev->mtu) {
> + VLOG_WARN("Unable to set MTU %d for port %d; this port has immutable MTU "
> + "%d\n", mtu, dev->port_id, dev->mtu);
> + }
> return 0;
> }
>
> @@ -1407,57 +1490,6 @@ netdev_dpdk_get_mtu(const struct netdev *netdev, int *mtup)
> }
>
> static int
> -netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu)
> -{
> - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> - int old_mtu, err, dpdk_mtu;
> - struct dpdk_mp *old_mp;
> - struct dpdk_mp *mp;
> - uint32_t buf_size;
> -
> - ovs_mutex_lock(&dpdk_mutex);
> - ovs_mutex_lock(&dev->mutex);
> - if (dev->mtu == mtu) {
> - err = 0;
> - goto out;
> - }
> -
> - buf_size = dpdk_buf_size(mtu);
> - dpdk_mtu = FRAME_LEN_TO_MTU(buf_size);
> -
> - mp = dpdk_mp_get(dev->socket_id, dpdk_mtu);
> - if (!mp) {
> - err = ENOMEM;
> - goto out;
> - }
> -
> - rte_eth_dev_stop(dev->port_id);
> -
> - old_mtu = dev->mtu;
> - old_mp = dev->dpdk_mp;
> - dev->dpdk_mp = mp;
> - dev->mtu = mtu;
> - dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> -
> - err = dpdk_eth_dev_init(dev);
> - if (err) {
> - dpdk_mp_put(mp);
> - dev->mtu = old_mtu;
> - dev->dpdk_mp = old_mp;
> - dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> - dpdk_eth_dev_init(dev);
> - goto out;
> - }
> -
> - dpdk_mp_put(old_mp);
> - netdev_change_seq_changed(netdev);
> -out:
> - ovs_mutex_unlock(&dev->mutex);
> - ovs_mutex_unlock(&dpdk_mutex);
> - return err;
> -}
> -
> -static int
> netdev_dpdk_get_carrier(const struct netdev *netdev_, bool *carrier);
>
> static int
> @@ -2000,6 +2032,61 @@ dpdk_vhost_user_class_init(void)
> return 0;
> }
>
> +/* Set the mtu of DPDK_DEV_VHOST ports */
> +static int
> +netdev_dpdk_vhost_set_mtu(const struct netdev *netdev, int mtu)
> +{
> + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> + int err = 0;
> + struct dpdk_mp *mp;
> +
> + ovs_mutex_lock(&dpdk_mutex);
> + ovs_mutex_lock(&dev->mutex);
> + if (dev->mtu == mtu) {
> + err = 0;
> + goto out;
> + }
> +
> + mp = dpdk_mp_get(dev->socket_id, mtu);
> + if (!mp) {
> + err = ENOMEM;
> + goto out;
> + }
> +
> + dev->dpdk_mp = mp;
> + dev->mtu = mtu;
> + dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> +
> + netdev_change_seq_changed(netdev);
> +out:
> + ovs_mutex_unlock(&dev->mutex);
> + ovs_mutex_unlock(&dpdk_mutex);
> + return err;
> +}
> +
> +static int
> +netdev_dpdk_vhost_set_config(struct netdev *netdev, const struct smap *args)
> +{
> + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> + int mtu;
> +
> + ovs_mutex_lock(&dev->mutex);
> + netdev->requested_n_rxq = MAX(smap_get_int(args, "n_rxq",
> + netdev->requested_n_rxq), 1);
> + netdev_change_seq_changed(netdev);
> + ovs_mutex_unlock(&dev->mutex);
> +
> + dpdk_dev_parse_mtu(args, &mtu);
> +
> + if (!dev->mtu) {
> + return netdev_dpdk_vhost_set_mtu(netdev, mtu);
> + } else if (mtu != dev->mtu) {
> + VLOG_WARN("Unable to set MTU %d for vhost port; this port has immutable MTU "
> + "%d\n", mtu, dev->mtu);
> + }
> + return 0;
> +}
> +
> static void
> dpdk_common_init(void)
> {
> @@ -2136,8 +2223,9 @@ unlock_dpdk:
> return err;
> }
>
> -#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, MULTIQ, SEND, \
> - GET_CARRIER, GET_STATS, GET_FEATURES, GET_STATUS, RXQ_RECV) \
> +#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, SET_CONFIG, \
> + MULTIQ, SEND, SET_MTU, GET_CARRIER, GET_STATS, GET_FEATURES, \
> + GET_STATUS, RXQ_RECV) \
> { \
> NAME, \
> INIT, /* init */ \
> @@ -2149,7 +2237,7 @@ unlock_dpdk:
> DESTRUCT, \
> netdev_dpdk_dealloc, \
> netdev_dpdk_get_config, \
> - netdev_dpdk_set_config, \
> + SET_CONFIG , \
> NULL, /* get_tunnel_config */ \
> NULL, /* build header */ \
> NULL, /* push header */ \
> @@ -2163,7 +2251,7 @@ unlock_dpdk:
> netdev_dpdk_set_etheraddr, \
> netdev_dpdk_get_etheraddr, \
> netdev_dpdk_get_mtu, \
> - netdev_dpdk_set_mtu, \
> + SET_MTU, \
> netdev_dpdk_get_ifindex, \
> GET_CARRIER, \
> netdev_dpdk_get_carrier_resets, \
> @@ -2309,8 +2397,10 @@ static const struct netdev_class dpdk_class =
> NULL,
> netdev_dpdk_construct,
> netdev_dpdk_destruct,
> + netdev_dpdk_set_config,
> netdev_dpdk_set_multiq,
> netdev_dpdk_eth_send,
> + netdev_dpdk_set_mtu,
> netdev_dpdk_get_carrier,
> netdev_dpdk_get_stats,
> netdev_dpdk_get_features,
> @@ -2323,8 +2413,10 @@ static const struct netdev_class dpdk_ring_class =
> NULL,
> netdev_dpdk_ring_construct,
> netdev_dpdk_destruct,
> + netdev_dpdk_set_config,
> netdev_dpdk_set_multiq,
> netdev_dpdk_ring_send,
> + netdev_dpdk_set_mtu,
> netdev_dpdk_get_carrier,
> netdev_dpdk_get_stats,
> netdev_dpdk_get_features,
> @@ -2337,8 +2429,10 @@ static const struct netdev_class OVS_UNUSED dpdk_vhost_cuse_class =
> dpdk_vhost_cuse_class_init,
> netdev_dpdk_vhost_cuse_construct,
> netdev_dpdk_vhost_destruct,
> + netdev_dpdk_set_config,
> netdev_dpdk_vhost_cuse_set_multiq,
> netdev_dpdk_vhost_send,
> + NULL,
> netdev_dpdk_vhost_get_carrier,
> netdev_dpdk_vhost_get_stats,
> NULL,
> @@ -2351,8 +2445,10 @@ static const struct netdev_class OVS_UNUSED dpdk_vhost_user_class =
> dpdk_vhost_user_class_init,
> netdev_dpdk_vhost_user_construct,
> netdev_dpdk_vhost_destruct,
> + netdev_dpdk_vhost_set_config,
> netdev_dpdk_vhost_set_multiq,
> netdev_dpdk_vhost_send,
> + netdev_dpdk_vhost_set_mtu,
> netdev_dpdk_vhost_get_carrier,
> netdev_dpdk_vhost_get_stats,
> NULL,
More information about the dev
mailing list