[ovs-dev] [PATCH V3] netdev-dpdk: Add Jumbo Frame Support.
Kavanagh, Mark B
mark.b.kavanagh at intel.com
Mon Feb 1 16:36:28 UTC 2016
Hi Zoltan. Thanks for your mail - responses inline below.
Cheers,
Mark
>
>Hi,
>
>I have a related question: is it a possible alternative with the current
>codebase to use KNI for configuring the MTU?
This would (as far as I'm aware) require the implementation of a new KNI netdev class, along with a handler/set of handlers in order to set the MTU (for that netdev class only), based on requests from ifconfig/ip link/etc.
It wouldn't make a difference as far as IVSHMEM/Phy ports are concerned, so I don't think this approach is workable. If I've missed something though, please do let me know.
>I never actually tried KNI, does it have a performance penalty to just
>have it?
I haven't used it in some time, but the fact that it passes packets to/accepts packets from the Linux Network Stack means that it's subject to all of the inherent performance bottlenecks that you'd expect.
>The reason I'm asking these questions because OVS normally doesn't
>configure the MTU, but it is outside of its scope. If there is a
>reasonable alternative to do that for DPDK (like with KNI), we don't
>need this change.
>Sorry for the noise if this idea have been discussed previously.
>
>Regards,
>
>Zoltan
>
>On 01/02/16 10:18, Mark Kavanagh wrote:
>> Add support for Jumbo Frames to DPDK-enabled port types,
>> using single-segment-mbufs.
>>
>> Using this approach, the amount of memory allocated for each mbuf
>> to store frame data is increased to a value greater than 1518B
>> (typical Ethernet maximum frame length). The increased space
>> available in the mbuf means that an entire Jumbo Frame can be carried
>> in a single mbuf, as opposed to partitioning it across multiple mbuf
>> segments.
>>
>> The amount of space allocated to each mbuf to hold frame data is
>> defined dynamically by the user when adding a DPDK port to a bridge.
>> If an MTU value is not supplied, or the user-supplied value is invalid,
>> the MTU for the port defaults to standard Ethernet MTU (i.e. 1500B).
>>
>> Signed-off-by: Mark Kavanagh <mark.b.kavanagh at intel.com>
>> ---
>> INSTALL.DPDK.md | 59 +++++++++-
>> NEWS | 2 +
>> lib/netdev-dpdk.c | 347 +++++++++++++++++++++++++++++++++++++++++-------------
>> 3 files changed, 328 insertions(+), 80 deletions(-)
>>
>> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
>> index 96b686c..64ccd15 100644
>> --- a/INSTALL.DPDK.md
>> +++ b/INSTALL.DPDK.md
>> @@ -859,10 +859,61 @@ by adding the following string:
>> to <interface> sections of all network devices used by DPDK. Parameter 'N'
>> determines how many queues can be used by the guest.
>>
>> +
>> +Jumbo Frames
>> +------------
>> +
>> +Support for Jumbo Frames may be enabled at run-time for DPDK-type ports.
>> +
>> +To avail of Jumbo Frame support, add the 'dpdk-mtu' option to the ovs-vsctl
>> +'add-port' command-line, along with the required MTU for the port.
>> +e.g.
>> +
>> + ```
>> + ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk options:dpdk-mtu=9000
>> + ```
>> +
>> +When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are
>> +increased, such that a full Jumbo Frame may be accommodated inside a single
>> +mbuf segment. Once set, the MTU for a DPDK port is immutable.
>> +
>> +Jumbo frame support has been validated against 13312B frames, using the
>> +DPDK `igb_uio` driver, but larger frames and other DPDK NIC drivers may
>> +theoretically be supported. Supported port types excludes vHost-Cuse ports, as
>> +this feature is pending deprecation.
>> +
>> +
>> +vHost Ports and Jumbo Frames
>> +----------------------------
>> +Jumbo frame support is available for DPDK vHost-User ports only. Some additional
>> +configuration is needed to take advantage of this feature:
>> +
>> + 1. `mergeable buffers` must be enabled for vHost ports, as demonstrated in
>> + the QEMU command line snippet below:
>> +
>> + ```
>> + '-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \'
>> + '-device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on'
>> + ```
>> +
>> + 2. Where virtio devices are bound to the Linux kernel driver in a guest
>> + environment (i.e. interfaces are not bound to an in-guest DPDK driver), the
>> + MTU of those logical network interfaces must also be increased. This
>> + avoids segmentation of Jumbo Frames in the guest. Note that 'MTU' refers
>> + to the length of the IP packet only, and not that of the entire frame.
>> +
>> + e.g. To calculate the exact MTU of a standard IPv4 frame, subtract the L2
>> + header and CRC lengths (i.e. 18B) from the max supported frame size.
>> + So, to set the MTU for a 13312B Jumbo Frame:
>> +
>> + ```
>> + ifconfig eth1 mtu 13294
>> + ```
>> +
>> +
>> Restrictions:
>> -------------
>>
>> - - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue.
>> - Currently DPDK port does not make use any offload functionality.
>> - DPDK-vHost support works with 1G huge pages.
>>
>> @@ -903,6 +954,12 @@ Restrictions:
>> the next release of DPDK (which includes the above patch) is available and
>> integrated into OVS.
>>
>> + Jumbo Frames:
>> + - `virtio-pmd`: DPDK apps in the guest do not exit gracefully. The source of
>> + this issue is currently being investigated.
>> + - vHost-Cuse: Jumbo Frame support is not available for vHost Cuse ports.
>> +
>> +
>> Bug Reporting:
>> --------------
>>
>> diff --git a/NEWS b/NEWS
>> index 5c18867..cd563e0 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -46,6 +46,8 @@ v2.5.0 - xx xxx xxxx
>> abstractions, such as virtual L2 and L3 overlays and security groups.
>> - RHEL packaging:
>> * DPDK ports may now be created via network scripts (see README.RHEL).
>> + - netdev-dpdk:
>> + * Add Jumbo Frame Support
>>
>>
>> v2.4.0 - 20 Aug 2015
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index de7e488..76a5dcc 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -62,20 +62,25 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
>> #define OVS_CACHE_LINE_SIZE CACHE_LINE_SIZE
>> #define OVS_VPORT_DPDK "ovs_dpdk"
>>
>> +#define NETDEV_DPDK_JUMBO_FRAME_ENABLED 1
>> +#define NETDEV_DPDK_DEFAULT_RX_BUFSIZE 1024
>> +
>> /*
>> * need to reserve tons of extra space in the mbufs so we can align the
>> * DMA addresses to 4KB.
>> * The minimum mbuf size is limited to avoid scatter behaviour and drop in
>> * performance for standard Ethernet MTU.
>> */
>> -#define MTU_TO_MAX_LEN(mtu) ((mtu) + ETHER_HDR_LEN + ETHER_CRC_LEN)
>> -#define MBUF_SIZE_MTU(mtu) (MTU_TO_MAX_LEN(mtu) \
>> - + sizeof(struct dp_packet) \
>> - + RTE_PKTMBUF_HEADROOM)
>> -#define MBUF_SIZE_DRIVER (2048 \
>> - + sizeof (struct rte_mbuf) \
>> - + RTE_PKTMBUF_HEADROOM)
>> -#define MBUF_SIZE(mtu) MAX(MBUF_SIZE_MTU(mtu), MBUF_SIZE_DRIVER)
>> +#define MTU_TO_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_LEN + ETHER_CRC_LEN)
>> +#define FRAME_LEN_TO_MTU(frame_len) ((frame_len)- ETHER_HDR_LEN - ETHER_CRC_LEN)
>> +#define MBUF_SEGMENT_SIZE(mtu) ( MTU_TO_FRAME_LEN(mtu) \
>> + + sizeof(struct dp_packet) \
>> + + RTE_PKTMBUF_HEADROOM)
>> +
>> +/* This value should be specified as a multiple of the DPDK NIC driver's
>> + * 'min_rx_bufsize' attribute (currently 1024B for 'igb_uio').
>> + */
>> +#define NETDEV_DPDK_MAX_FRAME_LEN 13312
>>
>> /* Max and min number of packets in the mempool. OVS tries to allocate a
>> * mempool with MAX_NB_MBUF: if this fails (because the system doesn't have
>> @@ -86,6 +91,8 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
>> #define MIN_NB_MBUF (4096 * 4)
>> #define MP_CACHE_SZ RTE_MEMPOOL_CACHE_MAX_SIZE
>>
>> +#define DPDK_VLAN_TAG_LEN 4
>> +
>> /* MAX_NB_MBUF can be divided by 2 many times, until MIN_NB_MBUF */
>> BUILD_ASSERT_DECL(MAX_NB_MBUF % ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF) == 0);
>>
>> @@ -114,7 +121,6 @@ static const struct rte_eth_conf port_conf = {
>> .header_split = 0, /* Header Split disabled */
>> .hw_ip_checksum = 0, /* IP checksum offload disabled */
>> .hw_vlan_filter = 0, /* VLAN filtering disabled */
>> - .jumbo_frame = 0, /* Jumbo Frame Support disabled */
>> .hw_strip_crc = 0,
>> },
>> .rx_adv_conf = {
>> @@ -254,6 +260,41 @@ is_dpdk_class(const struct netdev_class *class)
>> return class->construct == netdev_dpdk_construct;
>> }
>>
>> +/* DPDK NIC drivers allocate RX buffers at a particular granularity
>> + * (specified by rte_eth_dev_info.min_rx_bufsize - currently 1K for igb_uio).
>> + * If 'frame_len' is not a multiple of this value, insufficient buffers are
>> + * allocated to accomodate the packet in its entirety. Furthermore, the igb_uio
>> + * driver needs to ensure that there is also sufficient space in the Rx buffer
>> + * to accommodate two VLAN tags (for QinQ frames). If the RX buffer is too
>> + * small, then the driver enables scatter RX behaviour, which reduces
>> + * performance. To prevent this, use a buffer size that is closest to
>> + * 'frame_len', but which satisfies the aforementioned criteria.
>> + */
>> +static uint32_t
>> +dpdk_buf_size(struct netdev_dpdk *netdev, int frame_len)
>> +{
>> + struct rte_eth_dev_info info;
>> + uint32_t buf_size;
>> + /* XXX: This is a workaround for DPDK v2.2, and needs to be refactored with a
>> + * future DPDK release. */
>> + uint32_t len = frame_len + (2 * DPDK_VLAN_TAG_LEN);
>> +
>> + if(netdev->type == DPDK_DEV_ETH) {
>> + rte_eth_dev_info_get(netdev->port_id, &info);
>> + buf_size = (info.min_rx_bufsize == 0) ?
>> + NETDEV_DPDK_DEFAULT_RX_BUFSIZE :
>> + info.min_rx_bufsize;
>> + } else {
>> + buf_size = NETDEV_DPDK_DEFAULT_RX_BUFSIZE;
>> + }
>> +
>> + if(len % buf_size) {
>> + len = buf_size * ((len/buf_size) + 1);
>> + }
>> +
>> + return len;
>> +}
>> +
>> /* XXX: use dpdk malloc for entire OVS. in fact huge page should be used
>> * for all other segments data, bss and text. */
>>
>> @@ -280,26 +321,65 @@ free_dpdk_buf(struct dp_packet *p)
>> }
>>
>> static void
>> -__rte_pktmbuf_init(struct rte_mempool *mp,
>> - void *opaque_arg OVS_UNUSED,
>> - void *_m,
>> - unsigned i OVS_UNUSED)
>> +ovs_rte_pktmbuf_pool_init(struct rte_mempool *mp, void *opaque_arg)
>> {
>> - struct rte_mbuf *m = _m;
>> - uint32_t buf_len = mp->elt_size - sizeof(struct dp_packet);
>> + struct rte_pktmbuf_pool_private *user_mbp_priv, *mbp_priv;
>> + struct rte_pktmbuf_pool_private default_mbp_priv;
>> + uint16_t roomsz;
>>
>> RTE_MBUF_ASSERT(mp->elt_size >= sizeof(struct dp_packet));
>>
>> + /* if no structure is provided, assume no mbuf private area */
>> +
>> + user_mbp_priv = opaque_arg;
>> + if (user_mbp_priv == NULL) {
>> + default_mbp_priv.mbuf_priv_size = 0;
>> + if (mp->elt_size > sizeof(struct dp_packet)) {
>> + roomsz = mp->elt_size - sizeof(struct dp_packet);
>> + } else {
>> + roomsz = 0;
>> + }
>> + default_mbp_priv.mbuf_data_room_size = roomsz;
>> + user_mbp_priv = &default_mbp_priv;
>> + }
>> +
>> + RTE_MBUF_ASSERT(mp->elt_size >= sizeof(struct dp_packet) +
>> + user_mbp_priv->mbuf_data_room_size +
>> + user_mbp_priv->mbuf_priv_size);
>> +
>> + mbp_priv = rte_mempool_get_priv(mp);
>> + memcpy(mbp_priv, user_mbp_priv, sizeof(*mbp_priv));
>> +}
>> +
>> +/* Initialise some fields in the mbuf structure that are not modified by the
>> + * user once created (origin pool, buffer start address, etc.*/
>> +static void
>> +__ovs_rte_pktmbuf_init(struct rte_mempool *mp,
>> + void *opaque_arg OVS_UNUSED,
>> + void *_m,
>> + unsigned i OVS_UNUSED)
>> +{
>> + struct rte_mbuf *m = _m;
>> + uint32_t buf_size, buf_len, priv_size;
>> +
>> + priv_size = rte_pktmbuf_priv_size(mp);
>> + buf_size = sizeof(struct dp_packet) + priv_size;
>> + buf_len = rte_pktmbuf_data_room_size(mp);
>> +
>> + RTE_MBUF_ASSERT(RTE_ALIGN(priv_size, RTE_MBUF_PRIV_ALIGN) == priv_size);
>> + RTE_MBUF_ASSERT(mp->elt_size >= buf_size);
>> + RTE_MBUF_ASSERT(buf_len <= UINT16_MAX);
>> +
>> memset(m, 0, mp->elt_size);
>>
>> - /* start of buffer is just after mbuf structure */
>> - m->buf_addr = (char *)m + sizeof(struct dp_packet);
>> - m->buf_physaddr = rte_mempool_virt2phy(mp, m) +
>> - sizeof(struct dp_packet);
>> + /* start of buffer is after dp_packet structure and priv data */
>> + m->priv_size = priv_size;
>> + m->buf_addr = (char *)m + buf_size;
>> + m->buf_physaddr = rte_mempool_virt2phy(mp, m) + buf_size;
>> m->buf_len = (uint16_t)buf_len;
>>
>> /* keep some headroom between start of buffer and data */
>> - m->data_off = RTE_MIN(RTE_PKTMBUF_HEADROOM, m->buf_len);
>> + m->data_off = RTE_MIN(RTE_PKTMBUF_HEADROOM, (uint16_t)m->buf_len);
>>
>> /* init some constant fields */
>> m->pool = mp;
>> @@ -315,7 +395,7 @@ ovs_rte_pktmbuf_init(struct rte_mempool *mp,
>> {
>> struct rte_mbuf *m = _m;
>>
>> - __rte_pktmbuf_init(mp, opaque_arg, _m, i);
>> + __ovs_rte_pktmbuf_init(mp, opaque_arg, m, i);
>>
>> dp_packet_init_dpdk((struct dp_packet *) m, m->buf_len);
>> }
>> @@ -326,6 +406,7 @@ dpdk_mp_get(int socket_id, int mtu) OVS_REQUIRES(dpdk_mutex)
>> struct dpdk_mp *dmp = NULL;
>> char mp_name[RTE_MEMPOOL_NAMESIZE];
>> unsigned mp_size;
>> + struct rte_pktmbuf_pool_private mbp_priv;
>>
>> LIST_FOR_EACH (dmp, list_node, &dpdk_mp_list) {
>> if (dmp->socket_id == socket_id && dmp->mtu == mtu) {
>> @@ -338,6 +419,8 @@ dpdk_mp_get(int socket_id, int mtu) OVS_REQUIRES(dpdk_mutex)
>> dmp->socket_id = socket_id;
>> dmp->mtu = mtu;
>> dmp->refcount = 1;
>> + mbp_priv.mbuf_data_room_size = MTU_TO_FRAME_LEN(mtu) + RTE_PKTMBUF_HEADROOM;
>> + mbp_priv.mbuf_priv_size = 0;
>>
>> mp_size = MAX_NB_MBUF;
>> do {
>> @@ -346,10 +429,10 @@ dpdk_mp_get(int socket_id, int mtu) OVS_REQUIRES(dpdk_mutex)
>> return NULL;
>> }
>>
>> - dmp->mp = rte_mempool_create(mp_name, mp_size, MBUF_SIZE(mtu),
>> + dmp->mp = rte_mempool_create(mp_name, mp_size, MBUF_SEGMENT_SIZE(mtu),
>> MP_CACHE_SZ,
>> sizeof(struct rte_pktmbuf_pool_private),
>> - rte_pktmbuf_pool_init, NULL,
>> + ovs_rte_pktmbuf_pool_init, &mbp_priv,
>> ovs_rte_pktmbuf_init, NULL,
>> socket_id, 0);
>> } while (!dmp->mp && rte_errno == ENOMEM && (mp_size /= 2) >= MIN_NB_MBUF);
>> @@ -433,6 +516,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
>> {
>> int diag = 0;
>> int i;
>> + struct rte_eth_conf conf = port_conf;
>>
>> /* A device may report more queues than it makes available (this has
>> * been observed for Intel xl710, which reserves some of them for
>> @@ -444,7 +528,12 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int
>n_txq)
>> VLOG_INFO("Retrying setup with (rxq:%d txq:%d)", n_rxq, n_txq);
>> }
>>
>> - diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, &port_conf);
>> + if(OVS_UNLIKELY(dev->mtu > ETHER_MTU)) {
>> + conf.rxmode.jumbo_frame = NETDEV_DPDK_JUMBO_FRAME_ENABLED;
>> + conf.rxmode.max_rx_pkt_len = MTU_TO_FRAME_LEN(dev->mtu);
>> + }
>> +
>> + diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, &conf);
>> if (diag) {
>> break;
>> }
>> @@ -586,6 +675,7 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned int port_no,
>> struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
>> int sid;
>> int err = 0;
>> + uint32_t buf_size;
>>
>> ovs_mutex_init(&netdev->mutex);
>> ovs_mutex_lock(&netdev->mutex);
>> @@ -605,10 +695,16 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned int port_no,
>> netdev->port_id = port_no;
>> netdev->type = type;
>> netdev->flags = 0;
>> +
>> + /* Initialize port's MTU and frame len to the default Ethernet values.
>> + * Larger, user-specified (jumbo) frame buffers are accommodated in
>> + * netdev_dpdk_set_config.
>> + */
>> + netdev->max_packet_len = ETHER_MAX_LEN;
>> netdev->mtu = ETHER_MTU;
>> - netdev->max_packet_len = MTU_TO_MAX_LEN(netdev->mtu);
>>
>> - netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id, netdev->mtu);
>> + buf_size = dpdk_buf_size(netdev, ETHER_MAX_LEN);
>> + netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id, FRAME_LEN_TO_MTU(buf_size));
>> if (!netdev->dpdk_mp) {
>> err = ENOMEM;
>> goto unlock;
>> @@ -651,6 +747,27 @@ dpdk_dev_parse_name(const char dev_name[], const char prefix[],
>> return 0;
>> }
>>
>> +static void
>> +dpdk_dev_parse_mtu(const struct smap *args, int *mtu)
>> +{
>> + const char *mtu_str = smap_get(args, "dpdk-mtu");
>> + char *end_ptr = NULL;
>> + int local_mtu;
>> +
>> + if(mtu_str) {
>> + local_mtu = strtoul(mtu_str, &end_ptr, 0);
>> + }
>> + if(!mtu_str || local_mtu < ETHER_MTU ||
>> + local_mtu > FRAME_LEN_TO_MTU(NETDEV_DPDK_MAX_FRAME_LEN) ||
>> + *end_ptr != '\0') {
>> + local_mtu = ETHER_MTU;
>> + VLOG_WARN("Invalid or missing dpdk-mtu parameter - defaulting to %d.\n",
>> + local_mtu);
>> + }
>> +
>> + *mtu = local_mtu;
>> +}
>> +
>> static int
>> vhost_construct_helper(struct netdev *netdev_) OVS_REQUIRES(dpdk_mutex)
>> {
>> @@ -777,11 +894,77 @@ netdev_dpdk_get_config(const struct netdev *netdev_, struct smap
>*args)
>> smap_add_format(args, "configured_rx_queues", "%d", netdev_->n_rxq);
>> smap_add_format(args, "requested_tx_queues", "%d", netdev_->n_txq);
>> smap_add_format(args, "configured_tx_queues", "%d", dev->real_n_txq);
>> + smap_add_format(args, "mtu", "%d", dev->mtu);
>> ovs_mutex_unlock(&dev->mutex);
>>
>> return 0;
>> }
>>
>> +/* Set the mtu of DPDK_DEV_ETH ports */
>> +static int
>> +netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu)
>> +{
>> + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>> + int old_mtu, err;
>> + uint32_t buf_size;
>> + int dpdk_mtu;
>> + struct dpdk_mp *old_mp;
>> + struct dpdk_mp *mp;
>> +
>> + ovs_mutex_lock(&dpdk_mutex);
>> + ovs_mutex_lock(&dev->mutex);
>> + if (dev->mtu == mtu) {
>> + err = 0;
>> + goto out;
>> + }
>> +
>> + buf_size = dpdk_buf_size(dev, MTU_TO_FRAME_LEN(mtu));
>> + dpdk_mtu = FRAME_LEN_TO_MTU(buf_size);
>> +
>> + mp = dpdk_mp_get(dev->socket_id, dpdk_mtu);
>> + if (!mp) {
>> + err = ENOMEM;
>> + goto out;
>> + }
>> +
>> + rte_eth_dev_stop(dev->port_id);
>> +
>> + old_mtu = dev->mtu;
>> + old_mp = dev->dpdk_mp;
>> + dev->dpdk_mp = mp;
>> + dev->mtu = mtu;
>> + dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
>> +
>> + err = dpdk_eth_dev_init(dev);
>> + if (err) {
>> + VLOG_WARN("Unable to set MTU '%d' for '%s'; reverting to last known "
>> + "good value '%d'\n", mtu, dev->up.name, old_mtu);
>> + dpdk_mp_put(mp);
>> + dev->mtu = old_mtu;
>> + dev->dpdk_mp = old_mp;
>> + dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
>> + dpdk_eth_dev_init(dev);
>> + goto out;
>> + } else {
>> + dpdk_mp_put(old_mp);
>> + netdev_change_seq_changed(netdev);
>> + }
>> +out:
>> + ovs_mutex_unlock(&dev->mutex);
>> + ovs_mutex_unlock(&dpdk_mutex);
>> + return err;
>> +}
>> +
>> +static int
>> +netdev_dpdk_set_config(struct netdev *netdev_, const struct smap *args)
>> +{
>> + int mtu;
>> +
>> + dpdk_dev_parse_mtu(args, &mtu);
>> +
>> + return netdev_dpdk_set_mtu(netdev_, mtu);
>> +}
>> +
>> static int
>> netdev_dpdk_get_numa_id(const struct netdev *netdev_)
>> {
>> @@ -1358,54 +1541,6 @@ netdev_dpdk_get_mtu(const struct netdev *netdev, int *mtup)
>>
>> return 0;
>> }
>> -
>> -static int
>> -netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu)
>> -{
>> - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>> - int old_mtu, err;
>> - struct dpdk_mp *old_mp;
>> - struct dpdk_mp *mp;
>> -
>> - ovs_mutex_lock(&dpdk_mutex);
>> - ovs_mutex_lock(&dev->mutex);
>> - if (dev->mtu == mtu) {
>> - err = 0;
>> - goto out;
>> - }
>> -
>> - mp = dpdk_mp_get(dev->socket_id, dev->mtu);
>> - if (!mp) {
>> - err = ENOMEM;
>> - goto out;
>> - }
>> -
>> - rte_eth_dev_stop(dev->port_id);
>> -
>> - old_mtu = dev->mtu;
>> - old_mp = dev->dpdk_mp;
>> - dev->dpdk_mp = mp;
>> - dev->mtu = mtu;
>> - dev->max_packet_len = MTU_TO_MAX_LEN(dev->mtu);
>> -
>> - err = dpdk_eth_dev_init(dev);
>> - if (err) {
>> - dpdk_mp_put(mp);
>> - dev->mtu = old_mtu;
>> - dev->dpdk_mp = old_mp;
>> - dev->max_packet_len = MTU_TO_MAX_LEN(dev->mtu);
>> - dpdk_eth_dev_init(dev);
>> - goto out;
>> - }
>> -
>> - dpdk_mp_put(old_mp);
>> - netdev_change_seq_changed(netdev);
>> -out:
>> - ovs_mutex_unlock(&dev->mutex);
>> - ovs_mutex_unlock(&dpdk_mutex);
>> - return err;
>> -}
>> -
>> static int
>> netdev_dpdk_get_carrier(const struct netdev *netdev_, bool *carrier);
>>
>> @@ -1682,7 +1817,7 @@ netdev_dpdk_get_status(const struct netdev *netdev_, struct smap
>*args)
>> smap_add_format(args, "numa_id", "%d", rte_eth_dev_socket_id(dev->port_id));
>> smap_add_format(args, "driver_name", "%s", dev_info.driver_name);
>> smap_add_format(args, "min_rx_bufsize", "%u", dev_info.min_rx_bufsize);
>> - smap_add_format(args, "max_rx_pktlen", "%u", dev_info.max_rx_pktlen);
>> + smap_add_format(args, "max_rx_pktlen", "%u", dev->max_packet_len);
>> smap_add_format(args, "max_rx_queues", "%u", dev_info.max_rx_queues);
>> smap_add_format(args, "max_tx_queues", "%u", dev_info.max_tx_queues);
>> smap_add_format(args, "max_mac_addrs", "%u", dev_info.max_mac_addrs);
>> @@ -1904,6 +2039,51 @@ dpdk_vhost_user_class_init(void)
>> return 0;
>> }
>>
>> +/* Set the mtu of DPDK_DEV_VHOST ports */
>> +static int
>> +netdev_dpdk_vhost_set_mtu(const struct netdev *netdev, int mtu)
>> +{
>> + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>> + int err = 0;
>> + struct dpdk_mp *old_mp;
>> + struct dpdk_mp *mp;
>> +
>> + ovs_mutex_lock(&dpdk_mutex);
>> + ovs_mutex_lock(&dev->mutex);
>> + if (dev->mtu == mtu) {
>> + err = 0;
>> + goto out;
>> + }
>> +
>> + mp = dpdk_mp_get(dev->socket_id, mtu);
>> + if (!mp) {
>> + err = ENOMEM;
>> + goto out;
>> + }
>> +
>> + old_mp = dev->dpdk_mp;
>> + dev->dpdk_mp = mp;
>> + dev->mtu = mtu;
>> + dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
>> +
>> + dpdk_mp_put(old_mp);
>> + netdev_change_seq_changed(netdev);
>> +out:
>> + ovs_mutex_unlock(&dev->mutex);
>> + ovs_mutex_unlock(&dpdk_mutex);
>> + return err;
>> +}
>> +
>> +static int
>> +netdev_dpdk_vhost_set_config(struct netdev *netdev_, const struct smap *args)
>> +{
>> + int mtu;
>> +
>> + dpdk_dev_parse_mtu(args, &mtu);
>> +
>> + return netdev_dpdk_vhost_set_mtu(netdev_, mtu);
>> +}
>> +
>> static void
>> dpdk_common_init(void)
>> {
>> @@ -2040,8 +2220,9 @@ unlock_dpdk:
>> return err;
>> }
>>
>> -#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, MULTIQ, SEND, \
>> - GET_CARRIER, GET_STATS, GET_FEATURES, GET_STATUS, RXQ_RECV) \
>> +#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, SET_CONFIG, \
>> + MULTIQ, SEND, SET_MTU, GET_CARRIER, GET_STATS, GET_FEATURES, \
>> + GET_STATUS, RXQ_RECV) \
>> { \
>> NAME, \
>> INIT, /* init */ \
>> @@ -2053,7 +2234,7 @@ unlock_dpdk:
>> DESTRUCT, \
>> netdev_dpdk_dealloc, \
>> netdev_dpdk_get_config, \
>> - NULL, /* netdev_dpdk_set_config */ \
>> + SET_CONFIG, \
>> NULL, /* get_tunnel_config */ \
>> NULL, /* build header */ \
>> NULL, /* push header */ \
>> @@ -2067,7 +2248,7 @@ unlock_dpdk:
>> netdev_dpdk_set_etheraddr, \
>> netdev_dpdk_get_etheraddr, \
>> netdev_dpdk_get_mtu, \
>> - netdev_dpdk_set_mtu, \
>> + SET_MTU, \
>> netdev_dpdk_get_ifindex, \
>> GET_CARRIER, \
>> netdev_dpdk_get_carrier_resets, \
>> @@ -2213,8 +2394,10 @@ static const struct netdev_class dpdk_class =
>> NULL,
>> netdev_dpdk_construct,
>> netdev_dpdk_destruct,
>> + netdev_dpdk_set_config,
>> netdev_dpdk_set_multiq,
>> netdev_dpdk_eth_send,
>> + netdev_dpdk_set_mtu,
>> netdev_dpdk_get_carrier,
>> netdev_dpdk_get_stats,
>> netdev_dpdk_get_features,
>> @@ -2227,8 +2410,10 @@ static const struct netdev_class dpdk_ring_class =
>> NULL,
>> netdev_dpdk_ring_construct,
>> netdev_dpdk_destruct,
>> + netdev_dpdk_set_config,
>> netdev_dpdk_set_multiq,
>> netdev_dpdk_ring_send,
>> + netdev_dpdk_set_mtu,
>> netdev_dpdk_get_carrier,
>> netdev_dpdk_get_stats,
>> netdev_dpdk_get_features,
>> @@ -2241,8 +2426,10 @@ static const struct netdev_class OVS_UNUSED dpdk_vhost_cuse_class =
>> dpdk_vhost_cuse_class_init,
>> netdev_dpdk_vhost_cuse_construct,
>> netdev_dpdk_vhost_destruct,
>> + NULL,
>> netdev_dpdk_vhost_set_multiq,
>> netdev_dpdk_vhost_send,
>> + NULL,
>> netdev_dpdk_vhost_get_carrier,
>> netdev_dpdk_vhost_get_stats,
>> NULL,
>> @@ -2255,8 +2442,10 @@ static const struct netdev_class OVS_UNUSED dpdk_vhost_user_class =
>> dpdk_vhost_user_class_init,
>> netdev_dpdk_vhost_user_construct,
>> netdev_dpdk_vhost_destruct,
>> + netdev_dpdk_vhost_set_config,
>> netdev_dpdk_vhost_set_multiq,
>> netdev_dpdk_vhost_send,
>> + netdev_dpdk_vhost_set_mtu,
>> netdev_dpdk_vhost_get_carrier,
>> netdev_dpdk_vhost_get_stats,
>> NULL,
>>
More information about the dev
mailing list