[ovs-dev] [ovs-dev, 7/7] netdev-dpdk: add support for Jumbo Frames

Ilya Maximets i.maximets at samsung.com
Wed Aug 3 12:34:38 UTC 2016


Hi, Mark.

On 03.08.2016 15:14, Kavanagh, Mark B wrote:
>>
>> Hi Daniele. Thanks for posting this.
> 
> Hi Ilya,
> 
> I actually implemented this patch as part of Daniele's MTU patchset, based on my earlier patch - Daniele mainly rebased it to head of master :)
> 
> Thanks for your feedback - I've responded inline.
> 
> Cheers,
> Mark
> 
>> I have almost same patch in my local branch.
>>
>> I didn't test this with physical DPDK NICs yet, but I have few
>> high level comments:
>>
>> 1. Do you thought about renaming of 'mtu_request' inside netdev-dpdk
>>   to 'requested_mtu'? I think, this would be more clear and
>>   consistent with other configurable parameters (n_rxq, n_txq, ...).
> 
> 'mtu_request' was the name suggested by Daniele, following a discussion with colleagues.
> I don't have strong feelings either way, so I'll leave Daniele to comment.

I meant only renaming of 'netdev_dpdk->mtu_request' to 'netdev_dpdk->requested_mtu'.
Database column should be 'mtu_request' as it is now.

>>
>> 2. I'd prefer not to fail reconfiguration if there is no enough memory
>>   for new mempool. I think, it'll be common situation when we are
>>   requesting more memory than we have. Failure leads to destruction
>>   of the port and inability to reconnect to vhost-user port after
>>   re-creation if vhost is in server mode. We can just keep old
>>   mempool and inform user via VLOG_ERR.
>>
> Agreed - I'll modify V2 accordingly.
> 
> 
>> 3. Minor issues inline.
> 
> Comments on these inline also.
> 
>>
>> What do you think?
>>
>> Best regards, Ilya Maximets.
>>
>> On 30.07.2016 04:22, Daniele Di Proietto wrote:
>>> From: Mark Kavanagh <mark.b.kavanagh at intel.com>
>>>
>>> Add support for Jumbo Frames to DPDK-enabled port types,
>>> using single-segment-mbufs.
>>>
>>> Using this approach, the amount of memory allocated to each mbuf
>>> to store frame data is increased to a value greater than 1518B
>>> (typical Ethernet maximum frame length). The increased space
>>> available in the mbuf means that an entire Jumbo Frame of a specific
>>> size can be carried in a single mbuf, as opposed to partitioning
>>> it across multiple mbuf segments.
>>>
>>> The amount of space allocated to each mbuf to hold frame data is
>>> defined dynamically by the user with ovs-vsctl, via the 'mtu_request'
>>> parameter.
>>>
>>> Signed-off-by: Mark Kavanagh <mark.b.kavanagh at intel.com>
>>> [diproiettod at vmware.com rebased]
>>> Signed-off-by: Daniele Di Proietto <diproiettod at vmware.com>
>>> ---
>>>  INSTALL.DPDK-ADVANCED.md |  59 +++++++++++++++++-
>>>  INSTALL.DPDK.md          |   1 -
>>>  NEWS                     |   1 +
>>>  lib/netdev-dpdk.c        | 151 +++++++++++++++++++++++++++++++++++++++--------
>>>  4 files changed, 185 insertions(+), 27 deletions(-)
>>>
>>> diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
>>> index 191e69e..5cd64bf 100755
>>> --- a/INSTALL.DPDK-ADVANCED.md
>>> +++ b/INSTALL.DPDK-ADVANCED.md
>>> @@ -1,5 +1,5 @@
>>>  OVS DPDK ADVANCED INSTALL GUIDE
>>> -=================================
>>> +===============================
>>>
>>>  ## Contents
>>>
>>> @@ -12,7 +12,8 @@ OVS DPDK ADVANCED INSTALL GUIDE
>>>  7. [QOS](#qos)
>>>  8. [Rate Limiting](#rl)
>>>  9. [Flow Control](#fc)
>>> -10. [Vsperf](#vsperf)
>>> +10. [Jumbo Frames](#jumbo)
>>> +11. [Vsperf](#vsperf)
>>>
>>>  ## <a name="overview"></a> 1. Overview
>>>
>>> @@ -862,7 +863,59 @@ respective parameter. To disable the flow control at tx side,
>>>
>>>  `ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=false`
>>>
>>> -## <a name="vsperf"></a> 10. Vsperf
>>> +## <a name="jumbo"></a> 10. Jumbo Frames
>>> +
>>> +By default, DPDK ports are configured with standard Ethernet MTU (1500B). To
>>> +enable Jumbo Frames support for a DPDK port, change the Interface's `mtu_request`
>>> +attribute to a sufficiently large value.
>>> +
>>> +e.g. Add a DPDK Phy port with MTU of 9000:
>>> +
>>> +`ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk -- set Interface dpdk0
>> mtu_request=9000`
>>> +
>>> +e.g. Change the MTU of an existing port to 6200:
>>> +
>>> +`ovs-vsctl set Interface dpdk0 mtu_request=6200`
>>> +
>>> +When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are
>>> +increased, such that a full Jumbo Frame of a specific size may be accommodated
>>> +within a single mbuf segment.
>>> +
>>> +Jumbo frame support has been validated against 9728B frames (largest frame size
>>> +supported by Fortville NIC), using the DPDK `i40e` driver, but larger frames
>>> +(particularly in use cases involving East-West traffic only), and other DPDK NIC
>>> +drivers may be supported.
>>> +
>>> +### 9.1 vHost Ports and Jumbo Frames
>>> +
>>> +Some additional configuration is needed to take advantage of jumbo frames with
>>> +vhost ports:
>>> +
>>> +    1. `mergeable buffers` must be enabled for vHost ports, as demonstrated in
>>> +        the QEMU command line snippet below:
>>> +
>>> +        ```
>>> +        '-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \'
>>> +        '-device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on'
>>> +        ```
>>> +
>>> +    2. Where virtio devices are bound to the Linux kernel driver in a guest
>>> +       environment (i.e. interfaces are not bound to an in-guest DPDK driver),
>>> +       the MTU of those logical network interfaces must also be increased to a
>>> +       sufficiently large value. This avoids segmentation of Jumbo Frames
>>> +       received in the guest. Note that 'MTU' refers to the length of the IP
>>> +       packet only, and not that of the entire frame.
>>> +
>>> +       To calculate the exact MTU of a standard IPv4 frame, subtract the L2
>>> +       header and CRC lengths (i.e. 18B) from the max supported frame size.
>>> +       So, to set the MTU for a 9018B Jumbo Frame:
>>> +
>>> +       ```
>>> +       ifconfig eth1 mtu 9000
>>> +       ```
>>> +>>>>>>> 5ec921d... netdev-dpdk: add support for Jumbo Frames
>>
>> Looks like rebasing artefact.
> 
> Yup - I'll remove in  V2.
> 
>>
>>> +
>>> +## <a name="vsperf"></a> 11. Vsperf
>>>
>>>  Vsperf project goal is to develop vSwitch test framework that can be used to
>>>  validate the suitability of different vSwitch implementations in a Telco deployment
>>> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
>>> index 7609aa7..25c79de 100644
>>> --- a/INSTALL.DPDK.md
>>> +++ b/INSTALL.DPDK.md
>>> @@ -590,7 +590,6 @@ can be found in [Vhost Walkthrough].
>>>
>>>  ## <a name="ovslimits"></a> 6. Limitations
>>>
>>> -  - Supports MTU size 1500, MTU setting for DPDK netdevs will be in future OVS release.
>>>    - Currently DPDK ports does not use HW offload functionality.
>>>    - Network Interface Firmware requirements:
>>>      Each release of DPDK is validated against a specific firmware version for
>>> diff --git a/NEWS b/NEWS
>>> index 0ff5616..c004e5f 100644
>>> --- a/NEWS
>>> +++ b/NEWS
>>> @@ -68,6 +68,7 @@ Post-v2.5.0
>>>         is enabled in DPDK.
>>>       * Basic connection tracking for the userspace datapath (no ALG,
>>>         fragmentation or NAT support yet)
>>> +     * Jumbo frame support
>>>     - Increase number of registers to 16.
>>>     - ovs-benchmark: This utility has been removed due to lack of use and
>>>       bitrot.
>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>> index 0b6e410..68639ae 100644
>>> --- a/lib/netdev-dpdk.c
>>> +++ b/lib/netdev-dpdk.c
>>> @@ -82,6 +82,7 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
>>>                                      + sizeof(struct dp_packet)    \
>>>                                      + RTE_PKTMBUF_HEADROOM)
>>>  #define NETDEV_DPDK_MBUF_ALIGN      1024
>>> +#define NETDEV_DPDK_MAX_PKT_LEN     9728
>>>
>>>  /* Max and min number of packets in the mempool.  OVS tries to allocate a
>>>   * mempool with MAX_NB_MBUF: if this fails (because the system doesn't have
>>> @@ -336,6 +337,7 @@ struct netdev_dpdk {
>>>      struct ovs_mutex mutex OVS_ACQ_AFTER(dpdk_mutex);
>>>
>>>      struct dpdk_mp *dpdk_mp;
>>> +    int mtu_request;
>>>      int mtu;
>>>      int socket_id;
>>>      int buf_size;
>>> @@ -474,10 +476,19 @@ dpdk_mp_get(int socket_id, int mtu) OVS_REQUIRES(dpdk_mutex)
>>>      dmp->mtu = mtu;
>>>      dmp->refcount = 1;
>>>      mbp_priv.mbuf_data_room_size = MBUF_SIZE(mtu) - sizeof(struct dp_packet);
>>> -    mbp_priv.mbuf_priv_size = sizeof (struct dp_packet) -
>>> -                              sizeof (struct rte_mbuf);
>>> +    mbp_priv.mbuf_priv_size = sizeof (struct dp_packet)
>>> +                              - sizeof (struct rte_mbuf);
>>> +    /* XXX: this is a really rough method of provisioning memory.
>>> +     * It's impossible to determine what the exact memory requirements are when
>>> +     * the number of ports and rxqs that utilize a particular mempool can change
>>> +     * dynamically at runtime. For the moment, use this rough heurisitic.
>>> +     */
>>> +    if (mtu >= ETHER_MTU) {
>>> +        mp_size = MAX_NB_MBUF;
>>> +    } else {
>>> +        mp_size = MIN_NB_MBUF;
>>> +    }
>>>
>>> -    mp_size = MAX_NB_MBUF;
>>>      do {
>>>          if (snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_mp_%d_%d_%u",
>>>                       dmp->mtu, dmp->socket_id, mp_size) < 0) {
>>> @@ -522,6 +533,32 @@ dpdk_mp_put(struct dpdk_mp *dmp)
>>>  #endif
>>>  }
>>>
>>> +static int
>>> +dpdk_mp_configure(struct netdev_dpdk *dev)
>>> +    OVS_REQUIRES(dpdk_mutex)
>>> +    OVS_REQUIRES(dev->mutex)
>>> +{
>>> +    uint32_t buf_size = dpdk_buf_size(dev->mtu);
>>> +    struct dpdk_mp *mp_old, *mp;
>>> +
>>> +    mp_old = dev->dpdk_mp;
>>
>> Do we need this variable? We can just put old mempool before
>> assigning the new one.
> 
> Agreed - will remove in V2.
> 
>>
>>> +
>>> +    mp = dpdk_mp_get(dev->socket_id, FRAME_LEN_TO_MTU(buf_size));
>>> +    if (!mp) {
>>> +        VLOG_ERR("Insufficient memory to create memory pool for netdev %s\n",
>>> +                dev->up.name);
>>
>> +1 space character.
> 
> Sure - will fix.
> 
>>
>>> +        return ENOMEM;
>>> +    }
>>> +
>>> +    dev->dpdk_mp = mp;
>>> +    dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
>>> +
>>> +    dpdk_mp_put(mp_old);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +
>>>  static void
>>>  check_link_status(struct netdev_dpdk *dev)
>>>  {
>>> @@ -573,7 +610,15 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int
>> n_txq)
>>>  {
>>>      int diag = 0;
>>>      int i;
>>> +    struct rte_eth_conf conf = port_conf;
>>>
>>> +    if (dev->mtu > ETHER_MTU) {
>>> +        conf.rxmode.jumbo_frame = 1;
>>> +        conf.rxmode.max_rx_pkt_len = dev->max_packet_len;
>>> +    } else {
>>> +        conf.rxmode.jumbo_frame = 0;
>>> +        conf.rxmode.max_rx_pkt_len = 0;
>>
>> I know, it was implemented this way in the original patch, but I'm
>> not sure that all DPDK drivers will handle zero value of
>> 'max_rx_pkt_len' in a right way.
> 
> Out of interest, can you point to any examples of DPDK drivers handling max_rx_pkt_len = 0 with unintended behaviour?
> This field should only be of relevance if rxmode.jumbo_frame = 1, and 0 is its default value; if it is an issue though,

I don't know exactly. One place in DPDK i40e driver confuses me a little.
It's function 'i40evf_rxq_init()' in drivers/net/i40e/i40e_ethdev_vf.c .
It checks the value of 'rxq->max_pkt_len' if 'dev_conf.rxmode.jumbo_frame == 0'.

> I can just default instead to ETHER_MAX_LEN.

Will VLANs be handled properly in this case?



More information about the dev mailing list