[ovs-dev] [PATCH v11 01/14] netdev-dpdk: fix mbuf sizing

Lam, Tiago tiago.lam at intel.com
Thu Oct 18 07:41:15 UTC 2018


Hi Ilya,

Thanks for the comments, my replies are in-line.

On 17/10/2018 17:30, Ilya Maximets wrote:
> On 10.10.2018 19:22, Tiago Lam wrote:
>> From: Mark Kavanagh <mark.b.kavanagh at intel.com>
>>
>> There are numerous factors that must be considered when calculating
>> the size of an mbuf:
>> - the data portion of the mbuf must be sized in accordance With Rx
>>   buffer alignment (typically 1024B). So, for example, in order to
>>   successfully receive and capture a 1500B packet, mbufs with a
>>   data portion of size 2048B must be used.
>> - in OvS, the elements that comprise an mbuf are:
>>   * the dp packet, which includes a struct rte mbuf (704B)
>>   * RTE_PKTMBUF_HEADROOM (128B)
>>   * packet data (aligned to 1k, as previously described)
>>   * RTE_PKTMBUF_TAILROOM (typically 0)
>>
>> Some PMDs require that the total mbuf size (i.e. the total sum of all
>> of the above-listed components' lengths) is cache-aligned. To satisfy
>> this requirement, it may be necessary to round up the total mbuf size
>> with respect to cacheline size. In doing so, it's possible that the
>> dp_packet's data portion is inadvertently increased in size, such that
>> it no longer adheres to Rx buffer alignment. Consequently, the
>> following property of the mbuf no longer holds true:
>>
>>     mbuf.data_len == mbuf.buf_len - mbuf.data_off
>>
>> This creates a problem in the case of multi-segment mbufs, where that
>> assumption is assumed to be true for all but the final segment in an
>> mbuf chain. Resolve this issue by adjusting the size of the mbuf's
>> private data portion, as opposed to the packet data portion when
>> aligning mbuf size to cachelines.
>>
>> Fixes: 4be4d22 ("netdev-dpdk: clean up mbuf initialization")
>> Fixes: 31b88c9 ("netdev-dpdk: round up mbuf_size to cache_line_size")
>> CC: Santosh Shukla <santosh.shukla at caviumnetworks.com>
>> Signed-off-by: Mark Kavanagh <mark.b.kavanagh at intel.com>
>> Acked-by: Santosh Shukla <santosh.shukla at caviumnetworks.com>
>> Acked-by: Eelco Chaudron <echaudro at redhat.com>
>> ---
>>  lib/netdev-dpdk.c | 56 +++++++++++++++++++++++++++++++++++++------------------
>>  1 file changed, 38 insertions(+), 18 deletions(-)
>>
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index f91aa27..11659eb 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -88,10 +88,6 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
>>  #define MTU_TO_MAX_FRAME_LEN(mtu)   ((mtu) + ETHER_HDR_MAX_LEN)
>>  #define FRAME_LEN_TO_MTU(frame_len) ((frame_len)                    \
>>                                       - ETHER_HDR_LEN - ETHER_CRC_LEN)
>> -#define MBUF_SIZE(mtu)              ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) \
>> -                                             + sizeof(struct dp_packet) \
>> -                                             + RTE_PKTMBUF_HEADROOM),   \
>> -                                             RTE_CACHE_LINE_SIZE)
>>  #define NETDEV_DPDK_MBUF_ALIGN      1024
>>  #define NETDEV_DPDK_MAX_PKT_LEN     9728
>>  
>> @@ -637,7 +633,11 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
>>      char mp_name[RTE_MEMPOOL_NAMESIZE];
>>      const char *netdev_name = netdev_get_name(&dev->up);
>>      int socket_id = dev->requested_socket_id;
>> -    uint32_t n_mbufs;
>> +    uint32_t n_mbufs = 0;
>> +    uint32_t mbuf_size = 0;
>> +    uint32_t aligned_mbuf_size = 0;
>> +    uint32_t mbuf_priv_data_len = 0;
>> +    uint32_t pkt_size = 0;
>>      uint32_t hash = hash_string(netdev_name, 0);
>>      struct dpdk_mp *dmp = NULL;
>>      int ret;
>> @@ -650,6 +650,9 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
>>      dmp->mtu = mtu;
>>      dmp->refcount = 1;
>>  
>> +    /* Get the size of each mbuf, based on the MTU */
>> +    mbuf_size = dpdk_buf_size(dev->requested_mtu);
> 
> 1. You're using 'requested_mtu' here directly, which is different to
>    'mtu' passed to this function. This looks strange and misleading
>    for reader. Could you unify this?

I see your point. The `mtu` passed to dpdk_mp_create() is a result of
using the FRAME_LEN_TO_MTU macro on the buf_size already (in
netdev_dpdk_mempool_configure()). So, I can use the reverse,
MTU_TO_FRAME_LEN, to get bcak the mbuf_size. How does that sound?

> 
> 2  Almost same as previous concern from Flavio.
>    'mbuf_size' accounts RTE_PKTMBUF_HEADROOM here and rounded up.

Thanks, I've replied below.

> 
>> +
>>      n_mbufs = dpdk_calculate_mbufs(dev, mtu, per_port_mp);
>>  
>>      do {
>> @@ -661,8 +664,8 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
>>           * so this is not an issue for tasks such as debugging.
>>           */
>>          ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE,
>> -                           "ovs%08x%02d%05d%07u",
>> -                           hash, socket_id, mtu, n_mbufs);
>> +                       "ovs%08x%02d%05d%07u",
>> +                        hash, socket_id, mtu, n_mbufs);
>>          if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {
>>              VLOG_DBG("snprintf returned %d. "
>>                       "Failed to generate a mempool name for \"%s\". "
>> @@ -671,17 +674,34 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
>>              break;
>>          }
>>  
>> -        VLOG_DBG("Port %s: Requesting a mempool of %u mbufs "
>> -                  "on socket %d for %d Rx and %d Tx queues.",
>> -                  netdev_name, n_mbufs, socket_id,
>> -                  dev->requested_n_rxq, dev->requested_n_txq);
>> -
>> -        dmp->mp = rte_pktmbuf_pool_create(mp_name, n_mbufs,
>> -                                          MP_CACHE_SZ,
>> -                                          sizeof (struct dp_packet)
>> -                                          - sizeof (struct rte_mbuf),
>> -                                          MBUF_SIZE(mtu)
>> -                                          - sizeof(struct dp_packet),
>> +        VLOG_DBG("Port %s: Requesting a mempool of %u mbufs of size %u "
>> +                  "on socket %d for %d Rx and %d Tx queues, "
>> +                  "cache line size of %u",
>> +                  netdev_name, n_mbufs, mbuf_size, socket_id,
>> +                  dev->requested_n_rxq, dev->requested_n_txq,
>> +                  RTE_CACHE_LINE_SIZE);
>> +
>> +        mbuf_priv_data_len = sizeof(struct dp_packet) -
>> +                                 sizeof(struct rte_mbuf);
>> +        /* The size of the entire dp_packet. */
>> +        pkt_size = sizeof (struct dp_packet) +
>> +                                mbuf_size + RTE_PKTMBUF_HEADROOM;
> 
> 2* Here RTE_PKTMBUF_HEADROOM accounted again. That is unclear.

This had been removed already locally, only noticed after sending the
patchset.

Tiago.

> 
>> +        /* mbuf size, rounded up to cacheline size. */
>> +        aligned_mbuf_size = ROUND_UP(pkt_size, RTE_CACHE_LINE_SIZE);
>> +        /* If there is a size discrepancy, add padding to mbuf_priv_data_len.
>> +         * This maintains mbuf size cache alignment, while also honoring RX
>> +         * buffer alignment in the data portion of the mbuf. If this adjustment
>> +         * is not made, there is a possiblity later on that for an element of
>> +         * the mempool, buf, buf->data_len < (buf->buf_len - buf->data_off).
>> +         * This is problematic in the case of multi-segment mbufs, particularly
>> +         * when an mbuf segment needs to be resized (when [push|popp]ing a VLAN
>> +         * header, for example.
>> +         */
>> +        mbuf_priv_data_len += (aligned_mbuf_size - pkt_size);
>> +
>> +        dmp->mp = rte_pktmbuf_pool_create(mp_name, n_mbufs, MP_CACHE_SZ,
>> +                                          mbuf_priv_data_len,
>> +                                          mbuf_size,
>>                                            socket_id);
>>  
>>          if (dmp->mp) {
>>


More information about the dev mailing list