[ovs-dev] [PATCH v11 01/14] netdev-dpdk: fix mbuf sizing

Ilya Maximets i.maximets at samsung.com
Wed Oct 17 16:30:46 UTC 2018


On 10.10.2018 19:22, Tiago Lam wrote:
> From: Mark Kavanagh <mark.b.kavanagh at intel.com>
> 
> There are numerous factors that must be considered when calculating
> the size of an mbuf:
> - the data portion of the mbuf must be sized in accordance With Rx
>   buffer alignment (typically 1024B). So, for example, in order to
>   successfully receive and capture a 1500B packet, mbufs with a
>   data portion of size 2048B must be used.
> - in OvS, the elements that comprise an mbuf are:
>   * the dp packet, which includes a struct rte mbuf (704B)
>   * RTE_PKTMBUF_HEADROOM (128B)
>   * packet data (aligned to 1k, as previously described)
>   * RTE_PKTMBUF_TAILROOM (typically 0)
> 
> Some PMDs require that the total mbuf size (i.e. the total sum of all
> of the above-listed components' lengths) is cache-aligned. To satisfy
> this requirement, it may be necessary to round up the total mbuf size
> with respect to cacheline size. In doing so, it's possible that the
> dp_packet's data portion is inadvertently increased in size, such that
> it no longer adheres to Rx buffer alignment. Consequently, the
> following property of the mbuf no longer holds true:
> 
>     mbuf.data_len == mbuf.buf_len - mbuf.data_off
> 
> This creates a problem in the case of multi-segment mbufs, where that
> assumption is assumed to be true for all but the final segment in an
> mbuf chain. Resolve this issue by adjusting the size of the mbuf's
> private data portion, as opposed to the packet data portion when
> aligning mbuf size to cachelines.
> 
> Fixes: 4be4d22 ("netdev-dpdk: clean up mbuf initialization")
> Fixes: 31b88c9 ("netdev-dpdk: round up mbuf_size to cache_line_size")
> CC: Santosh Shukla <santosh.shukla at caviumnetworks.com>
> Signed-off-by: Mark Kavanagh <mark.b.kavanagh at intel.com>
> Acked-by: Santosh Shukla <santosh.shukla at caviumnetworks.com>
> Acked-by: Eelco Chaudron <echaudro at redhat.com>
> ---
>  lib/netdev-dpdk.c | 56 +++++++++++++++++++++++++++++++++++++------------------
>  1 file changed, 38 insertions(+), 18 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index f91aa27..11659eb 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -88,10 +88,6 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
>  #define MTU_TO_MAX_FRAME_LEN(mtu)   ((mtu) + ETHER_HDR_MAX_LEN)
>  #define FRAME_LEN_TO_MTU(frame_len) ((frame_len)                    \
>                                       - ETHER_HDR_LEN - ETHER_CRC_LEN)
> -#define MBUF_SIZE(mtu)              ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) \
> -                                             + sizeof(struct dp_packet) \
> -                                             + RTE_PKTMBUF_HEADROOM),   \
> -                                             RTE_CACHE_LINE_SIZE)
>  #define NETDEV_DPDK_MBUF_ALIGN      1024
>  #define NETDEV_DPDK_MAX_PKT_LEN     9728
>  
> @@ -637,7 +633,11 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
>      char mp_name[RTE_MEMPOOL_NAMESIZE];
>      const char *netdev_name = netdev_get_name(&dev->up);
>      int socket_id = dev->requested_socket_id;
> -    uint32_t n_mbufs;
> +    uint32_t n_mbufs = 0;
> +    uint32_t mbuf_size = 0;
> +    uint32_t aligned_mbuf_size = 0;
> +    uint32_t mbuf_priv_data_len = 0;
> +    uint32_t pkt_size = 0;
>      uint32_t hash = hash_string(netdev_name, 0);
>      struct dpdk_mp *dmp = NULL;
>      int ret;
> @@ -650,6 +650,9 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
>      dmp->mtu = mtu;
>      dmp->refcount = 1;
>  
> +    /* Get the size of each mbuf, based on the MTU */
> +    mbuf_size = dpdk_buf_size(dev->requested_mtu);

1. You're using 'requested_mtu' here directly, which is different to
   'mtu' passed to this function. This looks strange and misleading
   for reader. Could you unify this?

2  Almost same as previous concern from Flavio.
   'mbuf_size' accounts RTE_PKTMBUF_HEADROOM here and rounded up.

> +
>      n_mbufs = dpdk_calculate_mbufs(dev, mtu, per_port_mp);
>  
>      do {
> @@ -661,8 +664,8 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
>           * so this is not an issue for tasks such as debugging.
>           */
>          ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE,
> -                           "ovs%08x%02d%05d%07u",
> -                           hash, socket_id, mtu, n_mbufs);
> +                       "ovs%08x%02d%05d%07u",
> +                        hash, socket_id, mtu, n_mbufs);
>          if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {
>              VLOG_DBG("snprintf returned %d. "
>                       "Failed to generate a mempool name for \"%s\". "
> @@ -671,17 +674,34 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
>              break;
>          }
>  
> -        VLOG_DBG("Port %s: Requesting a mempool of %u mbufs "
> -                  "on socket %d for %d Rx and %d Tx queues.",
> -                  netdev_name, n_mbufs, socket_id,
> -                  dev->requested_n_rxq, dev->requested_n_txq);
> -
> -        dmp->mp = rte_pktmbuf_pool_create(mp_name, n_mbufs,
> -                                          MP_CACHE_SZ,
> -                                          sizeof (struct dp_packet)
> -                                          - sizeof (struct rte_mbuf),
> -                                          MBUF_SIZE(mtu)
> -                                          - sizeof(struct dp_packet),
> +        VLOG_DBG("Port %s: Requesting a mempool of %u mbufs of size %u "
> +                  "on socket %d for %d Rx and %d Tx queues, "
> +                  "cache line size of %u",
> +                  netdev_name, n_mbufs, mbuf_size, socket_id,
> +                  dev->requested_n_rxq, dev->requested_n_txq,
> +                  RTE_CACHE_LINE_SIZE);
> +
> +        mbuf_priv_data_len = sizeof(struct dp_packet) -
> +                                 sizeof(struct rte_mbuf);
> +        /* The size of the entire dp_packet. */
> +        pkt_size = sizeof (struct dp_packet) +
> +                                mbuf_size + RTE_PKTMBUF_HEADROOM;

2* Here RTE_PKTMBUF_HEADROOM accounted again. That is unclear.

> +        /* mbuf size, rounded up to cacheline size. */
> +        aligned_mbuf_size = ROUND_UP(pkt_size, RTE_CACHE_LINE_SIZE);
> +        /* If there is a size discrepancy, add padding to mbuf_priv_data_len.
> +         * This maintains mbuf size cache alignment, while also honoring RX
> +         * buffer alignment in the data portion of the mbuf. If this adjustment
> +         * is not made, there is a possiblity later on that for an element of
> +         * the mempool, buf, buf->data_len < (buf->buf_len - buf->data_off).
> +         * This is problematic in the case of multi-segment mbufs, particularly
> +         * when an mbuf segment needs to be resized (when [push|popp]ing a VLAN
> +         * header, for example.
> +         */
> +        mbuf_priv_data_len += (aligned_mbuf_size - pkt_size);
> +
> +        dmp->mp = rte_pktmbuf_pool_create(mp_name, n_mbufs, MP_CACHE_SZ,
> +                                          mbuf_priv_data_len,
> +                                          mbuf_size,
>                                            socket_id);
>  
>          if (dmp->mp) {
> 


More information about the dev mailing list