[ovs-dev] [PATCH v5 01/14] netdev-dpdk: fix mbuf sizing
Ian Stokes
ian.stokes at intel.com
Thu Jul 12 13:37:00 UTC 2018
On 7/11/2018 7:23 PM, Tiago Lam wrote:
> From: Mark Kavanagh <mark.b.kavanagh at intel.com>
>
> There are numerous factors that must be considered when calculating
> the size of an mbuf:
> - the data portion of the mbuf must be sized in accordance With Rx
> buffer alignment (typically 1024B). So, for example, in order to
> successfully receive and capture a 1500B packet, mbufs with a
> data portion of size 2048B must be used.
> - in OvS, the elements that comprise an mbuf are:
> * the dp packet, which includes a struct rte mbuf (704B)
> * RTE_PKTMBUF_HEADROOM (128B)
> * packet data (aligned to 1k, as previously described)
> * RTE_PKTMBUF_TAILROOM (typically 0)
>
> Some PMDs require that the total mbuf size (i.e. the total sum of all
> of the above-listed components' lengths) is cache-aligned. To satisfy
> this requirement, it may be necessary to round up the total mbuf size
> with respect to cacheline size. In doing so, it's possible that the
> dp_packet's data portion is inadvertently increased in size, such that
> it no longer adheres to Rx buffer alignment. Consequently, the
> following property of the mbuf no longer holds true:
>
> mbuf.data_len == mbuf.buf_len - mbuf.data_off
>
> This creates a problem in the case of multi-segment mbufs, where that
> assumption is assumed to be true for all but the final segment in an
> mbuf chain. Resolve this issue by adjusting the size of the mbuf's
> private data portion, as opposed to the packet data portion when
> aligning mbuf size to cachelines.
Hi Tiago,
with this patch I still don't see mbuf.data_len == mbuf.buf_len -
mbuf.data_off to be true.
I've tested with both Jumbo frames and non jumbo packets by examining
the mbufs on both tx and rx. mbuf.data_len is always smaller than
mbuf.buf_len - mbuf.data_off.
Maybe I've missed something here?
>
> Fixes: 4be4d22 ("netdev-dpdk: clean up mbuf initialization")
> Fixes: 31b88c9 ("netdev-dpdk: round up mbuf_size to cache_line_size")
> CC: Santosh Shukla <santosh.shukla at caviumnetworks.com>
> Signed-off-by: Mark Kavanagh <mark.b.kavanagh at intel.com>
> Acked-by: Santosh Shukla <santosh.shukla at caviumnetworks.com>
> Acked-by: Eelco Chaudron <echaudro at redhat.com>
> ---
> lib/netdev-dpdk.c | 56 +++++++++++++++++++++++++++++++++++++------------------
> 1 file changed, 38 insertions(+), 18 deletions(-)
>
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index bb4d60f..949b87b 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -88,10 +88,6 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
> #define MTU_TO_MAX_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_MAX_LEN)
> #define FRAME_LEN_TO_MTU(frame_len) ((frame_len) \
> - ETHER_HDR_LEN - ETHER_CRC_LEN)
> -#define MBUF_SIZE(mtu) ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) \
> - + sizeof(struct dp_packet) \
> - + RTE_PKTMBUF_HEADROOM), \
> - RTE_CACHE_LINE_SIZE)
> #define NETDEV_DPDK_MBUF_ALIGN 1024
> #define NETDEV_DPDK_MAX_PKT_LEN 9728
>
> @@ -637,7 +633,11 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
> char mp_name[RTE_MEMPOOL_NAMESIZE];
> const char *netdev_name = netdev_get_name(&dev->up);
> int socket_id = dev->requested_socket_id;
> - uint32_t n_mbufs;
> + uint32_t n_mbufs = 0;
> + uint32_t mbuf_size = 0;
> + uint32_t aligned_mbuf_size = 0;
> + uint32_t mbuf_priv_data_len = 0;
> + uint32_t pkt_size = 0;
> uint32_t hash = hash_string(netdev_name, 0);
> struct dpdk_mp *dmp = NULL;
> int ret;
> @@ -650,6 +650,9 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
> dmp->mtu = mtu;
> dmp->refcount = 1;
>
> + /* Get the size of each mbuf, based on the MTU */
> + mbuf_size = dpdk_buf_size(dev->requested_mtu);
> +
> n_mbufs = dpdk_calculate_mbufs(dev, mtu, per_port_mp);
>
> do {
> @@ -661,8 +664,8 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
> * so this is not an issue for tasks such as debugging.
> */
> ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE,
> - "ovs%08x%02d%05d%07u",
> - hash, socket_id, mtu, n_mbufs);
> + "ovs%08x%02d%05d%07u",
> + hash, socket_id, mtu, n_mbufs);
> if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {
> VLOG_DBG("snprintf returned %d. "
> "Failed to generate a mempool name for \"%s\". "
> @@ -671,17 +674,34 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
> break;
> }
>
> - VLOG_DBG("Port %s: Requesting a mempool of %u mbufs "
> - "on socket %d for %d Rx and %d Tx queues.",
> - netdev_name, n_mbufs, socket_id,
> - dev->requested_n_rxq, dev->requested_n_txq);
> -
> - dmp->mp = rte_pktmbuf_pool_create(mp_name, n_mbufs,
> - MP_CACHE_SZ,
> - sizeof (struct dp_packet)
> - - sizeof (struct rte_mbuf),
> - MBUF_SIZE(mtu)
> - - sizeof(struct dp_packet),
> + VLOG_DBG("Port %s: Requesting a mempool of %u mbufs of size %u "
> + "on socket %d for %d Rx and %d Tx queues, "
> + "cache line size of %u",
> + netdev_name, n_mbufs, mbuf_size, socket_id,
> + dev->requested_n_rxq, dev->requested_n_txq,
> + RTE_CACHE_LINE_SIZE);
> +
> + mbuf_priv_data_len = sizeof(struct dp_packet) -
> + sizeof(struct rte_mbuf);
> + /* The size of the entire dp_packet. */
> + pkt_size = sizeof (struct dp_packet) +
> + mbuf_size + RTE_PKTMBUF_HEADROOM;
> + /* mbuf size, rounded up to cacheline size. */
> + aligned_mbuf_size = ROUND_UP(pkt_size, RTE_CACHE_LINE_SIZE);
> + /* If there is a size discrepancy, add padding to mbuf_priv_data_len.
> + * This maintains mbuf size cache alignment, while also honoring RX
> + * buffer alignment in the data portion of the mbuf. If this adjustment
> + * is not made, there is a possiblity later on that for an element of
> + * the mempool, buf, buf->data_len < (buf->buf_len - buf->data_off).
> + * This is problematic in the case of multi-segment mbufs, particularly
> + * when an mbuf segment needs to be resized (when [push|popp]ing a VLAN
> + * header, for example.
> + */
> + mbuf_priv_data_len += (aligned_mbuf_size - pkt_size);
> +
> + dmp->mp = rte_pktmbuf_pool_create(mp_name, n_mbufs, MP_CACHE_SZ,
> + mbuf_priv_data_len,
> + mbuf_size + RTE_PKTMBUF_HEADROOM,
> socket_id);
>
> if (dmp->mp) {
>
More information about the dev
mailing list