[ovs-dev] [PATCH v11 01/14] netdev-dpdk: fix mbuf sizing
Lam, Tiago
tiago.lam at intel.com
Thu Oct 18 07:41:15 UTC 2018
Hi Ilya,
Thanks for the comments, my replies are in-line.
On 17/10/2018 17:30, Ilya Maximets wrote:
> On 10.10.2018 19:22, Tiago Lam wrote:
>> From: Mark Kavanagh <mark.b.kavanagh at intel.com>
>>
>> There are numerous factors that must be considered when calculating
>> the size of an mbuf:
>> - the data portion of the mbuf must be sized in accordance With Rx
>> buffer alignment (typically 1024B). So, for example, in order to
>> successfully receive and capture a 1500B packet, mbufs with a
>> data portion of size 2048B must be used.
>> - in OvS, the elements that comprise an mbuf are:
>> * the dp packet, which includes a struct rte mbuf (704B)
>> * RTE_PKTMBUF_HEADROOM (128B)
>> * packet data (aligned to 1k, as previously described)
>> * RTE_PKTMBUF_TAILROOM (typically 0)
>>
>> Some PMDs require that the total mbuf size (i.e. the total sum of all
>> of the above-listed components' lengths) is cache-aligned. To satisfy
>> this requirement, it may be necessary to round up the total mbuf size
>> with respect to cacheline size. In doing so, it's possible that the
>> dp_packet's data portion is inadvertently increased in size, such that
>> it no longer adheres to Rx buffer alignment. Consequently, the
>> following property of the mbuf no longer holds true:
>>
>> mbuf.data_len == mbuf.buf_len - mbuf.data_off
>>
>> This creates a problem in the case of multi-segment mbufs, where that
>> assumption is assumed to be true for all but the final segment in an
>> mbuf chain. Resolve this issue by adjusting the size of the mbuf's
>> private data portion, as opposed to the packet data portion when
>> aligning mbuf size to cachelines.
>>
>> Fixes: 4be4d22 ("netdev-dpdk: clean up mbuf initialization")
>> Fixes: 31b88c9 ("netdev-dpdk: round up mbuf_size to cache_line_size")
>> CC: Santosh Shukla <santosh.shukla at caviumnetworks.com>
>> Signed-off-by: Mark Kavanagh <mark.b.kavanagh at intel.com>
>> Acked-by: Santosh Shukla <santosh.shukla at caviumnetworks.com>
>> Acked-by: Eelco Chaudron <echaudro at redhat.com>
>> ---
>> lib/netdev-dpdk.c | 56 +++++++++++++++++++++++++++++++++++++------------------
>> 1 file changed, 38 insertions(+), 18 deletions(-)
>>
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index f91aa27..11659eb 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -88,10 +88,6 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
>> #define MTU_TO_MAX_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_MAX_LEN)
>> #define FRAME_LEN_TO_MTU(frame_len) ((frame_len) \
>> - ETHER_HDR_LEN - ETHER_CRC_LEN)
>> -#define MBUF_SIZE(mtu) ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) \
>> - + sizeof(struct dp_packet) \
>> - + RTE_PKTMBUF_HEADROOM), \
>> - RTE_CACHE_LINE_SIZE)
>> #define NETDEV_DPDK_MBUF_ALIGN 1024
>> #define NETDEV_DPDK_MAX_PKT_LEN 9728
>>
>> @@ -637,7 +633,11 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
>> char mp_name[RTE_MEMPOOL_NAMESIZE];
>> const char *netdev_name = netdev_get_name(&dev->up);
>> int socket_id = dev->requested_socket_id;
>> - uint32_t n_mbufs;
>> + uint32_t n_mbufs = 0;
>> + uint32_t mbuf_size = 0;
>> + uint32_t aligned_mbuf_size = 0;
>> + uint32_t mbuf_priv_data_len = 0;
>> + uint32_t pkt_size = 0;
>> uint32_t hash = hash_string(netdev_name, 0);
>> struct dpdk_mp *dmp = NULL;
>> int ret;
>> @@ -650,6 +650,9 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
>> dmp->mtu = mtu;
>> dmp->refcount = 1;
>>
>> + /* Get the size of each mbuf, based on the MTU */
>> + mbuf_size = dpdk_buf_size(dev->requested_mtu);
>
> 1. You're using 'requested_mtu' here directly, which is different to
> 'mtu' passed to this function. This looks strange and misleading
> for reader. Could you unify this?
I see your point. The `mtu` passed to dpdk_mp_create() is a result of
using the FRAME_LEN_TO_MTU macro on the buf_size already (in
netdev_dpdk_mempool_configure()). So, I can use the reverse,
MTU_TO_FRAME_LEN, to get bcak the mbuf_size. How does that sound?
>
> 2 Almost same as previous concern from Flavio.
> 'mbuf_size' accounts RTE_PKTMBUF_HEADROOM here and rounded up.
Thanks, I've replied below.
>
>> +
>> n_mbufs = dpdk_calculate_mbufs(dev, mtu, per_port_mp);
>>
>> do {
>> @@ -661,8 +664,8 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
>> * so this is not an issue for tasks such as debugging.
>> */
>> ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE,
>> - "ovs%08x%02d%05d%07u",
>> - hash, socket_id, mtu, n_mbufs);
>> + "ovs%08x%02d%05d%07u",
>> + hash, socket_id, mtu, n_mbufs);
>> if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {
>> VLOG_DBG("snprintf returned %d. "
>> "Failed to generate a mempool name for \"%s\". "
>> @@ -671,17 +674,34 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
>> break;
>> }
>>
>> - VLOG_DBG("Port %s: Requesting a mempool of %u mbufs "
>> - "on socket %d for %d Rx and %d Tx queues.",
>> - netdev_name, n_mbufs, socket_id,
>> - dev->requested_n_rxq, dev->requested_n_txq);
>> -
>> - dmp->mp = rte_pktmbuf_pool_create(mp_name, n_mbufs,
>> - MP_CACHE_SZ,
>> - sizeof (struct dp_packet)
>> - - sizeof (struct rte_mbuf),
>> - MBUF_SIZE(mtu)
>> - - sizeof(struct dp_packet),
>> + VLOG_DBG("Port %s: Requesting a mempool of %u mbufs of size %u "
>> + "on socket %d for %d Rx and %d Tx queues, "
>> + "cache line size of %u",
>> + netdev_name, n_mbufs, mbuf_size, socket_id,
>> + dev->requested_n_rxq, dev->requested_n_txq,
>> + RTE_CACHE_LINE_SIZE);
>> +
>> + mbuf_priv_data_len = sizeof(struct dp_packet) -
>> + sizeof(struct rte_mbuf);
>> + /* The size of the entire dp_packet. */
>> + pkt_size = sizeof (struct dp_packet) +
>> + mbuf_size + RTE_PKTMBUF_HEADROOM;
>
> 2* Here RTE_PKTMBUF_HEADROOM accounted again. That is unclear.
This had been removed already locally, only noticed after sending the
patchset.
Tiago.
>
>> + /* mbuf size, rounded up to cacheline size. */
>> + aligned_mbuf_size = ROUND_UP(pkt_size, RTE_CACHE_LINE_SIZE);
>> + /* If there is a size discrepancy, add padding to mbuf_priv_data_len.
>> + * This maintains mbuf size cache alignment, while also honoring RX
>> + * buffer alignment in the data portion of the mbuf. If this adjustment
>> + * is not made, there is a possiblity later on that for an element of
>> + * the mempool, buf, buf->data_len < (buf->buf_len - buf->data_off).
>> + * This is problematic in the case of multi-segment mbufs, particularly
>> + * when an mbuf segment needs to be resized (when [push|popp]ing a VLAN
>> + * header, for example.
>> + */
>> + mbuf_priv_data_len += (aligned_mbuf_size - pkt_size);
>> +
>> + dmp->mp = rte_pktmbuf_pool_create(mp_name, n_mbufs, MP_CACHE_SZ,
>> + mbuf_priv_data_len,
>> + mbuf_size,
>> socket_id);
>>
>> if (dmp->mp) {
>>
More information about the dev
mailing list