[ovs-dev] [RFC PATCH V4 1/9] netdev-dpdk: fix mbuf sizing
Chandran, Sugesh
sugesh.chandran at intel.com
Tue Dec 12 20:42:27 UTC 2017
Regards
_Sugesh
> -----Original Message-----
> From: Kavanagh, Mark B
> Sent: Tuesday, December 12, 2017 11:52 AM
> To: Chandran, Sugesh <sugesh.chandran at intel.com>; dev at openvswitch.org;
> qiudayu at chinac.com
> Cc: Stokes, Ian <ian.stokes at intel.com>; Loftus, Ciara <ciara.loftus at intel.com>;
> santosh.shukla at caviumnetworks.com
> Subject: RE: [ovs-dev][RFC PATCH V4 1/9] netdev-dpdk: fix mbuf sizing
>
> >From: Chandran, Sugesh
> >Sent: Friday, December 8, 2017 5:48 PM
> >To: Kavanagh, Mark B <mark.b.kavanagh at intel.com>; dev at openvswitch.org;
> >qiudayu at chinac.com
> >Cc: Stokes, Ian <ian.stokes at intel.com>; Loftus, Ciara
> ><ciara.loftus at intel.com>; santosh.shukla at caviumnetworks.com
> >Subject: RE: [ovs-dev][RFC PATCH V4 1/9] netdev-dpdk: fix mbuf sizing
> >
> >
> >
> >Regards
> >_Sugesh
> >
> >
> >> -----Original Message-----
> >> From: Kavanagh, Mark B
> >> Sent: Friday, December 8, 2017 12:02 PM
> >> To: dev at openvswitch.org; qiudayu at chinac.com
> >> Cc: Stokes, Ian <ian.stokes at intel.com>; Loftus, Ciara
> ><ciara.loftus at intel.com>;
> >> santosh.shukla at caviumnetworks.com; Chandran, Sugesh
> >> <sugesh.chandran at intel.com>; Kavanagh, Mark B
> >> <mark.b.kavanagh at intel.com>
> >> Subject: [ovs-dev][RFC PATCH V4 1/9] netdev-dpdk: fix mbuf sizing
> >>
> >> There are numerous factors that must be considered when calculating
> >> the size of an mbuf:
> >> - the data portion of the mbuf must be sized in accordance With Rx
> >> buffer alignment (typically 1024B). So, for example, in order to
> >> successfully receive and capture a 1500B packet, mbufs with a
> >> data portion of size 2048B must be used.
> >> - in OvS, the elements that comprise an mbuf are:
> >> * the dp packet, which includes a struct rte mbuf (704B)
> >> * RTE_PKTMBUF_HEADROOM (128B)
> >> * packet data (aligned to 1k, as previously described)
> >> * RTE_PKTMBUF_TAILROOM (typically 0)
> >>
> >> Some PMDs require that the total mbuf size (i.e. the total sum of all
> >> of the above-listed components' lengths) is cache-aligned. To satisfy
> >> this requirement, it may be necessary to round up the total mbuf size
> >> with respect to cacheline size. In doing so, it's possible that the
> >> dp_packet's data portion is inadvertently increased in size, such
> >> that it no longer adheres to Rx buffer alignment. Consequently, the
> >> following property of the mbuf no longer holds true:
> >>
> >> mbuf.data_len == mbuf.buf_len - mbuf.data_off
> >>
> >> This creates a problem in the case of multi-segment mbufs, where that
> >> assumption is assumed to be true for all but the final segment in an
> >> mbuf chain. Resolve this issue by adjusting the size of the mbuf's
> >> private data portion, as opposed to the packet data portion when
> >> aligning mbuf size to cachelines.
> >>
> >> Fixes: 4be4d22 ("netdev-dpdk: clean up mbuf initialization")
> >> Fixes: 31b88c9 ("netdev-dpdk: round up mbuf_size to cache_line_size")
> >> CC: Santosh Shukla <santosh.shukla at caviumnetworks.com>
> >> Signed-off-by: Mark Kavanagh <mark.b.kavanagh at intel.com>
> >> ---
> >> lib/netdev-dpdk.c | 46
> >> ++++++++++++++++++++++++++++++----------------
> >> 1 file changed, 30 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
> >> 9715c39..4167497 100644
> >> --- a/lib/netdev-dpdk.c
> >> +++ b/lib/netdev-dpdk.c
> >> @@ -81,12 +81,6 @@ static struct vlog_rate_limit rl =
> >> VLOG_RATE_LIMIT_INIT(5, 20);
> >> + (2 * VLAN_HEADER_LEN))
> >> #define MTU_TO_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_LEN +
> >> ETHER_CRC_LEN)
> >> #define MTU_TO_MAX_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_MAX_LEN)
> >> -#define FRAME_LEN_TO_MTU(frame_len) ((frame_len) \
> >> - - ETHER_HDR_LEN - ETHER_CRC_LEN)
> >> -#define MBUF_SIZE(mtu)
> ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu)
> >> \
> >> - + sizeof(struct dp_packet) \
> >> - + RTE_PKTMBUF_HEADROOM), \
> >> - RTE_CACHE_LINE_SIZE)
> >> #define NETDEV_DPDK_MBUF_ALIGN 1024
> >> #define NETDEV_DPDK_MAX_PKT_LEN 9728
> >>
> >> @@ -447,7 +441,7 @@ is_dpdk_class(const struct netdev_class *class)
> >> * behaviour, which reduces performance. To prevent this, use a buffer size
> >> * that is closest to 'mtu', but which satisfies the aforementioned
> >criteria.
> >> */
> >> -static uint32_t
> >> +static uint16_t
> >> dpdk_buf_size(int mtu)
> >> {
> >> return ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) +
> >> RTE_PKTMBUF_HEADROOM), @@ -486,7 +480,7 @@
> >> ovs_rte_pktmbuf_init(struct rte_mempool *mp OVS_UNUSED,
> >> * - a new mempool was just created;
> >> * - a matching mempool already exists. */ static struct
> >> rte_mempool * -dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
> >> +dpdk_mp_create(struct netdev_dpdk *dev, uint16_t mbuf_pkt_data_len)
> >> {
> >> char mp_name[RTE_MEMPOOL_NAMESIZE];
> >> const char *netdev_name = netdev_get_name(&dev->up); @@ -494,6
> >> +488,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
> >> uint32_t n_mbufs;
> >> uint32_t hash = hash_string(netdev_name, 0);
> >> struct rte_mempool *mp = NULL;
> >> + uint16_t mbuf_size, aligned_mbuf_size, mbuf_priv_data_len;
> >>
> >> /*
> >> * XXX: rough estimation of number of mbufs required for this port:
> >> @@ -513,12 +508,13 @@ dpdk_mp_create(struct netdev_dpdk *dev, int
> mtu)
> >> * longer than RTE_MEMPOOL_NAMESIZE. */
> >> int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE,
> >> "ovs%08x%02d%05d%07u",
> >> - hash, socket_id, mtu, n_mbufs);
> >> + hash, socket_id, mbuf_pkt_data_len,
> >> + n_mbufs);
> >> if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {
> >> VLOG_DBG("snprintf returned %d. "
> >> "Failed to generate a mempool name for \"%s\". "
> >> - "Hash:0x%x, socket_id: %d, mtu:%d, mbufs:%u.",
> >> - ret, netdev_name, hash, socket_id, mtu, n_mbufs);
> >> + "Hash:0x%x, socket_id: %d, pkt data room:%d,
> >mbufs:%u.",
> >> + ret, netdev_name, hash, socket_id, mbuf_pkt_data_len,
> >> + n_mbufs);
> >> break;
> >> }
> >>
> >> @@ -527,13 +523,31 @@ dpdk_mp_create(struct netdev_dpdk *dev, int
> mtu)
> >> netdev_name, n_mbufs, socket_id,
> >> dev->requested_n_rxq, dev->requested_n_txq);
> >>
> >> + mbuf_priv_data_len = sizeof(struct dp_packet) -
> >> + sizeof(struct rte_mbuf);
> >> + /* The size of the entire mbuf. */
> >> + mbuf_size = sizeof (struct dp_packet) +
> >> + mbuf_pkt_data_len + RTE_PKTMBUF_HEADROOM;
> >> + /* mbuf size, rounded up to cacheline size. */
> >> + aligned_mbuf_size = ROUND_UP(mbuf_size, RTE_CACHE_LINE_SIZE);
> >> + /* If there is a size discrepancy, add padding to
> >mbuf_priv_data_len.
> >> + * This maintains mbuf size cache alignment, while also honoring RX
> >> + * buffer alignment in the data portion of the mbuf. If this
> >adjustment
> >> + * is not made, there is a possibility later on that for an
> >> + element
> >of
> >> + * the mempool, buf, buf->data_len < (buf->buf_len - buf-
> >>data_off).
> >> + * This is problematic in the case of multi-segment mbufs,
> >particularly
> >> + * when an mbuf segment needs to be resized (when
> >> + [push|popp]ing a
> >> VLAN
> >> + * header, for example.
> >> + */
> >> + mbuf_priv_data_len += (aligned_mbuf_size - mbuf_size);
> >> +
> >> mp = rte_pktmbuf_pool_create(mp_name, n_mbufs, MP_CACHE_SZ,
> >> - sizeof (struct dp_packet) - sizeof (struct rte_mbuf),
> >> - MBUF_SIZE(mtu) - sizeof(struct dp_packet), socket_id);
> >> + mbuf_priv_data_len,
> >> + mbuf_pkt_data_len + RTE_PKTMBUF_HEADROOM,
> >> + socket_id);
> >[Sugesh] Is there any chance this will cause any performance drop?
> >The earlier implementation made the data to be cache aligned , so the
> >the packet data handling is more cache friendly. Now the alignment is
> >moved private data, which may
>
> Hi Sugesh,
>
> The size adjustment only affects 128B cachelines, in which case it actually
> ensures that the packet data within the mbuf is cache aligned. Without this
> adjustment, alignment is achieved by adding space to the data portion of the
> mbuf, but this introduces a 64B (i.e. half a cacheline) gap between the tail of the
> dp_packet, and the mbuf's headroom. By increasing the size of the private data
> instead (by 64B), the start of the mbuf data is once again aligned to a 128B
> cacheline. For 64B cachelines it doesn't make a difference, so no performance
> degradation is expected, nor have I observed any.
>
> Thanks,
> Mark
[Sugesh] Thank you Mark for the confirmation. In fact I did some of the quick testing and didn't find any degradation as such.
>
> >touch /accessed by CPU
> >very rarely. Does it make any difference in performance when running
> >different size packet? I don't see any issues when I did some basic
> >testing. However I haven't done any testing that involved a lot packet
> >data/header handling.
> >Do you expect anything like that?
> >>
> >> if (mp) {
> >> VLOG_DBG("Allocated \"%s\" mempool with %u mbufs",
> >> - mp_name, n_mbufs);
> >> + mp_name, n_mbufs);
> >> /* rte_pktmbuf_pool_create has done some initialization of the
> >> * rte_mbuf part of each dp_packet. Some OvS specific fields
> >> * of the packet still need to be initialized by @@
> >> -582,11 +596,11 @@ static int netdev_dpdk_mempool_configure(struct
> >> netdev_dpdk *dev)
> >> OVS_REQUIRES(dev->mutex)
> >> {
> >> - uint32_t buf_size = dpdk_buf_size(dev->requested_mtu);
> >> + uint16_t buf_size = dpdk_buf_size(dev->requested_mtu);
> >> struct rte_mempool *mp;
> >> int ret = 0;
> >>
> >> - mp = dpdk_mp_create(dev, FRAME_LEN_TO_MTU(buf_size));
> >> + mp = dpdk_mp_create(dev, buf_size);
> >> if (!mp) {
> >> VLOG_ERR("Failed to create memory pool for netdev "
> >> "%s, with MTU %d on socket %d: %s\n",
> >> --
> >> 1.9.3
More information about the dev
mailing list