[ovs-dev] [RFC PATCH V4 1/9] netdev-dpdk: fix mbuf sizing

Chandran, Sugesh sugesh.chandran at intel.com
Tue Dec 12 20:43:49 UTC 2017



Regards
_Sugesh


> -----Original Message-----
> From: Chandran, Sugesh
> Sent: Tuesday, December 12, 2017 8:42 PM
> To: Kavanagh, Mark B <mark.b.kavanagh at intel.com>; dev at openvswitch.org;
> qiudayu at chinac.com
> Cc: Stokes, Ian <ian.stokes at intel.com>; Loftus, Ciara <ciara.loftus at intel.com>;
> santosh.shukla at caviumnetworks.com
> Subject: RE: [ovs-dev][RFC PATCH V4 1/9] netdev-dpdk: fix mbuf sizing
> 
> 
> 
> Regards
> _Sugesh
> 
> 
> > -----Original Message-----
> > From: Kavanagh, Mark B
> > Sent: Tuesday, December 12, 2017 11:52 AM
> > To: Chandran, Sugesh <sugesh.chandran at intel.com>; dev at openvswitch.org;
> > qiudayu at chinac.com
> > Cc: Stokes, Ian <ian.stokes at intel.com>; Loftus, Ciara
> > <ciara.loftus at intel.com>; santosh.shukla at caviumnetworks.com
> > Subject: RE: [ovs-dev][RFC PATCH V4 1/9] netdev-dpdk: fix mbuf sizing
> >
> > >From: Chandran, Sugesh
> > >Sent: Friday, December 8, 2017 5:48 PM
> > >To: Kavanagh, Mark B <mark.b.kavanagh at intel.com>;
> > >dev at openvswitch.org; qiudayu at chinac.com
> > >Cc: Stokes, Ian <ian.stokes at intel.com>; Loftus, Ciara
> > ><ciara.loftus at intel.com>; santosh.shukla at caviumnetworks.com
> > >Subject: RE: [ovs-dev][RFC PATCH V4 1/9] netdev-dpdk: fix mbuf sizing
> > >
> > >
> > >
> > >Regards
> > >_Sugesh
> > >
> > >
> > >> -----Original Message-----
> > >> From: Kavanagh, Mark B
> > >> Sent: Friday, December 8, 2017 12:02 PM
> > >> To: dev at openvswitch.org; qiudayu at chinac.com
> > >> Cc: Stokes, Ian <ian.stokes at intel.com>; Loftus, Ciara
> > ><ciara.loftus at intel.com>;
> > >> santosh.shukla at caviumnetworks.com; Chandran, Sugesh
> > >> <sugesh.chandran at intel.com>; Kavanagh, Mark B
> > >> <mark.b.kavanagh at intel.com>
> > >> Subject: [ovs-dev][RFC PATCH V4 1/9] netdev-dpdk: fix mbuf sizing
> > >>
> > >> There are numerous factors that must be considered when calculating
> > >> the size of an mbuf:
> > >> - the data portion of the mbuf must be sized in accordance With Rx
> > >>   buffer alignment (typically 1024B). So, for example, in order to
> > >>   successfully receive and capture a 1500B packet, mbufs with a
> > >>   data portion of size 2048B must be used.
> > >> - in OvS, the elements that comprise an mbuf are:
> > >>   * the dp packet, which includes a struct rte mbuf (704B)
> > >>   * RTE_PKTMBUF_HEADROOM (128B)
> > >>   * packet data (aligned to 1k, as previously described)
> > >>   * RTE_PKTMBUF_TAILROOM (typically 0)
> > >>
> > >> Some PMDs require that the total mbuf size (i.e. the total sum of
> > >> all of the above-listed components' lengths) is cache-aligned. To
> > >> satisfy this requirement, it may be necessary to round up the total
> > >> mbuf size with respect to cacheline size. In doing so, it's
> > >> possible that the dp_packet's data portion is inadvertently
> > >> increased in size, such that it no longer adheres to Rx buffer
> > >> alignment. Consequently, the following property of the mbuf no longer
> holds true:
> > >>
> > >>     mbuf.data_len == mbuf.buf_len - mbuf.data_off
> > >>
> > >> This creates a problem in the case of multi-segment mbufs, where
> > >> that assumption is assumed to be true for all but the final segment
> > >> in an mbuf chain. Resolve this issue by adjusting the size of the
> > >> mbuf's private data portion, as opposed to the packet data portion
> > >> when aligning mbuf size to cachelines.
> > >>
> > >> Fixes: 4be4d22 ("netdev-dpdk: clean up mbuf initialization")
> > >> Fixes: 31b88c9 ("netdev-dpdk: round up mbuf_size to
> > >> cache_line_size")
> > >> CC: Santosh Shukla <santosh.shukla at caviumnetworks.com>
> > >> Signed-off-by: Mark Kavanagh <mark.b.kavanagh at intel.com>
> > >> ---
> > >>  lib/netdev-dpdk.c | 46
> > >> ++++++++++++++++++++++++++++++----------------
> > >>  1 file changed, 30 insertions(+), 16 deletions(-)
> > >>
> > >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
> > >> 9715c39..4167497 100644
> > >> --- a/lib/netdev-dpdk.c
> > >> +++ b/lib/netdev-dpdk.c
> > >> @@ -81,12 +81,6 @@ static struct vlog_rate_limit rl =
> > >> VLOG_RATE_LIMIT_INIT(5, 20);
> > >>                                       + (2 * VLAN_HEADER_LEN))
> > >>  #define MTU_TO_FRAME_LEN(mtu)       ((mtu) + ETHER_HDR_LEN +
> > >> ETHER_CRC_LEN)
> > >>  #define MTU_TO_MAX_FRAME_LEN(mtu)   ((mtu) +
> ETHER_HDR_MAX_LEN)
> > >> -#define FRAME_LEN_TO_MTU(frame_len) ((frame_len)                    \
> > >> -                                     - ETHER_HDR_LEN - ETHER_CRC_LEN)
> > >> -#define MBUF_SIZE(mtu)
> > ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu)
> > >> \
> > >> -                                             + sizeof(struct dp_packet) \
> > >> -                                             + RTE_PKTMBUF_HEADROOM),   \
> > >> -                                             RTE_CACHE_LINE_SIZE)
> > >>  #define NETDEV_DPDK_MBUF_ALIGN      1024
> > >>  #define NETDEV_DPDK_MAX_PKT_LEN     9728
> > >>
> > >> @@ -447,7 +441,7 @@ is_dpdk_class(const struct netdev_class *class)
> > >>   * behaviour, which reduces performance. To prevent this, use a buffer size
> > >>   * that is closest to 'mtu', but which satisfies the
> > >> aforementioned
> > >criteria.
> > >>   */
> > >> -static uint32_t
> > >> +static uint16_t
> > >>  dpdk_buf_size(int mtu)
> > >>  {
> > >>      return ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) +
> > >> RTE_PKTMBUF_HEADROOM), @@ -486,7 +480,7 @@
> > >> ovs_rte_pktmbuf_init(struct rte_mempool *mp OVS_UNUSED,
> > >>   *  - a new mempool was just created;
> > >>   *  - a matching mempool already exists. */  static struct
> > >> rte_mempool * -dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
> > >> +dpdk_mp_create(struct netdev_dpdk *dev, uint16_t
> > >> +mbuf_pkt_data_len)
> > >>  {
> > >>      char mp_name[RTE_MEMPOOL_NAMESIZE];
> > >>      const char *netdev_name = netdev_get_name(&dev->up); @@ -494,6
> > >> +488,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
> > >>      uint32_t n_mbufs;
> > >>      uint32_t hash = hash_string(netdev_name, 0);
> > >>      struct rte_mempool *mp = NULL;
> > >> +    uint16_t mbuf_size, aligned_mbuf_size, mbuf_priv_data_len;
> > >>
> > >>      /*
> > >>       * XXX: rough estimation of number of mbufs required for this port:
> > >> @@ -513,12 +508,13 @@ dpdk_mp_create(struct netdev_dpdk *dev, int
> > mtu)
> > >>           * longer than RTE_MEMPOOL_NAMESIZE. */
> > >>          int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE,
> > >>                             "ovs%08x%02d%05d%07u",
> > >> -                           hash, socket_id, mtu, n_mbufs);
> > >> +                           hash, socket_id, mbuf_pkt_data_len,
> > >> + n_mbufs);
> > >>          if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {
> > >>              VLOG_DBG("snprintf returned %d. "
> > >>                       "Failed to generate a mempool name for \"%s\". "
> > >> -                     "Hash:0x%x, socket_id: %d, mtu:%d, mbufs:%u.",
> > >> -                     ret, netdev_name, hash, socket_id, mtu, n_mbufs);
> > >> +                     "Hash:0x%x, socket_id: %d, pkt data room:%d,
> > >mbufs:%u.",
> > >> +                     ret, netdev_name, hash, socket_id, mbuf_pkt_data_len,
> > >> +                     n_mbufs);
> > >>              break;
> > >>          }
> > >>
> > >> @@ -527,13 +523,31 @@ dpdk_mp_create(struct netdev_dpdk *dev, int
> > mtu)
> > >>                    netdev_name, n_mbufs, socket_id,
> > >>                    dev->requested_n_rxq, dev->requested_n_txq);
> > >>
> > >> +        mbuf_priv_data_len = sizeof(struct dp_packet) -
> > >> +                                 sizeof(struct rte_mbuf);
> > >> +        /* The size of the entire mbuf. */
> > >> +        mbuf_size = sizeof (struct dp_packet) +
> > >> +                                mbuf_pkt_data_len + RTE_PKTMBUF_HEADROOM;
> > >> +        /* mbuf size, rounded up to cacheline size. */
> > >> +        aligned_mbuf_size = ROUND_UP(mbuf_size, RTE_CACHE_LINE_SIZE);
> > >> +        /* If there is a size discrepancy, add padding to
> > >mbuf_priv_data_len.
> > >> +         * This maintains mbuf size cache alignment, while also honoring RX
> > >> +         * buffer alignment in the data portion of the mbuf. If
> > >> + this
> > >adjustment
> > >> +         * is not made, there is a possibility later on that for
> > >> + an element
> > >of
> > >> +         * the mempool, buf, buf->data_len < (buf->buf_len - buf-
> > >>data_off).
> > >> +         * This is problematic in the case of multi-segment mbufs,
> > >particularly
> > >> +         * when an mbuf segment needs to be resized (when
> > >> + [push|popp]ing a
> > >> VLAN
> > >> +         * header, for example.
> > >> +         */
> > >> +        mbuf_priv_data_len += (aligned_mbuf_size - mbuf_size);
> > >> +
> > >>          mp = rte_pktmbuf_pool_create(mp_name, n_mbufs, MP_CACHE_SZ,
> > >> -                 sizeof (struct dp_packet) - sizeof (struct rte_mbuf),
> > >> -                 MBUF_SIZE(mtu) - sizeof(struct dp_packet), socket_id);
> > >> +                mbuf_priv_data_len,
> > >> +                mbuf_pkt_data_len + RTE_PKTMBUF_HEADROOM,
> > >> + socket_id);
> > >[Sugesh] Is there any chance this will cause any performance drop?
> > >The earlier implementation made the data to be cache aligned , so the
> > >the packet data handling is more cache friendly. Now the alignment is
> > >moved private data, which may
> >
> > Hi Sugesh,
> >
> > The size adjustment only affects 128B cachelines, in which case it
> > actually ensures that the packet data within the mbuf is cache
> > aligned. Without this adjustment, alignment is achieved by adding
> > space to the data portion of the mbuf, but this introduces a 64B (i.e.
> > half a cacheline) gap between the tail of the dp_packet, and the
> > mbuf's headroom. By increasing the size of the private data instead
> > (by 64B), the start of the mbuf data is once again aligned to a 128B
> > cacheline. For 64B cachelines it doesn't make a difference, so no performance
> degradation is expected, nor have I observed any.
> >
> > Thanks,
> > Mark
> [Sugesh] Thank you Mark for the confirmation. In fact I did some of the quick
> testing and didn't find any degradation as such.
> 
[Sugesh] addendum : I will run test cases and validate if there are any issue as such when you release the next version of patches.

Thanks!

> >
> > >touch /accessed by CPU
> > >very rarely. Does it make any difference in performance when running
> > >different size packet? I don't see any issues when I did some basic
> > >testing. However I haven't done any testing that involved a lot
> > >packet data/header handling.
> > >Do you expect anything like that?
> > >>
> > >>          if (mp) {
> > >>              VLOG_DBG("Allocated \"%s\" mempool with %u mbufs",
> > >> -                     mp_name, n_mbufs);
> > >> +                    mp_name, n_mbufs);
> > >>              /* rte_pktmbuf_pool_create has done some initialization of the
> > >>               * rte_mbuf part of each dp_packet. Some OvS specific fields
> > >>               * of the packet still need to be initialized by @@
> > >> -582,11 +596,11 @@ static int  netdev_dpdk_mempool_configure(struct
> > >> netdev_dpdk *dev)
> > >>      OVS_REQUIRES(dev->mutex)
> > >>  {
> > >> -    uint32_t buf_size = dpdk_buf_size(dev->requested_mtu);
> > >> +    uint16_t buf_size = dpdk_buf_size(dev->requested_mtu);
> > >>      struct rte_mempool *mp;
> > >>      int ret = 0;
> > >>
> > >> -    mp = dpdk_mp_create(dev, FRAME_LEN_TO_MTU(buf_size));
> > >> +    mp = dpdk_mp_create(dev, buf_size);
> > >>      if (!mp) {
> > >>          VLOG_ERR("Failed to create memory pool for netdev "
> > >>                   "%s, with MTU %d on socket %d: %s\n",
> > >> --
> > >> 1.9.3



More information about the dev mailing list