[ovs-dev] [PATCH v1] dpdk: Support both shared and per port mempools.

Kevin Traynor ktraynor at redhat.com
Tue Jun 26 14:33:09 UTC 2018


On 06/25/2018 12:56 PM, Ian Stokes wrote:
> This commit re-introduces the concept of shared mempools as the default
> memory model for DPDK devices. Per port mempools are still available but
> must be enabled explicitly by a user.
> 
> OVS previously used a shared mempool model for ports with the same MTU
> and socket configuration. This was replaced by a per port mempool model
> to address issues flagged by users such as:
> 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2016-September/042560.html
> 
> However the per port model potentially requires an increase in memory
> resource requirements to support the same number of ports and configuration
> as the shared port model.
> 
> This is considered a blocking factor for current deployments of OVS
> when upgrading to future OVS releases as a user may have to redimension
> memory for the same deployment configuration. This may not be possible for
> users.
> 
> This commit resolves the issue by re-introducing shared mempools as
> the default memory behaviour in OVS DPDK but also refactors the memory
> configuration code to allow for per port mempools.
> 
> This patch adds a new global config option, per-port-memory, that
> controls the enablement of per port mempools for DPDK devices.
> 
>     ovs-vsctl set Open_vSwitch . other_config:per-port-memory=true
> 
> This value defaults to false; to enable per port memory support,
> this field should be set to true when setting other global parameters
> on init (such as "dpdk-socket-mem", for example). Changing the value at
> runtime is not supported, and requires restarting the vswitch
> daemon.
> 
> The mempool sweep functionality is also replaced with the
> sweep functionality from OVS 2.9 found in commits
> 
> c77f692 (netdev-dpdk: Free mempool only when no in-use mbufs.)
> a7fb0a4 (netdev-dpdk: Add mempool reuse/free debug.)
> 
> A new document to discuss the specifics of the memory models and example
> memory requirement calculations is also added.
> 
> Signed-off-by: Ian Stokes <ian.stokes at intel.com>
> ---
>  Documentation/automake.mk            |   1 +
>  Documentation/intro/install/dpdk.rst |   6 +
>  Documentation/topics/dpdk/index.rst  |   1 +
>  Documentation/topics/dpdk/memory.rst | 216 +++++++++++++++++++++++++
>  NEWS                                 |   1 +
>  lib/dpdk-stub.c                      |   6 +
>  lib/dpdk.c                           |  12 ++
>  lib/dpdk.h                           |   1 +
>  lib/netdev-dpdk.c                    | 296 +++++++++++++++++++++++------------
>  vswitchd/vswitch.xml                 |  17 ++
>  10 files changed, 457 insertions(+), 100 deletions(-)
>  create mode 100644 Documentation/topics/dpdk/memory.rst
> 
> diff --git a/Documentation/automake.mk b/Documentation/automake.mk
> index bc728df..2444794 100644
> --- a/Documentation/automake.mk
> +++ b/Documentation/automake.mk
> @@ -36,6 +36,7 @@ DOC_SOURCE = \
>  	Documentation/topics/dpdk/index.rst \
>  	Documentation/topics/dpdk/bridge.rst \
>  	Documentation/topics/dpdk/jumbo-frames.rst \
> +	Documentation/topics/dpdk/memory.rst \
>  	Documentation/topics/dpdk/pdump.rst \
>  	Documentation/topics/dpdk/phy.rst \
>  	Documentation/topics/dpdk/pmd.rst \
> diff --git a/Documentation/intro/install/dpdk.rst b/Documentation/intro/install/dpdk.rst
> index 085e479..7dc75ef 100644
> --- a/Documentation/intro/install/dpdk.rst
> +++ b/Documentation/intro/install/dpdk.rst
> @@ -170,6 +170,12 @@ Mount the hugepages, if not already mounted by default::
>  
>      $ mount -t hugetlbfs none /dev/hugepages``
>  
> +.. note::
> +
> +   The amount of hugepage memory required can be affected by various
> +   aspects of the datapath and device configuration. Refer to
> +   :doc:`/topics/dpdk/memory` for more details.
> +
>  .. _dpdk-vfio:
>  
>  Setup DPDK devices using VFIO
> diff --git a/Documentation/topics/dpdk/index.rst b/Documentation/topics/dpdk/index.rst
> index 181f61a..cf24a7b 100644
> --- a/Documentation/topics/dpdk/index.rst
> +++ b/Documentation/topics/dpdk/index.rst
> @@ -40,3 +40,4 @@ The DPDK Datapath
>     /topics/dpdk/qos
>     /topics/dpdk/pdump
>     /topics/dpdk/jumbo-frames
> +   /topics/dpdk/memory
> diff --git a/Documentation/topics/dpdk/memory.rst b/Documentation/topics/dpdk/memory.rst
> new file mode 100644
> index 0000000..2e0b3f5
> --- /dev/null
> +++ b/Documentation/topics/dpdk/memory.rst
> @@ -0,0 +1,216 @@
> +..
> +      Copyright (c) 2018 Intel Corporation
> +
> +      Licensed under the Apache License, Version 2.0 (the "License"); you may
> +      not use this file except in compliance with the License. You may obtain
> +      a copy of the License at
> +
> +          http://www.apache.org/licenses/LICENSE-2.0
> +
> +      Unless required by applicable law or agreed to in writing, software
> +      distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
> +      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
> +      License for the specific language governing permissions and limitations
> +      under the License.
> +
> +      Convention for heading levels in Open vSwitch documentation:
> +
> +      =======  Heading 0 (reserved for the title in a document)
> +      -------  Heading 1
> +      ~~~~~~~  Heading 2
> +      +++++++  Heading 3
> +      '''''''  Heading 4
> +
> +      Avoid deeper levels because they do not render well.
> +
> +=========================
> +DPDK Device Memory Models
> +=========================
> +
> +DPDK device memory can be allocated in one of two ways in OVS DPDK,
> +**shared memory** or **per port memory**. The specifics of both are
> +detailed below.
> +
> +Shared Memory
> +-------------
> +
> +By default OVS DPDK uses a shared memory model. This means that multiple
> +ports can share the same mempool. For example when a port is added it will
> +have a given MTU and socket ID associated with it. If a mempool has been
> +created previously for an existing port that has the same MTU and socket ID,
> +that mempool is used for both ports. If there is no existing mempool
> +supporting these parameters then a new mempool is created.
> +
> +Per Port Memory
> +---------------
> +
> +In the per port memory model, mempools are created per device and are not
> +shared. The benefit of this is a more transparent memory model where mempools
> +will not be exhausted by other DPDK devices. However this comes at a potential
> +increase in cost for memory dimensioning for a given deployment. Users should
> +be aware of the memory requirements for their deployment before using this
> +model and allocate the required hugepage memory.
> +
> +Per port mempool support may be enabled via a global config value,
> +```per-port-memory```. Setting this to true enables the per port memory
> +model for all DPDK devices in OVS::
> +
> +   $ ovs-vsctl set Open_vSwitch . other_config:per-port-memory=true
> +
> +.. important::
> +
> +    This value should be set before setting dpdk-init=true. If set after
> +    dpdk-init=true then the daemon must be restarted to use per-port-memory.
> +
> +Calculating Memory Requirements
> +-------------------------------
> +
> +The amount of memory required for a given mempool can be calculated by the
> +**number mbufs in the mempool \* mbuf size**.
> +
> +Users should be aware of the following:
> +
> +* The **number of mbufs** per mempool will differ between memory models.
> +
> +* The **size of each mbuf** will be affected by the requested **MTU** size.
> +
> +.. important::
> +
> +   An mbuf size in bytes can be larger than the requested MTU size due to

s/can be/is always/

> +   allignment and rounding needed in OVS DPDK.

alignment

> +
> +Below are a number of examples of memory requirement calculations for both
> +shared and per port memory models.
> +

-----8<-----

>  
> -/* Tries to allocate a new mempool - or re-use an existing one where
> - * appropriate - on requested_socket_id with a size determined by
> - * requested_mtu and requested Rx/Tx queues.
> - * On success - or when re-using an existing mempool - the new configuration
> - * will be applied.
> +/* Depending on the memory model being used this function tries

tries to

> + * identify and reuse an existing mempool or tries to allocate new
> + * mempool on requested_socket_id with mbuf size corresponding to
> + * requested_mtu. On success new configuration will be applied.
>   * On error, device will be left unchanged. */
>  static int
>  netdev_dpdk_mempool_configure(struct netdev_dpdk *dev)
>      OVS_REQUIRES(dev->mutex)
>  {
>      uint32_t buf_size = dpdk_buf_size(dev->requested_mtu);
> -    struct rte_mempool *mp;
> +    struct dpdk_mp *dmp;
>      int ret = 0;
> +    bool per_port_mp = dpdk_per_port_memory();
>  
> -    dpdk_mp_sweep();
> +    /* With shared mempools we do not need to configure a mempool if the MTU
> +     * and socket ID have not changed, the previous configuration is still
> +     * valid so return 0 */
> +    if (!per_port_mp && dev->mtu == dev->requested_mtu
> +        && dev->socket_id == dev->requested_socket_id) {
> +        return ret;
> +    }
>  
> -    mp = dpdk_mp_create(dev, FRAME_LEN_TO_MTU(buf_size));
> -    if (!mp) {
> +    dmp = dpdk_mp_get(dev, FRAME_LEN_TO_MTU(buf_size), per_port_mp);
> +    if (!dmp) {
>          VLOG_ERR("Failed to create memory pool for netdev "
>                   "%s, with MTU %d on socket %d: %s\n",
>                   dev->up.name, dev->requested_mtu, dev->requested_socket_id,
>                   rte_strerror(rte_errno));
> -        ret = rte_errno;
> +        return rte_errno;

No need to return here and have the else below. You could set ret or
remove the else.

>      } else {
> -        /* If a new MTU was requested and its rounded value equals the one
> -         * that is currently used, then the existing mempool is returned. */
> -        if (dev->mp != mp) {
> -            /* A new mempool was created, release the previous one. */
> -            dpdk_mp_release(dev->mp);
> -        } else {
> -            ret = EEXIST;
> +        /* Check for any pre-existing dpdk_mp for the device before accessing
> +         * the associated mempool.
> +         */
> +        if (dev->dpdk_mp != NULL) {
> +            /* A new MTU was requested and its rounded value does not
> +             * equal the rounded value of the current mempool, decrement
> +             * the reference count to the devices current dpdk_mp as its
> +             * associated mempool will not be used for this device.
> +             */

The comment needs updating. You could be reusing the existing mempool,
in which case it's still ok to mp_put because you have just incremented
it...

About that, I know it was my comment to remove an unnecessary check here
:-/ but i'm wondering if the refcount inc and then dec for the EEXISTS
case is unintuitive and it would better the way you had it with set
refcount =1 in mp_get, and then have the check for "if (dev->dpdk_mp !=
dmp)" before you mp_put. What do you think?

> +            dpdk_mp_put(dev>dpdk_mp);



>          }
> -        dev->mp = mp;
> +        dev->dpdk_mp = dmp;
>          dev->mtu = dev->requested_mtu;
>          dev->socket_id = dev->requested_socket_id;
>          dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> @@ -855,7 +950,8 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq)
>  
>          for (i = 0; i < n_rxq; i++) {
>              diag = rte_eth_rx_queue_setup(dev->port_id, i, dev->rxq_size,
> -                                          dev->socket_id, NULL, dev->mp);
> +                                          dev->socket_id, NULL,
> +                                          dev->dpdk_mp->mp);
>              if (diag) {
>                  VLOG_INFO("Interface %s unable to setup rxq(%d): %s",
>                            dev->up.name, i, rte_strerror(-diag));
> @@ -950,7 +1046,7 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
>      memcpy(dev->hwaddr.ea, eth_addr.addr_bytes, ETH_ADDR_LEN);
>      rte_eth_link_get_nowait(dev->port_id, &dev->link);
>  
> -    mbp_priv = rte_mempool_get_priv(dev->mp);
> +    mbp_priv = rte_mempool_get_priv(dev->dpdk_mp->mp);
>      dev->buf_size = mbp_priv->mbuf_data_room_size - RTE_PKTMBUF_HEADROOM;
>  
>      /* Get the Flow control configuration for DPDK-ETH */
> @@ -1204,7 +1300,7 @@ common_destruct(struct netdev_dpdk *dev)
>      OVS_EXCLUDED(dev->mutex)
>  {
>      rte_free(dev->tx_q);
> -    dpdk_mp_release(dev->mp);
> +    dpdk_mp_put(dev->dpdk_mp);
>  
>      ovs_list_remove(&dev->list_node);
>      free(ovsrcu_get_protected(struct ingress_policer *,
> @@ -1957,7 +2053,7 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq,
>          return EAGAIN;
>      }
>  
> -    nb_rx = rte_vhost_dequeue_burst(vid, qid, dev->mp,
> +    nb_rx = rte_vhost_dequeue_burst(vid, qid, dev->dpdk_mp->mp,
>                                      (struct rte_mbuf **) batch->packets,
>                                      NETDEV_MAX_BURST);
>      if (!nb_rx) {
> @@ -2196,7 +2292,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet_batch *batch)
>              continue;
>          }
>  
> -        pkts[txcnt] = rte_pktmbuf_alloc(dev->mp);
> +        pkts[txcnt] = rte_pktmbuf_alloc(dev->dpdk_mp->mp);
>          if (OVS_UNLIKELY(!pkts[txcnt])) {
>              dropped += cnt - i;
>              break;
> @@ -3075,7 +3171,7 @@ netdev_dpdk_get_mempool_info(struct unixctl_conn *conn,
>          ovs_mutex_lock(&dev->mutex);
>          ovs_mutex_lock(&dpdk_mp_mutex);
>  
> -        rte_mempool_dump(stream, dev->mp);
> +        rte_mempool_dump(stream, dev->dpdk_mp->mp);
>  
>          ovs_mutex_unlock(&dpdk_mp_mutex);
>          ovs_mutex_unlock(&dev->mutex);
> @@ -3772,7 +3868,7 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev)
>  
>      err = netdev_dpdk_mempool_configure(dev);
>      if (!err) {
> -        /* A new mempool was created. */
> +        /* A new mempool was created or re-used. */
>          netdev_change_seq_changed(&dev->up);
>      } else if (err != EEXIST){
>          return err;
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index b93da69..b9aba8c 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -360,6 +360,23 @@
>          </p>
>        </column>
>  
> +      <column name="other_config" key="per-port-memory"
> +              type='{"type": "boolean"}'>
> +        <p>
> +          By default OVS DPDK uses a shared memory model wherein devices
> +          that have the same MTU and socket values can share the same
> +          mempool. Setting this value to <code>true</code> changes this
> +          behaviour. Per port memory allow DPDK devices to use private
> +          memory per device. This can provide greater transapraency as
> +          regards memory usage but potentially at the cost of greater memory
> +          requirements.
> +        </p>
> +        <p>
> +          Changing this value requires restarting the daemon if dpdk-init has
> +          already been set to true.
> +        </p>
> +      </column>
> +
>        <column name="other_config" key="tx-flush-interval"
>                type='{"type": "integer",
>                       "minInteger": 0, "maxInteger": 1000000}'>
> 



More information about the dev mailing list