[ovs-dev] [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing mempools.

Kavanagh, Mark B mark.b.kavanagh at intel.com
Tue Oct 17 13:33:32 UTC 2017


>From: Fischetti, Antonio
>Sent: Monday, October 16, 2017 2:15 PM
>To: dev at openvswitch.org
>Cc: Kavanagh, Mark B <mark.b.kavanagh at intel.com>; Kevin Traynor
><ktraynor at redhat.com>; Aaron Conole <aconole at redhat.com>; Darrell Ball
><dlu998 at gmail.com>; Fischetti, Antonio <antonio.fischetti at intel.com>
>Subject: [PATCH v6 1/5] netdev-dpdk: fix management of pre-existing mempools.
>
>Fix issues on reconfiguration of pre-existing mempools.
>This patch avoids to call dpdk_mp_put() - and erroneously
>release the mempool - when it already exists.
>Create mempool names by considering also the NUMA socket number.
>So a name reflects what socket the mempool is allocated on.
>This change is needed for the NUMA-awareness feature.

Hi Antonio,

Is there any particular reason why you've combined patches 1 and 2 of the previous series in a single patch here?

I would have thought that these two separate issues would warrant two individual patches (particularly with respect to the reported-by, tested-by tags).

Maybe it's not a big deal, but noted here nonetheless.

Apart from that, there are some comments inline.

Thanks again,
Mark

>
>CC: Mark B Kavanagh <mark.b.kavanagh at intel.com>
>CC: Kevin Traynor <ktraynor at redhat.com>
>CC: Aaron Conole <aconole at redhat.com>
>CC: Darrell Ball <dlu998 at gmail.com>
>Reported-by: Ciara Loftus <ciara.loftus at intel.com>
>Tested-by: Ciara Loftus <ciara.loftus at intel.com>
>Reported-by: Róbert Mulik <robert.mulik at ericsson.com>
>Fixes: d555d9bded5f ("netdev-dpdk: Create separate memory pool for each
>port.")
>Signed-off-by: Antonio Fischetti <antonio.fischetti at intel.com>
>---
> Test on releasing pre-existing mempools
> =======================================
>I've tested this patch by
>  - changing at run-time the number of Rx queues:
>      ovs-vsctl set Interface dpdk0 type=dpdk options:n_rxq=4
>
>  - reducing the MTU of the dpdk ports of 1 byte to force
>    the configuration of an existing mempool:
>      ovs-vsctl set Interface dpdk0 mtu_request=1499
>
>This issue was observed in a PVP test topology with dpdkvhostuserclient
>ports. It can happen also with dpdk type ports, eg by reducing the MTU
>of 1 byte.
>
>To replicate the bug scenario in the PVP case it's sufficient to
>set 1 dpdkvhostuserclient port, and just boot the VM.
>
>Below some more details on my own test setup.
>
> PVP test setup
> --------------
>CLIENT_SOCK_DIR=/tmp
>SOCK0=dpdkvhostuser0
>SOCK1=dpdkvhostuser1
>
>1 PMD
>Add 2 dpdk ports, n_rxq=1
>Add 2 vhu ports both of type dpdkvhostuserclient and specify vhost-server-path
> ovs-vsctl set Interface dpdkvhostuser0 options:vhost-server-
>path="$CLIENT_SOCK_DIR/$SOCK0"
> ovs-vsctl set Interface dpdkvhostuser1 options:vhost-server-
>path="$CLIENT_SOCK_DIR/$SOCK1"
>
>Set port-based rules: dpdk0 <--> vhu0 and dpdk1 <--> vhu1
> add-flow br0 in_port=1,action=output:3
> add-flow br0 in_port=3,action=output:1
> add-flow br0 in_port=4,action=output:2
> add-flow br0 in_port=2,action=output:4
>
> Launch QEMU
> -----------
>As OvS vhu ports are acting as clients, we must specify 'server' in the next
>command.
>VM_IMAGE=<path/to/your/vm/image>
>
> sudo -E taskset 0x3F00 $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64 -name us-
>vhost-vm1 -cpu host -enable-kvm -m 4096M -object memory-backend-
>file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on -numa node,memdev=mem
>-mem-prealloc -smp 4 -drive file=$VM_IMAGE -chardev
>socket,id=char0,path=$CLIENT_SOCK_DIR/$SOCK0,server -netdev type=vhost-
>user,id=mynet1,chardev=char0,vhostforce -device virtio-net-
>pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=off -chardev
>socket,id=char1,path=$CLIENT_SOCK_DIR/$SOCK1,server -netdev type=vhost-
>user,id=mynet2,chardev=char1,vhostforce -device virtio-net-
>pci,mac=00:00:00:00:00:02,netdev=mynet2,mrg_rxbuf=off --nographic
>
> Expected behavior
> -----------------
>With this fix OvS shouldn't crash.
>
> Test NUMA-Awareness feature
> ===========================
>Mempool names now contains the requested socket id and become like:
>"ovs_4adb057e_1_2030_20512".
>
>Tested with DPDK 17.05.2 (from dpdk-stable branch).
>NUMA-awareness feature enabled (DPDK/config/common_base).
>
>Created 1 single dpdkvhostuser port type.
>OvS pmd-cpu-mask=FF00003     # enable cores on both numa nodes
>QEMU core mask = 0xFC000     # cores for qemu on numa node 1 only
>
> Before launching the VM:
> ------------------------
>ovs-appctl dpif-netdev/pmd-rxq-show
>shows core #1 is serving the vhu port.
>
>pmd thread numa_id 0 core_id 1:
>        isolated : false
>        port: dpdkvhostuser0    queue-id: 0
>
> After launching the VM:
> -----------------------
>the vhu port is now managed by core #27
>pmd thread numa_id 1 core_id 27:
>        isolated : false
>        port: dpdkvhostuser0    queue-id: 0
>
>and the log shows a new mempool is allocated on NUMA node 1, while
>the previous one is deleted:
>
>2017-10-06T14:04:55Z|00105|netdev_dpdk|DBG|Allocated
>"ovs_4adb057e_1_2030_20512" mempool with 20512 mbufs
>2017-10-06T14:04:55Z|00106|netdev_dpdk|DBG|Releasing
>"ovs_4adb057e_0_2030_20512" mempool
>---
> lib/netdev-dpdk.c | 42 +++++++++++++++++++++++++-----------------
> 1 file changed, 25 insertions(+), 17 deletions(-)
>
>diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>index c60f46f..7f2d7ed 100644
>--- a/lib/netdev-dpdk.c
>+++ b/lib/netdev-dpdk.c
>@@ -499,8 +499,8 @@ dpdk_mp_name(struct dpdk_mp *dmp)
> {
>     uint32_t h = hash_string(dmp->if_name, 0);
>     char *mp_name = xcalloc(RTE_MEMPOOL_NAMESIZE, sizeof *mp_name);
>-    int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_%x_%d_%u",
>-                       h, dmp->mtu, dmp->mp_size);
>+    int ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_%x_%d_%d_%u",
>+                       h, dmp->socket_id, dmp->mtu, dmp->mp_size);
>     if (ret < 0 || ret >= RTE_MEMPOOL_NAMESIZE) {
>         return NULL;
>     }
>@@ -508,12 +508,13 @@ dpdk_mp_name(struct dpdk_mp *dmp)
> }
>
> static struct dpdk_mp *
>-dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
>+dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool *mp_exists)
> {
>     struct dpdk_mp *dmp = dpdk_rte_mzalloc(sizeof *dmp);
>     if (!dmp) {
>         return NULL;
>     }
>+    *mp_exists = false;
>     dmp->socket_id = dev->requested_socket_id;
>     dmp->mtu = mtu;
>     ovs_strzcpy(dmp->if_name, dev->up.name, IFNAMSIZ);
>@@ -530,15 +531,14 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
>             + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST
>             + MIN_NB_MBUF;
>
>-    bool mp_exists = false;
>-
>     do {
>         char *mp_name = dpdk_mp_name(dmp);
>
>         VLOG_DBG("Requesting a mempool of %u mbufs for netdev %s "
>-                 "with %d Rx and %d Tx queues.",
>+                 "with %d Rx and %d Tx queues, socket id:%d.",
>                  dmp->mp_size, dev->up.name,
>-                 dev->requested_n_rxq, dev->requested_n_txq);
>+                 dev->requested_n_rxq, dev->requested_n_txq,
>+                 dev->requested_socket_id);
>
>         dmp->mp = rte_pktmbuf_pool_create(mp_name, dmp->mp_size,
>                                           MP_CACHE_SZ,
>@@ -559,7 +559,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
>             /* As the mempool create returned EEXIST we can expect the
>              * lookup has returned a valid pointer.  If for some reason
>              * that's not the case we keep track of it. */
>-            mp_exists = true;
>+            *mp_exists = true;
>         } else {
>             VLOG_ERR("Failed mempool \"%s\" create request of %u mbufs",
>                      mp_name, dmp->mp_size);
>@@ -573,7 +573,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
>             rte_mempool_obj_iter(dmp->mp, ovs_rte_pktmbuf_init, NULL);
>             return dmp;
>         }
>-    } while (!mp_exists &&
>+    } while (!(*mp_exists) &&
>             (rte_errno == ENOMEM && (dmp->mp_size /= 2) >= MIN_NB_MBUF));
>
>     rte_free(dmp);
>@@ -581,12 +581,12 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
> }
>
> static struct dpdk_mp *
>-dpdk_mp_get(struct netdev_dpdk *dev, int mtu)
>+dpdk_mp_get(struct netdev_dpdk *dev, int mtu, bool *mp_exists)
> {
>     struct dpdk_mp *dmp;
>
>     ovs_mutex_lock(&dpdk_mp_mutex);
>-    dmp = dpdk_mp_create(dev, mtu);
>+    dmp = dpdk_mp_create(dev, mtu, mp_exists);
>     ovs_mutex_unlock(&dpdk_mp_mutex);
>
>     return dmp;
>@@ -620,14 +620,22 @@ netdev_dpdk_mempool_configure(struct netdev_dpdk *dev)

Apologies for not noticing this in my previous review, but the comment for this function needs to be updated.

It currently reads:
   " /* Tries to allocate new mempool on requested_socket_id with
	 * mbuf size corresponding to requested_mtu.
	 * On success new configuration will be applied.
	 * On error, device will be left unchanged. */"

It should be updated to reflect the fact that it tries to allocate a new mempool, or reuse an existing one, where appropriate.
Furthermore, if the error value is EEXIST, the new configuration (i.e. dev->mtu, dev->max_packet_len) is applied, so the device is not unchanged, as the comment suggests.

> {
>     uint32_t buf_size = dpdk_buf_size(dev->requested_mtu);
>     struct dpdk_mp *mp;
>+    bool mp_exists;

You don't need the 'mp_exists' variable.

If, as part of the dpdk_mp_get() call stack, rte_mempool_lookup() fails, then dpdk_mp_create() returns NULL, which is already handled by netdev_dpdk_mempool_configure(),
(marked as [1], below).

So, to handle the case where the mempool already exists - i.e. when in the callstack, rte_pkt_mbuf_pool_create() failed and set rte_errno to EEXIST - you only need to check rte_errno here;
(see [2], below).

>
>-    mp = dpdk_mp_get(dev, FRAME_LEN_TO_MTU(buf_size));
>+    mp = dpdk_mp_get(dev, FRAME_LEN_TO_MTU(buf_size), &mp_exists);

[1]
>     if (!mp) {
>         VLOG_ERR("Failed to create memory pool for netdev "
>                  "%s, with MTU %d on socket %d: %s\n",
>                  dev->up.name, dev->requested_mtu, dev->requested_socket_id,
>                  rte_strerror(rte_errno));
>         return rte_errno;
>+    } else if (mp_exists) {

[2]
"if (rte_errno == EEXIST)" is sufficient here

>+        /* If a new MTU was requested and its rounded value equals the one
>+         * that is currently used, then the existing mempool is returned.
>+         * Update dev with the new values. */
>+        dev->mtu = dev->requested_mtu;
>+        dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
>+        return EEXIST;

Replace with "return rte_errno" for consistency with the previous return (since the value of rte_errno is guaranteed to be EEXIST here). 

>     } else {
>         dpdk_mp_put(dev->dpdk_mp);
>         dev->dpdk_mp = mp;
>@@ -3207,7 +3215,7 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>     rte_eth_dev_stop(dev->port_id);
>
>     err = netdev_dpdk_mempool_configure(dev);
>-    if (err) {
>+    if (err && err != EEXIST) {
>         goto out;
>     }
>
>@@ -3247,12 +3255,12 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev)
>     netdev_dpdk_remap_txqs(dev);
>
>     err = netdev_dpdk_mempool_configure(dev);
>-    if (err) {
>-        return err;
>-    } else {
>+    if (!err) {
>+        /* A new mempool was created. */
>         netdev_change_seq_changed(&dev->up);
>+    } else if (err != EEXIST){
>+        return err;
>     }
>-
>     if (netdev_dpdk_get_vid(dev) >= 0) {
>         if (dev->vhost_reconfigured == false) {
>             dev->vhost_reconfigured = true;
>--
>2.4.11



More information about the dev mailing list