[ovs-dev] [PATCH RFC 1/1] netdev-dpdk: NUMA Aware vHost

Daniele Di Proietto diproiettod at vmware.com
Thu Mar 10 01:22:42 UTC 2016


Thanks for the patch, I'll put this in the use case list for
my series if I need to resend it!

It would be nice to get the numa socket information without
linking OVS with libnuma, maybe using some DPDK api. From
a quick look I didn't find any way, but maybe you know a
better way.

Some preliminary comments inline

On 04/03/2016 02:08, "dev on behalf of Ciara Loftus"
<dev-bounces at openvswitch.org on behalf of ciara.loftus at intel.com> wrote:

>This commit allows for vHost memory from QEMU, DPDK and OVS, as well
>as the servicing PMD, to all come from the same socket.
>
>DPDK v2.2 introduces a new configuration option:
>CONFIG_RTE_LIBRTE_VHOST_NUMA. If enabled, DPDK detects the socket
>from which a vhost device's memory has been allocated by QEMU, and
>accordingly reallocates device memory managed by DPDK to that same
>socket.
>
>OVS by default sets the socket id of a vhost port to that of the
>master lcore. This commit introduces the ability to update the
>socket id of the port if it is detected (during VM boot) that the
>port memory is not on the default NUMA node. If this is the case, the
>mempool of the port is also changed to the new node, and a PMD
>thread currently servicing the port will no longer, in favour of a
>thread from the new node (if enabled in the CPU mask).
>
>Signed-off-by: Ciara Loftus <ciara.loftus at intel.com>
>---
> INSTALL.DPDK.md   |  6 +++++-
> acinclude.m4      |  2 +-
> lib/netdev-dpdk.c | 25 +++++++++++++++++++++++--
> 3 files changed, 29 insertions(+), 4 deletions(-)
>
>diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
>index dca79bd..82e6908 100644
>--- a/INSTALL.DPDK.md
>+++ b/INSTALL.DPDK.md
>@@ -33,6 +33,10 @@ on Debian/Ubuntu)
> 
>      `CONFIG_RTE_BUILD_COMBINE_LIBS=y`
> 
>+     Enable NUMA-aware vHost by modifying the following in the same file:
>+
>+     `CONFIG_RTE_LIBRTE_VHOST_NUMA=y`
>+

I guess we should also update install_dpdk() in ./travis/build.sh to do
this if it's required

>      Then run `make install` to build and install the library.
>      For default install without IVSHMEM:
> 
>@@ -383,7 +387,7 @@ Performance Tuning:
> 
> 	It is good practice to ensure that threads that are in the datapath are
> 	pinned to cores in the same NUMA area. e.g. pmd threads and QEMU vCPUs
>-	responsible for forwarding.
>+	responsible for forwarding. This is now default behavior for vHost
>ports.
> 
>   9. Rx Mergeable buffers
> 
>diff --git a/acinclude.m4 b/acinclude.m4
>index 11c7787..432bdbd 100644
>--- a/acinclude.m4
>+++ b/acinclude.m4
>@@ -199,7 +199,7 @@ AC_DEFUN([OVS_CHECK_DPDK], [
>     found=false
>     save_LIBS=$LIBS
>     for extras in "" "-ldl"; do
>-        LIBS="$DPDK_LIB $extras $save_LIBS $DPDK_EXTRA_LIB"
>+        LIBS="$DPDK_LIB $extras $save_LIBS $DPDK_EXTRA_LIB -lnuma"

I guess we should also list libnuma-dev in .travis.yml and something
similar
in rhel/openvswitch-fedora.spec

>         AC_LINK_IFELSE(
>            [AC_LANG_PROGRAM([#include <rte_config.h>
>                              #include <rte_eal.h>],
>diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>index 17b8d51..4e1ce53 100644
>--- a/lib/netdev-dpdk.c
>+++ b/lib/netdev-dpdk.c
>@@ -29,6 +29,7 @@
> #include <stdio.h>
> #include <sys/types.h>
> #include <sys/stat.h>
>+#include <numaif.h>
> 
> #include "dirs.h"
> #include "dp-packet.h"
>@@ -1878,6 +1879,8 @@ new_device(struct virtio_net *dev)
> {
>     struct netdev_dpdk *netdev;
>     bool exists = false;
>+    int newnode = 0;
>+    long err = 0;
> 
>     ovs_mutex_lock(&dpdk_mutex);
>     /* Add device to the vhost port with the same name as that passed
>down. */
>@@ -1891,6 +1894,24 @@ new_device(struct virtio_net *dev)
>             }
>             ovsrcu_set(&netdev->virtio_dev, dev);
>             exists = true;
>+
>+            /* Get NUMA information */
>+            err = get_mempolicy(&newnode, NULL, 0, dev, MPOL_F_NODE |
>MPOL_F_ADDR);
>+            if (err) {
>+                VLOG_INFO("Error getting NUMA info for vHost Device
>'%s'",
>+                        dev->ifname);
>+                newnode = netdev->socket_id;
>+            } else if (newnode != netdev->socket_id) {
>+                netdev->socket_id = newnode;
>+                /* Change mempool to new NUMA Node */
>+                dpdk_mp_put(netdev->dpdk_mp);
>+                netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id,
>netdev->mtu);
>+                /* Request netdev reconfiguration. The port may now be
>+                 * serviced by a PMD on the new node if enabled in the
>cpu
>+                 * mask */
>+                netdev_request_reconfigure(&netdev->up);

I think here I would prefer:

1) Remembering the configuration change request
    netdev->requested_socket_id = newnode
2) Calling netdev_request_reconfigure()
3) In the netdev_dpdk_vhost_user_reconfigure() method:
    netdev->socket_id = netdev_requested_socket_id

Otherwise the datapath might be confused, because it assumes that the
socket_id doesn't change while the device is polled by the pmd threads.

It's safe to change almost everything inside the
netdev_dpdk_vhost_user_reconfigure(), because the device will not be
polled by the datapath when the function is called.

The same applies for the mempool: the actual change should be done
in netdev_dpdk_vhost_user_reconfigure(), because there might still be
threads using it.

>+            }
>+
>             dev->flags |= VIRTIO_DEV_RUNNING;
>             /* Disable notifications. */
>             set_irq_status(dev);
>@@ -1907,8 +1928,8 @@ new_device(struct virtio_net *dev)
>         return -1;
>     }
> 
>-    VLOG_INFO("vHost Device '%s' %"PRIu64" has been added", dev->ifname,
>-              dev->device_fh);
>+    VLOG_INFO("vHost Device '%s' %"PRIu64" has been added on socket %i",
>+              dev->ifname, dev->device_fh, newnode);
>     return 0;
> }
> 
>-- 
>2.4.3
>
>_______________________________________________
>dev mailing list
>dev at openvswitch.org
>http://openvswitch.org/mailman/listinfo/dev




More information about the dev mailing list