[ovs-dev] [PATCH RFC 1/1] netdev-dpdk: NUMA Aware vHost

Ben Pfaff blp at ovn.org
Thu Mar 10 22:38:43 UTC 2016


How does this numa library relate to lib/ovs-numa.[ch]?

On Thu, Mar 10, 2016 at 01:22:42AM +0000, Daniele Di Proietto wrote:
> Thanks for the patch, I'll put this in the use case list for
> my series if I need to resend it!
> 
> It would be nice to get the numa socket information without
> linking OVS with libnuma, maybe using some DPDK api. From
> a quick look I didn't find any way, but maybe you know a
> better way.
> 
> Some preliminary comments inline
> 
> On 04/03/2016 02:08, "dev on behalf of Ciara Loftus"
> <dev-bounces at openvswitch.org on behalf of ciara.loftus at intel.com> wrote:
> 
> >This commit allows for vHost memory from QEMU, DPDK and OVS, as well
> >as the servicing PMD, to all come from the same socket.
> >
> >DPDK v2.2 introduces a new configuration option:
> >CONFIG_RTE_LIBRTE_VHOST_NUMA. If enabled, DPDK detects the socket
> >from which a vhost device's memory has been allocated by QEMU, and
> >accordingly reallocates device memory managed by DPDK to that same
> >socket.
> >
> >OVS by default sets the socket id of a vhost port to that of the
> >master lcore. This commit introduces the ability to update the
> >socket id of the port if it is detected (during VM boot) that the
> >port memory is not on the default NUMA node. If this is the case, the
> >mempool of the port is also changed to the new node, and a PMD
> >thread currently servicing the port will no longer, in favour of a
> >thread from the new node (if enabled in the CPU mask).
> >
> >Signed-off-by: Ciara Loftus <ciara.loftus at intel.com>
> >---
> > INSTALL.DPDK.md   |  6 +++++-
> > acinclude.m4      |  2 +-
> > lib/netdev-dpdk.c | 25 +++++++++++++++++++++++--
> > 3 files changed, 29 insertions(+), 4 deletions(-)
> >
> >diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> >index dca79bd..82e6908 100644
> >--- a/INSTALL.DPDK.md
> >+++ b/INSTALL.DPDK.md
> >@@ -33,6 +33,10 @@ on Debian/Ubuntu)
> > 
> >      `CONFIG_RTE_BUILD_COMBINE_LIBS=y`
> > 
> >+     Enable NUMA-aware vHost by modifying the following in the same file:
> >+
> >+     `CONFIG_RTE_LIBRTE_VHOST_NUMA=y`
> >+
> 
> I guess we should also update install_dpdk() in ./travis/build.sh to do
> this if it's required
> 
> >      Then run `make install` to build and install the library.
> >      For default install without IVSHMEM:
> > 
> >@@ -383,7 +387,7 @@ Performance Tuning:
> > 
> > 	It is good practice to ensure that threads that are in the datapath are
> > 	pinned to cores in the same NUMA area. e.g. pmd threads and QEMU vCPUs
> >-	responsible for forwarding.
> >+	responsible for forwarding. This is now default behavior for vHost
> >ports.
> > 
> >   9. Rx Mergeable buffers
> > 
> >diff --git a/acinclude.m4 b/acinclude.m4
> >index 11c7787..432bdbd 100644
> >--- a/acinclude.m4
> >+++ b/acinclude.m4
> >@@ -199,7 +199,7 @@ AC_DEFUN([OVS_CHECK_DPDK], [
> >     found=false
> >     save_LIBS=$LIBS
> >     for extras in "" "-ldl"; do
> >-        LIBS="$DPDK_LIB $extras $save_LIBS $DPDK_EXTRA_LIB"
> >+        LIBS="$DPDK_LIB $extras $save_LIBS $DPDK_EXTRA_LIB -lnuma"
> 
> I guess we should also list libnuma-dev in .travis.yml and something
> similar
> in rhel/openvswitch-fedora.spec
> 
> >         AC_LINK_IFELSE(
> >            [AC_LANG_PROGRAM([#include <rte_config.h>
> >                              #include <rte_eal.h>],
> >diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> >index 17b8d51..4e1ce53 100644
> >--- a/lib/netdev-dpdk.c
> >+++ b/lib/netdev-dpdk.c
> >@@ -29,6 +29,7 @@
> > #include <stdio.h>
> > #include <sys/types.h>
> > #include <sys/stat.h>
> >+#include <numaif.h>
> > 
> > #include "dirs.h"
> > #include "dp-packet.h"
> >@@ -1878,6 +1879,8 @@ new_device(struct virtio_net *dev)
> > {
> >     struct netdev_dpdk *netdev;
> >     bool exists = false;
> >+    int newnode = 0;
> >+    long err = 0;
> > 
> >     ovs_mutex_lock(&dpdk_mutex);
> >     /* Add device to the vhost port with the same name as that passed
> >down. */
> >@@ -1891,6 +1894,24 @@ new_device(struct virtio_net *dev)
> >             }
> >             ovsrcu_set(&netdev->virtio_dev, dev);
> >             exists = true;
> >+
> >+            /* Get NUMA information */
> >+            err = get_mempolicy(&newnode, NULL, 0, dev, MPOL_F_NODE |
> >MPOL_F_ADDR);
> >+            if (err) {
> >+                VLOG_INFO("Error getting NUMA info for vHost Device
> >'%s'",
> >+                        dev->ifname);
> >+                newnode = netdev->socket_id;
> >+            } else if (newnode != netdev->socket_id) {
> >+                netdev->socket_id = newnode;
> >+                /* Change mempool to new NUMA Node */
> >+                dpdk_mp_put(netdev->dpdk_mp);
> >+                netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id,
> >netdev->mtu);
> >+                /* Request netdev reconfiguration. The port may now be
> >+                 * serviced by a PMD on the new node if enabled in the
> >cpu
> >+                 * mask */
> >+                netdev_request_reconfigure(&netdev->up);
> 
> I think here I would prefer:
> 
> 1) Remembering the configuration change request
>     netdev->requested_socket_id = newnode
> 2) Calling netdev_request_reconfigure()
> 3) In the netdev_dpdk_vhost_user_reconfigure() method:
>     netdev->socket_id = netdev_requested_socket_id
> 
> Otherwise the datapath might be confused, because it assumes that the
> socket_id doesn't change while the device is polled by the pmd threads.
> 
> It's safe to change almost everything inside the
> netdev_dpdk_vhost_user_reconfigure(), because the device will not be
> polled by the datapath when the function is called.
> 
> The same applies for the mempool: the actual change should be done
> in netdev_dpdk_vhost_user_reconfigure(), because there might still be
> threads using it.
> 
> >+            }
> >+
> >             dev->flags |= VIRTIO_DEV_RUNNING;
> >             /* Disable notifications. */
> >             set_irq_status(dev);
> >@@ -1907,8 +1928,8 @@ new_device(struct virtio_net *dev)
> >         return -1;
> >     }
> > 
> >-    VLOG_INFO("vHost Device '%s' %"PRIu64" has been added", dev->ifname,
> >-              dev->device_fh);
> >+    VLOG_INFO("vHost Device '%s' %"PRIu64" has been added on socket %i",
> >+              dev->ifname, dev->device_fh, newnode);
> >     return 0;
> > }
> > 
> >-- 
> >2.4.3
> >
> >_______________________________________________
> >dev mailing list
> >dev at openvswitch.org
> >http://openvswitch.org/mailman/listinfo/dev
> 
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev



More information about the dev mailing list