[ovs-dev] [PATCH RFC 1/1] netdev-dpdk: NUMA Aware vHost

Loftus, Ciara ciara.loftus at intel.com
Wed Mar 16 14:48:59 UTC 2016


> 
> Thanks for the patch, I'll put this in the use case list for
> my series if I need to resend it!
> 
> It would be nice to get the numa socket information without
> linking OVS with libnuma, maybe using some DPDK api. From
> a quick look I didn't find any way, but maybe you know a
> better way.
> 
> Some preliminary comments inline

Thanks for the feedback. I've sent out a v2 of the patch that incorporates most of your requests.
I just noticed I misspelled your name in the cover letter, apologies!

Some small comments below.

Thanks,
Ciara
> 
> On 04/03/2016 02:08, "dev on behalf of Ciara Loftus"
> <dev-bounces at openvswitch.org on behalf of ciara.loftus at intel.com>
> wrote:
> 
> >This commit allows for vHost memory from QEMU, DPDK and OVS, as well
> >as the servicing PMD, to all come from the same socket.
> >
> >DPDK v2.2 introduces a new configuration option:
> >CONFIG_RTE_LIBRTE_VHOST_NUMA. If enabled, DPDK detects the socket
> >from which a vhost device's memory has been allocated by QEMU, and
> >accordingly reallocates device memory managed by DPDK to that same
> >socket.
> >
> >OVS by default sets the socket id of a vhost port to that of the
> >master lcore. This commit introduces the ability to update the
> >socket id of the port if it is detected (during VM boot) that the
> >port memory is not on the default NUMA node. If this is the case, the
> >mempool of the port is also changed to the new node, and a PMD
> >thread currently servicing the port will no longer, in favour of a
> >thread from the new node (if enabled in the CPU mask).
> >
> >Signed-off-by: Ciara Loftus <ciara.loftus at intel.com>
> >---
> > INSTALL.DPDK.md   |  6 +++++-
> > acinclude.m4      |  2 +-
> > lib/netdev-dpdk.c | 25 +++++++++++++++++++++++--
> > 3 files changed, 29 insertions(+), 4 deletions(-)
> >
> >diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> >index dca79bd..82e6908 100644
> >--- a/INSTALL.DPDK.md
> >+++ b/INSTALL.DPDK.md
> >@@ -33,6 +33,10 @@ on Debian/Ubuntu)
> >
> >      `CONFIG_RTE_BUILD_COMBINE_LIBS=y`
> >
> >+     Enable NUMA-aware vHost by modifying the following in the same file:
> >+
> >+     `CONFIG_RTE_LIBRTE_VHOST_NUMA=y`
> >+
> 
> I guess we should also update install_dpdk() in ./travis/build.sh to do
> this if it's required
I left this out because everything will still work ok (ie. as before) without this option. It can be optionally enabled if the functionality is desired.
However if we think it should be always enabled I can include this in the next revision.

> 
> >      Then run `make install` to build and install the library.
> >      For default install without IVSHMEM:
> >
> >@@ -383,7 +387,7 @@ Performance Tuning:
> >
> > 	It is good practice to ensure that threads that are in the datapath are
> > 	pinned to cores in the same NUMA area. e.g. pmd threads and
> QEMU vCPUs
> >-	responsible for forwarding.
> >+	responsible for forwarding. This is now default behavior for vHost
> >ports.
> >
> >   9. Rx Mergeable buffers
> >
> >diff --git a/acinclude.m4 b/acinclude.m4
> >index 11c7787..432bdbd 100644
> >--- a/acinclude.m4
> >+++ b/acinclude.m4
> >@@ -199,7 +199,7 @@ AC_DEFUN([OVS_CHECK_DPDK], [
> >     found=false
> >     save_LIBS=$LIBS
> >     for extras in "" "-ldl"; do
> >-        LIBS="$DPDK_LIB $extras $save_LIBS $DPDK_EXTRA_LIB"
> >+        LIBS="$DPDK_LIB $extras $save_LIBS $DPDK_EXTRA_LIB -lnuma"
> 
> I guess we should also list libnuma-dev in .travis.yml and something
> similar
> in rhel/openvswitch-fedora.spec
I updated these in the v2. 

> 
> >         AC_LINK_IFELSE(
> >            [AC_LANG_PROGRAM([#include <rte_config.h>
> >                              #include <rte_eal.h>],
> >diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> >index 17b8d51..4e1ce53 100644
> >--- a/lib/netdev-dpdk.c
> >+++ b/lib/netdev-dpdk.c
> >@@ -29,6 +29,7 @@
> > #include <stdio.h>
> > #include <sys/types.h>
> > #include <sys/stat.h>
> >+#include <numaif.h>
> >
> > #include "dirs.h"
> > #include "dp-packet.h"
> >@@ -1878,6 +1879,8 @@ new_device(struct virtio_net *dev)
> > {
> >     struct netdev_dpdk *netdev;
> >     bool exists = false;
> >+    int newnode = 0;
> >+    long err = 0;
> >
> >     ovs_mutex_lock(&dpdk_mutex);
> >     /* Add device to the vhost port with the same name as that passed
> >down. */
> >@@ -1891,6 +1894,24 @@ new_device(struct virtio_net *dev)
> >             }
> >             ovsrcu_set(&netdev->virtio_dev, dev);
> >             exists = true;
> >+
> >+            /* Get NUMA information */
> >+            err = get_mempolicy(&newnode, NULL, 0, dev, MPOL_F_NODE |
> >MPOL_F_ADDR);
> >+            if (err) {
> >+                VLOG_INFO("Error getting NUMA info for vHost Device
> >'%s'",
> >+                        dev->ifname);
> >+                newnode = netdev->socket_id;
> >+            } else if (newnode != netdev->socket_id) {
> >+                netdev->socket_id = newnode;
> >+                /* Change mempool to new NUMA Node */
> >+                dpdk_mp_put(netdev->dpdk_mp);
> >+                netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id,
> >netdev->mtu);
> >+                /* Request netdev reconfiguration. The port may now be
> >+                 * serviced by a PMD on the new node if enabled in the
> >cpu
> >+                 * mask */
> >+                netdev_request_reconfigure(&netdev->up);
> 
> I think here I would prefer:
> 
> 1) Remembering the configuration change request
>     netdev->requested_socket_id = newnode
> 2) Calling netdev_request_reconfigure()
> 3) In the netdev_dpdk_vhost_user_reconfigure() method:
>     netdev->socket_id = netdev_requested_socket_id
> 
> Otherwise the datapath might be confused, because it assumes that the
> socket_id doesn't change while the device is polled by the pmd threads.
> 
> It's safe to change almost everything inside the
> netdev_dpdk_vhost_user_reconfigure(), because the device will not be
> polled by the datapath when the function is called.
> 
> The same applies for the mempool: the actual change should be done
> in netdev_dpdk_vhost_user_reconfigure(), because there might still be
> threads using it.
Makes sense thanks for the suggestion. This is implemented in the v2.

> 
> >+            }
> >+
> >             dev->flags |= VIRTIO_DEV_RUNNING;
> >             /* Disable notifications. */
> >             set_irq_status(dev);
> >@@ -1907,8 +1928,8 @@ new_device(struct virtio_net *dev)
> >         return -1;
> >     }
> >
> >-    VLOG_INFO("vHost Device '%s' %"PRIu64" has been added", dev-
> >ifname,
> >-              dev->device_fh);
> >+    VLOG_INFO("vHost Device '%s' %"PRIu64" has been added on socket
> %i",
> >+              dev->ifname, dev->device_fh, newnode);
> >     return 0;
> > }
> >
> >--
> >2.4.3
> >
> >_______________________________________________
> >dev mailing list
> >dev at openvswitch.org
> >http://openvswitch.org/mailman/listinfo/dev




More information about the dev mailing list