[ovs-dev] [PATCH RFC 1/1] netdev-dpdk: NUMA Aware vHost
Ben Pfaff
blp at ovn.org
Thu Mar 10 22:38:43 UTC 2016
How does this numa library relate to lib/ovs-numa.[ch]?
On Thu, Mar 10, 2016 at 01:22:42AM +0000, Daniele Di Proietto wrote:
> Thanks for the patch, I'll put this in the use case list for
> my series if I need to resend it!
>
> It would be nice to get the numa socket information without
> linking OVS with libnuma, maybe using some DPDK api. From
> a quick look I didn't find any way, but maybe you know a
> better way.
>
> Some preliminary comments inline
>
> On 04/03/2016 02:08, "dev on behalf of Ciara Loftus"
> <dev-bounces at openvswitch.org on behalf of ciara.loftus at intel.com> wrote:
>
> >This commit allows for vHost memory from QEMU, DPDK and OVS, as well
> >as the servicing PMD, to all come from the same socket.
> >
> >DPDK v2.2 introduces a new configuration option:
> >CONFIG_RTE_LIBRTE_VHOST_NUMA. If enabled, DPDK detects the socket
> >from which a vhost device's memory has been allocated by QEMU, and
> >accordingly reallocates device memory managed by DPDK to that same
> >socket.
> >
> >OVS by default sets the socket id of a vhost port to that of the
> >master lcore. This commit introduces the ability to update the
> >socket id of the port if it is detected (during VM boot) that the
> >port memory is not on the default NUMA node. If this is the case, the
> >mempool of the port is also changed to the new node, and a PMD
> >thread currently servicing the port will no longer, in favour of a
> >thread from the new node (if enabled in the CPU mask).
> >
> >Signed-off-by: Ciara Loftus <ciara.loftus at intel.com>
> >---
> > INSTALL.DPDK.md | 6 +++++-
> > acinclude.m4 | 2 +-
> > lib/netdev-dpdk.c | 25 +++++++++++++++++++++++--
> > 3 files changed, 29 insertions(+), 4 deletions(-)
> >
> >diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> >index dca79bd..82e6908 100644
> >--- a/INSTALL.DPDK.md
> >+++ b/INSTALL.DPDK.md
> >@@ -33,6 +33,10 @@ on Debian/Ubuntu)
> >
> > `CONFIG_RTE_BUILD_COMBINE_LIBS=y`
> >
> >+ Enable NUMA-aware vHost by modifying the following in the same file:
> >+
> >+ `CONFIG_RTE_LIBRTE_VHOST_NUMA=y`
> >+
>
> I guess we should also update install_dpdk() in ./travis/build.sh to do
> this if it's required
>
> > Then run `make install` to build and install the library.
> > For default install without IVSHMEM:
> >
> >@@ -383,7 +387,7 @@ Performance Tuning:
> >
> > It is good practice to ensure that threads that are in the datapath are
> > pinned to cores in the same NUMA area. e.g. pmd threads and QEMU vCPUs
> >- responsible for forwarding.
> >+ responsible for forwarding. This is now default behavior for vHost
> >ports.
> >
> > 9. Rx Mergeable buffers
> >
> >diff --git a/acinclude.m4 b/acinclude.m4
> >index 11c7787..432bdbd 100644
> >--- a/acinclude.m4
> >+++ b/acinclude.m4
> >@@ -199,7 +199,7 @@ AC_DEFUN([OVS_CHECK_DPDK], [
> > found=false
> > save_LIBS=$LIBS
> > for extras in "" "-ldl"; do
> >- LIBS="$DPDK_LIB $extras $save_LIBS $DPDK_EXTRA_LIB"
> >+ LIBS="$DPDK_LIB $extras $save_LIBS $DPDK_EXTRA_LIB -lnuma"
>
> I guess we should also list libnuma-dev in .travis.yml and something
> similar
> in rhel/openvswitch-fedora.spec
>
> > AC_LINK_IFELSE(
> > [AC_LANG_PROGRAM([#include <rte_config.h>
> > #include <rte_eal.h>],
> >diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> >index 17b8d51..4e1ce53 100644
> >--- a/lib/netdev-dpdk.c
> >+++ b/lib/netdev-dpdk.c
> >@@ -29,6 +29,7 @@
> > #include <stdio.h>
> > #include <sys/types.h>
> > #include <sys/stat.h>
> >+#include <numaif.h>
> >
> > #include "dirs.h"
> > #include "dp-packet.h"
> >@@ -1878,6 +1879,8 @@ new_device(struct virtio_net *dev)
> > {
> > struct netdev_dpdk *netdev;
> > bool exists = false;
> >+ int newnode = 0;
> >+ long err = 0;
> >
> > ovs_mutex_lock(&dpdk_mutex);
> > /* Add device to the vhost port with the same name as that passed
> >down. */
> >@@ -1891,6 +1894,24 @@ new_device(struct virtio_net *dev)
> > }
> > ovsrcu_set(&netdev->virtio_dev, dev);
> > exists = true;
> >+
> >+ /* Get NUMA information */
> >+ err = get_mempolicy(&newnode, NULL, 0, dev, MPOL_F_NODE |
> >MPOL_F_ADDR);
> >+ if (err) {
> >+ VLOG_INFO("Error getting NUMA info for vHost Device
> >'%s'",
> >+ dev->ifname);
> >+ newnode = netdev->socket_id;
> >+ } else if (newnode != netdev->socket_id) {
> >+ netdev->socket_id = newnode;
> >+ /* Change mempool to new NUMA Node */
> >+ dpdk_mp_put(netdev->dpdk_mp);
> >+ netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id,
> >netdev->mtu);
> >+ /* Request netdev reconfiguration. The port may now be
> >+ * serviced by a PMD on the new node if enabled in the
> >cpu
> >+ * mask */
> >+ netdev_request_reconfigure(&netdev->up);
>
> I think here I would prefer:
>
> 1) Remembering the configuration change request
> netdev->requested_socket_id = newnode
> 2) Calling netdev_request_reconfigure()
> 3) In the netdev_dpdk_vhost_user_reconfigure() method:
> netdev->socket_id = netdev_requested_socket_id
>
> Otherwise the datapath might be confused, because it assumes that the
> socket_id doesn't change while the device is polled by the pmd threads.
>
> It's safe to change almost everything inside the
> netdev_dpdk_vhost_user_reconfigure(), because the device will not be
> polled by the datapath when the function is called.
>
> The same applies for the mempool: the actual change should be done
> in netdev_dpdk_vhost_user_reconfigure(), because there might still be
> threads using it.
>
> >+ }
> >+
> > dev->flags |= VIRTIO_DEV_RUNNING;
> > /* Disable notifications. */
> > set_irq_status(dev);
> >@@ -1907,8 +1928,8 @@ new_device(struct virtio_net *dev)
> > return -1;
> > }
> >
> >- VLOG_INFO("vHost Device '%s' %"PRIu64" has been added", dev->ifname,
> >- dev->device_fh);
> >+ VLOG_INFO("vHost Device '%s' %"PRIu64" has been added on socket %i",
> >+ dev->ifname, dev->device_fh, newnode);
> > return 0;
> > }
> >
> >--
> >2.4.3
> >
> >_______________________________________________
> >dev mailing list
> >dev at openvswitch.org
> >http://openvswitch.org/mailman/listinfo/dev
>
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
More information about the dev
mailing list