[ovs-dev] [PATCH RFC 1/1] netdev-dpdk: NUMA Aware vHost
Daniele Di Proietto
diproiettod at vmware.com
Thu Mar 10 01:22:42 UTC 2016
Thanks for the patch, I'll put this in the use case list for
my series if I need to resend it!
It would be nice to get the numa socket information without
linking OVS with libnuma, maybe using some DPDK api. From
a quick look I didn't find any way, but maybe you know a
better way.
Some preliminary comments inline
On 04/03/2016 02:08, "dev on behalf of Ciara Loftus"
<dev-bounces at openvswitch.org on behalf of ciara.loftus at intel.com> wrote:
>This commit allows for vHost memory from QEMU, DPDK and OVS, as well
>as the servicing PMD, to all come from the same socket.
>
>DPDK v2.2 introduces a new configuration option:
>CONFIG_RTE_LIBRTE_VHOST_NUMA. If enabled, DPDK detects the socket
>from which a vhost device's memory has been allocated by QEMU, and
>accordingly reallocates device memory managed by DPDK to that same
>socket.
>
>OVS by default sets the socket id of a vhost port to that of the
>master lcore. This commit introduces the ability to update the
>socket id of the port if it is detected (during VM boot) that the
>port memory is not on the default NUMA node. If this is the case, the
>mempool of the port is also changed to the new node, and a PMD
>thread currently servicing the port will no longer, in favour of a
>thread from the new node (if enabled in the CPU mask).
>
>Signed-off-by: Ciara Loftus <ciara.loftus at intel.com>
>---
> INSTALL.DPDK.md | 6 +++++-
> acinclude.m4 | 2 +-
> lib/netdev-dpdk.c | 25 +++++++++++++++++++++++--
> 3 files changed, 29 insertions(+), 4 deletions(-)
>
>diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
>index dca79bd..82e6908 100644
>--- a/INSTALL.DPDK.md
>+++ b/INSTALL.DPDK.md
>@@ -33,6 +33,10 @@ on Debian/Ubuntu)
>
> `CONFIG_RTE_BUILD_COMBINE_LIBS=y`
>
>+ Enable NUMA-aware vHost by modifying the following in the same file:
>+
>+ `CONFIG_RTE_LIBRTE_VHOST_NUMA=y`
>+
I guess we should also update install_dpdk() in ./travis/build.sh to do
this if it's required
> Then run `make install` to build and install the library.
> For default install without IVSHMEM:
>
>@@ -383,7 +387,7 @@ Performance Tuning:
>
> It is good practice to ensure that threads that are in the datapath are
> pinned to cores in the same NUMA area. e.g. pmd threads and QEMU vCPUs
>- responsible for forwarding.
>+ responsible for forwarding. This is now default behavior for vHost
>ports.
>
> 9. Rx Mergeable buffers
>
>diff --git a/acinclude.m4 b/acinclude.m4
>index 11c7787..432bdbd 100644
>--- a/acinclude.m4
>+++ b/acinclude.m4
>@@ -199,7 +199,7 @@ AC_DEFUN([OVS_CHECK_DPDK], [
> found=false
> save_LIBS=$LIBS
> for extras in "" "-ldl"; do
>- LIBS="$DPDK_LIB $extras $save_LIBS $DPDK_EXTRA_LIB"
>+ LIBS="$DPDK_LIB $extras $save_LIBS $DPDK_EXTRA_LIB -lnuma"
I guess we should also list libnuma-dev in .travis.yml and something
similar
in rhel/openvswitch-fedora.spec
> AC_LINK_IFELSE(
> [AC_LANG_PROGRAM([#include <rte_config.h>
> #include <rte_eal.h>],
>diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>index 17b8d51..4e1ce53 100644
>--- a/lib/netdev-dpdk.c
>+++ b/lib/netdev-dpdk.c
>@@ -29,6 +29,7 @@
> #include <stdio.h>
> #include <sys/types.h>
> #include <sys/stat.h>
>+#include <numaif.h>
>
> #include "dirs.h"
> #include "dp-packet.h"
>@@ -1878,6 +1879,8 @@ new_device(struct virtio_net *dev)
> {
> struct netdev_dpdk *netdev;
> bool exists = false;
>+ int newnode = 0;
>+ long err = 0;
>
> ovs_mutex_lock(&dpdk_mutex);
> /* Add device to the vhost port with the same name as that passed
>down. */
>@@ -1891,6 +1894,24 @@ new_device(struct virtio_net *dev)
> }
> ovsrcu_set(&netdev->virtio_dev, dev);
> exists = true;
>+
>+ /* Get NUMA information */
>+ err = get_mempolicy(&newnode, NULL, 0, dev, MPOL_F_NODE |
>MPOL_F_ADDR);
>+ if (err) {
>+ VLOG_INFO("Error getting NUMA info for vHost Device
>'%s'",
>+ dev->ifname);
>+ newnode = netdev->socket_id;
>+ } else if (newnode != netdev->socket_id) {
>+ netdev->socket_id = newnode;
>+ /* Change mempool to new NUMA Node */
>+ dpdk_mp_put(netdev->dpdk_mp);
>+ netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id,
>netdev->mtu);
>+ /* Request netdev reconfiguration. The port may now be
>+ * serviced by a PMD on the new node if enabled in the
>cpu
>+ * mask */
>+ netdev_request_reconfigure(&netdev->up);
I think here I would prefer:
1) Remembering the configuration change request
netdev->requested_socket_id = newnode
2) Calling netdev_request_reconfigure()
3) In the netdev_dpdk_vhost_user_reconfigure() method:
netdev->socket_id = netdev_requested_socket_id
Otherwise the datapath might be confused, because it assumes that the
socket_id doesn't change while the device is polled by the pmd threads.
It's safe to change almost everything inside the
netdev_dpdk_vhost_user_reconfigure(), because the device will not be
polled by the datapath when the function is called.
The same applies for the mempool: the actual change should be done
in netdev_dpdk_vhost_user_reconfigure(), because there might still be
threads using it.
>+ }
>+
> dev->flags |= VIRTIO_DEV_RUNNING;
> /* Disable notifications. */
> set_irq_status(dev);
>@@ -1907,8 +1928,8 @@ new_device(struct virtio_net *dev)
> return -1;
> }
>
>- VLOG_INFO("vHost Device '%s' %"PRIu64" has been added", dev->ifname,
>- dev->device_fh);
>+ VLOG_INFO("vHost Device '%s' %"PRIu64" has been added on socket %i",
>+ dev->ifname, dev->device_fh, newnode);
> return 0;
> }
>
>--
>2.4.3
>
>_______________________________________________
>dev mailing list
>dev at openvswitch.org
>http://openvswitch.org/mailman/listinfo/dev
More information about the dev
mailing list