[ovs-dev] [PATCH RFC v3 1/1] netdev-dpdk: add dpdk vhost-user ports

Ciara Loftus ciara.loftus at intel.com
Wed Apr 29 20:51:58 UTC 2015


This patch adds support for a new port type to the userspace
datapath called dpdkvhostuser. It adds to the existing
infrastructure of vhost-cuse, however disables vhost-cuse ports
as the default port type, in favour of vhost-user ports.
vhost-cuse 'dpdkvhost' ports are still available and can be
enabled using a configure flag, steps for which are available
in INSTALL.DPDK.md.

A new dpdkvhostuser port will create a unix domain socket which
when provided to QEMU is used to facilitate communication between
the virtio-net device on the VM and the OVS port on the host.

Signed-off-by: Ciara Loftus <ciara.loftus at intel.com>
---
 INSTALL.DPDK.md         | 143 ++++++++++++++++++++++++++++++++++++++++--------
 acinclude.m4            |  13 +++++
 configure.ac            |   1 +
 lib/netdev-dpdk.c       | 108 +++++++++++++++++++++++++++++-------
 lib/netdev.c            |   4 ++
 vswitchd/ovs-vswitchd.c |   9 ++-
 6 files changed, 231 insertions(+), 47 deletions(-)

diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
index aae97a5..7fb2449 100644
--- a/INSTALL.DPDK.md
+++ b/INSTALL.DPDK.md
@@ -306,40 +306,132 @@ the vswitchd.
 DPDK vhost:
 -----------
 
-vhost-cuse is only supported at present i.e. not using the standard QEMU
-vhost-user interface. It is intended that vhost-user support will be added
-in future releases when supported in DPDK and that vhost-cuse will eventually
-be deprecated. See [DPDK Docs] for more info on vhost.
+DPDK 2.0 supports two types of vhost:
+
+1. vhost-user
+2. vhost-cuse
+
+By default, vhost-user is enabled in DPDK and following this, the same
+applies for OVS.
+
+Should you wish to use vhost-cuse instead of vhost-user, you must enable
+vhost-cuse in OVS and re-build. This can be achieved by using the
+`--with-vhostcuse` flag in the `./configure` step like so:
+
+`./configure --with-dpdk=$DPDK_BUILD --with-vhostcuse`
 
 Prerequisites:
-1.  Insert the Cuse module:
 
-      `modprobe cuse`
+1. DPDK 2.0 with vhost support enabled as documented in the "Building and
+   Installing section":
+
+  1. Update `config/common_linuxapp` so that DPDK is built with vhost-user
+     libraries:
+
+     `CONFIG_RTE_LIBRTE_VHOST=y`
+
+  2. (Optional) If using vhost-cuse, update the same file as above and
+     build DPDK with vhost-user libraries turned off. This in turn enables
+     vhost-cuse:
+
+     `CONFIG_RTE_LIBRTE_VHOST=y`
+     `CONFIG_RTE_LIBRTE_VHOST_USER=n`
+
+2. (Optional)If using vhost-cuse:
+
+  1. Insert the Cuse module:
+
+     `modprobe cuse`
 
-2.  Build and insert the `eventfd_link` module:
+  2. Build and insert the `eventfd_link` module:
 
      `cd $DPDK_DIR/lib/librte_vhost/eventfd_link/`
      `make`
      `insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko`
 
+3. QEMU version v2.1.0+
+
+   Both vhost-user and vhost-cuse will work with QEMU v2.1.0 and above,
+   however it is recommended to use v2.2.0 if providing your VM with memory
+   greater than 1GB due to potential issues with memory mapping larger areas.
+   Note: For vhost-cuse, QEMU v1.6.2 will also work, with slightly different
+   command line parameters, which are specified later in this document.
+
 Following the steps above to create a bridge, you can now add DPDK vhost
-as a port to the vswitch.
+as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost ports can have
+arbitrary names.
+
+When adding vhost ports to the switch, take care depending on which
+type of vhost you are using.
 
-`ovs-vsctl add-port br0 dpdkvhost0 -- set Interface dpdkvhost0 type=dpdkvhost`
+  -  For vhost-user (default), the name of the port type is `dpdkvhostuser`
 
-Unlike DPDK ring ports, DPDK vhost ports can have arbitrary names:
+      `ovs-ofctl add-port br0 vhost-user-1 -- set Interface vhost-user-1 type=dpdkvhostuser`
 
-`ovs-vsctl add-port br0 port123ABC -- set Interface port123ABC type=dpdkvhost`
+     This action creates a socket located at
+     `/usr/local/var/run/openvswitch/vhost-user-1`, which you must provide
+     to your VM on the QEMU command line. More instructions on this can be
+     found in the next section "DPDK vhost-user VM configuration"
+     Note: If you wish for the vhost-user sockets to be created in a
+     directory other than `/usr/local/var/run/openvswitch`, you may specify
+     another location on the ovs-vswitchd command line like so:
 
-However, please note that when attaching userspace devices to QEMU, the
-name provided during the add-port operation must match the ifname parameter
-on the QEMU command line.
+      `./vswitchd/ovs-vswitchd --dpdk --vhost_sock_dir /my-dir -c 0x1 ...`
 
+  -  For vhost-cuse, the name of the port type is `dpdkvhost`
 
-DPDK vhost VM configuration:
-----------------------------
+      `ovs-ofctl add-port br0 vhost-cuse1 -- set Interface vhost-cuse1 type=dpdkvhost`
 
-   vhost ports use a Linux* character device to communicate with QEMU.
+     When attaching vhost-cuse ports to QEMU, the name provided during the
+     add-port operation must match the ifname parameter on the QEMU command
+     line. More instructions on this can be found in the section "DPDK
+     vhost-cuse VM configuration"
+
+DPDK vhost-user VM configuration:
+---------------------------------
+Follow the steps below to attach vhost-user port(s) to a VM.
+
+1. Configure sockets.
+   Pass the following parameters to QEMU to attach a vhost-user device:
+
+   ```
+   -chardev socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user-1
+   -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
+   -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1
+   ```
+
+   ...where vhost-user-1 is the name of the vhost-user port added
+   to the switch.
+   Repeat the above parameters for multiple devices, changing the
+   chardev path and id as necessary. Note that a separate and different
+   chardev path needs to be specified for each vhost-user device. For
+   example you have a second vhost-user port named 'vhost-user-2', you
+   append your QEMU command line with an additional set of parameters:
+
+
+   ```
+   -chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2
+   -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce
+   -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2
+   ```
+
+2. Configure huge pages.
+   QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a
+   virtio-net device's virtual rings and packet buffers mapping the VM's
+   physical memory on hugetlbfs. To enable vhost-ports to map the VM's
+   memory into their process address space, pass the following paramters
+   to QEMU:
+
+   ```
+   -object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,
+   share=on
+   -numa node,memdev=mem -mem-prealloc
+   ```
+
+DPDK vhost-cuse VM configuration:
+---------------------------------
+
+   vhost-cuse ports use a Linux* character device to communicate with QEMU.
    By default it is set to `/dev/vhost-net`. It is possible to reuse this
    standard device for DPDK vhost, which makes setup a little simpler but it
    is better practice to specify an alternative character device in order to
@@ -405,16 +497,19 @@ DPDK vhost VM configuration:
    QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a
    virtio-net device's virtual rings and packet buffers mapping the VM's
    physical memory on hugetlbfs. To enable vhost-ports to map the VM's
-   memory into their process address space, pass the following paramters
+   memory into their process address space, pass the following parameters
    to QEMU:
 
      `-object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,
       share=on -numa node,memdev=mem -mem-prealloc`
 
+   Note: For use with an earlier QEMU version such as v1.6.2, use the following
+   instead:
 
-DPDK vhost VM configuration with QEMU wrapper:
-----------------------------------------------
+     `-mem-path /dev/hugepages -mem-prealloc`
 
+DPDK vhost-cuse VM configuration with QEMU wrapper:
+---------------------------------------------------
 The QEMU wrapper script automatically detects and calls QEMU with the
 necessary parameters. It performs the following actions:
 
@@ -440,8 +535,8 @@ qemu-wrap.py -cpu host -boot c -hda <disk image> -m 4096 -smp 4
   netdev=net1,mac=00:00:00:00:00:01
 ```
 
-DPDK vhost VM configuration with libvirt:
------------------------------------------
+DPDK vhost-cuse VM configuration with libvirt:
+----------------------------------------------
 
 If you are using libvirt, you must enable libvirt to access the character
 device by adding it to controllers cgroup for libvirtd using the following
@@ -515,7 +610,7 @@ Now you may launch your VM using virt-manager, or like so:
 
     `virsh create my_vhost_vm.xml`
 
-DPDK vhost VM configuration with libvirt and QEMU wrapper:
+DPDK vhost-cuse VM configuration with libvirt and QEMU wrapper:
 ----------------------------------------------------------
 
 To use the qemu-wrapper script in conjuntion with libvirt, follow the
@@ -543,7 +638,7 @@ steps in the previous section before proceeding with the following steps:
   the correct emulator location and set any additional options. If you are
   using a alternative character device name, please set "us_vhost_path" to the
   location of that device. The script will automatically detect and insert
-  the correct "vhostfd" value in the QEMU command line arguements.
+  the correct "vhostfd" value in the QEMU command line arguments.
 
   5. Use virt-manager to launch the VM
 
diff --git a/acinclude.m4 b/acinclude.m4
index 070f120..f7a6da4 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -225,6 +225,19 @@ AC_DEFUN([OVS_CHECK_DPDK], [
   AM_CONDITIONAL([DPDK_NETDEV], test -n "$RTE_SDK")
 ])
 
+dnl OVS_CHECK_VHOST_CUSE
+dnl
+dnl Enable DPDK vhost-cuse support in favour of vhost-user
+AC_DEFUN([OVS_CHECK_VHOST_CUSE], [
+  AC_ARG_WITH(vhostcuse,
+              [AC_HELP_STRING([--with-vhostcuse],
+                              [Enable DPDK vhost-cuse])])
+
+  if test X"$with_vhostcuse" != X; then
+    AC_DEFINE([VHOST_CUSE], [1], [DPDK vhost-cuse support enabled, vhost-user disabled.])
+  fi
+])
+
 dnl OVS_GREP_IFELSE(FILE, REGEX, [IF-MATCH], [IF-NO-MATCH])
 dnl
 dnl Greps FILE for REGEX.  If it matches, runs IF-MATCH, otherwise IF-NO-MATCH.
diff --git a/configure.ac b/configure.ac
index 068674e..3f635b4 100644
--- a/configure.ac
+++ b/configure.ac
@@ -165,6 +165,7 @@ AC_ARG_VAR(KARCH, [Kernel Architecture String])
 AC_SUBST(KARCH)
 OVS_CHECK_LINUX
 OVS_CHECK_DPDK
+OVS_CHECK_VHOST_CUSE
 OVS_CHECK_PRAGMA_MESSAGE
 AC_SUBST([OVS_CFLAGS])
 AC_SUBST([OVS_LDFLAGS])
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 8c59d9c..15f57a6 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -28,6 +28,7 @@
 #include <unistd.h>
 #include <stdio.h>
 
+#include "dirs.h"
 #include "dp-packet.h"
 #include "dpif-netdev.h"
 #include "list.h"
@@ -101,8 +102,18 @@ BUILD_ASSERT_DECL((MAX_NB_MBUF / ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF))
 
 #define MAX_PKT_BURST 32           /* Max burst size for RX/TX */
 
-/* Character device cuse_dev_name. */
-char *cuse_dev_name = NULL;
+/* For vhost-user, this the path where sockets will be created.
+ * For vhost-cuse, this is the name of the character device. */
+char *vhost_dev_or_sock = NULL;
+
+#ifdef VHOST_CUSE
+char vhost_flag[] = "--cuse_dev_name";
+char vhost_flag_default_val[] = "vhost-net";
+#else
+#define VHOST_USER
+char vhost_flag[] = "--vhost_sock_dir";
+char vhost_flag_default_val[PATH_MAX]; /* Initialized at runtime via ovs_rundir */
+#endif
 
 static const struct rte_eth_conf port_conf = {
     .rxmode = {
@@ -231,6 +242,11 @@ struct netdev_dpdk {
     /* virtio-net structure for vhost device */
     OVSRCU_TYPE(struct virtio_net *) virtio_dev;
 
+#ifdef VHOST_USER
+    /* socket location for vhost-user device */
+    char socket_path[IF_NAME_SZ];
+#endif
+
     /* In dpdk_list. */
     struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex);
     rte_spinlock_t txq_lock;
@@ -245,6 +261,8 @@ static bool thread_is_pmd(void);
 
 static int netdev_dpdk_construct(struct netdev *);
 
+static void *start_vhost_loop(void *dummy);
+
 struct virtio_net * netdev_dpdk_get_virtio(const struct netdev_dpdk *dev);
 
 static bool
@@ -557,6 +575,21 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned int port_no,
     netdev_->n_txq = NR_QUEUE;
     netdev_->n_rxq = NR_QUEUE;
 
+#ifdef VHOST_USER
+    if (type == DPDK_DEV_VHOST) {
+        snprintf(netdev->socket_path, sizeof(netdev->socket_path), "%s/%s",
+                vhost_dev_or_sock, netdev_->name);
+        err = rte_vhost_driver_register(netdev->socket_path);
+        if (err != 0) {
+            VLOG_ERR("vhost-user socket device setup failure for socket %s\n",
+                     netdev->socket_path);
+            goto unlock;
+        }
+
+        VLOG_INFO("Socket %s created for vhost-user port %s\n", netdev->socket_path, netdev_->name);
+    }
+#endif
+
     if (type == DPDK_DEV_ETH) {
         netdev_dpdk_alloc_txq(netdev, NR_QUEUE);
         err = dpdk_eth_dev_init(netdev);
@@ -1526,6 +1559,21 @@ set_irq_status(struct virtio_net *dev)
 }
 
 /*
+ * Compare the name of the QEMU device with the name of the vhost port.
+ */
+static int
+compare_vhost_name(char* ifname, struct netdev_dpdk *netdev) {
+#ifdef VHOST_CUSE
+    if (strncmp(ifname, netdev->up.name, IFNAMSIZ) == 0)
+        return 1;
+#else
+    if (strncmp(ifname, netdev->socket_path, IF_NAME_SZ) == 0)
+        return 1;
+#endif
+    return 0;
+}
+
+/*
  * A new virtio-net device is added to a vhost port.
  */
 static int
@@ -1537,7 +1585,7 @@ new_device(struct virtio_net *dev)
     ovs_mutex_lock(&dpdk_mutex);
     /* Add device to the vhost port with the same name as that passed down. */
     LIST_FOR_EACH(netdev, list_node, &dpdk_list) {
-        if (strncmp(dev->ifname, netdev->up.name, IFNAMSIZ) == 0) {
+        if (compare_vhost_name(dev->ifname, netdev)) {
             ovs_mutex_lock(&netdev->mutex);
             ovsrcu_set(&netdev->virtio_dev, dev);
             ovs_mutex_unlock(&netdev->mutex);
@@ -1617,7 +1665,7 @@ const struct virtio_net_device_ops virtio_net_device_ops =
 };
 
 static void *
-start_cuse_session_loop(void *dummy OVS_UNUSED)
+start_vhost_loop(void *dummy OVS_UNUSED)
 {
      pthread_detach(pthread_self());
      /* Put the cuse thread into quiescent state. */
@@ -1629,22 +1677,23 @@ start_cuse_session_loop(void *dummy OVS_UNUSED)
 static int
 dpdk_vhost_class_init(void)
 {
-    int err = -1;
-
     rte_vhost_driver_callback_register(&virtio_net_device_ops);
 
+#ifdef VHOST_CUSE
+    int err = -1;
     /* Register CUSE device to handle IOCTLs.
-     * Unless otherwise specified on the vswitchd command line, cuse_dev_name
+     * Unless otherwise specified on the vswitchd command line, vhost_dev_or_sock
      * is set to vhost-net.
      */
-    err = rte_vhost_driver_register(cuse_dev_name);
+    err = rte_vhost_driver_register(vhost_dev_or_sock);
 
     if (err != 0) {
         VLOG_ERR("CUSE device setup failure.");
         return -1;
     }
+#endif
 
-    ovs_thread_create("cuse_thread", start_cuse_session_loop, NULL);
+    ovs_thread_create("vhost_thread", start_vhost_loop, NULL);
     return 0;
 }
 
@@ -1862,6 +1911,7 @@ dpdk_init(int argc, char **argv)
     int result;
     int base = 0;
     char *pragram_name = argv[0];
+    char *vhost_flag_val = NULL;
 
     if (argc < 2 || strcmp(argv[1], "--dpdk"))
         return 0;
@@ -1870,29 +1920,41 @@ dpdk_init(int argc, char **argv)
     argc--;
     argv++;
 
-    /* If the cuse_dev_name parameter has been provided, set 'cuse_dev_name' to
-     * this string if it meets the correct criteria. Otherwise, set it to the
-     * default (vhost-net).
+    /* Depending on which version of vhost is in use, process the vhost-specific
+     * flag if it is provided on the vswitchd command line, otherwise resort to
+     * a default value.
+     *
+     * For vhost-user: Process "--cuse_dev_name" to set the custom location of
+     * the vhost-user socket(s).
+     * For vhost-cuse: Process "--vhost_sock_dir" to set the custom name of the
+     * vhost-cuse character device.
      */
-    if (!strcmp(argv[1], "--cuse_dev_name") &&
-        (strlen(argv[2]) <= NAME_MAX)) {
 
-        cuse_dev_name = strdup(argv[2]);
+#ifdef VHOST_USER
+    strncpy(vhost_flag_default_val, ovs_rundir(), PATH_MAX);
+#endif
+
+    if (!strcmp(argv[1], vhost_flag) &&
+       (strlen(argv[2]) <= NAME_MAX)) {
 
-        /* Remove the cuse_dev_name configuration parameters from the argument
+        vhost_flag_val = strdup(argv[2]);
+
+        /* Remove the vhost flag configuration parameters from the argument
          * list, so that the correct elements are passed to the DPDK
          * initialization function
          */
         argc -= 2;
-        argv += 2;    /* Increment by two to bypass the cuse_dev_name arguments */
+        argv += 2;    /* Increment by two to bypass the vhost flag arguments */
         base = 2;
 
-        VLOG_ERR("User-provided cuse_dev_name in use: /dev/%s", cuse_dev_name);
-    } else {
-        cuse_dev_name = "vhost-net";
-        VLOG_INFO("No cuse_dev_name provided - defaulting to /dev/vhost-net");
+        VLOG_ERR("User-provided %s in use: %s", vhost_flag, vhost_flag_val);
+        } else {
+        vhost_flag_val = vhost_flag_default_val;
+        VLOG_INFO("No %s provided - defaulting to %s", vhost_flag, vhost_flag_val);
     }
 
+    vhost_dev_or_sock = (char*)vhost_flag_val;
+
     /* Keep the program name argument as this is needed for call to
      * rte_eal_init()
      */
@@ -1947,7 +2009,11 @@ const struct netdev_class dpdk_ring_class =
 
 const struct netdev_class dpdk_vhost_class =
     NETDEV_DPDK_CLASS(
+#ifdef VHOST_CUSE
         "dpdkvhost",
+#else
+        "dpdkvhostuser",
+#endif
         dpdk_vhost_class_init,
         netdev_dpdk_vhost_construct,
         netdev_dpdk_vhost_destruct,
diff --git a/lib/netdev.c b/lib/netdev.c
index 45f7f29..77513fa 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -111,7 +111,11 @@ netdev_is_pmd(const struct netdev *netdev)
 {
     return (!strcmp(netdev->netdev_class->type, "dpdk") ||
             !strcmp(netdev->netdev_class->type, "dpdkr") ||
+#ifdef VHOST_CUSE
             !strcmp(netdev->netdev_class->type, "dpdkvhost"));
+#else
+            !strcmp(netdev->netdev_class->type, "dpdkvhostuser"));
+#endif
 }
 
 static void
diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c
index a1b33da..ea9560b 100644
--- a/vswitchd/ovs-vswitchd.c
+++ b/vswitchd/ovs-vswitchd.c
@@ -252,9 +252,14 @@ usage(void)
     daemon_usage();
     vlog_usage();
     printf("\nDPDK options:\n"
-           "  --dpdk options            Initialize DPDK datapath.\n"
-           "  --cuse_dev_name BASENAME  override default character device name\n"
+           "  --dpdk options            Initialize DPDK datapath.\n");
+#ifdef VHOST_CUSE
+    printf("  --cuse_dev_name BASENAME  override default character device name\n"
            "                            for use with userspace vHost.\n");
+#else
+    printf("  --vhost_sock_dir DIR      override default directory where\n"
+           "                            vhost-user sockets are created.\n");
+#endif
     printf("\nOther options:\n"
            "  --unixctl=SOCKET          override default control socket name\n"
            "  -h, --help                display this help message\n"
-- 
1.9.3




More information about the dev mailing list