[ovs-dev] [PATCH RFC v3 1/1] netdev-dpdk: add dpdk vhost-user ports
Ciara Loftus
ciara.loftus at intel.com
Wed Apr 29 20:51:58 UTC 2015
This patch adds support for a new port type to the userspace
datapath called dpdkvhostuser. It adds to the existing
infrastructure of vhost-cuse, however disables vhost-cuse ports
as the default port type, in favour of vhost-user ports.
vhost-cuse 'dpdkvhost' ports are still available and can be
enabled using a configure flag, steps for which are available
in INSTALL.DPDK.md.
A new dpdkvhostuser port will create a unix domain socket which
when provided to QEMU is used to facilitate communication between
the virtio-net device on the VM and the OVS port on the host.
Signed-off-by: Ciara Loftus <ciara.loftus at intel.com>
---
INSTALL.DPDK.md | 143 ++++++++++++++++++++++++++++++++++++++++--------
acinclude.m4 | 13 +++++
configure.ac | 1 +
lib/netdev-dpdk.c | 108 +++++++++++++++++++++++++++++-------
lib/netdev.c | 4 ++
vswitchd/ovs-vswitchd.c | 9 ++-
6 files changed, 231 insertions(+), 47 deletions(-)
diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
index aae97a5..7fb2449 100644
--- a/INSTALL.DPDK.md
+++ b/INSTALL.DPDK.md
@@ -306,40 +306,132 @@ the vswitchd.
DPDK vhost:
-----------
-vhost-cuse is only supported at present i.e. not using the standard QEMU
-vhost-user interface. It is intended that vhost-user support will be added
-in future releases when supported in DPDK and that vhost-cuse will eventually
-be deprecated. See [DPDK Docs] for more info on vhost.
+DPDK 2.0 supports two types of vhost:
+
+1. vhost-user
+2. vhost-cuse
+
+By default, vhost-user is enabled in DPDK and following this, the same
+applies for OVS.
+
+Should you wish to use vhost-cuse instead of vhost-user, you must enable
+vhost-cuse in OVS and re-build. This can be achieved by using the
+`--with-vhostcuse` flag in the `./configure` step like so:
+
+`./configure --with-dpdk=$DPDK_BUILD --with-vhostcuse`
Prerequisites:
-1. Insert the Cuse module:
- `modprobe cuse`
+1. DPDK 2.0 with vhost support enabled as documented in the "Building and
+ Installing section":
+
+ 1. Update `config/common_linuxapp` so that DPDK is built with vhost-user
+ libraries:
+
+ `CONFIG_RTE_LIBRTE_VHOST=y`
+
+ 2. (Optional) If using vhost-cuse, update the same file as above and
+ build DPDK with vhost-user libraries turned off. This in turn enables
+ vhost-cuse:
+
+ `CONFIG_RTE_LIBRTE_VHOST=y`
+ `CONFIG_RTE_LIBRTE_VHOST_USER=n`
+
+2. (Optional)If using vhost-cuse:
+
+ 1. Insert the Cuse module:
+
+ `modprobe cuse`
-2. Build and insert the `eventfd_link` module:
+ 2. Build and insert the `eventfd_link` module:
`cd $DPDK_DIR/lib/librte_vhost/eventfd_link/`
`make`
`insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko`
+3. QEMU version v2.1.0+
+
+ Both vhost-user and vhost-cuse will work with QEMU v2.1.0 and above,
+ however it is recommended to use v2.2.0 if providing your VM with memory
+ greater than 1GB due to potential issues with memory mapping larger areas.
+ Note: For vhost-cuse, QEMU v1.6.2 will also work, with slightly different
+ command line parameters, which are specified later in this document.
+
Following the steps above to create a bridge, you can now add DPDK vhost
-as a port to the vswitch.
+as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost ports can have
+arbitrary names.
+
+When adding vhost ports to the switch, take care depending on which
+type of vhost you are using.
-`ovs-vsctl add-port br0 dpdkvhost0 -- set Interface dpdkvhost0 type=dpdkvhost`
+ - For vhost-user (default), the name of the port type is `dpdkvhostuser`
-Unlike DPDK ring ports, DPDK vhost ports can have arbitrary names:
+ `ovs-ofctl add-port br0 vhost-user-1 -- set Interface vhost-user-1 type=dpdkvhostuser`
-`ovs-vsctl add-port br0 port123ABC -- set Interface port123ABC type=dpdkvhost`
+ This action creates a socket located at
+ `/usr/local/var/run/openvswitch/vhost-user-1`, which you must provide
+ to your VM on the QEMU command line. More instructions on this can be
+ found in the next section "DPDK vhost-user VM configuration"
+ Note: If you wish for the vhost-user sockets to be created in a
+ directory other than `/usr/local/var/run/openvswitch`, you may specify
+ another location on the ovs-vswitchd command line like so:
-However, please note that when attaching userspace devices to QEMU, the
-name provided during the add-port operation must match the ifname parameter
-on the QEMU command line.
+ `./vswitchd/ovs-vswitchd --dpdk --vhost_sock_dir /my-dir -c 0x1 ...`
+ - For vhost-cuse, the name of the port type is `dpdkvhost`
-DPDK vhost VM configuration:
-----------------------------
+ `ovs-ofctl add-port br0 vhost-cuse1 -- set Interface vhost-cuse1 type=dpdkvhost`
- vhost ports use a Linux* character device to communicate with QEMU.
+ When attaching vhost-cuse ports to QEMU, the name provided during the
+ add-port operation must match the ifname parameter on the QEMU command
+ line. More instructions on this can be found in the section "DPDK
+ vhost-cuse VM configuration"
+
+DPDK vhost-user VM configuration:
+---------------------------------
+Follow the steps below to attach vhost-user port(s) to a VM.
+
+1. Configure sockets.
+ Pass the following parameters to QEMU to attach a vhost-user device:
+
+ ```
+ -chardev socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user-1
+ -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
+ -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1
+ ```
+
+ ...where vhost-user-1 is the name of the vhost-user port added
+ to the switch.
+ Repeat the above parameters for multiple devices, changing the
+ chardev path and id as necessary. Note that a separate and different
+ chardev path needs to be specified for each vhost-user device. For
+ example you have a second vhost-user port named 'vhost-user-2', you
+ append your QEMU command line with an additional set of parameters:
+
+
+ ```
+ -chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2
+ -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce
+ -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2
+ ```
+
+2. Configure huge pages.
+ QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a
+ virtio-net device's virtual rings and packet buffers mapping the VM's
+ physical memory on hugetlbfs. To enable vhost-ports to map the VM's
+ memory into their process address space, pass the following paramters
+ to QEMU:
+
+ ```
+ -object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,
+ share=on
+ -numa node,memdev=mem -mem-prealloc
+ ```
+
+DPDK vhost-cuse VM configuration:
+---------------------------------
+
+ vhost-cuse ports use a Linux* character device to communicate with QEMU.
By default it is set to `/dev/vhost-net`. It is possible to reuse this
standard device for DPDK vhost, which makes setup a little simpler but it
is better practice to specify an alternative character device in order to
@@ -405,16 +497,19 @@ DPDK vhost VM configuration:
QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a
virtio-net device's virtual rings and packet buffers mapping the VM's
physical memory on hugetlbfs. To enable vhost-ports to map the VM's
- memory into their process address space, pass the following paramters
+ memory into their process address space, pass the following parameters
to QEMU:
`-object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,
share=on -numa node,memdev=mem -mem-prealloc`
+ Note: For use with an earlier QEMU version such as v1.6.2, use the following
+ instead:
-DPDK vhost VM configuration with QEMU wrapper:
-----------------------------------------------
+ `-mem-path /dev/hugepages -mem-prealloc`
+DPDK vhost-cuse VM configuration with QEMU wrapper:
+---------------------------------------------------
The QEMU wrapper script automatically detects and calls QEMU with the
necessary parameters. It performs the following actions:
@@ -440,8 +535,8 @@ qemu-wrap.py -cpu host -boot c -hda <disk image> -m 4096 -smp 4
netdev=net1,mac=00:00:00:00:00:01
```
-DPDK vhost VM configuration with libvirt:
------------------------------------------
+DPDK vhost-cuse VM configuration with libvirt:
+----------------------------------------------
If you are using libvirt, you must enable libvirt to access the character
device by adding it to controllers cgroup for libvirtd using the following
@@ -515,7 +610,7 @@ Now you may launch your VM using virt-manager, or like so:
`virsh create my_vhost_vm.xml`
-DPDK vhost VM configuration with libvirt and QEMU wrapper:
+DPDK vhost-cuse VM configuration with libvirt and QEMU wrapper:
----------------------------------------------------------
To use the qemu-wrapper script in conjuntion with libvirt, follow the
@@ -543,7 +638,7 @@ steps in the previous section before proceeding with the following steps:
the correct emulator location and set any additional options. If you are
using a alternative character device name, please set "us_vhost_path" to the
location of that device. The script will automatically detect and insert
- the correct "vhostfd" value in the QEMU command line arguements.
+ the correct "vhostfd" value in the QEMU command line arguments.
5. Use virt-manager to launch the VM
diff --git a/acinclude.m4 b/acinclude.m4
index 070f120..f7a6da4 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -225,6 +225,19 @@ AC_DEFUN([OVS_CHECK_DPDK], [
AM_CONDITIONAL([DPDK_NETDEV], test -n "$RTE_SDK")
])
+dnl OVS_CHECK_VHOST_CUSE
+dnl
+dnl Enable DPDK vhost-cuse support in favour of vhost-user
+AC_DEFUN([OVS_CHECK_VHOST_CUSE], [
+ AC_ARG_WITH(vhostcuse,
+ [AC_HELP_STRING([--with-vhostcuse],
+ [Enable DPDK vhost-cuse])])
+
+ if test X"$with_vhostcuse" != X; then
+ AC_DEFINE([VHOST_CUSE], [1], [DPDK vhost-cuse support enabled, vhost-user disabled.])
+ fi
+])
+
dnl OVS_GREP_IFELSE(FILE, REGEX, [IF-MATCH], [IF-NO-MATCH])
dnl
dnl Greps FILE for REGEX. If it matches, runs IF-MATCH, otherwise IF-NO-MATCH.
diff --git a/configure.ac b/configure.ac
index 068674e..3f635b4 100644
--- a/configure.ac
+++ b/configure.ac
@@ -165,6 +165,7 @@ AC_ARG_VAR(KARCH, [Kernel Architecture String])
AC_SUBST(KARCH)
OVS_CHECK_LINUX
OVS_CHECK_DPDK
+OVS_CHECK_VHOST_CUSE
OVS_CHECK_PRAGMA_MESSAGE
AC_SUBST([OVS_CFLAGS])
AC_SUBST([OVS_LDFLAGS])
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 8c59d9c..15f57a6 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -28,6 +28,7 @@
#include <unistd.h>
#include <stdio.h>
+#include "dirs.h"
#include "dp-packet.h"
#include "dpif-netdev.h"
#include "list.h"
@@ -101,8 +102,18 @@ BUILD_ASSERT_DECL((MAX_NB_MBUF / ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF))
#define MAX_PKT_BURST 32 /* Max burst size for RX/TX */
-/* Character device cuse_dev_name. */
-char *cuse_dev_name = NULL;
+/* For vhost-user, this the path where sockets will be created.
+ * For vhost-cuse, this is the name of the character device. */
+char *vhost_dev_or_sock = NULL;
+
+#ifdef VHOST_CUSE
+char vhost_flag[] = "--cuse_dev_name";
+char vhost_flag_default_val[] = "vhost-net";
+#else
+#define VHOST_USER
+char vhost_flag[] = "--vhost_sock_dir";
+char vhost_flag_default_val[PATH_MAX]; /* Initialized at runtime via ovs_rundir */
+#endif
static const struct rte_eth_conf port_conf = {
.rxmode = {
@@ -231,6 +242,11 @@ struct netdev_dpdk {
/* virtio-net structure for vhost device */
OVSRCU_TYPE(struct virtio_net *) virtio_dev;
+#ifdef VHOST_USER
+ /* socket location for vhost-user device */
+ char socket_path[IF_NAME_SZ];
+#endif
+
/* In dpdk_list. */
struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex);
rte_spinlock_t txq_lock;
@@ -245,6 +261,8 @@ static bool thread_is_pmd(void);
static int netdev_dpdk_construct(struct netdev *);
+static void *start_vhost_loop(void *dummy);
+
struct virtio_net * netdev_dpdk_get_virtio(const struct netdev_dpdk *dev);
static bool
@@ -557,6 +575,21 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned int port_no,
netdev_->n_txq = NR_QUEUE;
netdev_->n_rxq = NR_QUEUE;
+#ifdef VHOST_USER
+ if (type == DPDK_DEV_VHOST) {
+ snprintf(netdev->socket_path, sizeof(netdev->socket_path), "%s/%s",
+ vhost_dev_or_sock, netdev_->name);
+ err = rte_vhost_driver_register(netdev->socket_path);
+ if (err != 0) {
+ VLOG_ERR("vhost-user socket device setup failure for socket %s\n",
+ netdev->socket_path);
+ goto unlock;
+ }
+
+ VLOG_INFO("Socket %s created for vhost-user port %s\n", netdev->socket_path, netdev_->name);
+ }
+#endif
+
if (type == DPDK_DEV_ETH) {
netdev_dpdk_alloc_txq(netdev, NR_QUEUE);
err = dpdk_eth_dev_init(netdev);
@@ -1526,6 +1559,21 @@ set_irq_status(struct virtio_net *dev)
}
/*
+ * Compare the name of the QEMU device with the name of the vhost port.
+ */
+static int
+compare_vhost_name(char* ifname, struct netdev_dpdk *netdev) {
+#ifdef VHOST_CUSE
+ if (strncmp(ifname, netdev->up.name, IFNAMSIZ) == 0)
+ return 1;
+#else
+ if (strncmp(ifname, netdev->socket_path, IF_NAME_SZ) == 0)
+ return 1;
+#endif
+ return 0;
+}
+
+/*
* A new virtio-net device is added to a vhost port.
*/
static int
@@ -1537,7 +1585,7 @@ new_device(struct virtio_net *dev)
ovs_mutex_lock(&dpdk_mutex);
/* Add device to the vhost port with the same name as that passed down. */
LIST_FOR_EACH(netdev, list_node, &dpdk_list) {
- if (strncmp(dev->ifname, netdev->up.name, IFNAMSIZ) == 0) {
+ if (compare_vhost_name(dev->ifname, netdev)) {
ovs_mutex_lock(&netdev->mutex);
ovsrcu_set(&netdev->virtio_dev, dev);
ovs_mutex_unlock(&netdev->mutex);
@@ -1617,7 +1665,7 @@ const struct virtio_net_device_ops virtio_net_device_ops =
};
static void *
-start_cuse_session_loop(void *dummy OVS_UNUSED)
+start_vhost_loop(void *dummy OVS_UNUSED)
{
pthread_detach(pthread_self());
/* Put the cuse thread into quiescent state. */
@@ -1629,22 +1677,23 @@ start_cuse_session_loop(void *dummy OVS_UNUSED)
static int
dpdk_vhost_class_init(void)
{
- int err = -1;
-
rte_vhost_driver_callback_register(&virtio_net_device_ops);
+#ifdef VHOST_CUSE
+ int err = -1;
/* Register CUSE device to handle IOCTLs.
- * Unless otherwise specified on the vswitchd command line, cuse_dev_name
+ * Unless otherwise specified on the vswitchd command line, vhost_dev_or_sock
* is set to vhost-net.
*/
- err = rte_vhost_driver_register(cuse_dev_name);
+ err = rte_vhost_driver_register(vhost_dev_or_sock);
if (err != 0) {
VLOG_ERR("CUSE device setup failure.");
return -1;
}
+#endif
- ovs_thread_create("cuse_thread", start_cuse_session_loop, NULL);
+ ovs_thread_create("vhost_thread", start_vhost_loop, NULL);
return 0;
}
@@ -1862,6 +1911,7 @@ dpdk_init(int argc, char **argv)
int result;
int base = 0;
char *pragram_name = argv[0];
+ char *vhost_flag_val = NULL;
if (argc < 2 || strcmp(argv[1], "--dpdk"))
return 0;
@@ -1870,29 +1920,41 @@ dpdk_init(int argc, char **argv)
argc--;
argv++;
- /* If the cuse_dev_name parameter has been provided, set 'cuse_dev_name' to
- * this string if it meets the correct criteria. Otherwise, set it to the
- * default (vhost-net).
+ /* Depending on which version of vhost is in use, process the vhost-specific
+ * flag if it is provided on the vswitchd command line, otherwise resort to
+ * a default value.
+ *
+ * For vhost-user: Process "--cuse_dev_name" to set the custom location of
+ * the vhost-user socket(s).
+ * For vhost-cuse: Process "--vhost_sock_dir" to set the custom name of the
+ * vhost-cuse character device.
*/
- if (!strcmp(argv[1], "--cuse_dev_name") &&
- (strlen(argv[2]) <= NAME_MAX)) {
- cuse_dev_name = strdup(argv[2]);
+#ifdef VHOST_USER
+ strncpy(vhost_flag_default_val, ovs_rundir(), PATH_MAX);
+#endif
+
+ if (!strcmp(argv[1], vhost_flag) &&
+ (strlen(argv[2]) <= NAME_MAX)) {
- /* Remove the cuse_dev_name configuration parameters from the argument
+ vhost_flag_val = strdup(argv[2]);
+
+ /* Remove the vhost flag configuration parameters from the argument
* list, so that the correct elements are passed to the DPDK
* initialization function
*/
argc -= 2;
- argv += 2; /* Increment by two to bypass the cuse_dev_name arguments */
+ argv += 2; /* Increment by two to bypass the vhost flag arguments */
base = 2;
- VLOG_ERR("User-provided cuse_dev_name in use: /dev/%s", cuse_dev_name);
- } else {
- cuse_dev_name = "vhost-net";
- VLOG_INFO("No cuse_dev_name provided - defaulting to /dev/vhost-net");
+ VLOG_ERR("User-provided %s in use: %s", vhost_flag, vhost_flag_val);
+ } else {
+ vhost_flag_val = vhost_flag_default_val;
+ VLOG_INFO("No %s provided - defaulting to %s", vhost_flag, vhost_flag_val);
}
+ vhost_dev_or_sock = (char*)vhost_flag_val;
+
/* Keep the program name argument as this is needed for call to
* rte_eal_init()
*/
@@ -1947,7 +2009,11 @@ const struct netdev_class dpdk_ring_class =
const struct netdev_class dpdk_vhost_class =
NETDEV_DPDK_CLASS(
+#ifdef VHOST_CUSE
"dpdkvhost",
+#else
+ "dpdkvhostuser",
+#endif
dpdk_vhost_class_init,
netdev_dpdk_vhost_construct,
netdev_dpdk_vhost_destruct,
diff --git a/lib/netdev.c b/lib/netdev.c
index 45f7f29..77513fa 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -111,7 +111,11 @@ netdev_is_pmd(const struct netdev *netdev)
{
return (!strcmp(netdev->netdev_class->type, "dpdk") ||
!strcmp(netdev->netdev_class->type, "dpdkr") ||
+#ifdef VHOST_CUSE
!strcmp(netdev->netdev_class->type, "dpdkvhost"));
+#else
+ !strcmp(netdev->netdev_class->type, "dpdkvhostuser"));
+#endif
}
static void
diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c
index a1b33da..ea9560b 100644
--- a/vswitchd/ovs-vswitchd.c
+++ b/vswitchd/ovs-vswitchd.c
@@ -252,9 +252,14 @@ usage(void)
daemon_usage();
vlog_usage();
printf("\nDPDK options:\n"
- " --dpdk options Initialize DPDK datapath.\n"
- " --cuse_dev_name BASENAME override default character device name\n"
+ " --dpdk options Initialize DPDK datapath.\n");
+#ifdef VHOST_CUSE
+ printf(" --cuse_dev_name BASENAME override default character device name\n"
" for use with userspace vHost.\n");
+#else
+ printf(" --vhost_sock_dir DIR override default directory where\n"
+ " vhost-user sockets are created.\n");
+#endif
printf("\nOther options:\n"
" --unixctl=SOCKET override default control socket name\n"
" -h, --help display this help message\n"
--
1.9.3
More information about the dev
mailing list