[ovs-dev] [PATCH v3] netdev-dpdk: add dpdk vhost-user ports
Ciara Loftus
ciara.loftus at intel.com
Mon May 11 10:56:00 UTC 2015
This patch adds support for a new port type to the userspace
datapath called dpdkvhostuser.
A new dpdkvhostuser port will create a unix domain socket which
when provided to QEMU is used to facilitate communication between
the virtio-net device on the VM and the OVS port on the host.
vhost-cuse ('dpdkvhost') ports are still available, and will be
enabled if vhost-cuse support is detected in the DPDK build
specified during compilation of the switch. Otherwise, vhost-user
ports are enabled.
Signed-off-by: Ciara Loftus <ciara.loftus at intel.com>
---
INSTALL.DPDK.md | 174 ++++++++++++++++++++++++++++++++++++++----------
acinclude.m4 | 3 +
lib/netdev-dpdk.c | 168 +++++++++++++++++++++++++++++++++++++++-------
lib/netdev.c | 3 +-
vswitchd/ovs-vswitchd.c | 5 ++
5 files changed, 295 insertions(+), 58 deletions(-)
diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
index 899763f..51671a0 100644
--- a/INSTALL.DPDK.md
+++ b/INSTALL.DPDK.md
@@ -16,7 +16,9 @@ OVS needs a system with 1GB hugepages support.
Building and Installing:
------------------------
-Required DPDK 2.0, `fuse`, `fuse-devel` (`libfuse-dev` on Debian/Ubuntu)
+Required: DPDK 2.0
+Optional (if building with vhost-cuse): `fuse`, `fuse-devel` (`libfuse-dev`
+on Debian/Ubuntu)
1. Configure build & install DPDK:
1. Set `$DPDK_DIR`
@@ -31,13 +33,10 @@ Required DPDK 2.0, `fuse`, `fuse-devel` (`libfuse-dev` on Debian/Ubuntu)
`CONFIG_RTE_BUILD_COMBINE_LIBS=y`
- Update `config/common_linuxapp` so that DPDK is built with vhost
- libraries; currently, OVS only supports vhost-cuse, so DPDK vhost-user
- libraries should be explicitly turned off (they are enabled by default
- in DPDK 2.0).
+ Update `config/common_linuxapp` so that DPDK is built with vhost-user
+ libraries.
`CONFIG_RTE_LIBRTE_VHOST=y`
- `CONFIG_RTE_LIBRTE_VHOST_USER=n`
Then run `make install` to build and install the library.
For default install without IVSHMEM:
@@ -311,40 +310,144 @@ the vswitchd.
DPDK vhost:
-----------
-vhost-cuse is only supported at present i.e. not using the standard QEMU
-vhost-user interface. It is intended that vhost-user support will be added
-in future releases when supported in DPDK and that vhost-cuse will eventually
-be deprecated. See [DPDK Docs] for more info on vhost.
+DPDK 2.0 supports two types of vhost:
-Prerequisites:
-1. Insert the Cuse module:
+1. vhost-user
+2. vhost-cuse
- `modprobe cuse`
+This document assumes the use of vhost-user, unless otherwise specified.
+At the moment, vhost-cuse support is enabled in OVS only if it is detected
+in the DPDK build specified during OVS compilation.
+Please note that support for vhost-cuse is intended to be deprecated in OVS
+in a future release.
-2. Build and insert the `eventfd_link` module:
+(Optional) Building with vhost-cuse ports:
+------------------------------------------
- `cd $DPDK_DIR/lib/librte_vhost/eventfd_link/`
- `make`
- `insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko`
+Should you wish to use vhost-cuse instead of vhost-user, you must
+enable vhost-cuse in DPDK by setting the following additional flag in
+`config/common_linuxapp`:
+
+ `CONFIG_RTE_LIBRTE_VHOST_USER=n`
+
+Following this, rebuild DPDK as per the instructions in the "Building and
+Installing" section. Finally, rebuild OVS as per step 3 in the "Building
+and Installing" section - OVS will detect that DPDK has vhost-cuse libraries
+compiled and in turn will enable support for it in the switch and disable
+vhost-user support.
+
+DPDK vhost Prerequisites:
+-------------------------
+
+1. DPDK 2.0 with vhost support enabled as documented in the "Building and
+ Installing section":
+
+2. (Optional) If using vhost-cuse:
+
+ 1. Insert the Cuse module:
+
+ `modprobe cuse`
+
+ 2. Build and insert the `eventfd_link` module:
+
+ ```
+ cd $DPDK_DIR/lib/librte_vhost/eventfd_link/
+ make
+ insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko
+ ```
+
+3. QEMU version v2.1.0+
+
+ Both vhost-user and vhost-cuse will work with QEMU v2.1.0 and above,
+ however it is recommended to use v2.2.0 if providing your VM with memory
+ greater than 1GB due to potential issues with memory mapping larger areas.
+ Note: For vhost-cuse, QEMU v1.6.2 will also work, with slightly different
+ command line parameters, which are specified later in this document.
+
+Adding DPDK vhost ports to the Switch:
+--------------------------------------
Following the steps above to create a bridge, you can now add DPDK vhost
-as a port to the vswitch.
+as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost ports can have
+arbitrary names.
+
+When adding vhost ports to the switch, take care depending on which type of
+vhost you are using.
-`ovs-vsctl add-port br0 dpdkvhost0 -- set Interface dpdkvhost0 type=dpdkvhost`
+ - For vhost-user (default), the name of the port type is `dpdkvhostuser`
+
+ ```
+ ovs-ofctl add-port br0 vhost-user-1 -- set Interface vhost-user-1
+ type=dpdkvhostuser
+ ```
-Unlike DPDK ring ports, DPDK vhost ports can have arbitrary names:
+ This action creates a socket located at
+ `/usr/local/var/run/openvswitch/vhost-user-1`, which you must provide
+ to your VM on the QEMU command line. More instructions on this can be
+ found in the next section "DPDK vhost-user VM configuration"
+ Note: If you wish for the vhost-user sockets to be created in a
+ directory other than `/usr/local/var/run/openvswitch`, you may specify
+ another location on the ovs-vswitchd command line like so:
-`ovs-vsctl add-port br0 port123ABC -- set Interface port123ABC type=dpdkvhost`
+ `./vswitchd/ovs-vswitchd --dpdk --vhost_sock_dir /my-dir -c 0x1 ...`
-However, please note that when attaching userspace devices to QEMU, the
-name provided during the add-port operation must match the ifname parameter
-on the QEMU command line.
+ - For vhost-cuse, the name of the port type is `dpdkvhost`
+ ```
+ ovs-ofctl add-port br0 vhost-cuse-1 -- set Interface vhost-cuse-1
+ type=dpdkvhost
+ ```
+
+ When attaching vhost-cuse ports to QEMU, the name provided during the
+ add-port operation must match the ifname parameter on the QEMU command
+ line. More instructions on this can be found in the section "DPDK
+ vhost-cuse VM configuration"
+
+DPDK vhost-user VM configuration:
+---------------------------------
+Follow the steps below to attach vhost-user port(s) to a VM.
-DPDK vhost VM configuration:
-----------------------------
+1. Configure sockets.
+ Pass the following parameters to QEMU to attach a vhost-user device:
- vhost ports use a Linux* character device to communicate with QEMU.
+ ```
+ -chardev socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user-1
+ -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
+ -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1
+ ```
+
+ ...where vhost-user-1 is the name of the vhost-user port added
+ to the switch.
+ Repeat the above parameters for multiple devices, changing the
+ chardev path and id as necessary. Note that a separate and different
+ chardev path needs to be specified for each vhost-user device. For
+ example you have a second vhost-user port named 'vhost-user-2', you
+ append your QEMU command line with an additional set of parameters:
+
+
+ ```
+ -chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2
+ -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce
+ -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2
+ ```
+
+2. Configure huge pages.
+ QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a
+ virtio-net device's virtual rings and packet buffers mapping the VM's
+ physical memory on hugetlbfs. To enable vhost-ports to map the VM's
+ memory into their process address space, pass the following paramters
+ to QEMU:
+
+ ```
+ -object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,
+ share=on
+ -numa node,memdev=mem -mem-prealloc
+ ```
+
+DPDK vhost-cuse VM configuration:
+---------------------------------
+
+ vhost-cuse ports use a Linux* character device to communicate with QEMU.
By default it is set to `/dev/vhost-net`. It is possible to reuse this
standard device for DPDK vhost, which makes setup a little simpler but it
is better practice to specify an alternative character device in order to
@@ -410,16 +513,19 @@ DPDK vhost VM configuration:
QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a
virtio-net device's virtual rings and packet buffers mapping the VM's
physical memory on hugetlbfs. To enable vhost-ports to map the VM's
- memory into their process address space, pass the following paramters
+ memory into their process address space, pass the following parameters
to QEMU:
`-object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,
share=on -numa node,memdev=mem -mem-prealloc`
+ Note: For use with an earlier QEMU version such as v1.6.2, use the
+ following to configure hugepages instead:
-DPDK vhost VM configuration with QEMU wrapper:
-----------------------------------------------
+ `-mem-path /dev/hugepages -mem-prealloc`
+DPDK vhost-cuse VM configuration with QEMU wrapper:
+---------------------------------------------------
The QEMU wrapper script automatically detects and calls QEMU with the
necessary parameters. It performs the following actions:
@@ -445,8 +551,8 @@ qemu-wrap.py -cpu host -boot c -hda <disk image> -m 4096 -smp 4
netdev=net1,mac=00:00:00:00:00:01
```
-DPDK vhost VM configuration with libvirt:
------------------------------------------
+DPDK vhost-cuse VM configuration with libvirt:
+----------------------------------------------
If you are using libvirt, you must enable libvirt to access the character
device by adding it to controllers cgroup for libvirtd using the following
@@ -520,7 +626,7 @@ Now you may launch your VM using virt-manager, or like so:
`virsh create my_vhost_vm.xml`
-DPDK vhost VM configuration with libvirt and QEMU wrapper:
+DPDK vhost-cuse VM configuration with libvirt and QEMU wrapper:
----------------------------------------------------------
To use the qemu-wrapper script in conjuntion with libvirt, follow the
@@ -548,7 +654,7 @@ steps in the previous section before proceeding with the following steps:
the correct emulator location and set any additional options. If you are
using a alternative character device name, please set "us_vhost_path" to the
location of that device. The script will automatically detect and insert
- the correct "vhostfd" value in the QEMU command line arguements.
+ the correct "vhostfd" value in the QEMU command line arguments.
5. Use virt-manager to launch the VM
diff --git a/acinclude.m4 b/acinclude.m4
index e9d0ed9..2873480 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -218,6 +218,9 @@ AC_DEFUN([OVS_CHECK_DPDK], [
DPDK_vswitchd_LDFLAGS=-Wl,--whole-archive,$DPDK_LIB,--no-whole-archive
AC_SUBST([DPDK_vswitchd_LDFLAGS])
AC_DEFINE([DPDK_NETDEV], [1], [System uses the DPDK module.])
+
+ OVS_GREP_IFELSE([$RTE_SDK/include/rte_config.h], [define RTE_LIBRTE_VHOST_USER 1],
+ [], [AC_DEFINE([VHOST_CUSE], [1], [DPDK vhost-cuse support enabled, vhost-user disabled.])])
else
RTE_SDK=
fi
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 5af15d4..1edb9d0 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -28,6 +28,7 @@
#include <unistd.h>
#include <stdio.h>
+#include "dirs.h"
#include "dp-packet.h"
#include "dpif-netdev.h"
#include "list.h"
@@ -101,8 +102,8 @@ BUILD_ASSERT_DECL((MAX_NB_MBUF / ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF))
#define MAX_PKT_BURST 32 /* Max burst size for RX/TX */
-/* Character device cuse_dev_name. */
-char *cuse_dev_name = NULL;
+char *cuse_dev_name = NULL; /* Character device cuse_dev_name. */
+char *vhost_sock_dir = NULL; /* Location of vhost-user sockets */
static const struct rte_eth_conf port_conf = {
.rxmode = {
@@ -152,7 +153,8 @@ enum { DRAIN_TSC = 200000ULL };
enum dpdk_dev_type {
DPDK_DEV_ETH = 0,
- DPDK_DEV_VHOST = 1
+ DPDK_DEV_VHOST = 1,
+ DPDK_DEV_VHOST_USER = 2
};
static int rte_eal_init_ret = ENODEV;
@@ -230,6 +232,9 @@ struct netdev_dpdk {
/* virtio-net structure for vhost device */
OVSRCU_TYPE(struct virtio_net *) virtio_dev;
+ /* socket location for vhost-user device */
+ char socket_path[IF_NAME_SZ];
+
/* In dpdk_list. */
struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex);
rte_spinlock_t txq_lock;
@@ -556,6 +561,24 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned int port_no,
netdev_->n_txq = NR_QUEUE;
netdev_->n_rxq = NR_QUEUE;
+ /* Take the name of the vhost-user port and append it to the location where
+ * the socket is to be created, then register the socket.
+ */
+ if (type == DPDK_DEV_VHOST_USER) {
+ snprintf(netdev->socket_path, sizeof(netdev->socket_path), "%s/%s",
+ vhost_sock_dir, netdev_->name);
+ err = rte_vhost_driver_register(netdev->socket_path);
+ if (err) {
+ VLOG_ERR("vhost-user socket device setup failure for socket %s\n",
+ netdev->socket_path);
+ goto unlock;
+ }
+
+ VLOG_INFO("Socket %s created for vhost-user port %s\n", netdev->socket_path, netdev_->name);
+ } else {
+ strncpy(netdev->socket_path, "", sizeof(netdev->socket_path));
+ }
+
if (type == DPDK_DEV_ETH) {
netdev_dpdk_alloc_txq(netdev, NR_QUEUE);
err = dpdk_eth_dev_init(netdev);
@@ -590,7 +613,7 @@ dpdk_dev_parse_name(const char dev_name[], const char prefix[],
}
static int
-netdev_dpdk_vhost_construct(struct netdev *netdev_)
+vhost_construct_helper(struct netdev *netdev_, int type)
{
int err;
@@ -599,13 +622,25 @@ netdev_dpdk_vhost_construct(struct netdev *netdev_)
}
ovs_mutex_lock(&dpdk_mutex);
- err = netdev_dpdk_init(netdev_, -1, DPDK_DEV_VHOST);
+ err = netdev_dpdk_init(netdev_, -1, type);
ovs_mutex_unlock(&dpdk_mutex);
return err;
}
static int
+netdev_dpdk_vhost_construct(struct netdev *netdev_)
+{
+ return vhost_construct_helper(netdev_, DPDK_DEV_VHOST);
+}
+
+static int
+netdev_dpdk_vhost_user_construct(struct netdev *netdev_)
+{
+ return vhost_construct_helper(netdev_, DPDK_DEV_VHOST_USER);
+}
+
+static int
netdev_dpdk_construct(struct netdev *netdev)
{
unsigned int port_no;
@@ -1009,7 +1044,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet **pkts,
ovs_mutex_unlock(&dev->mutex);
}
- if (dev->type == DPDK_DEV_VHOST) {
+ if (dev->type == DPDK_DEV_VHOST || dev->type == DPDK_DEV_VHOST_USER) {
__netdev_dpdk_vhost_send(netdev, (struct dp_packet **) mbufs, newcnt, true);
} else {
dpdk_queue_pkts(dev, qid, mbufs, newcnt);
@@ -1561,6 +1596,40 @@ new_device(struct virtio_net *dev)
return 0;
}
+static int
+new_device_vhost_user(struct virtio_net *dev)
+{
+ struct netdev_dpdk *netdev;
+ bool exists = false;
+
+ ovs_mutex_lock(&dpdk_mutex);
+ /* Add device to the vhost port with the same name as that passed down. */
+ LIST_FOR_EACH(netdev, list_node, &dpdk_list) {
+ if (strncmp(dev->ifname, netdev->socket_path, IF_NAME_SZ) == 0) {
+ ovs_mutex_lock(&netdev->mutex);
+ ovsrcu_set(&netdev->virtio_dev, dev);
+ ovs_mutex_unlock(&netdev->mutex);
+ exists = true;
+ dev->flags |= VIRTIO_DEV_RUNNING;
+ /* Disable notifications. */
+ set_irq_status(dev);
+ break;
+ }
+ }
+ ovs_mutex_unlock(&dpdk_mutex);
+
+ if (!exists) {
+ VLOG_INFO("vHost Device '%s' (%ld) can't be added - name not found",
+ dev->ifname, dev->device_fh);
+
+ return -1;
+ }
+
+ VLOG_INFO("vHost Device '%s' (%ld) has been added",
+ dev->ifname, dev->device_fh);
+ return 0;
+}
+
/*
* Remove a virtio-net device from the specific vhost port. Use dev->remove
* flag to stop any more packets from being sent or received to/from a VM and
@@ -1615,8 +1684,14 @@ const struct virtio_net_device_ops virtio_net_device_ops =
.destroy_device = destroy_device,
};
+const struct virtio_net_device_ops virtio_net_device_ops_vhost_user =
+{
+ .new_device = new_device_vhost_user,
+ .destroy_device = destroy_device,
+};
+
static void *
-start_cuse_session_loop(void *dummy OVS_UNUSED)
+start_vhost_loop(void *dummy OVS_UNUSED)
{
pthread_detach(pthread_self());
/* Put the cuse thread into quiescent state. */
@@ -1643,7 +1718,16 @@ dpdk_vhost_class_init(void)
return -1;
}
- ovs_thread_create("cuse_thread", start_cuse_session_loop, NULL);
+ ovs_thread_create("vhost_thread", start_vhost_loop, NULL);
+ return 0;
+}
+
+static int
+dpdk_vhost_user_class_init(void)
+{
+ rte_vhost_driver_callback_register(&virtio_net_device_ops_vhost_user);
+
+ ovs_thread_create("vhost_thread", start_vhost_loop, NULL);
return 0;
}
@@ -1855,12 +1939,42 @@ unlock_dpdk:
NULL, /* rxq_drain */ \
}
+static int
+process_vhost_flags(char* flag, char* default_val, int size, char** argv, char** new_val)
+{
+ int changed = 0;
+
+ /* Depending on which version of vhost is in use, process the vhost-specific
+ * flag if it is provided on the vswitchd command line, otherwise resort to
+ * a default value.
+ *
+ * For vhost-user: Process "--cuse_dev_name" to set the custom location of
+ * the vhost-user socket(s).
+ * For vhost-cuse: Process "--vhost_sock_dir" to set the custom name of the
+ * vhost-cuse character device.
+ */
+ if (!strcmp(argv[1], flag) &&
+ (strlen(argv[2]) <= size)) {
+
+ *new_val = strdup(argv[2]);
+
+ VLOG_ERR("User-provided %s in use: %s", flag, *new_val);
+ changed = 1;
+ } else {
+ *new_val = default_val;
+ VLOG_INFO("No %s provided - defaulting to %s", flag, default_val);
+ }
+
+ return changed;
+}
+
int
dpdk_init(int argc, char **argv)
{
int result;
int base = 0;
char *pragram_name = argv[0];
+ int flag_processed = 0;
if (argc < 2 || strcmp(argv[1], "--dpdk"))
return 0;
@@ -1869,27 +1983,17 @@ dpdk_init(int argc, char **argv)
argc--;
argv++;
- /* If the cuse_dev_name parameter has been provided, set 'cuse_dev_name' to
- * this string if it meets the correct criteria. Otherwise, set it to the
- * default (vhost-net).
- */
- if (!strcmp(argv[1], "--cuse_dev_name") &&
- (strlen(argv[2]) <= NAME_MAX)) {
-
- cuse_dev_name = strdup(argv[2]);
+ flag_processed = (process_vhost_flags("--cuse_dev_name", "vhost-net", PATH_MAX, argv, &cuse_dev_name) ||
+ process_vhost_flags("--vhost_sock_dir", strdup(ovs_rundir()), NAME_MAX, argv, &vhost_sock_dir));
- /* Remove the cuse_dev_name configuration parameters from the argument
+ if (flag_processed) {
+ /* Remove the vhost flag configuration parameters from the argument
* list, so that the correct elements are passed to the DPDK
* initialization function
*/
argc -= 2;
- argv += 2; /* Increment by two to bypass the cuse_dev_name arguments */
+ argv += 2; /* Increment by two to bypass the vhost flag arguments */
base = 2;
-
- VLOG_ERR("User-provided cuse_dev_name in use: /dev/%s", cuse_dev_name);
- } else {
- cuse_dev_name = "vhost-net";
- VLOG_INFO("No cuse_dev_name provided - defaulting to /dev/vhost-net");
}
/* Keep the program name argument as this is needed for call to
@@ -1958,6 +2062,20 @@ const struct netdev_class dpdk_vhost_class =
NULL,
netdev_dpdk_vhost_rxq_recv);
+const struct netdev_class dpdk_vhost_user_class =
+ NETDEV_DPDK_CLASS(
+ "dpdkvhostuser",
+ dpdk_vhost_user_class_init,
+ netdev_dpdk_vhost_user_construct,
+ netdev_dpdk_vhost_destruct,
+ netdev_dpdk_vhost_set_multiq,
+ netdev_dpdk_vhost_send,
+ netdev_dpdk_vhost_get_carrier,
+ netdev_dpdk_vhost_get_stats,
+ NULL,
+ NULL,
+ netdev_dpdk_vhost_rxq_recv);
+
void
netdev_dpdk_register(void)
{
@@ -1971,7 +2089,11 @@ netdev_dpdk_register(void)
dpdk_common_init();
netdev_register_provider(&dpdk_class);
netdev_register_provider(&dpdk_ring_class);
+#ifdef VHOST_CUSE
netdev_register_provider(&dpdk_vhost_class);
+#else
+ netdev_register_provider(&dpdk_vhost_user_class);
+#endif
ovsthread_once_done(&once);
}
}
diff --git a/lib/netdev.c b/lib/netdev.c
index 45f7f29..24351b1 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -111,7 +111,8 @@ netdev_is_pmd(const struct netdev *netdev)
{
return (!strcmp(netdev->netdev_class->type, "dpdk") ||
!strcmp(netdev->netdev_class->type, "dpdkr") ||
- !strcmp(netdev->netdev_class->type, "dpdkvhost"));
+ !strcmp(netdev->netdev_class->type, "dpdkvhost") ||
+ !strcmp(netdev->netdev_class->type, "dpdkvhostuser"));
}
static void
diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c
index a1b33da..48651df 100644
--- a/vswitchd/ovs-vswitchd.c
+++ b/vswitchd/ovs-vswitchd.c
@@ -253,8 +253,13 @@ usage(void)
vlog_usage();
printf("\nDPDK options:\n"
" --dpdk options Initialize DPDK datapath.\n"
+#ifdef VHOST_CUSE
" --cuse_dev_name BASENAME override default character device name\n"
" for use with userspace vHost.\n");
+#else
+ " --vhost_sock_dir DIR override default directory where\n"
+ " vhost-user sockets are created.\n");
+#endif
printf("\nOther options:\n"
" --unixctl=SOCKET override default control socket name\n"
" -h, --help display this help message\n"
--
1.9.3
More information about the dev
mailing list