[ovs-dev] [PATCH v7] netdev-dpdk: add dpdk vhost-user ports

Pravin Shelar pshelar at nicira.com
Sat May 23 04:22:05 UTC 2015


On Fri, May 22, 2015 at 8:40 AM, Ciara Loftus <ciara.loftus at intel.com> wrote:
> This patch adds support for a new port type to the userspace
> datapath called dpdkvhostuser.
>
> A new dpdkvhostuser port will create a unix domain socket which
> when provided to QEMU is used to facilitate communication between
> the virtio-net device on the VM and the OVS port on the host.
>
> vhost-cuse ('dpdkvhost') ports are still available, and will be
> enabled if vhost-cuse support is detected in the DPDK build
> specified during compilation of the switch. Otherwise, vhost-user
> ports are enabled.
>
> v4:
> - Included helper function for the new_device callbacks to minimise
> code duplication.
> - Fixed indentation & line-wrap.
> - Simplified and corrected the processing of vhost ovs-vswitchd flags.
>
> v5:
> - Removed unnecessary strdup()
> - Fixed spacing
>
> v6:
> - Rebased to master
>
> v7:
> - Rebased to master
>
> Signed-off-by: Ciara Loftus <ciara.loftus at intel.com>
> ---

Thanks for all changes.
The change log should not be part of commit message but should be
written here after "---".

>  INSTALL.DPDK.md         | 174 ++++++++++++++++++++++++++++++++++++++----------
>  acinclude.m4            |   3 +
>  lib/netdev-dpdk.c       | 153 ++++++++++++++++++++++++++++++++++--------
>  lib/netdev.c            |   3 +-
>  vswitchd/ovs-vswitchd.c |   5 ++
>  5 files changed, 277 insertions(+), 61 deletions(-)
>
...

>  Following the steps above to create a bridge, you can now add DPDK vhost
> -as a port to the vswitch.
> +as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost ports can have
> +arbitrary names.
> +
> +When adding vhost ports to the switch, take care depending on which type of
> +vhost you are using.
>
> -`ovs-vsctl add-port br0 dpdkvhost0 -- set Interface dpdkvhost0 type=dpdkvhost`
> +  -  For vhost-user (default), the name of the port type is `dpdkvhostuser`
> +
> +     ```
> +     ovs-ofctl add-port br0 vhost-user-1 -- set Interface vhost-user-1
> +     type=dpdkvhostuser
> +     ```
>
> -Unlike DPDK ring ports, DPDK vhost ports can have arbitrary names:
> +     This action creates a socket located at
> +     `/usr/local/var/run/openvswitch/vhost-user-1`, which you must provide
> +     to your VM on the QEMU command line. More instructions on this can be
> +     found in the next section "DPDK vhost-user VM configuration"
> +     Note: If you wish for the vhost-user sockets to be created in a
> +     directory other than `/usr/local/var/run/openvswitch`, you may specify
> +     another location on the ovs-vswitchd command line like so:
>
> -`ovs-vsctl add-port br0 port123ABC -- set Interface port123ABC type=dpdkvhost`
> +      `./vswitchd/ovs-vswitchd --dpdk --vhost_sock_dir /my-dir -c 0x1 ...`
>
Since we are going to deprecate cuse in future release, we should
switch the type naming. vhost user should be dpdkvhost and cuse can be
dpdkvhostcuse.

> -However, please note that when attaching userspace devices to QEMU, the
> -name provided during the add-port operation must match the ifname parameter
> -on the QEMU command line.
> +  -  For vhost-cuse, the name of the port type is `dpdkvhost`
>
> +     ```
> +     ovs-ofctl add-port br0 vhost-cuse-1 -- set Interface vhost-cuse-1
> +     type=dpdkvhost
> +     ```
> +
> +     When attaching vhost-cuse ports to QEMU, the name provided during the
> +     add-port operation must match the ifname parameter on the QEMU command
> +     line. More instructions on this can be found in the section "DPDK
> +     vhost-cuse VM configuration"
> +
> +DPDK vhost-user VM configuration:
> +---------------------------------
> +Follow the steps below to attach vhost-user port(s) to a VM.
>
> -DPDK vhost VM configuration:
> -----------------------------
> +1. Configure sockets.
> +   Pass the following parameters to QEMU to attach a vhost-user device:
>
> -   vhost ports use a Linux* character device to communicate with QEMU.
> +   ```
> +   -chardev socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user-1
> +   -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
> +   -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1
> +   ```
> +
> +   ...where vhost-user-1 is the name of the vhost-user port added
> +   to the switch.
> +   Repeat the above parameters for multiple devices, changing the
> +   chardev path and id as necessary. Note that a separate and different
> +   chardev path needs to be specified for each vhost-user device. For
> +   example you have a second vhost-user port named 'vhost-user-2', you
> +   append your QEMU command line with an additional set of parameters:
> +
> +
> +   ```
> +   -chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2
> +   -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce
> +   -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2
> +   ```
> +
> +2. Configure huge pages.
> +   QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a
> +   virtio-net device's virtual rings and packet buffers mapping the VM's
> +   physical memory on hugetlbfs. To enable vhost-ports to map the VM's
> +   memory into their process address space, pass the following paramters
> +   to QEMU:
> +
> +   ```
> +   -object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,
> +   share=on
> +   -numa node,memdev=mem -mem-prealloc
> +   ```
> +
> +DPDK vhost-cuse VM configuration:
> +---------------------------------
> +
> +   vhost-cuse ports use a Linux* character device to communicate with QEMU.
>     By default it is set to `/dev/vhost-net`. It is possible to reuse this
>     standard device for DPDK vhost, which makes setup a little simpler but it
>     is better practice to specify an alternative character device in order to
> @@ -418,16 +521,19 @@ DPDK vhost VM configuration:
>     QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a
>     virtio-net device's virtual rings and packet buffers mapping the VM's
>     physical memory on hugetlbfs. To enable vhost-ports to map the VM's
> -   memory into their process address space, pass the following paramters
> +   memory into their process address space, pass the following parameters
>     to QEMU:
>
>       `-object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,
>        share=on -numa node,memdev=mem -mem-prealloc`
>
> +   Note: For use with an earlier QEMU version such as v1.6.2, use the
> +   following to configure hugepages instead:
>
> -DPDK vhost VM configuration with QEMU wrapper:
> -----------------------------------------------
> +     `-mem-path /dev/hugepages -mem-prealloc`
>
> +DPDK vhost-cuse VM configuration with QEMU wrapper:
> +---------------------------------------------------
>  The QEMU wrapper script automatically detects and calls QEMU with the
>  necessary parameters. It performs the following actions:
>
I think we need to better organize documentation for DPDK vHost
support. User will be mostly interested in vhost user, so there should
be two sections under vHost section. first for vhost user
configuration and next cuse. So that it is easier to follow one setup
in continuous fashion.

> @@ -453,8 +559,8 @@ qemu-wrap.py -cpu host -boot c -hda <disk image> -m 4096 -smp 4
>    netdev=net1,mac=00:00:00:00:00:01
>  ```
>
...
>  /*
>   * Maximum amount of time in micro seconds to try and enqueue to vhost.
> @@ -126,7 +127,8 @@ enum { DRAIN_TSC = 200000ULL };
>
>  enum dpdk_dev_type {
>      DPDK_DEV_ETH = 0,
> -    DPDK_DEV_VHOST = 1
> +    DPDK_DEV_VHOST = 1,
> +    DPDK_DEV_VHOST_USER = 2
>  };
>
>  static int rte_eal_init_ret = ENODEV;
> @@ -204,6 +206,9 @@ struct netdev_dpdk {
>      /* virtio-net structure for vhost device */
>      OVSRCU_TYPE(struct virtio_net *) virtio_dev;
>
> +    /* socket location for vhost-user device */
> +    char socket_path[IF_NAME_SZ];
> +
Since this full path name it is better to use PATH_MAX, rather than
dpdk defined IF_NAME_SZ.

>      /* In dpdk_list. */
>      struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex);
>      rte_spinlock_t txq_lock;
> @@ -530,6 +535,24 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned int port_no,
>      netdev_->n_txq = NR_QUEUE;
>      netdev_->n_rxq = NR_QUEUE;
>
> +    /* Take the name of the vhost-user port and append it to the location where
> +     * the socket is to be created, then register the socket.
> +     */
> +    if (type == DPDK_DEV_VHOST_USER) {
> +        snprintf(netdev->socket_path, sizeof(netdev->socket_path), "%s/%s",
> +                vhost_sock_dir, netdev_->name);
> +        err = rte_vhost_driver_register(netdev->socket_path);
> +        if (err) {
> +            VLOG_ERR("vhost-user socket device setup failure for socket %s\n",
> +                     netdev->socket_path);
> +            goto unlock;
> +        }
> +
> +        VLOG_INFO("Socket %s created for vhost-user port %s\n", netdev->socket_path, netdev_->name);
> +    } else {
> +        strncpy(netdev->socket_path, "", sizeof(netdev->socket_path));
> +    }
> +
This can moved to netdev_dpdk_vhost_user_construct() to avoid checking
for DPDK_DEV_VHOST_USER. This way we can get rid of
DPDK_DEV_VHOST_USER enum value.

>      if (type == DPDK_DEV_ETH) {
>          netdev_dpdk_alloc_txq(netdev, NR_QUEUE);
>          err = dpdk_eth_dev_init(netdev);
> @@ -564,7 +587,7 @@ dpdk_dev_parse_name(const char dev_name[], const char prefix[],
>  }
>
>  static int
> -netdev_dpdk_vhost_construct(struct netdev *netdev_)
> +vhost_construct_helper(struct netdev *netdev_, int type)
>  {
>      int err;
>
...

> +
> +static int
> +netdev_dpdk_vhost_user_construct(struct netdev *netdev_)
> +{
> +     return vhost_construct_helper(netdev_, DPDK_DEV_VHOST_USER);
> +}
> +
> +static int
>  netdev_dpdk_construct(struct netdev *netdev)
>  {
>      unsigned int port_no;
> @@ -1025,7 +1060,7 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid, struct dp_packet **pkts,
>          ovs_mutex_unlock(&dev->mutex);
>      }
>
> -    if (dev->type == DPDK_DEV_VHOST) {
> +    if (dev->type == DPDK_DEV_VHOST || dev->type == DPDK_DEV_VHOST_USER) {

It is slightly simple if DPDK_DEV_VHOST_USER is removed.

>          __netdev_dpdk_vhost_send(netdev, (struct dp_packet **) mbufs, newcnt, true);
>      } else {
>          dpdk_queue_pkts(dev, qid, mbufs, newcnt);
> @@ -1544,15 +1579,17 @@ set_irq_status(struct virtio_net *dev)
>   * A new virtio-net device is added to a vhost port.
>   */
>  static int
> -new_device(struct virtio_net *dev)
> +new_device_helper(struct virtio_net *dev, bool sock, int size)
>  {
>      struct netdev_dpdk *netdev;
>      bool exists = false;
> +    char* netdev_name;
>
>      ovs_mutex_lock(&dpdk_mutex);
>      /* Add device to the vhost port with the same name as that passed down. */
>      LIST_FOR_EACH(netdev, list_node, &dpdk_list) {
> -        if (strncmp(dev->ifname, netdev->up.name, IFNAMSIZ) == 0) {
> +        netdev_name = sock ? netdev->socket_path : netdev->up.name;
> +        if (strncmp(dev->ifname, netdev_name, size) == 0) {
We can store dev->ifname for vhost-cuse devices. So that new_device
can be a common function for both type of devices.

>              ovs_mutex_lock(&netdev->mutex);
>              ovsrcu_set(&netdev->virtio_dev, dev);
>              ovs_mutex_unlock(&netdev->mutex);
> @@ -1577,6 +1614,18 @@ new_device(struct virtio_net *dev)
>      return 0;
>  }
>
> +static int
> +new_device(struct virtio_net *dev)
> +{
> +    return new_device_helper(dev, false, IFNAMSIZ);
> +}
> +
> +static int
> +new_device_vhost_user(struct virtio_net *dev)
> +{
> +    return new_device_helper(dev, true, IF_NAME_SZ);
> +}
> +
>  /*
>   * Remove a virtio-net device from the specific vhost port.  Use dev->remove
>   * flag to stop any more packets from being sent or received to/from a VM and
> @@ -1631,8 +1680,14 @@ static const struct virtio_net_device_ops virtio_net_device_ops =
>      .destroy_device = destroy_device,
>  };
>
> +const struct virtio_net_device_ops virtio_net_device_ops_vhost_user =
> +{
> +    .new_device =  new_device_vhost_user,
> +    .destroy_device = destroy_device,
> +};
> +
>  static void *
> -start_cuse_session_loop(void *dummy OVS_UNUSED)
> +start_vhost_loop(void *dummy OVS_UNUSED)
>  {
>       pthread_detach(pthread_self());
>       /* Put the cuse thread into quiescent state. */
> @@ -1659,7 +1714,16 @@ dpdk_vhost_class_init(void)
>          return -1;
>      }
>
> -    ovs_thread_create("cuse_thread", start_cuse_session_loop, NULL);
> +    ovs_thread_create("vhost_thread", start_vhost_loop, NULL);
> +    return 0;
> +}
> +
> +static int
> +dpdk_vhost_user_class_init(void)
> +{
> +    rte_vhost_driver_callback_register(&virtio_net_device_ops_vhost_user);
> +
> +    ovs_thread_create("vhost_thread", start_vhost_loop, NULL);
>      return 0;
>  }
>
> @@ -1871,6 +1935,32 @@ unlock_dpdk:
>      NULL,                       /* rxq_drain */               \
>  }
>
> +static int
> +process_vhost_flags(char* flag, char* default_val, int size, char** argv, char** new_val)
> +{
> +    int changed = 0;
> +
> +    /* Depending on which version of vhost is in use, process the vhost-specific
> +     * flag if it is provided on the vswitchd command line, otherwise resort to
> +     * a default value.
> +     *
> +     * For vhost-user: Process "--cuse_dev_name" to set the custom location of
> +     * the vhost-user socket(s).
> +     * For vhost-cuse: Process "--vhost_sock_dir" to set the custom name of the
> +     * vhost-cuse character device.
> +     */
> +    if (!strcmp(argv[1], flag) && (strlen(argv[2]) <= size)) {
> +        *new_val = strdup(argv[2]);
> +        VLOG_INFO("User-provided %s in use: %s", flag, *new_val);
> +        changed = 1;
> +    } else {
> +        *new_val = default_val;
> +        VLOG_INFO("No %s provided - defaulting to %s", flag, default_val);
> +    }
> +
> +    return changed;
> +}
> +
>  int
>  dpdk_init(int argc, char **argv)
>  {
> @@ -1885,27 +1975,20 @@ dpdk_init(int argc, char **argv)
>      argc--;
>      argv++;
>
> -    /* If the cuse_dev_name parameter has been provided, set 'cuse_dev_name' to
> -     * this string if it meets the correct criteria. Otherwise, set it to the
> -     * default (vhost-net).
> -     */
> -    if (!strcmp(argv[1], "--cuse_dev_name") &&
> -        (strlen(argv[2]) <= NAME_MAX)) {
> -
> -        cuse_dev_name = strdup(argv[2]);
> -
> -        /* Remove the cuse_dev_name configuration parameters from the argument
> +#ifdef VHOST_CUSE
> +    if (process_vhost_flags("--cuse_dev_name", strdup("vhost-net"),
> +            PATH_MAX, argv, &cuse_dev_name)) {
> +#else
> +    if (process_vhost_flags("--vhost_sock_dir", strdup(ovs_rundir()),
> +            NAME_MAX, argv, &vhost_sock_dir)) {

We need to check if the directory exist so that we can reject invalid
directory at vswitchd startup.

> +#endif
> +        /* Remove the vhost flag configuration parameters from the argument
>           * list, so that the correct elements are passed to the DPDK
>           * initialization function
>           */
>          argc -= 2;
> -        argv += 2;    /* Increment by two to bypass the cuse_dev_name arguments */
> +        argv += 2;    /* Increment by two to bypass the vhost flag arguments */
>          base = 2;
> -
> -        VLOG_ERR("User-provided cuse_dev_name in use: /dev/%s", cuse_dev_name);
> -    } else {
> -        cuse_dev_name = "vhost-net";
> -        VLOG_INFO("No cuse_dev_name provided - defaulting to /dev/vhost-net");
>      }
>
>      /* Keep the program name argument as this is needed for call to
> @@ -1974,6 +2057,20 @@ static const struct netdev_class dpdk_vhost_class =
>          NULL,
>          netdev_dpdk_vhost_rxq_recv);
>
> +const struct netdev_class dpdk_vhost_user_class =
> +    NETDEV_DPDK_CLASS(
> +        "dpdkvhostuser",
> +        dpdk_vhost_user_class_init,
> +        netdev_dpdk_vhost_user_construct,
> +        netdev_dpdk_vhost_destruct,
> +        netdev_dpdk_vhost_set_multiq,
> +        netdev_dpdk_vhost_send,
> +        netdev_dpdk_vhost_get_carrier,
> +        netdev_dpdk_vhost_get_stats,
> +        NULL,
> +        NULL,
> +        netdev_dpdk_vhost_rxq_recv);
> +
Since cuse suport is going to be around for some time, All cuse
related functions should have cuse in its name like
netdev_dpdk_vhost_user* function. Common functions can be just as it
just netdev_dpdk_vhost_*

>  void
>  netdev_dpdk_register(void)
>  {
> @@ -1987,7 +2084,11 @@ netdev_dpdk_register(void)
>          dpdk_common_init();
>          netdev_register_provider(&dpdk_class);
>          netdev_register_provider(&dpdk_ring_class);
> +#ifdef VHOST_CUSE
>          netdev_register_provider(&dpdk_vhost_class);
> +#else
> +        netdev_register_provider(&dpdk_vhost_user_class);
> +#endif
>          ovsthread_once_done(&once);
>      }
>  }
> diff --git a/lib/netdev.c b/lib/netdev.c
> index 45f7f29..24351b1 100644
> --- a/lib/netdev.c
> +++ b/lib/netdev.c
> @@ -111,7 +111,8 @@ netdev_is_pmd(const struct netdev *netdev)
>  {
>      return (!strcmp(netdev->netdev_class->type, "dpdk") ||
>              !strcmp(netdev->netdev_class->type, "dpdkr") ||
> -            !strcmp(netdev->netdev_class->type, "dpdkvhost"));
> +            !strcmp(netdev->netdev_class->type, "dpdkvhost") ||
> +            !strcmp(netdev->netdev_class->type, "dpdkvhostuser"));
>  }
>
>  static void
> diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c
> index a1b33da..48651df 100644
> --- a/vswitchd/ovs-vswitchd.c
> +++ b/vswitchd/ovs-vswitchd.c
> @@ -253,8 +253,13 @@ usage(void)
>      vlog_usage();
>      printf("\nDPDK options:\n"
>             "  --dpdk options            Initialize DPDK datapath.\n"
> +#ifdef VHOST_CUSE
>             "  --cuse_dev_name BASENAME  override default character device name\n"
>             "                            for use with userspace vHost.\n");
> +#else
> +           "  --vhost_sock_dir DIR      override default directory where\n"
> +           "                            vhost-user sockets are created.\n");

since --cuse_dev_name and --vhost_sock_dir is sub-argument under
--dpdk, it should have single hyphen prefix.



More information about the dev mailing list