[ovs-dev] [PATCH RFC v6 1/1] netdev-dpdk: add dpdk vhost ports

Michael S. Tsirkin mst at redhat.com
Wed Jan 21 11:19:07 UTC 2015


On Thu, Jan 08, 2015 at 11:05:02PM +0000, Kevin Traynor wrote:
> This patch adds support for a new port type to userspace datapath
> called dpdkvhost. This allows KVM (QEMU) to offload the servicing
> of virtio-net devices to its associated dpdkvhost port. Instructions
> for use are in INSTALL.DPDK.
> 
> This has been tested on Intel multi-core platforms and with clients
> that have virtio-net interfaces.
> 
>  ver 6:
>    - rebased with master
>    - modified to use DPDK v1.8.0 vhost library
>    - reworked for review comments
>  ver 5:
>    - rebased against latest master
>  ver 4:
>    - added eventfd_link.h and eventfd_link.c to EXTRA_DIST in
>  utilities/automake.mk
>    - rebased with master to work with DPDK 1.7 ver 3:
>    - rebased with master
>  ver 2:
>    - rebased with master
> 
> Signed-off-by: Ciara Loftus <ciara.loftus at intel.com>
> Signed-off-by: Kevin Traynor <kevin.traynor at intel.com>
> Signed-off-by: Maryam Tahhan <maryam.tahhan at intel.com>
> ---
>  INSTALL.DPDK.md         |  236 +++++++++++++++++
>  Makefile.am             |    4 +
>  lib/automake.mk         |    1 +
>  lib/netdev-dpdk.c       |  649 +++++++++++++++++++++++++++++++++++++++--------
>  lib/netdev.c            |    3 +-
>  utilities/automake.mk   |    3 +-
>  utilities/qemu-wrap.py  |  389 ++++++++++++++++++++++++++++
>  vswitchd/ovs-vswitchd.c |    4 +-
>  8 files changed, 1177 insertions(+), 112 deletions(-)
>  mode change 100644 => 100755 lib/netdev-dpdk.c
>  create mode 100755 utilities/qemu-wrap.py
> 
> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> index 2cc7636..da8116d 100644
> --- a/INSTALL.DPDK.md
> +++ b/INSTALL.DPDK.md
> @@ -17,6 +17,7 @@ Building and Installing:
>  ------------------------
>  
>  Required DPDK 1.7
> +Optional `fuse`, `fuse-devel`
>  
>  1. Configure build & install DPDK:
>    1. Set `$DPDK_DIR`
> @@ -264,6 +265,241 @@ A general rule of thumb for better performance is that the client
>  application should not be assigned the same dpdk core mask "-c" as
>  the vswitchd.
>  
> +DPDK vHost:
> +-----------
> +
> +Prerequisites:
> +1.  DPDK 1.8 with vHost support enabled and recompile OVS as above.
> +
> +     Update `config/common_linuxapp` so that DPDK is built with vHost
> +     libraries:
> +
> +     `CONFIG_RTE_LIBRTE_VHOST=y`
> +
> +2.  Insert the Fuse module:
> +
> +      `modprobe fuse`
> +
> +3.  Build and insert the `eventfd_link` module:
> +
> +     `cd $DPDK_DIR/lib/librte_vhost/eventfd_link/`
> +     `make`
> +     `insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko`
> +
> +4.  Remove /dev/vhost-net character device:
> +
> +      `rm -rf /dev/vhost-net`

I think it's not a good idea to tell people to do this,
best to drop this section and put "with standard vhost"
here instead.

> +
> +Following the steps above to create a bridge, you can now add DPDK vHost
> +as a port to the vswitch.
> +
> +`ovs-vsctl add-port br0 dpdkvhost0 -- set Interface dpdkvhost0 type=dpdkvhost`
> +
> +Unlike DPDK ring ports, DPDK vHost ports can have arbitrary names:
> +
> +`ovs-vsctl add-port br0 port123ABC -- set Interface port123ABC type=dpdkvhost`
> +
> +However, please note that when attaching userspace devices to QEMU, the
> +name provided during the add-port operation must match the ifname parameter
> +on the QEMU command line.
> +
> +DPDK vHost VM configuration:
> +----------------------------
> +
> +1. Configure virtio-net adaptors:
> +   The guest must be configured with virtio-net adapters and offloads
> +   MUST BE DISABLED.

Any plans to address this?

> +    This means the following parameters should be passed
> +   to the QEMU binary:
> +
> +     ```
> +     -netdev tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on
> +     -device virtio-net-pci,netdev=net1,mac=<mac>,csum=off,gso=off,
> +     guest_tso4=off,guest_tso6=off,guest_ecn=off
> +     ```
> +
> +     Repeat the above parameters for multiple devices.
> +
> +2. Configure huge pages:
> +   QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a
> +   virtio-net device's virtual rings and packet buffers mapping the VM's
> +   physical memory on hugetlbfs. To enable vhost-ports to map the VM's
> +   memory into their process address space, pass the following paramters
> +   to QEMU:
> +
> +     `-mem-path /dev/hugepages -mem-prealloc`

I guess you also need to request MAP_SHARED mappings - otherwise
I think you won't be able to poke at them.

> +
> +DPDK vHost with standard vHost:
> +-------------------------------
> +
> +DPDK vHost ports use a Linux* character device to communicate with QEMU.
> +By default it is set to `/dev/vhost-net`. This conflicts with the kernel
> +vHost device, hence the need to remove `/dev/vhost-net` above. However,
> +if you wish to use kernel vhost in parallel, you can specify an
> +alternative basename on the vswitchd command line like so:
> +
> +     `./vswitchd/ovs-vswitchd --dpdk --basename my-vhost-net -c 0x1 ...`
> +
> +Note that the basename arguement and associated string must be the first
> +arguements after `--dpdk` and come before the EAL arguements.
> +
> +DPDK vHost VM configuration with standard vHost:
> +------------------------------------------------
> +
> +1. As with the "normal" (i.e. using `/dev/vhost-net`) DPDK vHost setup,
> +the guest must be configured with virtio-net adapters and offloads
> +MUST BE DISABLED. However, this time you must also pass in a `vhostfd`
> +argument:
> +
> +     ```
> +     -netdev tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on,
> +     vhostfd=<open_fd>
> +     -device virtio-net-pci,netdev=net1,mac=<mac>,csum=off,gso=off,
> +     guest_tso4=off,guest_tso6=off,guest_ecn=off
> +     ```
> +
> +     The open file descriptor must be passed to QEMU running as a child
> +     process.

You might as well tell people how to do this. E.g. with bash:

vhostfd=42 42<>/path/to/vhost/chardev

42 is, of course, The Answer.


> +2. As above, QEMU must allocate the VM's memory on hugetlbfs:
> +
> +     `-mem-path /dev/hugepages -mem-prealloc`
> +
> +3. (Optional) If you are using libvirt, you must enable libvirt to access
> +the userspace device file by adding it to controllers cgroup for libvirtd
> +using the following steps:
> +
> +     1. In `/etc/libvirt/qemu.conf` add/edit the following lines:
> +
> +        ```
> +        1) cgroup_controllers = [ ... "devices", ... ]
> +        2) clear_emulator_capabilities = 0
> +        3) user = "root"
> +        4) group = "root"
> +        5) cgroup_device_acl = [
> +               "/dev/null", "/dev/full", "/dev/zero",
> +               "/dev/random", "/dev/urandom",
> +               "/dev/ptmx", "/dev/kvm", "/dev/kqemu",
> +               "/dev/rtc", "/dev/hpet", "/dev/net/tun",
> +               "/dev/<devbase-name>-<index>",
> +               "/dev/hugepages"]
> +        ```
> +
> +     2. Disable SELinux or set to permissive mode


It's a work-around, but the right thing to do is really
to write up correct selinux policies.
Any plans to do this?

> +     3. Mount cgroup device controller:
> +
> +        ```
> +        mkdir /dev/cgroup
> +        mount -t cgroup none /dev/cgroup -o devices
> +        ```
> +
> +     4. Restart the libvirtd process
> +        For example, on Fedora:
> +
> +          `systemctl restart libvirtd.service`
> +
> +The easiest way to setup a Guest that isn't using `/dev/vhost-net` is to
> +use the `qemu-wrap.py` script located in utilities. This Python script
> +automates the requirements specified above and can be used in conjunction
> +with libvirt.

I notice that new libvirt versions should have ability to specify everything
directly in the conf, that would be preferable if available.
Should be documented too?

> +
> +DPDK vHost VM configuration with QEMU wrapper:

...





More information about the dev mailing list