[ovs-dev] [PATCH RFC v6 1/1] netdev-dpdk: add dpdk vhost ports
Michael S. Tsirkin
mst at redhat.com
Wed Jan 21 11:19:07 UTC 2015
On Thu, Jan 08, 2015 at 11:05:02PM +0000, Kevin Traynor wrote:
> This patch adds support for a new port type to userspace datapath
> called dpdkvhost. This allows KVM (QEMU) to offload the servicing
> of virtio-net devices to its associated dpdkvhost port. Instructions
> for use are in INSTALL.DPDK.
>
> This has been tested on Intel multi-core platforms and with clients
> that have virtio-net interfaces.
>
> ver 6:
> - rebased with master
> - modified to use DPDK v1.8.0 vhost library
> - reworked for review comments
> ver 5:
> - rebased against latest master
> ver 4:
> - added eventfd_link.h and eventfd_link.c to EXTRA_DIST in
> utilities/automake.mk
> - rebased with master to work with DPDK 1.7 ver 3:
> - rebased with master
> ver 2:
> - rebased with master
>
> Signed-off-by: Ciara Loftus <ciara.loftus at intel.com>
> Signed-off-by: Kevin Traynor <kevin.traynor at intel.com>
> Signed-off-by: Maryam Tahhan <maryam.tahhan at intel.com>
> ---
> INSTALL.DPDK.md | 236 +++++++++++++++++
> Makefile.am | 4 +
> lib/automake.mk | 1 +
> lib/netdev-dpdk.c | 649 +++++++++++++++++++++++++++++++++++++++--------
> lib/netdev.c | 3 +-
> utilities/automake.mk | 3 +-
> utilities/qemu-wrap.py | 389 ++++++++++++++++++++++++++++
> vswitchd/ovs-vswitchd.c | 4 +-
> 8 files changed, 1177 insertions(+), 112 deletions(-)
> mode change 100644 => 100755 lib/netdev-dpdk.c
> create mode 100755 utilities/qemu-wrap.py
>
> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> index 2cc7636..da8116d 100644
> --- a/INSTALL.DPDK.md
> +++ b/INSTALL.DPDK.md
> @@ -17,6 +17,7 @@ Building and Installing:
> ------------------------
>
> Required DPDK 1.7
> +Optional `fuse`, `fuse-devel`
>
> 1. Configure build & install DPDK:
> 1. Set `$DPDK_DIR`
> @@ -264,6 +265,241 @@ A general rule of thumb for better performance is that the client
> application should not be assigned the same dpdk core mask "-c" as
> the vswitchd.
>
> +DPDK vHost:
> +-----------
> +
> +Prerequisites:
> +1. DPDK 1.8 with vHost support enabled and recompile OVS as above.
> +
> + Update `config/common_linuxapp` so that DPDK is built with vHost
> + libraries:
> +
> + `CONFIG_RTE_LIBRTE_VHOST=y`
> +
> +2. Insert the Fuse module:
> +
> + `modprobe fuse`
> +
> +3. Build and insert the `eventfd_link` module:
> +
> + `cd $DPDK_DIR/lib/librte_vhost/eventfd_link/`
> + `make`
> + `insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko`
> +
> +4. Remove /dev/vhost-net character device:
> +
> + `rm -rf /dev/vhost-net`
I think it's not a good idea to tell people to do this,
best to drop this section and put "with standard vhost"
here instead.
> +
> +Following the steps above to create a bridge, you can now add DPDK vHost
> +as a port to the vswitch.
> +
> +`ovs-vsctl add-port br0 dpdkvhost0 -- set Interface dpdkvhost0 type=dpdkvhost`
> +
> +Unlike DPDK ring ports, DPDK vHost ports can have arbitrary names:
> +
> +`ovs-vsctl add-port br0 port123ABC -- set Interface port123ABC type=dpdkvhost`
> +
> +However, please note that when attaching userspace devices to QEMU, the
> +name provided during the add-port operation must match the ifname parameter
> +on the QEMU command line.
> +
> +DPDK vHost VM configuration:
> +----------------------------
> +
> +1. Configure virtio-net adaptors:
> + The guest must be configured with virtio-net adapters and offloads
> + MUST BE DISABLED.
Any plans to address this?
> + This means the following parameters should be passed
> + to the QEMU binary:
> +
> + ```
> + -netdev tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on
> + -device virtio-net-pci,netdev=net1,mac=<mac>,csum=off,gso=off,
> + guest_tso4=off,guest_tso6=off,guest_ecn=off
> + ```
> +
> + Repeat the above parameters for multiple devices.
> +
> +2. Configure huge pages:
> + QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a
> + virtio-net device's virtual rings and packet buffers mapping the VM's
> + physical memory on hugetlbfs. To enable vhost-ports to map the VM's
> + memory into their process address space, pass the following paramters
> + to QEMU:
> +
> + `-mem-path /dev/hugepages -mem-prealloc`
I guess you also need to request MAP_SHARED mappings - otherwise
I think you won't be able to poke at them.
> +
> +DPDK vHost with standard vHost:
> +-------------------------------
> +
> +DPDK vHost ports use a Linux* character device to communicate with QEMU.
> +By default it is set to `/dev/vhost-net`. This conflicts with the kernel
> +vHost device, hence the need to remove `/dev/vhost-net` above. However,
> +if you wish to use kernel vhost in parallel, you can specify an
> +alternative basename on the vswitchd command line like so:
> +
> + `./vswitchd/ovs-vswitchd --dpdk --basename my-vhost-net -c 0x1 ...`
> +
> +Note that the basename arguement and associated string must be the first
> +arguements after `--dpdk` and come before the EAL arguements.
> +
> +DPDK vHost VM configuration with standard vHost:
> +------------------------------------------------
> +
> +1. As with the "normal" (i.e. using `/dev/vhost-net`) DPDK vHost setup,
> +the guest must be configured with virtio-net adapters and offloads
> +MUST BE DISABLED. However, this time you must also pass in a `vhostfd`
> +argument:
> +
> + ```
> + -netdev tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on,
> + vhostfd=<open_fd>
> + -device virtio-net-pci,netdev=net1,mac=<mac>,csum=off,gso=off,
> + guest_tso4=off,guest_tso6=off,guest_ecn=off
> + ```
> +
> + The open file descriptor must be passed to QEMU running as a child
> + process.
You might as well tell people how to do this. E.g. with bash:
vhostfd=42 42<>/path/to/vhost/chardev
42 is, of course, The Answer.
> +2. As above, QEMU must allocate the VM's memory on hugetlbfs:
> +
> + `-mem-path /dev/hugepages -mem-prealloc`
> +
> +3. (Optional) If you are using libvirt, you must enable libvirt to access
> +the userspace device file by adding it to controllers cgroup for libvirtd
> +using the following steps:
> +
> + 1. In `/etc/libvirt/qemu.conf` add/edit the following lines:
> +
> + ```
> + 1) cgroup_controllers = [ ... "devices", ... ]
> + 2) clear_emulator_capabilities = 0
> + 3) user = "root"
> + 4) group = "root"
> + 5) cgroup_device_acl = [
> + "/dev/null", "/dev/full", "/dev/zero",
> + "/dev/random", "/dev/urandom",
> + "/dev/ptmx", "/dev/kvm", "/dev/kqemu",
> + "/dev/rtc", "/dev/hpet", "/dev/net/tun",
> + "/dev/<devbase-name>-<index>",
> + "/dev/hugepages"]
> + ```
> +
> + 2. Disable SELinux or set to permissive mode
It's a work-around, but the right thing to do is really
to write up correct selinux policies.
Any plans to do this?
> + 3. Mount cgroup device controller:
> +
> + ```
> + mkdir /dev/cgroup
> + mount -t cgroup none /dev/cgroup -o devices
> + ```
> +
> + 4. Restart the libvirtd process
> + For example, on Fedora:
> +
> + `systemctl restart libvirtd.service`
> +
> +The easiest way to setup a Guest that isn't using `/dev/vhost-net` is to
> +use the `qemu-wrap.py` script located in utilities. This Python script
> +automates the requirements specified above and can be used in conjunction
> +with libvirt.
I notice that new libvirt versions should have ability to specify everything
directly in the conf, that would be preferable if available.
Should be documented too?
> +
> +DPDK vHost VM configuration with QEMU wrapper:
...
More information about the dev
mailing list