[ovs-dev] [PATCH RFC v6 1/1] netdev-dpdk: add dpdk vhost ports
Traynor, Kevin
kevin.traynor at intel.com
Thu Feb 12 12:59:17 UTC 2015
> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst at redhat.com]
> Sent: Wednesday, January 21, 2015 11:19 AM
> To: Traynor, Kevin
> Cc: dev at openvswitch.org
> Subject: Re: [ovs-dev] [PATCH RFC v6 1/1] netdev-dpdk: add dpdk vhost ports
>
> On Thu, Jan 08, 2015 at 11:05:02PM +0000, Kevin Traynor wrote:
> > This patch adds support for a new port type to userspace datapath
> > called dpdkvhost. This allows KVM (QEMU) to offload the servicing
> > of virtio-net devices to its associated dpdkvhost port. Instructions
> > for use are in INSTALL.DPDK.
> >
> > This has been tested on Intel multi-core platforms and with clients
> > that have virtio-net interfaces.
> >
> > ver 6:
> > - rebased with master
> > - modified to use DPDK v1.8.0 vhost library
> > - reworked for review comments
> > ver 5:
> > - rebased against latest master
> > ver 4:
> > - added eventfd_link.h and eventfd_link.c to EXTRA_DIST in
> > utilities/automake.mk
> > - rebased with master to work with DPDK 1.7 ver 3:
> > - rebased with master
> > ver 2:
> > - rebased with master
> >
> > Signed-off-by: Ciara Loftus <ciara.loftus at intel.com>
> > Signed-off-by: Kevin Traynor <kevin.traynor at intel.com>
> > Signed-off-by: Maryam Tahhan <maryam.tahhan at intel.com>
> > ---
> > INSTALL.DPDK.md | 236 +++++++++++++++++
> > Makefile.am | 4 +
> > lib/automake.mk | 1 +
> > lib/netdev-dpdk.c | 649 +++++++++++++++++++++++++++++++++++++++--------
> > lib/netdev.c | 3 +-
> > utilities/automake.mk | 3 +-
> > utilities/qemu-wrap.py | 389 ++++++++++++++++++++++++++++
> > vswitchd/ovs-vswitchd.c | 4 +-
> > 8 files changed, 1177 insertions(+), 112 deletions(-)
> > mode change 100644 => 100755 lib/netdev-dpdk.c
> > create mode 100755 utilities/qemu-wrap.py
> >
> > diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> > index 2cc7636..da8116d 100644
> > --- a/INSTALL.DPDK.md
> > +++ b/INSTALL.DPDK.md
> > @@ -17,6 +17,7 @@ Building and Installing:
> > ------------------------
> >
> > Required DPDK 1.7
> > +Optional `fuse`, `fuse-devel`
> >
> > 1. Configure build & install DPDK:
> > 1. Set `$DPDK_DIR`
> > @@ -264,6 +265,241 @@ A general rule of thumb for better performance is that the client
> > application should not be assigned the same dpdk core mask "-c" as
> > the vswitchd.
> >
> > +DPDK vHost:
> > +-----------
> > +
> > +Prerequisites:
> > +1. DPDK 1.8 with vHost support enabled and recompile OVS as above.
> > +
> > + Update `config/common_linuxapp` so that DPDK is built with vHost
> > + libraries:
> > +
> > + `CONFIG_RTE_LIBRTE_VHOST=y`
> > +
> > +2. Insert the Fuse module:
> > +
> > + `modprobe fuse`
> > +
> > +3. Build and insert the `eventfd_link` module:
> > +
> > + `cd $DPDK_DIR/lib/librte_vhost/eventfd_link/`
> > + `make`
> > + `insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko`
> > +
> > +4. Remove /dev/vhost-net character device:
> > +
> > + `rm -rf /dev/vhost-net`
>
> I think it's not a good idea to tell people to do this,
> best to drop this section and put "with standard vhost"
> here instead.
Not clear what you'd like to see dropped? This will be necessary
if using the default vhost file, so can change to make that clearer.
>
> > +
> > +Following the steps above to create a bridge, you can now add DPDK vHost
> > +as a port to the vswitch.
> > +
> > +`ovs-vsctl add-port br0 dpdkvhost0 -- set Interface dpdkvhost0 type=dpdkvhost`
> > +
> > +Unlike DPDK ring ports, DPDK vHost ports can have arbitrary names:
> > +
> > +`ovs-vsctl add-port br0 port123ABC -- set Interface port123ABC type=dpdkvhost`
> > +
> > +However, please note that when attaching userspace devices to QEMU, the
> > +name provided during the add-port operation must match the ifname parameter
> > +on the QEMU command line.
> > +
> > +DPDK vHost VM configuration:
> > +----------------------------
> > +
> > +1. Configure virtio-net adaptors:
> > + The guest must be configured with virtio-net adapters and offloads
> > + MUST BE DISABLED.
>
> Any plans to address this?
There's no plans at present
>
> > + This means the following parameters should be passed
> > + to the QEMU binary:
> > +
> > + ```
> > + -netdev tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on
> > + -device virtio-net-pci,netdev=net1,mac=<mac>,csum=off,gso=off,
> > + guest_tso4=off,guest_tso6=off,guest_ecn=off
> > + ```
> > +
> > + Repeat the above parameters for multiple devices.
> > +
> > +2. Configure huge pages:
> > + QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a
> > + virtio-net device's virtual rings and packet buffers mapping the VM's
> > + physical memory on hugetlbfs. To enable vhost-ports to map the VM's
> > + memory into their process address space, pass the following paramters
> > + to QEMU:
> > +
> > + `-mem-path /dev/hugepages -mem-prealloc`
>
> I guess you also need to request MAP_SHARED mappings - otherwise
> I think you won't be able to poke at them.
Ok, it depends on version of QEMU, so we can call that out. We've tested
with QEMU 1.6.2
>
> > +
> > +DPDK vHost with standard vHost:
> > +-------------------------------
> > +
> > +DPDK vHost ports use a Linux* character device to communicate with QEMU.
> > +By default it is set to `/dev/vhost-net`. This conflicts with the kernel
> > +vHost device, hence the need to remove `/dev/vhost-net` above. However,
> > +if you wish to use kernel vhost in parallel, you can specify an
> > +alternative basename on the vswitchd command line like so:
> > +
> > + `./vswitchd/ovs-vswitchd --dpdk --basename my-vhost-net -c 0x1 ...`
> > +
> > +Note that the basename arguement and associated string must be the first
> > +arguements after `--dpdk` and come before the EAL arguements.
> > +
> > +DPDK vHost VM configuration with standard vHost:
> > +------------------------------------------------
> > +
> > +1. As with the "normal" (i.e. using `/dev/vhost-net`) DPDK vHost setup,
> > +the guest must be configured with virtio-net adapters and offloads
> > +MUST BE DISABLED. However, this time you must also pass in a `vhostfd`
> > +argument:
> > +
> > + ```
> > + -netdev tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on,
> > + vhostfd=<open_fd>
> > + -device virtio-net-pci,netdev=net1,mac=<mac>,csum=off,gso=off,
> > + guest_tso4=off,guest_tso6=off,guest_ecn=off
> > + ```
> > +
> > + The open file descriptor must be passed to QEMU running as a child
> > + process.
>
> You might as well tell people how to do this. E.g. with bash:
>
> vhostfd=42 42<>/path/to/vhost/chardev
>
> 42 is, of course, The Answer.
True - I think we need to distinguish between default and specified
file better.
>
>
> > +2. As above, QEMU must allocate the VM's memory on hugetlbfs:
> > +
> > + `-mem-path /dev/hugepages -mem-prealloc`
> > +
> > +3. (Optional) If you are using libvirt, you must enable libvirt to access
> > +the userspace device file by adding it to controllers cgroup for libvirtd
> > +using the following steps:
> > +
> > + 1. In `/etc/libvirt/qemu.conf` add/edit the following lines:
> > +
> > + ```
> > + 1) cgroup_controllers = [ ... "devices", ... ]
> > + 2) clear_emulator_capabilities = 0
> > + 3) user = "root"
> > + 4) group = "root"
> > + 5) cgroup_device_acl = [
> > + "/dev/null", "/dev/full", "/dev/zero",
> > + "/dev/random", "/dev/urandom",
> > + "/dev/ptmx", "/dev/kvm", "/dev/kqemu",
> > + "/dev/rtc", "/dev/hpet", "/dev/net/tun",
> > + "/dev/<devbase-name>-<index>",
> > + "/dev/hugepages"]
> > + ```
> > +
> > + 2. Disable SELinux or set to permissive mode
>
>
> It's a work-around, but the right thing to do is really
> to write up correct selinux policies.
> Any plans to do this?
No plans for this at present
>
> > + 3. Mount cgroup device controller:
> > +
> > + ```
> > + mkdir /dev/cgroup
> > + mount -t cgroup none /dev/cgroup -o devices
> > + ```
> > +
> > + 4. Restart the libvirtd process
> > + For example, on Fedora:
> > +
> > + `systemctl restart libvirtd.service`
> > +
> > +The easiest way to setup a Guest that isn't using `/dev/vhost-net` is to
> > +use the `qemu-wrap.py` script located in utilities. This Python script
> > +automates the requirements specified above and can be used in conjunction
> > +with libvirt.
>
> I notice that new libvirt versions should have ability to specify everything
> directly in the conf, that would be preferable if available.
> Should be documented too?
It's not something we've looked at, but will bring it up with the dpdk team
>
> > +
> > +DPDK vHost VM configuration with QEMU wrapper:
>
> ...
>
More information about the dev
mailing list