[ovs-dev] [PATCH RFC v6 1/1] netdev-dpdk: add dpdk vhost ports

Traynor, Kevin kevin.traynor at intel.com
Thu Feb 12 12:59:17 UTC 2015


> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst at redhat.com]
> Sent: Wednesday, January 21, 2015 11:19 AM
> To: Traynor, Kevin
> Cc: dev at openvswitch.org
> Subject: Re: [ovs-dev] [PATCH RFC v6 1/1] netdev-dpdk: add dpdk vhost ports
> 
> On Thu, Jan 08, 2015 at 11:05:02PM +0000, Kevin Traynor wrote:
> > This patch adds support for a new port type to userspace datapath
> > called dpdkvhost. This allows KVM (QEMU) to offload the servicing
> > of virtio-net devices to its associated dpdkvhost port. Instructions
> > for use are in INSTALL.DPDK.
> >
> > This has been tested on Intel multi-core platforms and with clients
> > that have virtio-net interfaces.
> >
> >  ver 6:
> >    - rebased with master
> >    - modified to use DPDK v1.8.0 vhost library
> >    - reworked for review comments
> >  ver 5:
> >    - rebased against latest master
> >  ver 4:
> >    - added eventfd_link.h and eventfd_link.c to EXTRA_DIST in
> >  utilities/automake.mk
> >    - rebased with master to work with DPDK 1.7 ver 3:
> >    - rebased with master
> >  ver 2:
> >    - rebased with master
> >
> > Signed-off-by: Ciara Loftus <ciara.loftus at intel.com>
> > Signed-off-by: Kevin Traynor <kevin.traynor at intel.com>
> > Signed-off-by: Maryam Tahhan <maryam.tahhan at intel.com>
> > ---
> >  INSTALL.DPDK.md         |  236 +++++++++++++++++
> >  Makefile.am             |    4 +
> >  lib/automake.mk         |    1 +
> >  lib/netdev-dpdk.c       |  649 +++++++++++++++++++++++++++++++++++++++--------
> >  lib/netdev.c            |    3 +-
> >  utilities/automake.mk   |    3 +-
> >  utilities/qemu-wrap.py  |  389 ++++++++++++++++++++++++++++
> >  vswitchd/ovs-vswitchd.c |    4 +-
> >  8 files changed, 1177 insertions(+), 112 deletions(-)
> >  mode change 100644 => 100755 lib/netdev-dpdk.c
> >  create mode 100755 utilities/qemu-wrap.py
> >
> > diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> > index 2cc7636..da8116d 100644
> > --- a/INSTALL.DPDK.md
> > +++ b/INSTALL.DPDK.md
> > @@ -17,6 +17,7 @@ Building and Installing:
> >  ------------------------
> >
> >  Required DPDK 1.7
> > +Optional `fuse`, `fuse-devel`
> >
> >  1. Configure build & install DPDK:
> >    1. Set `$DPDK_DIR`
> > @@ -264,6 +265,241 @@ A general rule of thumb for better performance is that the client
> >  application should not be assigned the same dpdk core mask "-c" as
> >  the vswitchd.
> >
> > +DPDK vHost:
> > +-----------
> > +
> > +Prerequisites:
> > +1.  DPDK 1.8 with vHost support enabled and recompile OVS as above.
> > +
> > +     Update `config/common_linuxapp` so that DPDK is built with vHost
> > +     libraries:
> > +
> > +     `CONFIG_RTE_LIBRTE_VHOST=y`
> > +
> > +2.  Insert the Fuse module:
> > +
> > +      `modprobe fuse`
> > +
> > +3.  Build and insert the `eventfd_link` module:
> > +
> > +     `cd $DPDK_DIR/lib/librte_vhost/eventfd_link/`
> > +     `make`
> > +     `insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko`
> > +
> > +4.  Remove /dev/vhost-net character device:
> > +
> > +      `rm -rf /dev/vhost-net`
> 
> I think it's not a good idea to tell people to do this,
> best to drop this section and put "with standard vhost"
> here instead.

Not clear what you'd like to see dropped? This will be necessary 
if using the default vhost file, so can change to make that clearer.

> 
> > +
> > +Following the steps above to create a bridge, you can now add DPDK vHost
> > +as a port to the vswitch.
> > +
> > +`ovs-vsctl add-port br0 dpdkvhost0 -- set Interface dpdkvhost0 type=dpdkvhost`
> > +
> > +Unlike DPDK ring ports, DPDK vHost ports can have arbitrary names:
> > +
> > +`ovs-vsctl add-port br0 port123ABC -- set Interface port123ABC type=dpdkvhost`
> > +
> > +However, please note that when attaching userspace devices to QEMU, the
> > +name provided during the add-port operation must match the ifname parameter
> > +on the QEMU command line.
> > +
> > +DPDK vHost VM configuration:
> > +----------------------------
> > +
> > +1. Configure virtio-net adaptors:
> > +   The guest must be configured with virtio-net adapters and offloads
> > +   MUST BE DISABLED.
> 
> Any plans to address this?

There's no plans at present

> 
> > +    This means the following parameters should be passed
> > +   to the QEMU binary:
> > +
> > +     ```
> > +     -netdev tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on
> > +     -device virtio-net-pci,netdev=net1,mac=<mac>,csum=off,gso=off,
> > +     guest_tso4=off,guest_tso6=off,guest_ecn=off
> > +     ```
> > +
> > +     Repeat the above parameters for multiple devices.
> > +
> > +2. Configure huge pages:
> > +   QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a
> > +   virtio-net device's virtual rings and packet buffers mapping the VM's
> > +   physical memory on hugetlbfs. To enable vhost-ports to map the VM's
> > +   memory into their process address space, pass the following paramters
> > +   to QEMU:
> > +
> > +     `-mem-path /dev/hugepages -mem-prealloc`
> 
> I guess you also need to request MAP_SHARED mappings - otherwise
> I think you won't be able to poke at them.

Ok, it depends on version of QEMU, so we can call that out. We've tested 
with QEMU 1.6.2 

> 
> > +
> > +DPDK vHost with standard vHost:
> > +-------------------------------
> > +
> > +DPDK vHost ports use a Linux* character device to communicate with QEMU.
> > +By default it is set to `/dev/vhost-net`. This conflicts with the kernel
> > +vHost device, hence the need to remove `/dev/vhost-net` above. However,
> > +if you wish to use kernel vhost in parallel, you can specify an
> > +alternative basename on the vswitchd command line like so:
> > +
> > +     `./vswitchd/ovs-vswitchd --dpdk --basename my-vhost-net -c 0x1 ...`
> > +
> > +Note that the basename arguement and associated string must be the first
> > +arguements after `--dpdk` and come before the EAL arguements.
> > +
> > +DPDK vHost VM configuration with standard vHost:
> > +------------------------------------------------
> > +
> > +1. As with the "normal" (i.e. using `/dev/vhost-net`) DPDK vHost setup,
> > +the guest must be configured with virtio-net adapters and offloads
> > +MUST BE DISABLED. However, this time you must also pass in a `vhostfd`
> > +argument:
> > +
> > +     ```
> > +     -netdev tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on,
> > +     vhostfd=<open_fd>
> > +     -device virtio-net-pci,netdev=net1,mac=<mac>,csum=off,gso=off,
> > +     guest_tso4=off,guest_tso6=off,guest_ecn=off
> > +     ```
> > +
> > +     The open file descriptor must be passed to QEMU running as a child
> > +     process.
> 
> You might as well tell people how to do this. E.g. with bash:
> 
> vhostfd=42 42<>/path/to/vhost/chardev
> 
> 42 is, of course, The Answer.

True - I think we need to distinguish between default and specified 
file better.

> 
> 
> > +2. As above, QEMU must allocate the VM's memory on hugetlbfs:
> > +
> > +     `-mem-path /dev/hugepages -mem-prealloc`
> > +
> > +3. (Optional) If you are using libvirt, you must enable libvirt to access
> > +the userspace device file by adding it to controllers cgroup for libvirtd
> > +using the following steps:
> > +
> > +     1. In `/etc/libvirt/qemu.conf` add/edit the following lines:
> > +
> > +        ```
> > +        1) cgroup_controllers = [ ... "devices", ... ]
> > +        2) clear_emulator_capabilities = 0
> > +        3) user = "root"
> > +        4) group = "root"
> > +        5) cgroup_device_acl = [
> > +               "/dev/null", "/dev/full", "/dev/zero",
> > +               "/dev/random", "/dev/urandom",
> > +               "/dev/ptmx", "/dev/kvm", "/dev/kqemu",
> > +               "/dev/rtc", "/dev/hpet", "/dev/net/tun",
> > +               "/dev/<devbase-name>-<index>",
> > +               "/dev/hugepages"]
> > +        ```
> > +
> > +     2. Disable SELinux or set to permissive mode
> 
> 
> It's a work-around, but the right thing to do is really
> to write up correct selinux policies.
> Any plans to do this?

No plans for this at present

> 
> > +     3. Mount cgroup device controller:
> > +
> > +        ```
> > +        mkdir /dev/cgroup
> > +        mount -t cgroup none /dev/cgroup -o devices
> > +        ```
> > +
> > +     4. Restart the libvirtd process
> > +        For example, on Fedora:
> > +
> > +          `systemctl restart libvirtd.service`
> > +
> > +The easiest way to setup a Guest that isn't using `/dev/vhost-net` is to
> > +use the `qemu-wrap.py` script located in utilities. This Python script
> > +automates the requirements specified above and can be used in conjunction
> > +with libvirt.
> 
> I notice that new libvirt versions should have ability to specify everything
> directly in the conf, that would be preferable if available.
> Should be documented too?

It's not something we've looked at, but will bring it up with the dpdk team

> 
> > +
> > +DPDK vHost VM configuration with QEMU wrapper:
> 
> ...
> 




More information about the dev mailing list