[ovs-dev] [Patch] Documentation for DPDK IVSHMEM VM Communications

Polehn, Mike A mike.a.polehn at intel.com
Fri Aug 15 14:07:12 UTC 2014


Adds documentation on how to run IVSHMEM communication through VM.

Signed-off-by: Mike A. Polehn <mike.a.polehn at intel.com>

diff --git a/INSTALL.DPDK b/INSTALL.DPDK
index 4551f4c..8d866e9 100644
--- a/INSTALL.DPDK
+++ b/INSTALL.DPDK
@@ -19,10 +19,14 @@ Recommended to use DPDK 1.6.
 DPDK:
 Set dir i.g.:   export DPDK_DIR=/usr/src/dpdk-1.6.0r2
 cd $DPDK_DIR
-update config/defconfig_x86_64-default-linuxapp-gcc so that dpdk generate single lib file.
+update config/defconfig_x86_64-default-linuxapp-gcc so that dpdk generate
+single lib file (modification also required for IVSHMEM build).
 CONFIG_RTE_BUILD_COMBINE_LIBS=y
 
-make install T=x86_64-default-linuxapp-gcc
+For default install without IVSHMEM (old):
+  make install T=x86_64-default-linuxapp-gcc
+To include IVSHMEM (shared memory):
+  make install T=x86_64-ivshmem-linuxapp-gcc
 For details refer to  http://dpdk.org/
 
 Linux kernel:
@@ -32,7 +36,10 @@ DPDK kernel requirement.
 OVS:
 cd $(OVS_DIR)/openvswitch
 ./boot.sh
-export DPDK_BUILD=/usr/src/dpdk-1.6.0r2/x86_64-default-linuxapp-gcc
+Without IVSHMEM
+  export DPDK_BUILD=/usr/src/dpdk-1.6.0r2/x86_64-default-linuxapp-gcc
+With IVSHMEM:
+  export DPDK_BUILD=/usr/src/dpdk-1.6.0r2/x86_64-ivshmem-linuxapp-gcc
 ./configure --with-dpdk=$DPDK_BUILD
 make
 
@@ -44,12 +51,18 @@ Using the DPDK with ovs-vswitchd:
 
 Setup system boot:
    kernel bootline, add: default_hugepagesz=1GB hugepagesz=1G hugepages=1
+To include 3 GB memory for VM (2 socket system, half on each NUMA node)
+   kernel bootline, add: default_hugepagesz=1GB hugepagesz=1G hugepages=8
 
 First setup DPDK devices:
   - insert uio.ko
     e.g. modprobe uio
-  - insert igb_uio.ko
+
+  - insert igb_uio.ko (non-IVSHMEM case)
     e.g. insmod DPDK/x86_64-default-linuxapp-gcc/kmod/igb_uio.ko
+  - insert igb_uio.ko (IVSHMEM case)
+    e.g. insmod DPDK/x86_64-ivshmem-linuxapp-gcc/kmod/igb_uio.ko
+
   - Bind network device to ibg_uio.
     e.g. DPDK/tools/pci_unbind.py --bind=igb_uio eth1
     Alternate binding method:
@@ -73,7 +86,7 @@ First setup DPDK devices:
 
 Prepare system:
   - mount hugetlbfs
-    e.g. mount -t hugetlbfs -o pagesize=1G none /mnt/huge/
+    e.g. mount -t hugetlbfs -o pagesize=1G none /dev/hugepages
 
 Ref to http://www.dpdk.org/doc/quick-start for verifying DPDK setup.
 
@@ -91,7 +104,7 @@ Start ovsdb-server as discussed in INSTALL doc:
       ./ovsdb/ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock \
           --remote=db:Open_vSwitch,Open_vSwitch,manager_options \
           --private-key=db:Open_vSwitch,SSL,private_key \
-          --certificate=dbitch,SSL,certificate \
+          --certificate=Open_vSwitch,SSL,certificate \
           --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --pidfile --detach
     First time after db creation, initialize:
       cd $OVS_DIR
@@ -105,12 +118,13 @@ for dpdk initialization.
 
    e.g.
    export DB_SOCK=/usr/local/var/run/openvswitch/db.sock
-   ./vswitchd/ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:$DB_SOCK  --pidfile --detach
+   ./vswitchd/ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:$DB_SOCK --pidfile --detach
 
-If allocated more than 1 GB huge pages, set amount and use NUMA node 0 memory:
+If allocated more than one 1 GB hugepage (as for IVSHMEM), set amount and use NUMA
+node 0 memory:
 
    ./vswitchd/ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 1024,0 \
-      -- unix:$DB_SOCK  --pidfile --detach
+      -- unix:$DB_SOCK --pidfile --detach
 
 To use ovs-vswitchd with DPDK, create a bridge with datapath_type
 "netdev" in the configuration database.  For example:
@@ -136,9 +150,7 @@ Test flow script across NICs (assuming ovs in /usr/src/ovs):
 ############################# Script:
 
 #! /bin/sh
-
 # Move to command directory
-
 cd /usr/src/ovs/utilities/
 
 # Clear current flows
@@ -158,7 +170,8 @@ help.
 
 At this time all ovs-vswitchd tasks end up being affinitized to cpu core 0
 but this may change. Lets pick a target core for 100% task to run on, i.e. core 7.
-Also assume a dual 8 core sandy bridge system with hyperthreading enabled.
+Also assume a dual 8 core sandy bridge system with hyperthreading enabled
+where CPU1 has cores 0,...,7 and 16,...,23 & CPU2 cores 8,...,15 & 24,...,31.
 (A different cpu configuration will have different core mask requirements).
 
 To give better ownership of 100%, isolation maybe useful.
@@ -178,11 +191,11 @@ taskset -p 080 1762
   pid 1762's new affinity mask: 80
 
 Assume that all other ovs-vswitchd threads to be on other socket 0 cores.
-Affinitize the rest of the ovs-vswitchd thread ids to 0x0FF007F
+Affinitize the rest of the ovs-vswitchd thread ids to 0x07F007F
 
-taskset -p 0x0FF007F {thread pid, e.g 1738}
+taskset -p 0x07F007F {thread pid, e.g 1738}
   pid 1738's current affinity mask: 1
-  pid 1738's new affinity mask: ff007f
+  pid 1738's new affinity mask: 7f007f
 . . .
 
 The core 23 is left idle, which allows core 7 to run at full rate.
@@ -207,8 +220,8 @@ with the ring naming used within ovs.
 location tests/ovs_client
 
 To run the client :
-
-    ovsclient -c 1 -n 4 --proc-type=secondary -- -n "port id you gave dpdkr"
+  cd /usr/src/ovs/tests/
+  ovsclient -c 1 -n 4 --proc-type=secondary -- -n "port id you gave dpdkr"
 
 In the case of the dpdkr example above the "port id you gave dpdkr" is 0.
 
@@ -218,6 +231,9 @@ The application simply receives an mbuf on the receive queue of the
 ethernet ring and then places that same mbuf on the transmit ring of
 the ethernet ring.  It is a trivial loopback application.
 
+DPDK rings in VM (IVSHMEM shared memory communications)
+-------------------------------------------------------
+
 In addition to executing the client in the host, you can execute it within
 a guest VM. To do so you will need a patched qemu.  You can download the
 patch and getting started guide at :
@@ -228,6 +244,281 @@ A general rule of thumb for better performance is that the client
 application should not be assigned the same dpdk core mask "-c" as
 the vswitchd.
 
+Alternative method to get QEMU, download and build from OVDK:
+-------------------------------------------------------------
+
+##### On Host
+
+Rebuild DPDK and OVS with IVSHMEM support (above).
+
+Example Fedora 20 Host tools which will build qemu:
+Infrastructure Server install + Virt, C devl tools, devl tools, RPM tools
+  yum install tunctl rpmdevtools yum-utils ncurses-devel qt3-devel \
+  libXi-devel gcc-c++ openssl-devel glibc.i686 libgcc.i686 libstdc++.i686 \
+  glibc-devel.i686 kernel-devel libcap-devel gcc coreutils make nasm \
+  glibc-devel autoconf automake zlib-devel glib2-devel libtool fuse-devel \
+  pixman-devel fuse kernel-modules-extra
+
+Get and build qemu for OVDK:
+  cd /usr/src
+  git clone git://github.com/01org/dpdk-ovs
+  cd /usr/src/dpdk-ovs/qemu/
+  export DPDK_DIR=/usr/src/dpdk-1.6.0r2
+  ./configure --enable-kvm --target-list=x86_64-softmmu --disable-pie
+  make
+
+Start OVS for IVSHMEM test
+--------------------------
+
+Start ovsdb-server as above.
+Start ovs-vswitchd with 1 GB NUMA node 0 memory
+  cd /usr/src/ovs
+  ./vswitchd/ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 1024,0 -- \
+  unix:/usr/local/var/run/openvswitch/db.sock --pidfile --detach
+
+Copy information for IVSHMEM VM use:
+  mkdir /tmp/share
+  chmod 777 /tmp/share
+  cp -a /run/.rte_* /tmp/share
+
+Prepare and start VM
+--------------------
+
+Example Fedora 20 VM created:
+Minimal Install+Guest Agents, Standard Tools, C Dev Tools, Dev tools, RPM Tools
+  yum install rpmdevtools yum-utils ncurses-devel gcc make qt3-devel \
+  libXi-devel gcc-c++ openssl-devel coreutils kernel-devel glibc.i686 \
+  libgcc.i686 libstdc++.i686 glibc-devel.i686 kernel-devel libcap-devel
+
+Start VM for IVSHMEM test
+-------------------------
+Note: VM runs QEMU from the OVDK directory built above.
+
+Example: VM at /vm/Fed20-vm.qcow2, VM scipts in /vm/vm_ctl
+Linux br-mgt management bridge setup
+
+############################## Example VM start script
+
+#!/bin/sh
+vm=/vm/Fed20-vm.qcow2
+vm_name="IVSHMEM1"
+
+vnc=10
+
+n1=tap46
+bra=br-mgt
+dn_scrp_a=/vm/vm_ctl/br-mgt-ifdown
+mac1=00:1f:33:16:64:44
+
+if [ ! -f $vm ];
+then
+    echo "VM $vm not found!"
+else
+    echo "VM $vm started! VNC: $vnc, management network: $n1"
+    tunctl -t $n1
+    brctl addif $bra $n1
+    ifconfig $n1 0.0.0.0 up
+
+    taskset 0x30 /usr/src/dpdk-ovs/qemu/x86_64-softmmu/qemu-system-x86_64 \
+-cpu host -hda $vm -m 3072 -boot c -smp 2 -pidfile /tmp/vm1.pid \
+-monitor unix:/tmp/vm1monitor,server,nowait -mem-path /dev/hugepages \
+-mem-prealloc -enable-kvm -net nic,model=virtio,netdev=eth0,macaddr=$mac1 \
+-netdev tap,ifname=$n1,id=eth0,vhost=on,script=no,downscript=$dn_scrp_a \
+-device ivshmem,size=1024M,shm=fd:/dev/hugepages/rtemap_0:0x0:0x40000000 \
+-name $vm_name -vnc :$vnc -drive file=fat:/tmp/share &
+fi
+
+############################## Example br-mgt-ifdown script
+
+#!/bin/sh
+
+bridge='br-mgt'
+/sbin/ifconfig $1 0.0.0.0 down
+brctl delif  ${bridge} $1
+
+##############################
+
+On host, taskset sets the host CPU cores (0x030 in this case) used by QEMU.
+Core selection is system specific, the selected core should be on the same
+NUMA node or CPU socket. For Linux, the memory is generally allocated from the 
+NUMA node the processes are running on when allocation occurs providing memory 
+is available on that NUMA node at that time.
+
+VM Setup
+----------------------
+
+Set VM Kernel Bootline parameters and reboot:
+  default_hugepagesz=1GB hugepagesz=1G hugepages=1 isolcpus=1
+
+Copy from host to VM (10.4.0.160):
+  scp /usr/src/dpdk-1.6.0r2.tar.gz 10.4.0.160:/root
+
+Build DPDK in VM
+  cd /root
+  tar -xf dpdk-1.6.0r2.tar.gz
+  cd /root/dpdk-1.6.0r2/
+update config/defconfig_x86_64-default-linuxapp-gcc so that dpdk generate
+single lib file (modification also required for IVSHMEM build).
+CONFIG_RTE_BUILD_COMBINE_LIBS=y
+  make install T=x86_64-ivshmem-linuxapp-gcc
+
+  mkdir -p /root/ovs_client
+Copy ovsclient code and Makefile from host:
+  scp /usr/src/ovs/tests/ovs_client/ovs_client.c 10.4.0.160:/root/ovs_client
+  scp /usr/src/dpdk-ovs/guest/ovs_client/Makefile 10.4.0.160:/root/ovs_client
+
+On VM patch or change /root/ovs_client/Makefile:
+
+################# Makefile Patch
+
+diff --git a/Makefile b/Makefile
+index 9df37ef..cef1903 100755
+--- a/Makefile
++++ b/Makefile
+@@ -39,13 +39,12 @@ endif
+ include $(RTE_SDK)/mk/rte.vars.mk
+
+ # binary name
+-APP = ovs_client
++APP = ovsclient
+
+ # all source are stored in SRCS-y
+-SRCS-y := ovs_client.c libvport/ovs-vport.c
++SRCS-y := ovs_client.c
+
+ CFLAGS += -O3
+ CFLAGS += $(WERROR_FLAGS)
+-CFLAGS += -I$(SRCDIR)/libvport
+
+ include $(RTE_SDK)/mk/rte.extapp.mk
+
+##################
+
+On the VM, patch or change /root/ovs_client/ovs_client to remove printf in
+the process loop. The printf makes processing unstable in the VM, but works
+OK on the host test.
+
+################## ovs_client.c Patch
+
+diff --git a/ovs_client.c b/ovs_client.c
+index dbd99b1..8240275 100644
+--- a/ovs_client.c
++++ b/ovs_client.c
+@@ -217,10 +217,5 @@ main(int argc, char *argv[])
+         } else {
+                no_pkt++;
+         }
+-
+-        if (!(pkt %  100000)) {
+-            printf("pkt %d %d\n", pkt, no_pkt);
+-            pkt = no_pkt = 0;
+-        }
+     }
+ }
+
+##################
+
+Build ovsclient:
+  cd /root/ovs_client
+  export RTE_SDK=/root/dpdk-1.6.0r2
+  export RTE_TARGET=x86_64-ivshmem-linuxapp-gcc
+  make
+
+Setup of VM to run ovsclient
+----------------------------
+
+Mount VM internal hugepage memory
+  mkdir -p /mnt/hugepages
+  mount -t hugetlbfs hugetlbfs /mnt/hugepages
+
+Copy host info for IVSHMEM memory
+  mount -o iocharset=utf8 /dev/sdb1 /mnt/ovs_client
+  cp -a /mnt/ovs_client/.rte_* /run
+
+Find IVSHMEM memory PCI device, set hugepage link (first time)
+  lspci |grep RAM
+    00:04.0 RAM memory: Red Hat, Inc Inter-VM shared memory
+  Find device info by tab completion
+    dir /sys/devices/pci0000\:00/0000\:00\:04.0/resource2
+  Use last line and modify to creat link
+    ln -s /sys/devices/pci0000\:00/0000\:00\:04.0/resource2 \
+    /dev/hugepages/rtemap_0
+  Verify link:
+    ls -l /dev/hugepages
+
+Run ovsclient
+-------------
+
+Start IVSHMEM Client task
+  cd /root/ovs_client
+  ./build/ovsclient -c 1 -n 4 --proc-type=secondary -- -n 0 &
+
+Find ovsclient task process ids:
+  ps -eLF |grep ovsclient
+    root   3538  392  3538 11   2 399513 1092   0 13:09 pts/0    00:00:15 ...
+    root   3538  392  3539  0   2 399513 1092   0 13:09 pts/0    00:00:00 ...
+Set process accumalating time to vCPU1 and make high priortiy, others to vCPU0:
+  taskset -p 2 3538
+  taskset -p 1 3539
+  renice -20 -p 3538
+Verify with top ... 1, in VM that task load on vCPU 1 and not vCPU 0
+
+Affinitize host QEMU for VM vCPUs
+---------------------------------
+CPU core affinitization is different for different systems. For this example
+the cpu cores 4 (0x10) and 5 (0x20) are on same physical CPU.
+
+Example:
+  ps -eLF |grep qemu
+    root    2256    1   2256  0   5 1123016 21828 4 13:38 pts/1   00:00:02 ...
+    root    2256    1   2261  5   5 1123016 21828 4 13:38 pts/1   00:08:36 ...
+    root    2256    1   2262  4   5 1123016 21828 4 13:38 pts/1   00:07:21 ...
+    root    2256    1   2264  0   5 1123016 21828 4 13:38 pts/1   00:00:00 ...
+Note: VM 100% task was in vCPU0 then moved to vCPU1 so both vCPU0 (2261) and
+vCPU1 (2262) have accumulated process time.
+  taskset -p 10 2256
+    pid 2256's current affinity mask: 30
+    pid 2256's new affinity mask: 10
+  taskset -p 10 2261
+    pid 2261's current affinity mask: 30
+    pid 2261's new affinity mask: 10
+  taskset -p 20 2262
+    pid 2262's current affinity mask: 30
+    pid 2262's new affinity mask: 20
+  taskset -p 10 2264
+    pid 2264's current affinity mask: 30
+    pid 2264's new affinity mask: 10
+Verify with top ... 1, in VM that task load on vCPU1 (host core 5) and
+not vCPU0 (host core 4).
+
+Set packet flows from ports through VM out ports
+------------------------------------------------
+
+Set flows IP address as needed for test.
+
+############################## Example flow test script:
+
+#! /bin/sh
+# Move to command directory
+cd /usr/src/ovs/utilities/
+
+# Clear current flows
+./ovs-ofctl del-flows br0
+
+#  Add bidirectional flows between
+#   port 1 (dpdk0) <--> VM port 3 (dpdkr0) <--> port 2 (dpdk1)
+./ovs-ofctl add-flow br0 in_port=1,dl_type=0x800,nw_src=1.1.1.1,\
+nw_dst=1.1.1.2,idle_timeout=0,action=output:3
+./ovs-ofctl add-flow br0 in_port=2,dl_type=0x800,nw_src=1.1.1.2,\
+nw_dst=1.1.1.1,idle_timeout=0,action=output:3
+
+./ovs-ofctl add-flow br0 in_port=3,dl_type=0x800,nw_src=1.1.1.1,\
+nw_dst=1.1.1.2,idle_timeout=0,action=output:2
+./ovs-ofctl add-flow br0 in_port=3,dl_type=0x800,nw_src=1.1.1.2,\
+nw_dst=1.1.1.1,idle_timeout=0,action=output:1
+
+###############################
+
 Restrictions:
 -------------




More information about the dev mailing list