[ovs-dev] [PATCH 1/1] netdev IVSHMEM shared memory usage documentation

Polehn, Mike A mike.a.polehn at intel.com
Fri Jun 19 15:04:21 UTC 2015


This adds documentation for DPDK netdev to do an IVSHMEM shared memory to host app or VM app
test using current OVS code. This example allows people to do learn how it is done, so that they can
develop their own shared IVSHMEM memory applications.  Also adds knowledge to better system
setup for realtime task operation.

Signed-off-by: Mike A. Polehn <mike.a.polehn at intel.com>

diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
index cdef6cf..64ae6f1 100644
--- a/INSTALL.DPDK.md
+++ b/INSTALL.DPDK.md
@@ -49,6 +49,41 @@ on Debian/Ubuntu)
      For further details refer to http://dpdk.org/
+1b. Alternative DPDK 2.0 install
+  1. Get DPDK from git repository
+
+     cd /usr/src
+     git clone git://dpdk.org/dpdk
+     cd /usr/src/dpdk
+     git checkout -b test_v2.0.0 v2.0.0
+     export DPDK_DIR=/usr/src/dpdk
+
+  2 If DPDK already installed with different version or build parameters.
+     Ideally done before checking out a new version.
+
+     cd $(DPDK_DIR)
+     make uninstall
+
+  3. Build DPDK with IVSHMEM and User Side vHost support.
+     Note: Split ring (optional CONFIG_RTE_RING_SPLIT_PROD_CONS=y) has
+     notably better performance for two simaltanious data sources, as in
+     the case of two simultaneous port tasks or threads, writing into an
+     IVSHMEM ring (in either host or VM) at the same time. However for just
+     one task or thread, for example one port of data being switched a full
+     rate into IVSHMEM ring buffer, little or no data rate difference will
+     be observed.
+
+     cd $(DPDK_DIR)
+     make install T=x86_64-ivshmem-linuxapp-gcc CONFIG_RTE_LIBRTE_VHOST=y \
+     CONFIG_RTE_BUILD_COMBINE_LIBS=y CONFIG_RTE_LIBRTE_VHOST_USER=n \
+     CONFIG_RTE_RING_SPLIT_PROD_CONS=y
+
+     Note: Any host or VM task using shared memory as in the case of IVSHMEM,
+     must have DPDK built and installed in exactly the same way for all
+     DPDK programs on the system. DPDK install in VM needs same DPDK source
+     and build. Any changes in DPDK build requires all apps, including OVS,
+     host apps, and VM apps to be rebuilt and relinked.
+
2. Configure & build the Linux kernel:
    Refer to intel-dpdk-getting-started-guide.pdf for understanding
@@ -85,9 +120,24 @@ Using the DPDK with ovs-vswitchd:
---------------------------------
 1. Setup system boot
-   Add the following options to the kernel bootline:
+   Add the following options to the kernel bootline for both 1 GB and 2 MB support:

-   `default_hugepagesz=1GB hugepagesz=1G hugepages=1`
+   `default_hugepagesz=1GB hugepagesz=1GB hugepages=16 hugepagesz=2M hugepages=2048`
+
+   For just 1 GB hugepage support:
+
+   `default_hugepagesz=1GB hugepagesz=1GB hugepages=16`
+
+   This kernel bootline will allocate half the hugepages on each NUMA node.  For
+   the IVSHMEM test below, 4 GB of 1 GB huge pages is needed for the test (1GB
+   for OVS and 3 GB for VM).  This requires at least 8 1 GB pages for 4 1 GB
+   pages of NUMA node 0 hugepage memory to be available since half will be
+   allocated on NUMA Node 1 (assuming 2 CPU socket system). If system has
+   limited amount of memory or only 1 NUMA node, may need to adjust. At this
+   time 1GB pages are required and 2 MB pages are optional but very desirable
+   to have both 1 GB and 2 MB hugepage memory available on host at same time.
+   Dual hugepage memory size in VM is also very desirable (see IVSHMEM VM
+   setup information below).
 2. Setup DPDK devices:
@@ -112,9 +162,14 @@ Using the DPDK with ovs-vswitchd:
      3. Bind network device to vfio-pci:
         `$DPDK_DIR/tools/dpdk_nic_bind.py --bind=vfio-pci eth1`
-3. Mount the hugetable filsystem
-
+3. Mount the hugetable filesystem
+   The following may or may not be needed depending on host OS:
+   `mkdir /dev/hugepages`
+   Mount for 1 GB hugepages
    `mount -t hugetlbfs -o pagesize=1G none /dev/hugepages`
+   For additional 2 MB hugepage support:
+   `mkdir /dev/hugepages_2mb`
+   `mount -t hugetlbfs nodev /dev/hugepages_2mb -o pagesize=2MB`
    Ref to http://www.dpdk.org/doc/quick-start for verifying DPDK setup.
@@ -267,7 +322,7 @@ Using the DPDK with ovs-vswitchd:
    ovs-appctl dpif-netdev/pmd-stats-show
    ```
-DPDK Rings :
+DPDK Rings:
------------
 Following the steps above to create a bridge, you can now add dpdk rings
@@ -299,12 +354,124 @@ The application simply receives an mbuf on the receive queue of the
ethernet ring and then places that same mbuf on the transmit ring of
the ethernet ring.  It is a trivial loopback application.
+DPDK Ring access on Host using IVSHMEM:
+---------------------------------------
+
+Use test program ring_client for IVSHMEM flow test. Requires DPDK to have
+been built with IVSHMEM support. Rebuild DPDK and OVS with IVSHMEM support
+(above) if not already.
+
+1: Move to directory with ring_client.c
+
+  cd $(OVS_DIR)/tests/dpdk
+
+  If desired, copy outside of OVS code tree and move to, to create
+  example of a DPDK host app with an IVSHMEM ring accessbile from OVS.
+
+2: Patch or edit ring_client.c, to remove configuration include file: include
+ <config.h>
+
+  Patch for ring_client.c:
+  ```
+--- org_ring_client.c   2015-05-22 07:44:59.390000000 -0700
++++ ring_client.c       2015-05-22 07:44:39.781000000 -0700
+@@ -34,7 +34,6 @@
+
+ #include <getopt.h>
+
+-#include <config.h>
+ #include <rte_config.h>
+ #include <rte_mbuf.h>
+ #include <rte_ether.h>
+  ```
+
+3: Use a DPDK Makefile to build ring_client, copy from DPDK example
+   (l3fwd was used in this case). Quick way to make a new DPDK app
+   Makefile, copy and change app name and source filename.
+
+  cp $(DPDK_DIR)/examples/l3fwd/Makefile .
+
+  Patch or edit Makefile to build target ring_client program:
+  ```
+--- org_Makefile        2015-05-22 07:43:28.062000000 -0700
++++ Makefile    2015-05-22 07:44:07.527000000 -0700
+@@ -39,10 +39,10 @@ RTE_TARGET ?= x86_64-native-linuxapp-gcc
+ include $(RTE_SDK)/mk/rte.vars.mk
+
+ # binary name
+-APP = l3fwd
++APP = ring_client
+
+ # all source are stored in SRCS-y
+-SRCS-y := main.c
++SRCS-y := ring_client.c
+
+ CFLAGS += -O3 $(USER_FLAGS)
+ CFLAGS += $(WERROR_FLAGS)
+  ```
+
+4: Build ring_client:
+
+  export RTE_SDK=$(DPDK_DIR)
+  export RTE_TARGET=x86_64-ivshmem-linuxapp-gcc
+  make
+
+5: Start OVS for IVSHMEM test
+
+  Start ovsdb-server as above.
+  Start ovs-vswitchd with 1 GB NUMA node 0 memory (may work with more
+  then 1 page of 1 GB memory, appears to work but not verified)
+
+  cd $(OVS_DIR)
+  ./vswitchd/ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 1024,0 -- \
+  unix:/usr/local/var/run/openvswitch/db.sock --pidfile --detach
+
+6: Setup OVS bridge for ring for use with IVSHMEM
+
+  cd $(OVS_DIR)
+  ./utilities/ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev
+  ./utilities/ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
+  ./utilities/ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk
+  ./utilities/ovs-vsctl add-port br0 dpdkr0 -- set Interface dpdkr0 \
+  type=dpdkr
+  ./utilities/ovs-vsctl show
+
+7: Run ring_client:
+
+  build/ring_client -c 8 -n 4 --proc-type=secondary -- -n 0 &
+
+8: Add test flows:
+
+  Test flow script 2 NICs bidirectional through ring
+  (assuming ovs in /usr/src/ovs), may need to adjust IPs for own system
+  Execute script:
+  ```
+  #! /bin/sh
+  # Move to command directory
+  cd /usr/src/ovs/utilities/
+
+  # Clear current flows
+  ./ovs-ofctl del-flows br0
+
+  #  Add bidirectional flows between
+  #   port 1 (dpdk0) <--> VM port 3 (dpdkr0) <--> port 2 (dpdk1)
+  ./ovs-ofctl add-flow br0 in_port=1,dl_type=0x800,nw_src=1.1.1.1,\
+  nw_dst=1.1.1.2,idle_timeout=0,action=output:3
+  ./ovs-ofctl add-flow br0 in_port=2,dl_type=0x800,nw_src=1.1.1.2,\
+  nw_dst=1.1.1.1,idle_timeout=0,action=output:3
+
+  ./ovs-ofctl add-flow br0 in_port=3,dl_type=0x800,nw_src=1.1.1.1,\
+  nw_dst=1.1.1.2,idle_timeout=0,action=output:2
+  ./ovs-ofctl add-flow br0 in_port=3,dl_type=0x800,nw_src=1.1.1.2,\
+  nw_dst=1.1.1.1,idle_timeout=0,action=output:1
+
+
DPDK rings in VM (IVSHMEM shared memory communications)
-------------------------------------------------------
 In addition to executing the client in the host, you can execute it within
a guest VM. To do so you will need a patched qemu.  You can download the
-patch and getting started guide at :
+patch and getting started guide at:
 https://01.org/packet-processing/downloads
@@ -312,6 +479,432 @@ A general rule of thumb for better performance is that the client
application should not be assigned the same dpdk core mask "-c" as
the vswitchd.
+If cannot get patch, IVSHMEM can use the modified QEMU from a early
+DPDK prototype vSwitch running with OVS:
+
+1: On Host Rebuild DPDK and OVS with IVSHMEM support (above).
+
+2: Add system tools needed for DPDK and QEMU build:
+
+  Example Fedora 21 Host install and tools which will build qemu:
+  Installed Fedora 21 from Server DVD selecting Virt, C devl tools,
+  devl tools, RPM tools, System tools for install.
+  Firewall and irqbalance were off for test.
+
+  Packages needed on Fedora 21 (also Fedora 20) to build QEMU. Maybe
+  different on other types of systems:
+
+  yum install tunctl rpmdevtools yum-utils ncurses-devel gcc qt3-devel \
+    make, libXi-devel gcc-c++ openssl-devel coreutils kernel-devel
+    glibc.i686 libgcc.i686 libstdc++.i686 fuse fuse-devel\
+    glibc-devel.i686 libcap-devel coreutils nasm glibc-devel \
+    autoconf automake zlib-devel glib2-devel libtool fuse-devel \
+    pixman-devel fuse kernel-modules-extra
+
+  May need to review QEMU build documentation for your particular system
+  to setup correct tools to build the modified QEMU version.
+
+3: Get modified QEMU:
+
+  Get the modified ivshmem in QEMU located in the prototype code base:
+
+  cd /usr/src
+  git clone git://github.com/01org/dpdk-ovs
+
+4: Build modified QEMU portion:
+
+  cd /usr/src/dpdk-ovs/qemu/
+  export DPDK_DIR=/usr/src/dpdk
+  ./configure --enable-kvm --target-list=x86_64-softmmu --disable-pie
+  make
+
+5: Start OVS for IVSHMEM test
+
+  Start ovsdb-server as above.
+  Start ovs-vswitchd with 1 GB NUMA node 0 memory (may work with more
+  then 1 page of 1 GB memory, appears to work but not verified)
+
+  cd $(OVS_DIR)
+  ./vswitchd/ovs-vswitchd --dpdk -c 0x1 -n 4 --socket-mem 1024,0 -- \
+  unix:/usr/local/var/run/openvswitch/db.sock --pidfile .detach
+
+  Additional parameters useful for testing: PMD core list and 30 second
+  exact flow timeout:
+  ./utilities/ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=2
+  ./utilities/ovs-vsctl set Open_vSwitch . other_config:max-idle=30000
+
+6: Setup OVS bridge for ring for use with IVSHMEM (if not setup)
+
+  cd $(OVS_DIR)
+  ./utilities/ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev
+  ./utilities/ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
+  ./utilities/ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk
+  ./utilities/ovs-vsctl add-port br0 dpdkr0 -- set Interface dpdkr0 \
+  type=dpdkr
+  ./utilities/ovs-vsctl show
+
+7: Copy information for IVSHMEM VM to use:
+
+  mkdir /tmp/share
+  chmod 777 /tmp/share
+  cp -a /run/.rte_* /tmp/share
+
+  Note: Copy needs to be redone every time ovs-vswitchd is restarted. Also
+  VMs and shared memory host tasks attach to shared memory requiring
+  termination to allow ovs-vswitchd, after being stopped, to start,
+  allocate and create the new shared memory region.
+
+8 Prepare a VM for test (needs to be run host core type vCPU (-cpu host):
+
+  Example Fedora 21 VM created from Server DVD:
+  Installed Fedora 21 from Server DVD selecting:
+  Guest Agents, C Dev Tools, Dev tools
+  Firewall and irqbalance were off for test.
+
+    yum install rpmdevtools yum-utils ncurses-devel gcc make qt3-devel \
+     libXi-devel gcc-c++ openssl-devel coreutils kernel-devel glibc.i686 \
+     libgcc.i686 libstdc++.i686 glibc-devel.i686 fuse fuse-devel \
+     kernel-devel libcap-devel
+
+  A different VM OS version and/or type may require a different set of tools
+  and/or installer (yum) to build DPDK. See DPDK documentation if above
+  libraries does not build DPDK in the target VM.
+
+9. Start VM for IVSHMEM test:
+
+  Note: VM runs QEMU from the OVDK directory built above, not using the host
+  supplied QEMU version.
+
+  Example: VM is at /vm/Fed21-mp.qcow2, VM scipts in /vm/vm_ctl
+  Linux br-mgt: management network bridge (non-OVS Linux bridge)
+  QEMU at: /usr/src/dpdk-ovs/qemu/x86_64-softmmu/qemu-system-x86_64
+
+  Example VM start script:
+  ```
+#!/bin/sh
+
+vm=/vm/Fed21-mp.qcow2
+vm_name=Fed21_Test_VM
+vnc=14
+n1=tap55
+bra=br-mgt
+dn_scrp_a=/vm/vm_ctl/br-mgt-ifdown
+mac1=00:00:14:42:04:29
+
+if [ ! -f $vm ];
+then
+    echo "VM $vm not found!"
+else
+    echo "VM $vm started! VNC: $vnc, net0: $n1, net1: $n2"
+    tunctl -t $n1
+    brctl addif $bra $n1
+    ifconfig $n1 0.0.0.0 up
+    taskset 0x70 /usr/src/dpdk-ovs/qemu/x86_64-softmmu/qemu-system-x86_64 \
+-enable-kvm -m 3072 -boot c -smp 3 -cpu host \
+-pidfile /tmp/vm1.pid -monitor unix:/tmp/vm1monitor,server,nowait \
+-mem-path /dev/hugepages -mem-prealloc \
+-net nic,model=virtio,netdev=eth0,macaddr=$mac1 \
+-netdev tap,ifname=$n1,id=eth0,script=no,downscript=$dn_scrp_a \
+-device ivshmem,size=1024M,shm=fd:/dev/hugepages/rtemap_0:0x0:0x40000000 \
+-drive file=$vm -vnc :$vnc -name $vm_name -drive file=fat:rw:/tmp/share &
+fi+
+  ```
+  Notes on QEMU arguments:
+  VM started on cores 4, 5, and 6 of a 14 core CPU system. (Non-optimal GP
+    (General Purpose) CPU core planning since cores 4,5, and 6 are isolated
+    cores on my test system. Should start VM on target NUMA node using cores
+    for GP processing, which there should a moderate number based on expected
+    non-realtime and control plane system load, then selected virtual cores
+    for realtime operation are moved to target isolated cores. Afterwards the
+    realtime processing is then started in the VM. However this is just a
+    test with a full idle system for the test, so extra cores are available.)
+  mem-path: allocates 1 GB pages VM to execute out of (min 3 1GB pages)
+  net & netdev: control network interface (vhost not used for user side
+    vhost operation, but not part of IVSHMEM operation).
+  device ivshmem: Maps current 1 GB page in VM to same virtual address
+    as host
+  file: /tmp/share file for mapping hugepage for DPDK App of VM into host.
+
+  Example br-mgt-ifdown script
+  ```
+#!/bin/sh
+
+bridge='br-mgt'
+/sbin/ifconfig $1 0.0.0.0 down
+brctl delif  ${bridge} $1
+  ```
+
+  On host, taskset sets the host CPU cores (0x070 in this case) used by QEMU.
+  Core selection is system specific, the selected core need to be on the same
+  NUMA node or CPU socket. For Linux, the memory is generally allocated from
+  the NUMA node the processes are running on when allocation occurs providing
+  that memory is available on that NUMA node at that time.
+
+10: (HOST) Verify that the 1 GB memory was allocated from NUMA node 1.
+
+  cat /proc/meminfo
+  . . .
+  HugePages_Total:      16
+  HugePages_Free:       12
+  HugePages_Rsvd:        0
+  HugePages_Surp:        0
+  . . .
+
+  Half are allocated on NUMA node 0 and half on NUMA node 1 (2 node system)
+  or 8 on each in this case. 1 GB is used by OVS (--socket-mem 1024,0) and
+  3 1 GB pages by VM. Closer checks shows that these 4 pages were from NUMA
+  node 0. If the Free count drops below half, then mixed pages are obtained
+  and the performance will be poor.
+
+
+11 (VM) VM kernel setup
+
+  Set VM Kernel bootline parameters to support both 1 GB and 2 MB page sizes
+  and has the cores 1 and 2 isolated for realtime operation. The sample
+  ring_client below only uses 1 realtime core, but this is a multi-realtime
+  core setup to allow someone to understand how to do more then one core.
+  At least one virtual core needs to run non-realtime, vCPU0 in this case
+  requires minimal effort since cpu 0 is default interrupt core.
+
+  default_hugepagesz=1GB hugepagesz=1GB hugepages=1 hugepagesz=2M \
+  hugepages=256 isolcpus=1,2
+
+  Reboot VM after changes to take effect.
+
+11: (VM) Check for 1 GB huge page memory available:
+
+  cat /proc/meminfo
+  . . .
+  HugePages_Total:       1
+  HugePages_Free:        1
+  HugePages_Rsvd:        0
+  HugePages_Surp:        0
+  Hugepagesize:    1048576 kB
+  . . .
+
+  Shows 1 1GB huge page available.
+
+12: (VM) Get DPDK and build same as on host:
+
+  cd /usr/src
+  git clone git://dpdk.org/dpdk
+  git checkout -b test_v2.0.0 v2.0.0
+  cd /usr/src/dpdk
+  make install T=x86_64-ivshmem-linuxapp-gcc CONFIG_RTE_LIBRTE_VHOST=y \
+     CONFIG_RTE_BUILD_COMBINE_LIBS=y CONFIG_RTE_LIBRTE_VHOST_USER=n \
+     CONFIG_RTE_RING_SPLIT_PROD_CONS=y
+
+13: Copy ring_client.c from host
+
+  (VM):
+  cd /usr/src
+  mkdir ring_client
+
+  (HOST): (in this case VM management network IP was 10.4.0.160, change):
+  cd /usr/src/ovs/tests/dpdk/
+  scp ring_client.c 10.4.0.160:/usr/src/ring_client
+
+14: (VM) Remove config.h reference:
+
+  cd /usr/src
+
+  Patch to remove config.h reference
+  ```
+--- org_ring_client.c   2015-05-22 07:44:59.390000000 -0700
++++ ring_client.c       2015-05-22 07:44:39.781000000 -0700
+@@ -34,7 +34,6 @@
+
+ #include <getopt.h>
+
+-#include <config.h>
+ #include <rte_config.h>
+ #include <rte_mbuf.h>
+ #include <rte_ether.h>
+  ```
+
+15: (VM) Makefile to build ring_client
+
+  cd /usr/src/ring_client
+  cp /usr/src/dpdk/examples/l3fwd/Makefile .
+
+  Patch or edit makefile to change build name and select source file:
+  ```
+--- org_Makefile        2015-05-22 07:43:28.062000000 -0700
++++ Makefile    2015-05-22 07:44:07.527000000 -0700
+@@ -39,10 +39,10 @@ RTE_TARGET ?= x86_64-native-linuxapp-gcc
+ include $(RTE_SDK)/mk/rte.vars.mk
+
+ # binary name
+-APP = l3fwd
++APP = ring_client
+
+ # all source are stored in SRCS-y
+-SRCS-y := main.c
++SRCS-y := ring_client.c
+
+ CFLAGS += -O3 $(USER_FLAGS)
+ CFLAGS += $(WERROR_FLAGS)
+  ```
+
+16: (VM) Build ring_client
+
+  cd /usr/src/ring_client
+  export RTE_SDK=/usr/src/dpdk
+  export RTE_TARGET=x86_64-ivshmem-linuxapp-gcc
+  make
+
+17: (VM) Remove default hugepage map directory if present
+
+  For Fedora 21 VM, the default hugepage mapping needs to be removed.
+  umount /dev/hugepages
+  rmdir /dev/hugepages
+
+18: (VM) Setup IVSHMEM hugepage link
+
+  mkdir /dev/hugepages
+
+  Find IVSHMEM memory PCI device, set hugepage link:
+    lspci |grep RAM
+      00:04.0 RAM memory: Red Hat, Inc Inter-VM shared memory
+
+  Find device info by tab completion
+    ls /sys/devices/pci0000\:00/0000\:00\:04.0/resource2
+
+  Use last line and modify to create the soft link
+    ln -s /sys/devices/pci0000\:00/0000\:00\:04.0/resource2 \
+     /dev/hugepages/rtemap_0
+
+  Verify link:
+    ls -l /dev/hugepages
+      total 0
+      lrwxrwxrwx 1 root root 46 May 22 08:51 rtemap_0 ->
+      /sys/devices/pci0000:00/0000:00:04.0/resource2
+
+19: (VM) Create 1 GB and 2 MB huge page memory mounts:
+
+  mkdir /dev/hugepages_1gb
+  mount -t hugetlbfs nodev /dev/hugepages_1gb
+  mkdir /dev/hugepages_2mb
+  mount -t hugetlbfs nodev /dev/hugepages_2mb -o pagesize=2MB
+
+20: (VM) Copy IVSHMEM DPDK mapping information:
+
+  mkdir /mnt/ovs_client
+  mount -o iocharset=utf8 /dev/sdb1 /mnt/ovs_client
+  cp -a /mnt/ovs_client/.rte_* /run
+
+21: (VM) Start IVSHMEM Client task
+
+  cd /usrs/src/ring_client
+  ./ build/ring_client -c 2 -n 4 --proc-type=secondary -- -n 0 &
+
+22: (VM) Find ovsclient task process ids and set affinity:
+
+  # ps -eLF |grep ring
+  root     12048     1 12048 99    2 399795 2744   1 08:57 ?        00:01:01
+     build/ring_client -c 2 -n 4 --proc-type=secondary -- -n 0
+  root     12048     1 12049  0    2 399795 2744   0 08:57 ?        00:00:00
+     build/ring_client -c 2 -n 4 --proc-type=secondary -- -n 0
+  ...
+
+  Verify process accumulating time is vCPU1, if not set process to vCPU1,
+  other threads need to be set to vCPU0:
+
+    taskset -p 12048
+      pid 12048 current affinity mask: 2
+    if not correct: taskset -p 2 12048
+    taskset -p 12049
+      pid 12049 current affinity mask: 1
+    if not correct: taskset -p 2 12048
+
+  Can increase task switch priority (isolation on kernel bootline not set)
+  renice -20 -p 12048
+
+  Verify with top ... 1, in VM that task load is on vCPU1 and not vCPU0
+
+23: (HOST): Affinitize host QEMU for VM vCPUs (10+ core CPU)
+
+ps -eLF |grep qemu
+root     113479      1 113479  0    6 1141940 21340 4 12:53 pts/1    00:00:01
+/usr/src/dpdk-ovs/qemu/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 3072
+-boot c -smp 3 -cpu host -pidfile /tmp/vm1.pid -monitor unix:/tmp/vm1monitor,
+server,nowait -mem-path /dev/hugepages -mem-prealloc -net nic,model=virtio,
+netdev=eth0,macaddr=00:00:14:42:04:29 -netdev tap,ifname=tap55,id=eth0,
+script=no,downscript=/vm/vm_ctl/br-mgt-ifdown -device ivshmem,size=1024M,
+shm=fd:/dev/hugepages/rtemap_0:0x0:0x40000000 -drive file=/vm/Fed21-mp.qcow2
+-vnc :14 -name Fed21_Test_VM -drive file=fat:rw:/tmp/share
+root     113479      1 113481  2    6 1141940 21340 4 12:53 pts/1    00:00:12
+. . .
+root     113479      1 113482 14    6 1141940 21340 4 12:53 pts/1    00:01:17
+. . .
+root     113479      1 113483  0    6 1141940 21340 4 12:53 pts/1    00:00:02
+. . .
+root     113479      1 113485  0    6 1141940 21340 4 12:53 pts/1    00:00:00
+. . .
+root     113479      1 113526  0    6 1141940 21340 4 13:01 pts/1    00:00:00
+
+  taskset -p 113481
+    pid 113481 current affinity mask: 70
+  taskset -p 80 113481
+    pid 113481 current affinity mask: 70
+    pid 113481 new affinity mask: 80
+  taskset -p 100 113482
+    pid 113482 current affinity mask: 70
+    pid 113482 new affinity mask: 100
+  taskset -p 200 113483
+    pid 113483 current affinity mask: 70
+    pid 113483 new affinity mask: 200
+
+  Move vCPU1 and vCPU2 to host CPU cores 8 and 9 respectively. In this case
+  cores 8 and 9 are on NUMA Node 0 since this using 14 core CPUs when test
+  was performed. (Although vCPU2 is not running a realtime task in this case.
+  Just here to show how to handle multiple vCPUs).
+
+  When VMs are started on isolated cores, sometimes multiple vCPUs end up on
+  the same host CPU core and Linux scheduler does not move them causing
+  realtime tasks to share a core, which is non-realtime operation. There are
+  more QEMU tasks then vCPUs, requiring all to be moved for correct operation.
+  Solution is to start on GP (General Purpose) cores of the target NUMA Node
+  under Linux scheduler control. Taskset or other core control method is
+  needed on VM startup to specify only GP cores in same NUMA Node, excluding
+  other NUMA Node GP cores. Then move just the vCPU threads used for realtime
+  operation to target cores used for realtime operation leaving the rest under
+  general Linux scheduler control. Generally the VM vCPU0 is non-realtime
+  operating system virtaul core that should be left running in host GP cores.
+  For minimal disruption of current system operating performance, start
+  realtime VM task(s) after vCPU threads have been moved to the target host
+  physical cores.
+
+24: (HOST): Use top (1) on host to verify load on target CPU (core 8 in this case).
+
+25: (VM): Use top (1) on VM to check that load is close to 100% on VM vCPU1.
+
+26: (HOST): Set packet flows from ports through VM out other port
+
+  Test flow script 2 NICs bidirectional through ring
+  (assuming ovs in /usr/src/ovs), may need to adjust IPs for own system
+  Excute script:
+  ```
+  #! /bin/sh
+  # Move to command directory
+  cd /usr/src/ovs/utilities/
+
+  # Clear current flows
+  ./ovs-ofctl del-flows br0
+
+  #  Add bidirectional flows between
+  #   port 1 (dpdk0) <--> VM port 3 (dpdkr0) <--> port 2 (dpdk1)
+  ./ovs-ofctl add-flow br0 in_port=1,dl_type=0x800,nw_src=1.1.1.1,\
+  nw_dst=1.1.1.2,idle_timeout=0,action=output:3
+  ./ovs-ofctl add-flow br0 in_port=2,dl_type=0x800,nw_src=1.1.1.2,\
+  nw_dst=1.1.1.1,idle_timeout=0,action=output:3
+
+  ./ovs-ofctl add-flow br0 in_port=3,dl_type=0x800,nw_src=1.1.1.1,\
+  nw_dst=1.1.1.2,idle_timeout=0,action=output:2
+  ./ovs-ofctl add-flow br0 in_port=3,dl_type=0x800,nw_src=1.1.1.2,\
+  nw_dst=1.1.1.1,idle_timeout=0,action=output:1
+
+
DPDK vhost:
-----------




More information about the dev mailing list