[ovs-dev] [patch_v1] docs/dpdk: Consolidate pmd-cpu-mask references.

Thu Jul 27 23:50:26 UTC 2017

The DPDK introductory documentation has various references to
pmd-cpu-mask, including a section devoted to it.  These parts of
the documentation seeemed to have been written at different times
and look like they were individually ported from other sources.
They all include an example command which gets repeated several times.
Here, we consolidate those referenes to make the documentation
easier to maintain. At the same time, create linkages to the
pmd-cpu-mask section from other sections to provide some level of
coherence.

Signed-off-by: Darrell Ball <dlu998 at gmail.com>
---
 Documentation/intro/install/dpdk.rst | 141 ++++++++++++++++-------------------
 1 file changed, 65 insertions(+), 76 deletions(-)

diff --git a/Documentation/intro/install/dpdk.rst b/Documentation/intro/install/dpdk.rst
index a05aa1a..2e9cf8d 100644
--- a/Documentation/intro/install/dpdk.rst
+++ b/Documentation/intro/install/dpdk.rst
@@ -237,20 +237,12 @@ or::
     $ ovs-vsctl --no-wait set Open_vSwitch . \
         other_config:dpdk-socket-mem="1024"
 
-Similarly, if you wish to better scale the workloads across cores, then
-multiple pmd threads can be created and pinned to CPU cores by explicity
-specifying ``pmd-cpu-mask``. Cores are numbered from 0, so to spawn two pmd
-threads and pin them to cores 1,2, run::
-
-    $ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x6
-
-Refer to ovs-vswitchd.conf.db(5) for additional information on configuration
-options.
-
 .. note::
   Changing any of these options requires restarting the ovs-vswitchd
   application
 
+See the section ``Performance Tuning`` for important DPDK customizations.
+
 Validating
 ----------
 
@@ -373,32 +365,6 @@ Now mount the huge pages, if not already done so::
 
     $ mount -t hugetlbfs -o pagesize=1G none /dev/hugepages
 
-Enable HyperThreading
-~~~~~~~~~~~~~~~~~~~~~
-
-With HyperThreading, or SMT, enabled, a physical core appears as two logical
-cores. SMT can be utilized to spawn worker threads on logical cores of the same
-physical core there by saving additional cores.
-
-With DPDK, when pinning pmd threads to logical cores, care must be taken to set
-the correct bits of the ``pmd-cpu-mask`` to ensure that the pmd threads are
-pinned to SMT siblings.
-
-Take a sample system configuration, with 2 sockets, 2 * 10 core processors, HT
-enabled. This gives us a total of 40 logical cores. To identify the physical
-core shared by two logical cores, run::
-
-    $ cat /sys/devices/system/cpu/cpuN/topology/thread_siblings_list
-
-where ``N`` is the logical core number.
-
-In this example, it would show that cores ``1`` and ``21`` share the same
-physical core. As cores are counted from 0, the ``pmd-cpu-mask`` can be used
-to enable these two pmd threads running on these two logical cores (one
-physical core) is::
-
-    $ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x200002
-
 Isolate Cores
 ~~~~~~~~~~~~~
 
@@ -413,23 +379,6 @@ cmdline.
   It has been verified that core isolation has minimal advantage due to mature
   Linux scheduler in some circumstances.
 
-NUMA/Cluster-on-Die
-~~~~~~~~~~~~~~~~~~~
-
-Ideally inter-NUMA datapaths should be avoided where possible as packets will
-go across QPI and there may be a slight performance penalty when compared with
-intra NUMA datapaths. On Intel Xeon Processor E5 v3, Cluster On Die is
-introduced on models that have 10 cores or more.  This makes it possible to
-logically split a socket into two NUMA regions and again it is preferred where
-possible to keep critical datapaths within the one cluster.
-
-It is good practice to ensure that threads that are in the datapath are pinned
-to cores in the same NUMA area. e.g. pmd threads and QEMU vCPUs responsible for
-forwarding. If DPDK is built with ``CONFIG_RTE_LIBRTE_VHOST_NUMA=y``, vHost
-User ports automatically detect the NUMA socket of the QEMU vCPUs and will be
-serviced by a PMD from the same node provided a core on this node is enabled in
-the ``pmd-cpu-mask``. ``libnuma`` packages are required for this feature.
-
 Compiler Optimizations
 ~~~~~~~~~~~~~~~~~~~~~~
 
@@ -439,6 +388,31 @@ gcc (verified on 5.3.1) can produce performance gains though not siginificant.
 ``-march=native`` will produce optimized code on local machine and should be
 used when software compilation is done on Testbed.
 
+Multiple Poll-Mode Driver Threads
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+With pmd multi-threading support, OVS creates one pmd thread for each NUMA node
+by default, if there is at least one DPDK interface from that NUMA node added
+to OVS. However, in cases where there are multiple ports/rxq's producing
+traffic, performance can be improved by creating multiple pmd threads running
+on separate cores. These pmd threads can share the workload by each being
+responsible for different ports/rxq's. Assignment of ports/rxq's to pmd threads
+is done automatically.
+
+A set bit in the mask means a pmd thread is created and pinned to the
+corresponding CPU core. For example, to run pmd threads on core 1 and 2::
+
+    $ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x6
+
+When using dpdk and dpdkvhostuser ports in a bi-directional VM loopback as
+shown below, spreading the workload over 2 or 4 pmd threads shows significant
+improvements as there will be more total CPU occupancy available::
+
+    NIC port0 <-> OVS <-> VM <-> OVS <-> NIC port 1
+
+Refer to ovs-vswitchd.conf.db(5) for additional information on configuration
+options.
+
 Affinity
 ~~~~~~~~
 
@@ -452,14 +426,8 @@ affinitized accordingly.
   switch the packets and send to tx port.  pmd thread is CPU bound, and needs
   to be affinitized to isolated cores for optimum performance.
 
-  By setting a bit in the mask, a pmd thread is created and pinned to the
-  corresponding CPU core. e.g. to run a pmd thread on core 2::
-
-      $ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x4
-
-  .. note::
-    pmd thread on a NUMA node is only created if there is at least one DPDK
-    interface from that NUMA node added to OVS.
+  Binding PMD threads is described in the above section
+  ``Multiple Poll-Mode Driver Threads``.
 
 - QEMU vCPU thread Affinity
 
@@ -472,26 +440,47 @@ affinitized accordingly.
   the ``taskset`` command should be used to affinitize the vCPU threads to the
   dedicated isolated cores on the host system.
 
-Multiple Poll-Mode Driver Threads
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Enable HyperThreading
+~~~~~~~~~~~~~~~~~~~~~
 
-With pmd multi-threading support, OVS creates one pmd thread for each NUMA node
-by default. However, in cases where there are multiple ports/rxq's producing
-traffic, performance can be improved by creating multiple pmd threads running
-on separate cores. These pmd threads can share the workload by each being
-responsible for different ports/rxq's. Assignment of ports/rxq's to pmd threads
-is done automatically.
+With HyperThreading, or SMT, enabled, a physical core appears as two logical
+cores. SMT can be utilized to spawn worker threads on logical cores of the same
+physical core there by saving additional cores.
 
-A set bit in the mask means a pmd thread is created and pinned to the
-corresponding CPU core. For example, to run pmd threads on core 1 and 2::
+With DPDK, when pinning pmd threads to logical cores, care must be taken to set
+the correct bits of the ``pmd-cpu-mask`` to ensure that the pmd threads are
+pinned to SMT siblings.
 
-    $ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x6
+Take a sample system configuration, with 2 sockets, 2 * 10 core processors, HT
+enabled. This gives us a total of 40 logical cores. To identify the physical
+core shared by two logical cores, run::
 
-When using dpdk and dpdkvhostuser ports in a bi-directional VM loopback as
-shown below, spreading the workload over 2 or 4 pmd threads shows significant
-improvements as there will be more total CPU occupancy available::
+    $ cat /sys/devices/system/cpu/cpuN/topology/thread_siblings_list
 
-    NIC port0 <-> OVS <-> VM <-> OVS <-> NIC port 1
+where ``N`` is the logical core number.
+
+In this example, it would show that cores ``1`` and ``21`` share the same
+physical core. Logical cores can be specified in pmd-cpu-masks similarly to
+physical cores, as described in ``Multiple Poll-Mode Driver Threads``.
+
+NUMA/Cluster-on-Die
+~~~~~~~~~~~~~~~~~~~
+
+Ideally inter-NUMA datapaths should be avoided where possible as packets will
+go across QPI and there may be a slight performance penalty when compared with
+intra NUMA datapaths. On Intel Xeon Processor E5 v3, Cluster On Die is
+introduced on models that have 10 cores or more.  This makes it possible to
+logically split a socket into two NUMA regions and again it is preferred where
+possible to keep critical datapaths within the one cluster.
+
+It is good practice to ensure that threads that are in the datapath are pinned
+to cores in the same NUMA area. e.g. pmd threads and QEMU vCPUs responsible for
+forwarding. If DPDK is built with ``CONFIG_RTE_LIBRTE_VHOST_NUMA=y``, vHost
+User ports automatically detect the NUMA socket of the QEMU vCPUs and will be
+serviced by a PMD from the same node provided a core on this node is enabled in
+the ``pmd-cpu-mask``. ``libnuma`` packages are required for this feature.
+Binding PMD threads is described in the above section
+``Multiple Poll-Mode Driver Threads``.
 
 DPDK Physical Port Rx Queues
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-- 
1.9.1