[ovs-dev] [PATCH v6] dpif-netdev: Assign ports to pmds on non-local numa node.

Mon Jun 26 07:03:51 UTC 2017

I think this is helpful in some cases where lower performance is an acceptable
tradeoff with more frugal and/or more flexible usage of cpu resources.

I did not test it since Ian has already done that, but I reviewed
the code change and other related code.

One comment inline regarding the added documentation.

Otherwise, Acked-by: Darrell Ball <dlu998 at gmail.com>

On 5/10/17, 8:59 AM, "ovs-dev-bounces at openvswitch.org on behalf of Billy O'Mahony" <ovs-dev-bounces at openvswitch.org on behalf of billy.o.mahony at intel.com> wrote:

    From: billyom <billy.o.mahony at intel.com>

    Previously if there is no available (non-isolated) pmd on the numa node
    for a port then the port is not polled at all. This can result in a
    non-operational system until such time as nics are physically
    repositioned. It is preferable to operate with a pmd on the 'wrong' numa
    node albeit with lower performance. Local pmds are still chosen when
    available.

    Signed-off-by: Billy O'Mahony <billy.o.mahony at intel.com>
    ---
    v6: Change 'port' to 'queue' in a warning msg
    v5: Fix warning msg; Update same in docs
    v4: Fix a checkpatch error
    v3: Fix warning messages not appearing when using multiqueue
    v2: Add details of warning messages into docs

     Documentation/intro/install/dpdk.rst | 10 +++++++++
     lib/dpif-netdev.c                    | 43 +++++++++++++++++++++++++++++++-----
     2 files changed, 48 insertions(+), 5 deletions(-)

    diff --git a/Documentation/intro/install/dpdk.rst b/Documentation/intro/install/dpdk.rst
    index d1c0e65..7a66bff 100644
    --- a/Documentation/intro/install/dpdk.rst
    +++ b/Documentation/intro/install/dpdk.rst
    @@ -460,6 +460,16 @@ affinitized accordingly.
         pmd thread on a NUMA node is only created if there is at least one DPDK
         interface from that NUMA node added to OVS.

    +  .. note::
    +   On NUMA systems PCI devices are also local to a NUMA node.  Rx queues for
    +   PCI device will assigned to a pmd on it's local NUMA node if pmd-cpu-mask
    +   has created a pmd thread on that NUMA node.  If not the queue will be
    +   assigned to a pmd on a remote NUMA node.

I think the below text is a bit more accurate
+  .. note::
+   On NUMA systems PCI devices are also local to a NUMA node.  Rx queues for
+   a PCI device will be assigned to a pmd on it's local NUMA node.  A pmd is
+   created if at least one dpdk interface is added to OVS on that NUMA node or
+   if the pmd-cpu-mask has created a pmd thread on that NUMA node.  

    + This will result in reduced
    +   maximum throughput on that device.  In the case such a queue assignment
    +   is made a warning message will be logged: "There's no available (non-
    +   isolated) pmd thread on numa node N. Queue Q on port P will be assigned to
    +   the pmd on core C (numa node N'). Expect reduced performance."
    +
     - QEMU vCPU thread Affinity

       A VM performing simple packet forwarding or running complex packet pipelines
    diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
    index b3a0806..34f1963 100644
    --- a/lib/dpif-netdev.c
    +++ b/lib/dpif-netdev.c
    @@ -3149,10 +3149,13 @@ rr_numa_list_lookup(struct rr_numa_list *rr, int numa_id)
     }

     static void
    -rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr)
    +rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr,
    +                      int *all_numa_ids, unsigned all_numa_ids_sz,
    +                      int *num_ids_written)
     {
         struct dp_netdev_pmd_thread *pmd;
         struct rr_numa *numa;
    +    unsigned idx = 0;

         hmap_init(&rr->numas);

    @@ -3170,7 +3173,11 @@ rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr)
             numa->n_pmds++;
             numa->pmds = xrealloc(numa->pmds, numa->n_pmds * sizeof *numa->pmds);
             numa->pmds[numa->n_pmds - 1] = pmd;
    +
    +        all_numa_ids[idx % all_numa_ids_sz] = pmd->numa_id;
    +        idx++;
         }
    +    *num_ids_written = idx;
     }

     static struct dp_netdev_pmd_thread *
    @@ -3202,8 +3209,15 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex)
     {
         struct dp_netdev_port *port;
         struct rr_numa_list rr;
    +    int all_numa_ids [64];
    +    int all_numa_ids_sz = sizeof all_numa_ids / sizeof all_numa_ids[0];
    +    unsigned all_numa_ids_idx = 0;
    +    int all_numa_ids_max_idx = 0;
    +    int num_numa_ids = 0;

    -    rr_numa_list_populate(dp, &rr);
    +    rr_numa_list_populate(dp, &rr, all_numa_ids, all_numa_ids_sz,
    +                          &num_numa_ids);
    +    all_numa_ids_max_idx = MIN(num_numa_ids - 1, all_numa_ids_sz - 1);

         HMAP_FOR_EACH (port, node, &dp->ports) {
             struct rr_numa *numa;
    @@ -3234,10 +3248,29 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex)
                     }
                 } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) {
                     if (!numa) {
    -                    VLOG_WARN("There's no available (non isolated) pmd thread "
    +                    if (all_numa_ids_max_idx < 0) {
    +                        VLOG_ERR("There is no available (non-isolated) pmd "
    +                                 "thread for port \'%s\' queue %d. This queue "
    +                                 "will not be polled. Is pmd-cpu-mask set to "
    +                                 "zero? Or are all PMDs isolated to other "
    +                                 "queues?", netdev_get_name(port->netdev),
    +                                 qid);
    +                        continue;
    +                    }
    +                    int alt_numa_id = all_numa_ids[all_numa_ids_idx];
    +                    struct rr_numa *alt_numa;
    +                    alt_numa = rr_numa_list_lookup(&rr, alt_numa_id);
    +                    q->pmd = rr_numa_get_pmd(alt_numa);
    +                    VLOG_WARN("There's no available (non-isolated) pmd thread "
                                   "on numa node %d. Queue %d on port \'%s\' will "
    -                              "not be polled.",
    -                              numa_id, qid, netdev_get_name(port->netdev));
    +                              "be assigned to the pmd on core %d "
    +                              "(numa node %d). Expect reduced performance.",
    +                              numa_id, qid, netdev_get_name(port->netdev),
    +                              q->pmd->core_id, q->pmd->numa_id);
    +                    all_numa_ids_idx++;
    +                    if (all_numa_ids_idx > all_numa_ids_max_idx) {
    +                        all_numa_ids_idx = 0;
    +                    }
                     } else {
                         q->pmd = rr_numa_get_pmd(numa);
                     }
    -- 
    2.7.4

    _______________________________________________
    dev mailing list
    dev at openvswitch.org
    https://mail.openvswitch.org/mailman/listinfo/ovs-dev