[ovs-dev] [PATCH v2] dpif-netdev: Allow PMD auto load balance with cross-numa.

Kevin Traynor ktraynor at redhat.com
Thu Mar 18 11:37:15 UTC 2021


On 17/03/2021 15:59, Ilya Maximets wrote:
> On 3/15/21 4:43 PM, Kevin Traynor wrote:
>> Previously auto load balance did not trigger a reassignment when
>> there was any cross-numa polling as an rxq could be polled from a
>> different numa after reassign and it could impact estimates.
>>
>> In the case where there is only one numa with pmds available, the
>> same numa will always poll before and after reassignment, so estimates
>> are valid. Allow PMD auto load balance to trigger a reassignment in
>> this case.
>>
>> Signed-off-by: Kevin Traynor <ktraynor at redhat.com>
>> Acked-by: Eelco Chaudron <echaudro at redhat.com>
>>
>> ---
>> v2:
>> - Same logic as v1, combined two "ifs" as per David suggestion
>> - Updated comments/logs
>> - Updated the doc note that said it will not work for cross NUMA to
>>   include new condition
>> - Kept Eelco's Ack, as no logic changed
>> ---
>>  Documentation/topics/dpdk/pmd.rst |  9 ++++++---
>>  lib/dpif-netdev.c                 | 16 +++++++++++++---
>>  2 files changed, 19 insertions(+), 6 deletions(-)
>>
>> diff --git a/Documentation/topics/dpdk/pmd.rst b/Documentation/topics/dpdk/pmd.rst
>> index caa7d97be..1f61bddb6 100644
>> --- a/Documentation/topics/dpdk/pmd.rst
>> +++ b/Documentation/topics/dpdk/pmd.rst
>> @@ -238,7 +238,10 @@ If not set, the default variance improvement threshold is 25%.
>>  .. note::
>>  
>> -    PMD Auto Load Balancing doesn't currently work if queues are assigned
>> -    cross NUMA as actual processing load could get worse after assignment
>> -    as compared to what dry run predicts.
>> +    PMD Auto Load Balancing doesn't request a reassignment if queues are
>> +    assigned cross NUMA and there are multiple NUMA nodes available for
>> +    reassignment. This is because reassignment to a different NUMA node could
>> +    lead to an unpredictable change in processing cycles required for a queue.
>> +    However, if there is only one cross NUMA node available then a dry run and
>> +    possible request to reassign may continue as normal.
> 
> This note looks very cryptic.  What is 'cross NUMA node'?  Request a reassignment
> from who?  What is dry run (this was understandable from the old version of the
> note)?
> 
> Way too complex.
> I'd not expect that normal user who doesn't know internals of the code to
> understand what is going on here.
> 
> Maybe we can keep the old note and only add an exceptional case? e.g.:
> 
>     PMD Auto Load Balancing doesn't currently work if queues are assigned
>     cross NUMA as actual processing load could get worse after assignment
>     as compared to what dry run predicts.  The only exception is when all
>     PMD threads are running on cores from a single NUMA node.  In this case
>     Auto Load Balancing is still possible.
> 
>>  
>>  The minimum time between 2 consecutive PMD auto load balancing iterations can
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index 816945375..29e74ee43 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -4888,4 +4888,10 @@ struct rr_numa {
>>  };
>>  
>> +static size_t
>> +rr_numa_list_count(struct rr_numa_list *rr)
>> +{
>> +    return hmap_count(&rr->numas);
>> +}
>> +
>>  static struct rr_numa *
>>  rr_numa_list_lookup(struct rr_numa_list *rr, int numa_id)
>> @@ -5600,8 +5606,12 @@ get_dry_run_variance(struct dp_netdev *dp, uint32_t *core_list,
>>          int numa_id = netdev_get_numa_id(rxqs[i]->port->netdev);
>>          numa = rr_numa_list_lookup(&rr, numa_id);
>> +        /* If there is no available pmd on the local numa but there is only one
>> +         * numa for cross-numa polling, we can estimate the dry run. */
>> +        if (!numa && rr_numa_list_count(&rr) == 1) {
>> +            numa = rr_numa_list_next(&rr, NULL);
>> +        }
>>          if (!numa) {
>> -            /* Abort if cross NUMA polling. */
>> -            VLOG_DBG("PMD auto lb dry run."
>> -                     " Aborting due to cross-numa polling.");
>> +            VLOG_DBG("PMD auto lb dry run. Aborting due to "
>> +                     "multiple numa nodes available for cross-numa polling.");
> 
> Same here.  This message is hard to understand.
> Maybe:
>             VLOG_DBG("PMD auto lb dry run: "
>                      "There's no available (non-isolated) PMD thread on NUMA "
>                      "node %d for port '%s' and there are PMD threads on more "
>                      "than one NUMA node available for cross-NUMA polling. "
>                      "Aborting.", numa_id, netdev_rxq_get_name(rxqs[i]->rx));
> 
> What do you think?
> 

Sounds good. Thanks for adding the suggestions too, it helps with
comments about text. i've incorporated them into v3:
https://mail.openvswitch.org/pipermail/ovs-dev/2021-March/381310.html

> Best regards, Ilya Maximets.
> 



More information about the dev mailing list