[ovs-dev] [PATCH v3 1/2] dpif-netdev: Add percentage of pmd/core used by each rxq.

Kevin Traynor ktraynor at redhat.com
Tue Jan 16 15:08:00 UTC 2018


On 01/16/2018 12:27 PM, Ilya Maximets wrote:
> Not a full review. Few comments inline.
> 
> Best regards, Ilya Maximets.
> 
> On 15.01.2018 20:58, Kevin Traynor wrote:
>> It is based on the length of history that is stored about an
>> rxq (currently 1 min).
>>
>> $ ovs-appctl dpif-netdev/pmd-rxq-show
>> pmd thread numa_id 0 core_id 4:
>>         isolated : false
>>         port:         dpdkphy1  queue-id:  0    pmd usage: 70 %
>>         port:       dpdkvhost0  queue-id:  0    pmd usage:  0 %
>> pmd thread numa_id 0 core_id 6:
>>         isolated : false
>>         port:         dpdkphy0  queue-id:  0    pmd usage: 64 %
>>         port:       dpdkvhost1  queue-id:  0    pmd usage:  0 %
>>
>> These values are what would be used as part of rxq to pmd
>> assignment due to a reconfiguration event e.g. adding pmds,
>> adding rxqs or with the command:
>>
>> ovs-appctl dpif-netdev/pmd-rxq-rebalance
>>
>> Signed-off-by: Jan Scheurich <jan.scheurich at ericsson.com>
>> Co-authored-by: Jan Scheurich <jan.scheurich at ericsson.com>
>> Signed-off-by: Kevin Traynor <ktraynor at redhat.com>
>> ---
>>
>> V3:
>> - Rebased to DPDK_MERGE branch.
>>
>> V2:
>> - Re-worked to calculate and store cycles needed for stats directly in pmd.
>>
>>  Documentation/howto/dpdk.rst | 11 +++++++++
>>  NEWS                         |  1 +
>>  lib/dpif-netdev.c            | 53 +++++++++++++++++++++++++++++++++-----------
>>  tests/pmd.at                 | 51 +++++++++++++++++++++++++++++++-----------
>>  4 files changed, 90 insertions(+), 26 deletions(-)
>>
>> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
>> index 587aaed..40f9d96 100644
>> --- a/Documentation/howto/dpdk.rst
>> +++ b/Documentation/howto/dpdk.rst
>> @@ -140,4 +140,15 @@ Core 7: Q4 (70%) | Q5 (10%)
>>  core 8: Q3 (60%) | Q0 (30%)
>>  
>> +To see the current measured usage history of pmd core cycles for each rxq::
>> +
>> +    $ ovs-appctl dpif-netdev/pmd-rxq-show
>> +
>> +.. note::
>> +
>> +  A history of one minute is recorded and shown for each rxq to allow for
>> +  traffic pattern spikes. An rxq's pmd core cycles usage changes due to traffic
>> +  pattern or reconfig changes will take one minute before they are fully
>> +  reflected in the stats.
>> +
>>  Rxq to pmds assignment takes place whenever there are configuration changes
>>  or can be triggered by using::
>> diff --git a/NEWS b/NEWS
>> index 2c28456..96c7c8d 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -43,4 +43,5 @@ Post-v2.8.0
>>          - ovs-ofctl dump-ports command now prints new of set custom statistics
>>            if available (for OpenFlow 1.4+).
>> +     * Add rxq utilization of pmd to appctl 'dpif-netdev/pmd-rxq-show'.
>>     - Userspace datapath:
>>       * Output packet batching support.
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index 48a8ebb..28a418f 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -568,4 +568,9 @@ struct dp_netdev_pmd_thread {
>>      long long int rxq_next_cycle_store;
>>  
>> +    /* Last interval timestamp. */
>> +    uint64_t intrvl_tsc_prev;
>> +    /* Last interval cycles. */
>> +    atomic_ullong intrvl_cycles;
>> +
>>      /* Current context of the PMD thread. */
>>      struct dp_netdev_pmd_thread_ctx ctx;
>> @@ -933,7 +938,7 @@ pmd_info_show_rxq(struct ds *reply, struct dp_netdev_pmd_thread *pmd)
>>  {
>>      if (pmd->core_id != NON_PMD_CORE_ID) {
>> -        const char *prev_name = NULL;
>>          struct rxq_poll *list;
>> -        size_t i, n;
>> +        size_t n_rxq;
>> +        uint64_t total_cycles = 0;
>>  
>>          ds_put_format(reply,
>> @@ -943,20 +948,32 @@ pmd_info_show_rxq(struct ds *reply, struct dp_netdev_pmd_thread *pmd)
>>  
>>          ovs_mutex_lock(&pmd->port_mutex);
>> -        sorted_poll_list(pmd, &list, &n);
>> -        for (i = 0; i < n; i++) {
>> -            const char *name = netdev_rxq_get_name(list[i].rxq->rx);
>> +        sorted_poll_list(pmd, &list, &n_rxq);
>>  
>> -            if (!prev_name || strcmp(name, prev_name)) {
>> -                if (prev_name) {
>> -                    ds_put_cstr(reply, "\n");
>> -                }
>> -                ds_put_format(reply, "\tport: %s\tqueue-id:", name);
>> +        /* Get the total pmd cycles for an interval. */
>> +        atomic_read_relaxed(&pmd->intrvl_cycles, &total_cycles);
>> +        /* Estimate the cycles to cover all intervals. */
>> +        total_cycles *= PMD_RXQ_INTERVAL_MAX;
>> +
>> +        for (int i = 0; i < n_rxq; i++) {
>> +            struct dp_netdev_rxq *rxq = list[i].rxq;
>> +            const char *name = netdev_rxq_get_name(rxq->rx);
>> +            uint64_t proc_cycles = 0;
>> +
>> +            for (int j = 0; j < PMD_RXQ_INTERVAL_MAX; j++) {
>> +                proc_cycles += dp_netdev_rxq_get_intrvl_cycles(rxq, j);
>>              }
>> -            ds_put_format(reply, " %d",
>> +            ds_put_format(reply, "\tport: %16s\tqueue-id: %2d", name,
>>                            netdev_rxq_get_queue_id(list[i].rxq->rx));
> 
> Can we use left-justified output for the port name (i.e. "%-16s").
> Looks more natural, especially if port names has different lengths.
> 

Sure, changed it and updated UT's to match in v4.

>> -            prev_name = name;
>> +            ds_put_format(reply, "\tpmd usage: ");
>> +            if (total_cycles) {
>> +                ds_put_format(reply, "%2"PRIu64"",
>> +                              proc_cycles * 100 / total_cycles);
>> +                ds_put_cstr(reply, " %");
>> +            } else {
>> +                ds_put_format(reply, "%s", "NOT AVAIL");
>> +            }
>> +            ds_put_cstr(reply, "\n");
>>          }
>>          ovs_mutex_unlock(&pmd->port_mutex);
>> -        ds_put_cstr(reply, "\n");
>>          free(list);
>>      }
>> @@ -4118,4 +4135,6 @@ reload:
>>      }
>>  
>> +    pmd->intrvl_tsc_prev = 0;
>> +    atomic_store_relaxed(&pmd->intrvl_cycles, 0);
>>      cycles_counter_update(s);
>>      for (;;) {
>> @@ -6117,4 +6136,5 @@ dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd,
>>  
>>      if (pmd->ctx.now > pmd->rxq_next_cycle_store) {
>> +        uint64_t curr_tsc;
>>          /* Get the cycles that were used to process each queue and store. */
>>          for (unsigned i = 0; i < poll_cnt; i++) {
>> @@ -6125,4 +6145,11 @@ dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd,
>>                                       0);
>>          }
>> +        curr_tsc = cycles_counter_update(&pmd->perf_stats);
>> +        if (pmd->intrvl_tsc_prev) {
>> +            /* There is a prev timestamp, store a new intrvl cycle count. */
>> +            atomic_store_relaxed(&pmd->intrvl_cycles,
>> +                                 curr_tsc - pmd->intrvl_tsc_prev);
>> +        }
>> +        pmd->intrvl_tsc_prev = curr_tsc;
>>          /* Start new measuring interval */
>>          pmd->rxq_next_cycle_store = pmd->ctx.now + PMD_RXQ_INTERVAL_LEN;
>> diff --git a/tests/pmd.at b/tests/pmd.at
>> index 83d60f8..25ea996 100644
>> --- a/tests/pmd.at
>> +++ b/tests/pmd.at
>> @@ -7,5 +7,13 @@ m4_divert_push([PREPARE_TESTS])
>>  # port_name rxq_id numa_id core_id
>>  parse_pmd_rxq_show () {
>> -    awk '/pmd/ {numa=$4; core=substr($6, 1, length($6) - 1)} /\t/{for (i=4; i<=NF; i++) print  $2, $i, numa, core}' | sort
>> +    awk '/pmd thread/ {numa=$4; core=substr($6, 1, length($6) - 1)} /\tport:/ {print  $2, $4, numa, core}' | sort
>> +}
>> +
>> +# Given the output of `ovs-appctl dpif-netdev/pmd-rxq-show`,
>> +# and with queues for each core on one line, prints the rxqs
>> +# of the core on one line
>> +# 'port:' port_name 'queue_id:' rxq_id rxq_id rxq_id rxq_id
>> +parse_pmd_rxq_show_group () {
>> +   awk '/port:/ {print  $1, $2, $3, $4, $12, $20, $28}'
>>  }
>>  
>> @@ -54,5 +62,5 @@ m4_define([CHECK_PMD_THREADS_CREATED], [
>>  
>>  m4_define([SED_NUMA_CORE_PATTERN], ["s/\(numa_id \)[[0-9]]*\( core_id \)[[0-9]]*:/\1<cleared>\2<cleared>:/"])
>> -m4_define([SED_NUMA_CORE_QUEUE_PATTERN], ["s/\(numa_id \)[[0-9]]*\( core_id \)[[0-9]]*:/\1<cleared>\2<cleared>:/;s/\(queue-id: \)1 2 5 6/\1<cleared>/;s/\(queue-id: \)0 3 4 7/\1<cleared>/"])
>> +m4_define([SED_NUMA_CORE_QUEUE_PATTERN], ["s/1 2 5 6/<group>/;s/0 3 4 7/<group>/"])
>>  m4_define([DUMMY_NUMA], [--dummy-numa="0,0,0,0"])
>>  
>> @@ -66,5 +74,5 @@ AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed SED_NUMA_CORE_PATTERN], [0],
>>  pmd thread numa_id <cleared> core_id <cleared>:
>>  	isolated : false
>> -	port: p0	queue-id: 0
>> +	port:               p0	queue-id:  0	pmd usage: NOT AVAIL
>>  ])
>>  
>> @@ -97,5 +105,12 @@ AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed SED_NUMA_CORE_PATTERN], [0],
>>  pmd thread numa_id <cleared> core_id <cleared>:
>>  	isolated : false
>> -	port: p0	queue-id: 0 1 2 3 4 5 6 7 <sip:01234567>
>> +	port:               p0	queue-id:  0	pmd usage: NOT AVAIL
>> +	port:               p0	queue-id:  1	pmd usage: NOT AVAIL
>> +	port:               p0	queue-id:  2	pmd usage: NOT AVAIL
>> +	port:               p0	queue-id:  3	pmd usage: NOT AVAIL
>> +	port:               p0	queue-id:  4	pmd usage: NOT AVAIL
>> +	port:               p0	queue-id:  5	pmd usage: NOT AVAIL
>> +	port:               p0	queue-id:  6	pmd usage: NOT AVAIL
>> +	port:               p0	queue-id:  7	pmd usage: NOT AVAIL
>>  ])
>>  
>> @@ -121,5 +136,12 @@ AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed SED_NUMA_CORE_PATTERN], [0],
>>  pmd thread numa_id <cleared> core_id <cleared>:
>>  	isolated : false
>> -	port: p0	queue-id: 0 1 2 3 4 5 6 7 <sip:01234567>
>> +	port:               p0	queue-id:  0	pmd usage: NOT AVAIL
>> +	port:               p0	queue-id:  1	pmd usage: NOT AVAIL
>> +	port:               p0	queue-id:  2	pmd usage: NOT AVAIL
>> +	port:               p0	queue-id:  3	pmd usage: NOT AVAIL
>> +	port:               p0	queue-id:  4	pmd usage: NOT AVAIL
>> +	port:               p0	queue-id:  5	pmd usage: NOT AVAIL
>> +	port:               p0	queue-id:  6	pmd usage: NOT AVAIL
>> +	port:               p0	queue-id:  7	pmd usage: NOT AVAIL
>>  ])
>>  
>> @@ -128,11 +150,7 @@ AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x3])
>>  CHECK_PMD_THREADS_CREATED([2], [], [+$TMP])
>>  
>> -AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed SED_NUMA_CORE_QUEUE_PATTERN], [0], [dnl
>> -pmd thread numa_id <cleared> core_id <cleared>:
>> -	isolated : false
>> -	port: p0	queue-id: <cleared>
>> -pmd thread numa_id <cleared> core_id <cleared>:
>> -	isolated : false
>> -	port: p0	queue-id: <cleared>
>> +AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed ':a;/AVAIL$/{N;s/\n//;ba}' | parse_pmd_rxq_show_group | sed SED_NUMA_CORE_QUEUE_PATTERN], [0], [dnl
>> +port: p0 queue-id: <group>
>> +port: p0 queue-id: <group>
>>  ])
> 
> Why we need above change? It really complicates reading the test.
> We can just add 3 more lines to each pmd thread with cleared queue ids.

What I saw was that the existing test needs cleared queue ids because it
can vary depending on timing which pmd is associated with which group of
queues.

If we were to just clear each queue id individually then we would only
be testing that equal number of queues were distributed to each pmd - we
would no longer be testing if the queue groupings were correct. So
parsed them onto one line to ensure the grouping was correct and then
mark as group.

Actually compared to other tests where output needs to be stripped out I
deliberately gave some more information like "port: p0 queue-id:" so
there would be context for someone debugging a fail.

> From the other hand, if you want to decrease the size of the output,
> IMHO, you should enhance your commands to not strip 'pmd thread ...'.
> Current test tries to check that we have 2 different threads and queues
> are distributed between them. It's not obvious if headers stripped.
> 
> Something like this could be used:
> ---
> m4_define([SED_QUEUE_ID_PATTERN], ["s/\(queue-id: \) *[[0-9]]*/\1<cleared>/"])
> 
> AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed SED_NUMA_CORE_PATTERN | sed SED_QUEUE_ID_PATTERN | uniq -c], [0], [dnl
>       1 pmd thread numa_id <cleared> core_id <cleared>:
>       1 	isolated : false
>       4 	port:               p0	queue-id: <cleared>	pmd usage: NOT AVAIL
>       1 pmd thread numa_id <cleared> core_id <cleared>:
>       1 	isolated : false
>       4 	port:               p0	queue-id: <cleared>	pmd usage: NOT AVAIL
> ])
> ----
> This will guarantee that we have 4 queues on each pmd thread.

Right, but it would lose the existing ability to detect if the groupings
were not as expected, which would imply that something had
changed/broken in the algorithm to distribute queues to pmds.

> If you wish, same technique could be used to decrease size of output in other tests
> where exact queue-ids are not important.
> 

Hope it makes sense now why I made the changes above? In general, I
don't want to strip the output down. I found it easier to debug failures
with more natural looking output.

thanks,
Kevin.

>>  
>> @@ -144,5 +162,12 @@ AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed SED_NUMA_CORE_PATTERN], [0],
>>  pmd thread numa_id <cleared> core_id <cleared>:
>>  	isolated : false
>> -	port: p0	queue-id: 0 1 2 3 4 5 6 7 <sip:01234567>
>> +	port:               p0	queue-id:  0	pmd usage: NOT AVAIL
>> +	port:               p0	queue-id:  1	pmd usage: NOT AVAIL
>> +	port:               p0	queue-id:  2	pmd usage: NOT AVAIL
>> +	port:               p0	queue-id:  3	pmd usage: NOT AVAIL
>> +	port:               p0	queue-id:  4	pmd usage: NOT AVAIL
>> +	port:               p0	queue-id:  5	pmd usage: NOT AVAIL
>> +	port:               p0	queue-id:  6	pmd usage: NOT AVAIL
>> +	port:               p0	queue-id:  7	pmd usage: NOT AVAIL
>>  ])
>>  
>>



More information about the dev mailing list