[ovs-dev] [PATCH v8 2/2] dpif-netdev: Refactor cycle counting
Ilya Maximets
i.maximets at samsung.com
Fri Jan 12 12:57:27 UTC 2018
On 12.01.2018 15:52, Jan Scheurich wrote:
> Thanks for the info and hint. I will implement that in v9.
>
> I will also include "rte_config.h" in dpif-netdev.c. It didn't break ARM build so far because it happened to be included indirectly.
Maybe it's better to remove include for rte_cycles.h from dpif-netdev.c instead?
It looks like dpif-netdev does not use any dpdk functions anymore.
> BR, Jan
>
>> -----Original Message-----
>> From: ovs-dev-bounces at openvswitch.org [mailto:ovs-dev-bounces at openvswitch.org] On Behalf Of Ilya Maximets
>> Sent: Friday, 12 January, 2018 11:07
>> To: Jan Scheurich <jan.scheurich at ericsson.com>; dev at openvswitch.org
>> Subject: Re: [ovs-dev] [PATCH v8 2/2] dpif-netdev: Refactor cycle counting
>>
>> Patch breaks ARMv8 build:
>>
>> In file included from lib/dpif-netdev-perf.c:20:
>> In file included from dpdk/build//include/rte_cycles.h:39:
>> In file included from dpdk/build//include/rte_cycles_32.h:49:
>> In file included from dpdk/build//include/generic/rte_cycles.h:46:
>> In file included from dpdk/build//include/rte_atomic.h:39:
>> dpdk/build//include/rte_atomic_32.h:37:4: error: Platform must be built with CONFIG_RTE_FORCE_INTRINSICS
>> # error Platform must be built with CONFIG_RTE_FORCE_INTRINSICS
>> ^
>>
>> The issue is that "rte_config.h" is not included.
>> You need following incremental for this patch:
>> ------------------------------------------------------------
>> diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
>> index 297c184..f6e995b 100644
>> --- a/lib/dpif-netdev-perf.h
>> +++ b/lib/dpif-netdev-perf.h
>> @@ -23,6 +23,11 @@
>> #include <string.h>
>> #include <math.h>
>>
>> +#ifdef DPDK_NETDEV
>> +#include <rte_config.h>
>> +#include <rte_cycles.h>
>> +#endif
>> +
>> #include "openvswitch/vlog.h"
>> #include "ovs-atomic.h"
>> #include "timeval.h"
>> ------------------------------------------------------------
>>
>> And following for the patch #1, because dpif-netdev-perf.c doesn't use
>> any DPDK functions:
>> ------------------------------------------------------------
>> diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c
>> index d609545..53fb41f 100644
>> --- a/lib/dpif-netdev-perf.c
>> +++ b/lib/dpif-netdev-perf.c
>> @@ -16,10 +16,6 @@
>>
>> #include <config.h>
>>
>> -#ifdef DPDK_NETDEV
>> -#include <rte_cycles.h>
>> -#endif
>> -
>> #include "openvswitch/dynamic-string.h"
>> #include "openvswitch/vlog.h"
>> #include "dpif-netdev-perf.h"
>> ------------------------------------------------------------
>>
>>
>> Best regards, Ilya Maximets.
>>
>> On 11.01.2018 19:20, Jan Scheurich wrote:
>>> Simplify the historically grown TSC cycle counting in PMD threads.
>>> Cycles are currently counted for the following purposes:
>>>
>>> 1. Measure PMD ustilization
>>>
>>> PMD utilization is defined as ratio of cycles spent in busy iterations
>>> (at least one packet received or sent) over the total number of cycles.
>>>
>>> This is already done in pmd_perf_start_iteration() and
>>> pmd_perf_end_iteration() based on a TSC timestamp saved in current
>>> iteration at start_iteration() and the actual TSC at end_iteration().
>>> No dependency on intermediate cycle accounting.
>>>
>>> 2. Measure the processing load per RX queue
>>>
>>> This comprises cycles spend on polling and processing packets received
>>> from the rx queue and the cycles spent on delayed sending of these packets
>>> to tx queues (with time-based batching).
>>>
>>> The previous scheme using cycles_count_start(), cycles_count_intermediate()
>>> and cycles-count_end() originally introduced to simplify cycle counting
>>> and saving calls to rte_get_tsc_cycles() was rather obscuring things.
>>>
>>> Replace by a nestable cycle_timer with with start and stop functions to
>>> embrace a code segment to be timed. The timed code may contain arbitrary
>>> nested cycle_timers. The duration of nested timers is excluded from the
>>> outer timer.
>>>
>>> The caller must ensure that each call to cycle_timer_start() is
>>> followed by a call to cycle_timer_end(). Failure to do so will lead to
>>> assertion failure or a memory leak.
>>>
>>> The new cycle_timer is used to measure the processing cycles per rx queue.
>>> This is not yet strictly necessary but will be made use of in a subsequent
>>> commit.
>>>
>>> All cycle count functions and data are relocated to module
>>> dpif-netdev-perf.
>>>
>>> Signed-off-by: Jan Scheurich <jan.scheurich at ericsson.com>
>>> ---
>>> lib/dpif-netdev-perf.c | 1 +
>>> lib/dpif-netdev-perf.h | 99 +++++++++++++++++++++++++++++++++++++----
>>> lib/dpif-netdev.c | 118 ++++++++++++++-----------------------------------
>>> 3 files changed, 125 insertions(+), 93 deletions(-)
>>>
>>> diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c
>>> index bb4cf9d..d609545 100644
>>> --- a/lib/dpif-netdev-perf.c
>>> +++ b/lib/dpif-netdev-perf.c
>>> @@ -31,6 +31,7 @@ void
>>> pmd_perf_stats_init(struct pmd_perf_stats *s)
>>> {
>>> memset(s, 0 , sizeof(*s));
>>> + cycles_counter_update(s);
>>> }
>>>
>>> void
>>> diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
>>> index 53d60d3..297c184 100644
>>> --- a/lib/dpif-netdev-perf.h
>>> +++ b/lib/dpif-netdev-perf.h
>>> @@ -59,10 +59,6 @@ enum pmd_stat_type {
>>> * recirculation. */
>>> PMD_STAT_SENT_PKTS, /* Packets that have been sent. */
>>> PMD_STAT_SENT_BATCHES, /* Number of batches sent. */
>>> - PMD_CYCLES_POLL_IDLE, /* Cycles spent unsuccessful polling. */
>>> - PMD_CYCLES_POLL_BUSY, /* Cycles spent successfully polling and
>>> - * processing polled packets. */
>>> - PMD_CYCLES_OVERHEAD, /* Cycles spent for other tasks. */
>>> PMD_CYCLES_ITER_IDLE, /* Cycles spent in idle iterations. */
>>> PMD_CYCLES_ITER_BUSY, /* Cycles spent in busy iterations. */
>>> PMD_N_STATS
>>> @@ -85,11 +81,95 @@ struct pmd_counters {
>>>
>>> struct pmd_perf_stats {
>>> /* Start of the current PMD iteration in TSC cycles.*/
>>> + uint64_t start_it_tsc;
>>> + /* Latest TSC time stamp taken in PMD. */
>>> uint64_t last_tsc;
>>> + /* If non-NULL, outermost cycle timer currently running in PMD. */
>>> + struct cycle_timer *cur_timer;
>>> /* Set of PMD counters with their zero offsets. */
>>> struct pmd_counters counters;
>>> };
>>>
>>> +/* Support for accurate timing of PMD execution on TSC clock cycle level.
>>> + * These functions are intended to be invoked in the context of pmd threads. */
>>> +
>>> +/* Read the TSC cycle register and cache it. Any function not requiring clock
>>> + * cycle accuracy should read the cached value using cycles_counter_get() to
>>> + * avoid the overhead of reading the TSC register. */
>>> +
>>> +static inline uint64_t
>>> +cycles_counter_update(struct pmd_perf_stats *s)
>>> +{
>>> +#ifdef DPDK_NETDEV
>>> + return s->last_tsc = rte_get_tsc_cycles();
>>> +#else
>>> + return s->last_tsc = 0;
>>> +#endif
>>> +}
>>> +
>>> +static inline uint64_t
>>> +cycles_counter_get(struct pmd_perf_stats *s)
>>> +{
>>> + return s->last_tsc;
>>> +}
>>> +
>>> +/* A nestable timer for measuring execution time in TSC cycles.
>>> + *
>>> + * Usage:
>>> + * struct cycle_timer timer;
>>> + *
>>> + * cycle_timer_start(pmd, &timer);
>>> + * <Timed execution>
>>> + * uint64_t cycles = cycle_timer_stop(pmd, &timer);
>>> + *
>>> + * The caller must guarantee that a call to cycle_timer_start() is always
>>> + * paired with a call to cycle_stimer_stop().
>>> + *
>>> + * Is is possible to have nested cycles timers within the timed code. The
>>> + * execution time measured by the nested timers is excluded from the time
>>> + * measured by the embracing timer.
>>> + */
>>> +
>>> +struct cycle_timer {
>>> + uint64_t start;
>>> + uint64_t suspended;
>>> + struct cycle_timer *interrupted;
>>> +};
>>> +
>>> +static inline void
>>> +cycle_timer_start(struct pmd_perf_stats *s,
>>> + struct cycle_timer *timer)
>>> +{
>>> + struct cycle_timer *cur_timer = s->cur_timer;
>>> + uint64_t now = cycles_counter_update(s);
>>> +
>>> + if (cur_timer) {
>>> + cur_timer->suspended = now;
>>> + }
>>> + timer->interrupted = cur_timer;
>>> + timer->start = now;
>>> + timer->suspended = 0;
>>> + s->cur_timer = timer;
>>> +}
>>> +
>>> +static inline uint64_t
>>> +cycle_timer_stop(struct pmd_perf_stats *s,
>>> + struct cycle_timer *timer)
>>> +{
>>> + /* Assert that this is the current cycle timer. */
>>> + ovs_assert(s->cur_timer == timer);
>>> + uint64_t now = cycles_counter_update(s);
>>> + struct cycle_timer *intr_timer = timer->interrupted;
>>> +
>>> + if (intr_timer) {
>>> + /* Adjust the start offset by the suspended cycles. */
>>> + intr_timer->start += now - intr_timer->suspended;
>>> + }
>>> + /* Restore suspended timer, if any. */
>>> + s->cur_timer = intr_timer;
>>> + return now - timer->start;
>>> +}
>>> +
>>> void pmd_perf_stats_init(struct pmd_perf_stats *s);
>>> void pmd_perf_stats_clear(struct pmd_perf_stats *s);
>>> void pmd_perf_read_counters(struct pmd_perf_stats *s,
>>> @@ -115,16 +195,17 @@ pmd_perf_update_counter(struct pmd_perf_stats *s,
>>> }
>>>
>>> static inline void
>>> -pmd_perf_start_iteration(struct pmd_perf_stats *s, uint64_t now_tsc)
>>> +pmd_perf_start_iteration(struct pmd_perf_stats *s)
>>> {
>>> - s->last_tsc = now_tsc;
>>> + /* We rely on here that the last_tsc was updated immediately prior
>>> + * at the end of the previous iteration, or before the first iteration. */
>>> + s->start_it_tsc = s->last_tsc;
>>> }
>>>
>>> static inline void
>>> -pmd_perf_end_iteration(struct pmd_perf_stats *s, uint64_t now_tsc,
>>> - int rx_packets)
>>> +pmd_perf_end_iteration(struct pmd_perf_stats *s, int rx_packets)
>>> {
>>> - uint64_t cycles = now_tsc - s->last_tsc;
>>> + uint64_t cycles = cycles_counter_update(s) - s->start_it_tsc;
>>>
>>> if (rx_packets > 0) {
>>> pmd_perf_update_counter(s, PMD_CYCLES_ITER_BUSY, cycles);
>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>> index 82d29bb..dc26026 100644
>>> --- a/lib/dpif-netdev.c
>>> +++ b/lib/dpif-netdev.c
>>> @@ -509,8 +509,6 @@ struct tx_port {
>>> struct dp_netdev_pmd_thread_ctx {
>>> /* Latest measured time. See 'pmd_thread_ctx_time_update()'. */
>>> long long now;
>>> - /* Used to count cycles. See 'cycles_count_end()' */
>>> - unsigned long long last_cycles;
>>> };
>>>
>>> /* PMD: Poll modes drivers. PMD accesses devices via polling to eliminate
>>> @@ -3111,64 +3109,20 @@ dp_netdev_actions_free(struct dp_netdev_actions *actions)
>>> free(actions);
>>> }
>>>
>
>>> -static inline unsigned long long
>>> -cycles_counter(void)
>>> -{
>>> -#ifdef DPDK_NETDEV
>>> - return rte_get_tsc_cycles();
>>> -#else
>>> - return 0;
>>> -#endif
>>> -}
>>> -
>>> -/* Fake mutex to make sure that the calls to cycles_count_* are balanced */
>>> -extern struct ovs_mutex cycles_counter_fake_mutex;
>>> -
>>> -/* Start counting cycles. Must be followed by 'cycles_count_end()' */
>>> -static inline void
>>> -cycles_count_start(struct dp_netdev_pmd_thread *pmd)
>>> - OVS_ACQUIRES(&cycles_counter_fake_mutex)
>>> - OVS_NO_THREAD_SAFETY_ANALYSIS
>>> -{
>>> - pmd->ctx.last_cycles = cycles_counter();
>>> -}
>>> -
>>> -/* Stop counting cycles and add them to the counter 'type' */
>>> -static inline void
>>> -cycles_count_end(struct dp_netdev_pmd_thread *pmd,
>>> - enum pmd_stat_type type)
>>> - OVS_RELEASES(&cycles_counter_fake_mutex)
>>> - OVS_NO_THREAD_SAFETY_ANALYSIS
>>> -{
>>> - unsigned long long interval = cycles_counter() - pmd->ctx.last_cycles;
>>> -
>>> - pmd_perf_update_counter(&pmd->perf_stats, type, interval);
>>> -}
>>> -
>>> -/* Calculate the intermediate cycle result and add to the counter 'type' */
>>> -static inline void
>>> -cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd,
>>> - struct dp_netdev_rxq *rxq,
>>> - enum pmd_stat_type type)
>>> - OVS_NO_THREAD_SAFETY_ANALYSIS
>>> +static void
>>> +dp_netdev_rxq_set_cycles(struct dp_netdev_rxq *rx,
>>> + enum rxq_cycles_counter_type type,
>>> + unsigned long long cycles)
>>> {
>>> - unsigned long long new_cycles = cycles_counter();
>>> - unsigned long long interval = new_cycles - pmd->ctx.last_cycles;
>>> - pmd->ctx.last_cycles = new_cycles;
>>> -
>>> - pmd_perf_update_counter(&pmd->perf_stats, type, interval);
>>> - if (rxq && (type == PMD_CYCLES_POLL_BUSY)) {
>>> - /* Add to the amount of current processing cycles. */
>>> - non_atomic_ullong_add(&rxq->cycles[RXQ_CYCLES_PROC_CURR], interval);
>>> - }
>>> + atomic_store_relaxed(&rx->cycles[type], cycles);
>>> }
>>>
>>> static void
>>> -dp_netdev_rxq_set_cycles(struct dp_netdev_rxq *rx,
>>> +dp_netdev_rxq_add_cycles(struct dp_netdev_rxq *rx,
>>> enum rxq_cycles_counter_type type,
>>> unsigned long long cycles)
>>> {
>>> - atomic_store_relaxed(&rx->cycles[type], cycles);
>>> + non_atomic_ullong_add(&rx->cycles[type], cycles);
>>> }
>>>
>>> static uint64_t
>>> @@ -3234,27 +3188,40 @@ dp_netdev_pmd_flush_output_packets(struct dp_netdev_pmd_thread *pmd)
>>>
>>> static int
>>> dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd,
>>> - struct netdev_rxq *rx,
>>> + struct dp_netdev_rxq *rxq,
>>> odp_port_t port_no)
>>> {
>>> struct dp_packet_batch batch;
>>> + struct cycle_timer timer;
>>> int error;
>>> int batch_cnt = 0;
>>>
>>> + /* Measure duration for polling and processing rx burst. */
>>> + cycle_timer_start(&pmd->perf_stats, &timer);
>>> dp_packet_batch_init(&batch);
>>> - error = netdev_rxq_recv(rx, &batch);
>>> + error = netdev_rxq_recv(rxq->rx, &batch);
>>> if (!error) {
>>> + /* At least one packet received. */
>>> *recirc_depth_get() = 0;
>>> pmd_thread_ctx_time_update(pmd);
>>>
>>> batch_cnt = batch.count;
>>> dp_netdev_input(pmd, &batch, port_no);
>>> dp_netdev_pmd_flush_output_packets(pmd);
>>> - } else if (error != EAGAIN && error != EOPNOTSUPP) {
>>> - static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
>>>
>>> - VLOG_ERR_RL(&rl, "error receiving data from %s: %s",
>>> - netdev_rxq_get_name(rx), ovs_strerror(error));
>>> + /* Assign processing cycles to rx queue. */
>>> + uint64_t cycles = cycle_timer_stop(&pmd->perf_stats, &timer);
>>> + dp_netdev_rxq_add_cycles(rxq, RXQ_CYCLES_PROC_CURR, cycles);
>>> +
>>> + } else {
>>> + /* Discard cycles. */
>>> + cycle_timer_stop(&pmd->perf_stats, &timer);
>>> + if (error != EAGAIN && error != EOPNOTSUPP) {
>>> + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
>>> +
>>> + VLOG_ERR_RL(&rl, "error receiving data from %s: %s",
>>> + netdev_rxq_get_name(rxq->rx), ovs_strerror(error));
>>> + }
>>> }
>>>
>>> return batch_cnt;
>>> @@ -3880,30 +3847,22 @@ dpif_netdev_run(struct dpif *dpif)
>>> struct dp_netdev *dp = get_dp_netdev(dpif);
>>> struct dp_netdev_pmd_thread *non_pmd;
>>> uint64_t new_tnl_seq;
>>> - int process_packets = 0;
>>>
>>> ovs_mutex_lock(&dp->port_mutex);
>>> non_pmd = dp_netdev_get_pmd(dp, NON_PMD_CORE_ID);
>>> if (non_pmd) {
>>> ovs_mutex_lock(&dp->non_pmd_mutex);
>>> - cycles_count_start(non_pmd);
>>> HMAP_FOR_EACH (port, node, &dp->ports) {
>>> if (!netdev_is_pmd(port->netdev)) {
>>> int i;
>>>
>>> for (i = 0; i < port->n_rxq; i++) {
>>> - process_packets =
>>> - dp_netdev_process_rxq_port(non_pmd,
>>> - port->rxqs[i].rx,
>>> - port->port_no);
>>> - cycles_count_intermediate(non_pmd, NULL,
>>> - process_packets
>>> - ? PMD_CYCLES_POLL_BUSY
>>> - : PMD_CYCLES_POLL_IDLE);
>>> + dp_netdev_process_rxq_port(non_pmd,
>>> + &port->rxqs[i],
>>> + port->port_no);
>>> }
>>> }
>>> }
>>> - cycles_count_end(non_pmd, PMD_CYCLES_POLL_IDLE);
>>> pmd_thread_ctx_time_update(non_pmd);
>>> dpif_netdev_xps_revalidate_pmd(non_pmd, false);
>>> ovs_mutex_unlock(&dp->non_pmd_mutex);
>>> @@ -4082,18 +4041,14 @@ reload:
>>> lc = UINT_MAX;
>>> }
>>>
>>> - cycles_count_start(pmd);
>>> + cycles_counter_update(s);
>>> for (;;) {
>>> uint64_t iter_packets = 0;
>>> - pmd_perf_start_iteration(s, pmd->ctx.last_cycles);
>>> + pmd_perf_start_iteration(s);
>>> for (i = 0; i < poll_cnt; i++) {
>>> process_packets =
>>> - dp_netdev_process_rxq_port(pmd, poll_list[i].rxq->rx,
>>> + dp_netdev_process_rxq_port(pmd, poll_list[i].rxq,
>>> poll_list[i].port_no);
>>> - cycles_count_intermediate(pmd, poll_list[i].rxq,
>>> - process_packets
>>> - ? PMD_CYCLES_POLL_BUSY
>>> - : PMD_CYCLES_POLL_IDLE);
>>> iter_packets += process_packets;
>>> }
>>>
>>> @@ -4115,13 +4070,10 @@ reload:
>>> if (reload) {
>>> break;
>>> }
>>> - cycles_count_intermediate(pmd, NULL, PMD_CYCLES_OVERHEAD);
>>> }
>>> - pmd_perf_end_iteration(s, pmd->ctx.last_cycles, iter_packets);
>>> + pmd_perf_end_iteration(s, iter_packets);
>>> }
>>>
>>> - cycles_count_end(pmd, PMD_CYCLES_OVERHEAD);
>>> -
>>> poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list);
>>> exiting = latch_is_set(&pmd->exit_latch);
>>> /* Signal here to make sure the pmd finishes
>>> @@ -5066,9 +5018,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
>>> ovs_mutex_unlock(&pmd->flow_mutex);
>>> emc_probabilistic_insert(pmd, key, netdev_flow);
>>> }
>>> - /* Only error ENOSPC can reach here. We process the packet but do not
>>> - * install a datapath flow. Treat as successful. */
>>> - return 0;
>>> + return error;
>>> }
>>>
>>> static inline void
>>>
>> _______________________________________________
>> dev mailing list
>> dev at openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
>
More information about the dev
mailing list