[ovs-dev] [PATCH RFC] coverage: fix coverage accounting for pmd threads
Alex Wang
alexw at nicira.com
Thu Aug 13 18:44:58 UTC 2015
On Thu, Aug 13, 2015 at 8:26 AM, Ilya Maximets <i.maximets at samsung.com>
wrote:
> Ok, thanks for explanation.
> In that case, I think, try-lock solution will be good.
>
> Best regards, Ilya Maximets.
>
Thx Ilya, for the suggestions and discussion
I posted a patch here:
http://openvswitch.org/pipermail/dev/2015-August/058855.html
> On 12.08.2015 23:52, Alex Wang wrote:
> > Hey Ilya,
> >
> > Thanks for the reply please see my comments inline,
> >
> > Thanks,
> > Alex Wang,
> >
> > On Wed, Aug 12, 2015 at 4:19 AM, Ilya Maximets <i.maximets at samsung.com
> <mailto:i.maximets at samsung.com>> wrote:
> >
> > I agree, that this solution is much more simple, but I don't like
> this race for
> > coverage_mutex.
> > Why this all done so?
> >
> >
> > Before implementing pmd threads, waiting on the ‘coverage_mutex’ is
> > affordable for all other threads.
> >
> >
> >
> >
> > I mean, why we should run coverage_clear every second?
> >
> >
> > Because, we (the main thread) need to calculate the per
> second/minute/hour
> > rate..
> >
> >
> >
> > What if we change type of
> > thread_local unsigned int counter_##COUNTER from unsigned int to
> unsigned long long
> > and will call coverage_clear() in coverage_read() ? (May be possible
> with my patch
> > applied) Also, we may collect stats for all threads separately.
> >
> >
> > There is one major issue with you change to 'COVERAGE_DEFNE()'
> >
> >> /* Defines COUNTER. There must be exactly one such definition
> at file scope
> >> * within a program. */
> >> #define COVERAGE_DEFINE(COUNTER)
> \
> >> DEFINE_STATIC_PER_THREAD_DATA(unsigned int,
> \
> >> counter_##COUNTER, 0);
> \
> >> - static unsigned int COUNTER##_count(void)
> \
> >> + static unsigned int COUNTER##_count(unsigned int i)
> \
> >> {
> \
> >> - unsigned int *countp = counter_##COUNTER##_get();
> \
> >> + volatile unsigned int *countp = coverage_val[i];
> \
> >> unsigned int count = *countp;
> \
> >> *countp = 0;
> \
> >> return count;
> \
> >> @@ -72,11 +77,17 @@ void coverage_counter_register(struct
> coverage_counter*);
> >> {
> \
> >> *counter_##COUNTER##_get() += n;
> \
> >> }
> \
> >> - extern struct coverage_counter counter_##COUNTER;
> \
> >> - struct coverage_counter counter_##COUNTER
> \
> >> + extern thread_local struct coverage_counter
> counter_##COUNTER; \
> >> + thread_local struct coverage_counter
> counter_##COUNTER \
> >> = { #COUNTER, COUNTER##_count, 0, 0, {0}, {0} };
> >
> >
> > Here a minor issue is that 'thread_local' is not portable. You need to
> define
> > it using the ovs wrappers documented in lib/ovs-thread.h
> >
> >
> >
> >> - OVS_CONSTRUCTOR(COUNTER##_init) {
> \
> >> + static void COUNTER##_initializer(void) {
> \
> >> coverage_counter_register(&counter_##COUNTER);
> \
> >> + coverage_val[n_coverage_counters - 1] =
> \
> >> +
> counter_##COUNTER##_get(); \
> >> + }
> \
> >> + OVS_CONSTRUCTOR(COUNTER##_init) {
> \
> >> +
> coverage_initializer_register(&COUNTER##_initializer); \
> >> + COUNTER##_initializer();
> \
> >> }
> >>
> >
> > We really should not have main thread accessing the per thread data of
> other
> > thread. As mentioned in the comments in lib/ovs-thread.h,
> >
> > """
> > * - The thread_local feature newly defined in C11 <threads.h> works
> with
> > * any data type and initializer, and it is fast. thread_local
> does not
> > * require once-only initialization like pthread_key_t. * C11 does
> not*
> > * * define what happens if one attempts to access a thread_local
> object*
> > * * from a thread other than the one to which that object
> belongs.* There
> > * is no provision to call a user-specified destructor when a
> thread
> > * ends. Typical implementations allow for an arbitrary amount of
> > * thread_local storage, but statically allocated only.
> > """
> >
> > We really should not play with any undefined behavior. (since has no way
> > to guarantee the portability). In other words, we really should not use
> > thread local variables, if we ever want to re-architect the coverage
> module
> > to have main thread taking care of collecting the stats.
> >
> > So, we think the try lock solution should do good now,
> >
> > What do you think?
> >
> > Thanks,
> > Alex Wang,
> >
> >
> >
> > On 12.08.2015 08:01, Alex Wang wrote:
> > > Just trying share an alternative, could we still do the coverage
> clear inside 'pmd_thread_main()'. However,
> > > instead of contending for the 'coverage_mutex', use
> 'ovs_mutex_trylock()'. Since pmd thread runs in infinite
> > > loop, it can constantly try grabbing the lock and will eventually
> acquire it.
> > >
> > > Something like this:
> > >
> > > diff --git a/lib/coverage.c b/lib/coverage.c
> > > index 6121956..9f23f99 100644
> > > --- a/lib/coverage.c
> > > +++ b/lib/coverage.c
> > > @@ -272,6 +272,37 @@ coverage_clear(void)
> > > }
> > > }
> > >
> > > +/* Runs approximately every COVERAGE_CLEAR_INTERVAL amount of
> time to
> > > + * synchronize per-thread counters with global counters. Exits
> immediately
> > > + * without updating the 'thread_time' if cannot acquire the
> 'coverage_mutex'.
> > > + * This is to avoid lock contention for some performance-critical
> threads.
> > > + */
> > > +void
> > > +coverage_try_clear(void)
> > > +{
> > > + long long int now, *thread_time;
> > > +
> > > + now = time_msec();
> > > + thread_time = coverage_clear_time_get();
> > > +
> > > + /* Initialize the coverage_clear_time. */
> > > + if (*thread_time == LLONG_MIN) {
> > > + *thread_time = now + COVERAGE_CLEAR_INTERVAL;
> > > + }
> > > +
> > > + if (now >= *thread_time) {
> > > + if (ovs_mutex_trylock(&coverage_mutex)) {
> > > + size_t i;
> > > + for (i = 0; i < n_coverage_counters; i++) {
> > > + struct coverage_counter *c = coverage_counters[i];
> > > + c->total += c->count();
> > > + }
> > > + ovs_mutex_unlock(&coverage_mutex);
> > > + *thread_time = now + COVERAGE_CLEAR_INTERVAL;
> > > + }
> > > + }
> > > +}
> > > +
> > > /* Runs approximately every COVERAGE_RUN_INTERVAL amount of time
> to update the
> > > * coverage counters' 'min' and 'hr' array. 'min' array is for
> cumulating
> > > * per second counts into per minute count. 'hr' array is for
> cumulating per
> > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > > index c144352..96884ec 100644
> > > --- a/lib/dpif-netdev.c
> > > +++ b/lib/dpif-netdev.c
> > > @@ -2695,6 +2695,7 @@ reload:
> > >
> > > lc = 0;
> > >
> > > + coverage_try_clear();
> > > emc_cache_slow_sweep(&pmd->flow_cache);
> > > ovsrcu_quiesce();
> > >
> > >
> > >
> > > On Tue, Aug 11, 2015 at 6:24 PM, Alex Wang <alexw at nicira.com
> <mailto:alexw at nicira.com> <mailto:alexw at nicira.com <mailto:
> alexw at nicira.com>>> wrote:
> > >
> > > Hi IIya,
> > >
> > > Thx a lot for writing up this fix,
> > >
> > > The reason that you did not see the pmd related coverage
> counter stats
> > > (in "ovs-appctl coverage/show") is that we do not call
> coverage_clear()
> > > anywhere in pmd_thread_main(). This is in that the
> coverage_clear()
> > > requires holding the global mutex in the coverage module which
> can
> > > be super expensive for dpdk packet processing.
> > >
> > > Want to admit that your fix is more complicated than I
> expect. Especially,
> > > changing the global (extern) struct 'counter_##COUNTER' to
> per-thread
> > > and expanding the array size by number of threads times can
> also be
> > > expensive for main thread...
> > >
> > > Do you think there could be a simpler way to achieve this?
> I'd like to think
> > > about other possible solutions more,
> > >
> > > Thanks,
> > > Alex Wang,
> > >
> > >
> > > On Mon, Aug 10, 2015 at 7:45 AM, Ilya Maximets <
> i.maximets at samsung.com <mailto:i.maximets at samsung.com> <mailto:
> i.maximets at samsung.com <mailto:i.maximets at samsung.com>>> wrote:
> > >
> > > Currently coverage_counter_register() is called only from
> > > constructor functions at program initialization time by
> > > main thread. Coverage counter values are static and
> > > thread local.
> > >
> > > This means, that all COVERAGE_INC() calls from pmd threads
> > > leads to increment of thread local variables of that
> threads
> > > that can't be accessed by anyone else. And they are not
> used
> > > in final coverage accounting.
> > >
> > > Fix that by adding ability to add/remove coverage counters
> > > in runtime for newly created threads and counting coverage
> > > using counters from all threads.
> > >
> > > Signed-off-by: Ilya Maximets <i.maximets at samsung.com
> <mailto:i.maximets at samsung.com> <mailto:i.maximets at samsung.com <mailto:
> i.maximets at samsung.com>>>
> > > ---
> > > lib/coverage.c | 83
> +++++++++++++++++++++++++++++++++++++++++++++----------
> > > lib/coverage.h | 27 ++++++++++++------
> > > lib/dpif-netdev.c | 4 ++-
> > > 3 files changed, 91 insertions(+), 23 deletions(-)
> > >
> > > diff --git a/lib/coverage.c b/lib/coverage.c
> > > index 6121956..2ae9e7a 100644
> > > --- a/lib/coverage.c
> > > +++ b/lib/coverage.c
> > > @@ -30,14 +30,22 @@ VLOG_DEFINE_THIS_MODULE(coverage);
> > >
> > > /* The coverage counters. */
> > > static struct coverage_counter **coverage_counters = NULL;
> > > -static size_t n_coverage_counters = 0;
> > > static size_t allocated_coverage_counters = 0;
> > >
> > > +static void (**coverage_initializers)(void) = NULL;
> > > +static size_t n_coverage_initializers = 0;
> > > +static size_t allocated_coverage_initializers = 0;
> > > +
> > > static struct ovs_mutex coverage_mutex =
> OVS_MUTEX_INITIALIZER;
> > >
> > > DEFINE_STATIC_PER_THREAD_DATA(long long int,
> coverage_clear_time, LLONG_MIN);
> > > static long long int coverage_run_time = LLONG_MIN;
> > >
> > > +volatile unsigned int *coverage_val[2048] = {NULL};
> > > +volatile size_t n_coverage_counters = 0;
> > > +
> > > +DEFINE_STATIC_PER_THREAD_DATA(unsigned int,
> coverage_thread_start, 0);
> > > +
> > > /* Index counter used to compute the moving average
> array's index. */
> > > static unsigned int idx_count = 0;
> > >
> > > @@ -54,7 +62,44 @@ coverage_counter_register(struct
> coverage_counter* counter)
> > >
> &allocated_coverage_counters,
> > > sizeof(struct
> coverage_counter*));
> > > }
> > > - coverage_counters[n_coverage_counters++] = counter;
> > > + coverage_counters[n_coverage_counters] = counter;
> > > + coverage_val[n_coverage_counters++] = NULL;
> > > +}
> > > +
> > > +void
> > > +coverage_initializer_register(void (*f)(void))
> > > +{
> > > + if (n_coverage_initializers >=
> allocated_coverage_initializers) {
> > > + coverage_initializers =
> x2nrealloc(coverage_initializers,
> > > +
> &allocated_coverage_initializers,
> > > + sizeof
> *coverage_initializers);
> > > + }
> > > + coverage_initializers[n_coverage_initializers++] = f;
> > > +}
> > > +
> > > +void
> > > +coverage_register_new_thread(void)
> > > +{
> > > + int i;
> > > + ovs_mutex_lock(&coverage_mutex);
> > > + *coverage_thread_start_get() = n_coverage_counters;
> > > + for (i = 0; i < n_coverage_initializers; i++)
> > > + coverage_initializers[i]();
> > > + ovs_mutex_unlock(&coverage_mutex);
> > > +}
> > > +
> > > +void
> > > +coverage_unregister_thread(void)
> > > +{
> > > + int i;
> > > + ovs_mutex_lock(&coverage_mutex);
> > > + for (i = *coverage_thread_start_get() +
> n_coverage_initializers;
> > > + i < n_coverage_counters; i++) {
> > > + coverage_counters[i - n_coverage_initializers] =
> coverage_counters[i];
> > > + coverage_val[i - n_coverage_initializers] =
> coverage_val[i];
> > > + }
> > > + n_coverage_counters -= n_coverage_initializers;
> > > + ovs_mutex_unlock(&coverage_mutex);
> > > }
> > >
> > > static void
> > > @@ -102,13 +147,12 @@ coverage_hash(void)
> > > uint32_t hash = 0;
> > > int n_groups, i;
> > >
> > > + ovs_mutex_lock(&coverage_mutex);
> > > /* Sort coverage counters into groups with equal
> totals. */
> > > c = xmalloc(n_coverage_counters * sizeof *c);
> > > - ovs_mutex_lock(&coverage_mutex);
> > > for (i = 0; i < n_coverage_counters; i++) {
> > > c[i] = coverage_counters[i];
> > > }
> > > - ovs_mutex_unlock(&coverage_mutex);
> > > qsort(c, n_coverage_counters, sizeof *c,
> compare_coverage_counters);
> > >
> > > /* Hash the names in each group along with the rank.
> */
> > > @@ -130,6 +174,7 @@ coverage_hash(void)
> > > i = j;
> > > }
> > >
> > > + ovs_mutex_unlock(&coverage_mutex);
> > > free(c);
> > >
> > > return hash_int(n_groups, hash);
> > > @@ -200,9 +245,11 @@ coverage_read(struct svec *lines)
> > > {
> > > struct coverage_counter **c = coverage_counters;
> > > unsigned long long int *totals;
> > > + unsigned long long int *total_sec, *total_min,
> *total_hr;
> > > size_t n_never_hit;
> > > uint32_t hash;
> > > size_t i;
> > > + int initial_n = n_coverage_initializers;
> > >
> > > hash = coverage_hash();
> > >
> > > @@ -213,14 +260,20 @@ coverage_read(struct svec *lines)
> > > "hash=%08"PRIx32":",
> > > COVERAGE_RUN_INTERVAL/1000,
> hash));
> > >
> > > - totals = xmalloc(n_coverage_counters * sizeof
> *totals);
> > > ovs_mutex_lock(&coverage_mutex);
> > > + totals = xcalloc(n_coverage_counters, sizeof *totals);
> > > + total_sec = xcalloc(n_coverage_counters, sizeof
> *total_sec);
> > > + total_min = xcalloc(n_coverage_counters, sizeof
> *total_min);
> > > + total_hr = xcalloc(n_coverage_counters, sizeof
> *total_hr);
> > > +
> > > for (i = 0; i < n_coverage_counters; i++) {
> > > - totals[i] = c[i]->total;
> > > + totals[i % initial_n] += c[i]->total;
> > > + total_sec[i % initial_n] += c[i]->min[(idx_count
> - 1) % MIN_AVG_LEN];
> > > + total_min[i % initial_n] +=
> coverage_array_sum(c[i]->min, MIN_AVG_LEN);
> > > + total_hr[i % initial_n] +=
> coverage_array_sum(c[i]->hr, HR_AVG_LEN);
> > > }
> > > - ovs_mutex_unlock(&coverage_mutex);
> > >
> > > - for (i = 0; i < n_coverage_counters; i++) {
> > > + for (i = 0; i < initial_n; i++) {
> > > if (totals[i]) {
> > > /* Shows the averaged per-second rates for
> the last
> > > * COVERAGE_RUN_INTERVAL interval, the last
> minute and
> > > @@ -229,18 +282,22 @@ coverage_read(struct svec *lines)
> > > xasprintf("%-24s %5.1f/sec %9.3f/sec "
> > > "%13.4f/sec total: %llu",
> > > c[i]->name,
> > > - (c[i]->min[(idx_count - 1) %
> MIN_AVG_LEN]
> > > + (total_sec[i]
> > > * 1000.0 /
> COVERAGE_RUN_INTERVAL),
> > > - coverage_array_sum(c[i]->min,
> MIN_AVG_LEN) / 60.0,
> > > - coverage_array_sum(c[i]->hr,
> HR_AVG_LEN) / 3600.0,
> > > + total_min[i] / 60.0,
> > > + total_hr[i] / 3600.0,
> > > totals[i]));
> > > } else {
> > > n_never_hit++;
> > > }
> > > }
> > > + ovs_mutex_unlock(&coverage_mutex);
> > >
> > > svec_add_nocopy(lines, xasprintf("%"PRIuSIZE" events
> never hit", n_never_hit));
> > > free(totals);
> > > + free(total_sec);
> > > + free(total_min);
> > > + free(total_hr);
> > > }
> > >
> > > /* Runs approximately every COVERAGE_CLEAR_INTERVAL
> amount of time to
> > > @@ -265,7 +322,7 @@ coverage_clear(void)
> > > ovs_mutex_lock(&coverage_mutex);
> > > for (i = 0; i < n_coverage_counters; i++) {
> > > struct coverage_counter *c =
> coverage_counters[i];
> > > - c->total += c->count();
> > > + c->total += c->count(i);
> > > }
> > > ovs_mutex_unlock(&coverage_mutex);
> > > *thread_time = now + COVERAGE_CLEAR_INTERVAL;
> > > @@ -340,10 +397,8 @@ coverage_array_sum(const unsigned int
> *arr, const unsigned int len)
> > > unsigned int sum = 0;
> > > size_t i;
> > >
> > > - ovs_mutex_lock(&coverage_mutex);
> > > for (i = 0; i < len; i++) {
> > > sum += arr[i];
> > > }
> > > - ovs_mutex_unlock(&coverage_mutex);
> > > return sum;
> > > }
> > > diff --git a/lib/coverage.h b/lib/coverage.h
> > > index 832c433..1f5ae17 100644
> > > --- a/lib/coverage.h
> > > +++ b/lib/coverage.h
> > > @@ -45,9 +45,9 @@ BUILD_ASSERT_DECL(COVERAGE_RUN_INTERVAL
> % COVERAGE_CLEAR_INTERVAL == 0);
> > >
> > > /* A coverage counter. */
> > > struct coverage_counter {
> > > - const char *const name; /* Textual name. */
> > > - unsigned int (*const count)(void); /* Gets, zeros
> this thread's count. */
> > > - unsigned long long int total; /* Total count. */
> > > + const char *const name; /* Textual
> name. */
> > > + unsigned int (*const count)(unsigned int); /* Gets,
> zeros this thread's count. */
> > > + unsigned long long int total; /* Total
> count. */
> > > unsigned long long int last_total;
> > > /* The moving average arrays. */
> > > unsigned int min[MIN_AVG_LEN];
> > > @@ -55,15 +55,20 @@ struct coverage_counter {
> > > };
> > >
> > > void coverage_counter_register(struct coverage_counter*);
> > > +void coverage_initializer_register(void (*f)(void));
> > > +void coverage_register_new_thread(void);
> > > +void coverage_unregister_thread(void);
> > >
> > > +extern volatile unsigned int *coverage_val[2048];
> > > +extern volatile size_t n_coverage_counters;
> > > /* Defines COUNTER. There must be exactly one such
> definition at file scope
> > > * within a program. */
> > > #define COVERAGE_DEFINE(COUNTER)
> \
> > > DEFINE_STATIC_PER_THREAD_DATA(unsigned int,
> \
> > > counter_##COUNTER,
> 0); \
> > > - static unsigned int COUNTER##_count(void)
> \
> > > + static unsigned int COUNTER##_count(unsigned int
> i) \
> > > {
> \
> > > - unsigned int *countp =
> counter_##COUNTER##_get(); \
> > > + volatile unsigned int *countp =
> coverage_val[i]; \
> > > unsigned int count = *countp;
> \
> > > *countp = 0;
> \
> > > return count;
> \
> > > @@ -72,11 +77,17 @@ void coverage_counter_register(struct
> coverage_counter*);
> > > {
> \
> > > *counter_##COUNTER##_get() += n;
> \
> > > }
> \
> > > - extern struct coverage_counter
> counter_##COUNTER; \
> > > - struct coverage_counter counter_##COUNTER
> \
> > > + extern thread_local struct coverage_counter
> counter_##COUNTER; \
> > > + thread_local struct coverage_counter
> counter_##COUNTER \
> > > = { #COUNTER, COUNTER##_count, 0, 0, {0}, {0}
> }; \
> > > - OVS_CONSTRUCTOR(COUNTER##_init) {
> \
> > > + static void COUNTER##_initializer(void) {
> \
> > >
> coverage_counter_register(&counter_##COUNTER); \
> > > + coverage_val[n_coverage_counters - 1] =
> \
> > > +
> counter_##COUNTER##_get(); \
> > > + }
> \
> > > + OVS_CONSTRUCTOR(COUNTER##_init) {
> \
> > > +
> coverage_initializer_register(&COUNTER##_initializer); \
> > > + COUNTER##_initializer();
> \
> > > }
> > >
> > > /* Adds 1 to COUNTER. */
> > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > > index c144352..a92dd6a 100644
> > > --- a/lib/dpif-netdev.c
> > > +++ b/lib/dpif-netdev.c
> > > @@ -34,6 +34,7 @@
> > > #include "bitmap.h"
> > > #include "cmap.h"
> > > #include "csum.h"
> > > +#include "coverage.h"
> > > #include "dp-packet.h"
> > > #include "dpif.h"
> > > #include "dpif-provider.h"
> > > @@ -2666,7 +2667,7 @@ pmd_thread_main(void *f_)
> > >
> > > poll_cnt = 0;
> > > poll_list = NULL;
> > > -
> > > + coverage_register_new_thread();
> > > /* Stores the pmd thread's 'pmd' to 'per_pmd_key'. */
> > > ovsthread_setspecific(pmd->dp->per_pmd_key, pmd);
> > > pmd_thread_setaffinity_cpu(pmd->core_id);
> > > @@ -2717,6 +2718,7 @@ reload:
> > > }
> > >
> > > dp_netdev_pmd_reload_done(pmd);
> > > + coverage_unregister_thread();
> > >
> > > free(poll_list);
> > > return NULL;
> > > --
> > > 2.1.4
> > >
> > >
> > >
> >
> >
>
More information about the dev
mailing list