[ovs-dev] [PATCH RFC] coverage: fix coverage accounting for pmd threads

Ilya Maximets i.maximets at samsung.com
Thu Aug 13 15:26:40 UTC 2015


Ok, thanks for explanation.
In that case, I think, try-lock solution will be good.

Best regards, Ilya Maximets.

On 12.08.2015 23:52, Alex Wang wrote:
> Hey Ilya,
> 
> Thanks for the reply please see my comments inline,
> 
> Thanks,
> Alex Wang,
> 
> On Wed, Aug 12, 2015 at 4:19 AM, Ilya Maximets <i.maximets at samsung.com <mailto:i.maximets at samsung.com>> wrote:
> 
>     I agree, that this solution is much more simple, but I don't like this race for
>     coverage_mutex.
>     Why this all done so?
> 
> 
> Before implementing pmd threads, waiting on the ‘coverage_mutex’ is 
> affordable for all other threads.
> 
> 
>  
> 
>     I mean, why we should run coverage_clear every second? 
> 
> 
> Because, we (the main thread) need to calculate the per second/minute/hour 
> rate..
> 
>  
> 
>     What if we change type of
>     thread_local unsigned int counter_##COUNTER from unsigned int to unsigned long long
>     and will call coverage_clear() in coverage_read() ? (May be possible with my patch
>     applied) Also, we may collect stats for all threads separately.
> 
>  
> There is one major issue with you change to 'COVERAGE_DEFNE()'
> 
>>          /* Defines COUNTER.  There must be exactly one such definition at file scope
>>           * within a program. */
>>          #define COVERAGE_DEFINE(COUNTER)                                        \
>>                  DEFINE_STATIC_PER_THREAD_DATA(unsigned int,                     \
>>                                                counter_##COUNTER, 0);            \
>>         -        static unsigned int COUNTER##_count(void)                       \
>>         +        static unsigned int COUNTER##_count(unsigned int i)             \
>>                  {                                                               \
>>         -            unsigned int *countp = counter_##COUNTER##_get();           \
>>         +            volatile unsigned int *countp = coverage_val[i];            \
>>                      unsigned int count = *countp;                               \
>>                      *countp = 0;                                                \
>>                      return count;                                               \
>>         @@ -72,11 +77,17 @@ void coverage_counter_register(struct coverage_counter*);
>>                  {                                                               \
>>                      *counter_##COUNTER##_get() += n;                            \
>>                  }                                                               \
>>         -        extern struct coverage_counter counter_##COUNTER;               \
>>         -        struct coverage_counter counter_##COUNTER                       \
>>         +        extern thread_local struct coverage_counter counter_##COUNTER;  \
>>         +        thread_local struct coverage_counter counter_##COUNTER          \
>>                      = { #COUNTER, COUNTER##_count, 0, 0, {0}, {0} };           
> 
> 
> Here a minor issue is that 'thread_local' is not portable.  You need to define
> it using the ovs wrappers documented in lib/ovs-thread.h
> 
> 
> 
>>         -        OVS_CONSTRUCTOR(COUNTER##_init) {                               \
>>         +        static void COUNTER##_initializer(void) {                       \
>>                      coverage_counter_register(&counter_##COUNTER);              \
>>         +            coverage_val[n_coverage_counters - 1] =                     \
>>         +                                      counter_##COUNTER##_get();       \
>>         +        }                                                               \
>>         +        OVS_CONSTRUCTOR(COUNTER##_init) {                               \
>>         +            coverage_initializer_register(&COUNTER##_initializer);      \
>>         +            COUNTER##_initializer();                                    \
>>                  }
>>
> 
> We really should not have main thread accessing the per thread data of other
> thread.  As mentioned in the comments in lib/ovs-thread.h,
> 
> """
>  *     - The thread_local feature newly defined in C11 <threads.h> works with
>  *       any data type and initializer, and it is fast.  thread_local does not
>  *       require once-only initialization like pthread_key_t. * C11 does not*
> * *       define what happens if one attempts to access a thread_local object*
> * *       from a thread other than the one to which that object belongs.*  There
>  *       is no provision to call a user-specified destructor when a thread
>  *       ends.  Typical implementations allow for an arbitrary amount of
>  *       thread_local storage, but statically allocated only.
> """
> 
> We really should not play with any undefined behavior.  (since has no way
> to guarantee the portability).  In other words, we really should not use 
> thread local variables, if we ever want to re-architect the coverage module
> to have main thread taking care of collecting the stats.
> 
> So, we think the try lock solution should do good now,
> 
> What do you think?
> 
> Thanks,
> Alex Wang,
> 
>  
> 
>     On 12.08.2015 08:01, Alex Wang wrote:
>     > Just trying share an alternative, could we still do the coverage clear inside 'pmd_thread_main()'.  However,
>     > instead of contending for the 'coverage_mutex', use 'ovs_mutex_trylock()'.  Since pmd thread runs in infinite
>     > loop, it can constantly try grabbing the lock and will eventually acquire it.
>     >
>     > Something like this:
>     >
>     > diff --git a/lib/coverage.c b/lib/coverage.c
>     > index 6121956..9f23f99 100644
>     > --- a/lib/coverage.c
>     > +++ b/lib/coverage.c
>     > @@ -272,6 +272,37 @@ coverage_clear(void)
>     >      }
>     >  }
>     >
>     > +/* Runs approximately every COVERAGE_CLEAR_INTERVAL amount of time to
>     > + * synchronize per-thread counters with global counters.  Exits immediately
>     > + * without updating the 'thread_time' if cannot acquire the 'coverage_mutex'.
>     > + * This is to avoid lock contention for some performance-critical threads.
>     > + */
>     > +void
>     > +coverage_try_clear(void)
>     > +{
>     > +    long long int now, *thread_time;
>     > +
>     > +    now = time_msec();
>     > +    thread_time = coverage_clear_time_get();
>     > +
>     > +    /* Initialize the coverage_clear_time. */
>     > +    if (*thread_time == LLONG_MIN) {
>     > +        *thread_time = now + COVERAGE_CLEAR_INTERVAL;
>     > +    }
>     > +
>     > +    if (now >= *thread_time) {
>     > +        if (ovs_mutex_trylock(&coverage_mutex)) {
>     > +            size_t i;
>     > +            for (i = 0; i < n_coverage_counters; i++) {
>     > +                struct coverage_counter *c = coverage_counters[i];
>     > +                c->total += c->count();
>     > +            }
>     > +            ovs_mutex_unlock(&coverage_mutex);
>     > +            *thread_time = now + COVERAGE_CLEAR_INTERVAL;
>     > +        }
>     > +    }
>     > +}
>     > +
>     >  /* Runs approximately every COVERAGE_RUN_INTERVAL amount of time to update the
>     >   * coverage counters' 'min' and 'hr' array.  'min' array is for cumulating
>     >   * per second counts into per minute count.  'hr' array is for cumulating per
>     > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>     > index c144352..96884ec 100644
>     > --- a/lib/dpif-netdev.c
>     > +++ b/lib/dpif-netdev.c
>     > @@ -2695,6 +2695,7 @@ reload:
>     >
>     >              lc = 0;
>     >
>     > +            coverage_try_clear();
>     >              emc_cache_slow_sweep(&pmd->flow_cache);
>     >              ovsrcu_quiesce();
>     >
>     >
>     >
>     > On Tue, Aug 11, 2015 at 6:24 PM, Alex Wang <alexw at nicira.com <mailto:alexw at nicira.com> <mailto:alexw at nicira.com <mailto:alexw at nicira.com>>> wrote:
>     >
>     >     Hi IIya,
>     >
>     >     Thx a lot for writing up this fix,
>     >
>     >     The reason that you did not see the pmd related coverage counter stats
>     >     (in "ovs-appctl coverage/show") is that we do not call coverage_clear()
>     >     anywhere in pmd_thread_main().  This is in that the coverage_clear()
>     >     requires holding the global mutex in the coverage module which can
>     >     be super expensive for dpdk packet processing.
>     >
>     >     Want to admit that your fix is more complicated than I expect.  Especially,
>     >     changing the global (extern) struct 'counter_##COUNTER' to per-thread
>     >     and expanding the array size by number of threads times can also be
>     >     expensive for main thread...
>     >
>     >     Do you think there could be a simpler way to achieve this?  I'd like to think
>     >     about other possible solutions more,
>     >
>     >     Thanks,
>     >     Alex Wang,
>     >
>     >
>     >     On Mon, Aug 10, 2015 at 7:45 AM, Ilya Maximets <i.maximets at samsung.com <mailto:i.maximets at samsung.com> <mailto:i.maximets at samsung.com <mailto:i.maximets at samsung.com>>> wrote:
>     >
>     >         Currently coverage_counter_register() is called only from
>     >         constructor functions at program initialization time by
>     >         main thread. Coverage counter values are static and
>     >         thread local.
>     >
>     >         This means, that all COVERAGE_INC() calls from pmd threads
>     >         leads to increment of thread local variables of that threads
>     >         that can't be accessed by anyone else. And they are not used
>     >         in final coverage accounting.
>     >
>     >         Fix that by adding ability to add/remove coverage counters
>     >         in runtime for newly created threads and counting coverage
>     >         using counters from all threads.
>     >
>     >         Signed-off-by: Ilya Maximets <i.maximets at samsung.com <mailto:i.maximets at samsung.com> <mailto:i.maximets at samsung.com <mailto:i.maximets at samsung.com>>>
>     >         ---
>     >          lib/coverage.c    | 83 +++++++++++++++++++++++++++++++++++++++++++++----------
>     >          lib/coverage.h    | 27 ++++++++++++------
>     >          lib/dpif-netdev.c |  4 ++-
>     >          3 files changed, 91 insertions(+), 23 deletions(-)
>     >
>     >         diff --git a/lib/coverage.c b/lib/coverage.c
>     >         index 6121956..2ae9e7a 100644
>     >         --- a/lib/coverage.c
>     >         +++ b/lib/coverage.c
>     >         @@ -30,14 +30,22 @@ VLOG_DEFINE_THIS_MODULE(coverage);
>     >
>     >          /* The coverage counters. */
>     >          static struct coverage_counter **coverage_counters = NULL;
>     >         -static size_t n_coverage_counters = 0;
>     >          static size_t allocated_coverage_counters = 0;
>     >
>     >         +static void (**coverage_initializers)(void) = NULL;
>     >         +static size_t n_coverage_initializers = 0;
>     >         +static size_t allocated_coverage_initializers = 0;
>     >         +
>     >          static struct ovs_mutex coverage_mutex = OVS_MUTEX_INITIALIZER;
>     >
>     >          DEFINE_STATIC_PER_THREAD_DATA(long long int, coverage_clear_time, LLONG_MIN);
>     >          static long long int coverage_run_time = LLONG_MIN;
>     >
>     >         +volatile unsigned int *coverage_val[2048] = {NULL};
>     >         +volatile size_t n_coverage_counters = 0;
>     >         +
>     >         +DEFINE_STATIC_PER_THREAD_DATA(unsigned int, coverage_thread_start, 0);
>     >         +
>     >          /* Index counter used to compute the moving average array's index. */
>     >          static unsigned int idx_count = 0;
>     >
>     >         @@ -54,7 +62,44 @@ coverage_counter_register(struct coverage_counter* counter)
>     >                                                 &allocated_coverage_counters,
>     >                                                 sizeof(struct coverage_counter*));
>     >              }
>     >         -    coverage_counters[n_coverage_counters++] = counter;
>     >         +    coverage_counters[n_coverage_counters] = counter;
>     >         +    coverage_val[n_coverage_counters++] = NULL;
>     >         +}
>     >         +
>     >         +void
>     >         +coverage_initializer_register(void (*f)(void))
>     >         +{
>     >         +    if (n_coverage_initializers >= allocated_coverage_initializers) {
>     >         +        coverage_initializers = x2nrealloc(coverage_initializers,
>     >         +                                       &allocated_coverage_initializers,
>     >         +                                       sizeof *coverage_initializers);
>     >         +    }
>     >         +    coverage_initializers[n_coverage_initializers++] = f;
>     >         +}
>     >         +
>     >         +void
>     >         +coverage_register_new_thread(void)
>     >         +{
>     >         +    int i;
>     >         +    ovs_mutex_lock(&coverage_mutex);
>     >         +    *coverage_thread_start_get() = n_coverage_counters;
>     >         +    for (i = 0; i < n_coverage_initializers; i++)
>     >         +        coverage_initializers[i]();
>     >         +    ovs_mutex_unlock(&coverage_mutex);
>     >         +}
>     >         +
>     >         +void
>     >         +coverage_unregister_thread(void)
>     >         +{
>     >         +    int i;
>     >         +    ovs_mutex_lock(&coverage_mutex);
>     >         +    for (i = *coverage_thread_start_get() + n_coverage_initializers;
>     >         +         i < n_coverage_counters; i++) {
>     >         +        coverage_counters[i - n_coverage_initializers] = coverage_counters[i];
>     >         +        coverage_val[i - n_coverage_initializers] = coverage_val[i];
>     >         +    }
>     >         +    n_coverage_counters -= n_coverage_initializers;
>     >         +    ovs_mutex_unlock(&coverage_mutex);
>     >          }
>     >
>     >          static void
>     >         @@ -102,13 +147,12 @@ coverage_hash(void)
>     >              uint32_t hash = 0;
>     >              int n_groups, i;
>     >
>     >         +    ovs_mutex_lock(&coverage_mutex);
>     >              /* Sort coverage counters into groups with equal totals. */
>     >              c = xmalloc(n_coverage_counters * sizeof *c);
>     >         -    ovs_mutex_lock(&coverage_mutex);
>     >              for (i = 0; i < n_coverage_counters; i++) {
>     >                  c[i] = coverage_counters[i];
>     >              }
>     >         -    ovs_mutex_unlock(&coverage_mutex);
>     >              qsort(c, n_coverage_counters, sizeof *c, compare_coverage_counters);
>     >
>     >              /* Hash the names in each group along with the rank. */
>     >         @@ -130,6 +174,7 @@ coverage_hash(void)
>     >                  i = j;
>     >              }
>     >
>     >         +    ovs_mutex_unlock(&coverage_mutex);
>     >              free(c);
>     >
>     >              return hash_int(n_groups, hash);
>     >         @@ -200,9 +245,11 @@ coverage_read(struct svec *lines)
>     >          {
>     >              struct coverage_counter **c = coverage_counters;
>     >              unsigned long long int *totals;
>     >         +    unsigned long long int *total_sec, *total_min, *total_hr;
>     >              size_t n_never_hit;
>     >              uint32_t hash;
>     >              size_t i;
>     >         +    int initial_n = n_coverage_initializers;
>     >
>     >              hash = coverage_hash();
>     >
>     >         @@ -213,14 +260,20 @@ coverage_read(struct svec *lines)
>     >                                        "hash=%08"PRIx32":",
>     >                                        COVERAGE_RUN_INTERVAL/1000, hash));
>     >
>     >         -    totals = xmalloc(n_coverage_counters * sizeof *totals);
>     >              ovs_mutex_lock(&coverage_mutex);
>     >         +    totals = xcalloc(n_coverage_counters, sizeof *totals);
>     >         +    total_sec = xcalloc(n_coverage_counters, sizeof *total_sec);
>     >         +    total_min = xcalloc(n_coverage_counters, sizeof *total_min);
>     >         +    total_hr = xcalloc(n_coverage_counters, sizeof *total_hr);
>     >         +
>     >              for (i = 0; i < n_coverage_counters; i++) {
>     >         -        totals[i] = c[i]->total;
>     >         +        totals[i % initial_n] += c[i]->total;
>     >         +        total_sec[i % initial_n] += c[i]->min[(idx_count - 1) % MIN_AVG_LEN];
>     >         +        total_min[i % initial_n] += coverage_array_sum(c[i]->min, MIN_AVG_LEN);
>     >         +        total_hr[i % initial_n] += coverage_array_sum(c[i]->hr,  HR_AVG_LEN);
>     >              }
>     >         -    ovs_mutex_unlock(&coverage_mutex);
>     >
>     >         -    for (i = 0; i < n_coverage_counters; i++) {
>     >         +    for (i = 0; i < initial_n; i++) {
>     >                  if (totals[i]) {
>     >                      /* Shows the averaged per-second rates for the last
>     >                       * COVERAGE_RUN_INTERVAL interval, the last minute and
>     >         @@ -229,18 +282,22 @@ coverage_read(struct svec *lines)
>     >                          xasprintf("%-24s %5.1f/sec %9.3f/sec "
>     >                                    "%13.4f/sec   total: %llu",
>     >                                    c[i]->name,
>     >         -                          (c[i]->min[(idx_count - 1) % MIN_AVG_LEN]
>     >         +                          (total_sec[i]
>     >                                     * 1000.0 / COVERAGE_RUN_INTERVAL),
>     >         -                          coverage_array_sum(c[i]->min, MIN_AVG_LEN) / 60.0,
>     >         -                          coverage_array_sum(c[i]->hr,  HR_AVG_LEN) / 3600.0,
>     >         +                          total_min[i] / 60.0,
>     >         +                          total_hr[i] / 3600.0,
>     >                                    totals[i]));
>     >                  } else {
>     >                      n_never_hit++;
>     >                  }
>     >              }
>     >         +    ovs_mutex_unlock(&coverage_mutex);
>     >
>     >              svec_add_nocopy(lines, xasprintf("%"PRIuSIZE" events never hit", n_never_hit));
>     >              free(totals);
>     >         +    free(total_sec);
>     >         +    free(total_min);
>     >         +    free(total_hr);
>     >          }
>     >
>     >          /* Runs approximately every COVERAGE_CLEAR_INTERVAL amount of time to
>     >         @@ -265,7 +322,7 @@ coverage_clear(void)
>     >                  ovs_mutex_lock(&coverage_mutex);
>     >                  for (i = 0; i < n_coverage_counters; i++) {
>     >                      struct coverage_counter *c = coverage_counters[i];
>     >         -            c->total += c->count();
>     >         +            c->total += c->count(i);
>     >                  }
>     >                  ovs_mutex_unlock(&coverage_mutex);
>     >                  *thread_time = now + COVERAGE_CLEAR_INTERVAL;
>     >         @@ -340,10 +397,8 @@ coverage_array_sum(const unsigned int *arr, const unsigned int len)
>     >              unsigned int sum = 0;
>     >              size_t i;
>     >
>     >         -    ovs_mutex_lock(&coverage_mutex);
>     >              for (i = 0; i < len; i++) {
>     >                  sum += arr[i];
>     >              }
>     >         -    ovs_mutex_unlock(&coverage_mutex);
>     >              return sum;
>     >          }
>     >         diff --git a/lib/coverage.h b/lib/coverage.h
>     >         index 832c433..1f5ae17 100644
>     >         --- a/lib/coverage.h
>     >         +++ b/lib/coverage.h
>     >         @@ -45,9 +45,9 @@ BUILD_ASSERT_DECL(COVERAGE_RUN_INTERVAL % COVERAGE_CLEAR_INTERVAL == 0);
>     >
>     >          /* A coverage counter. */
>     >          struct coverage_counter {
>     >         -    const char *const name;            /* Textual name. */
>     >         -    unsigned int (*const count)(void); /* Gets, zeros this thread's count. */
>     >         -    unsigned long long int total;      /* Total count. */
>     >         +    const char *const name;                    /* Textual name. */
>     >         +    unsigned int (*const count)(unsigned int); /* Gets, zeros this thread's count. */
>     >         +    unsigned long long int total;              /* Total count. */
>     >              unsigned long long int last_total;
>     >              /* The moving average arrays. */
>     >              unsigned int min[MIN_AVG_LEN];
>     >         @@ -55,15 +55,20 @@ struct coverage_counter {
>     >          };
>     >
>     >          void coverage_counter_register(struct coverage_counter*);
>     >         +void coverage_initializer_register(void (*f)(void));
>     >         +void coverage_register_new_thread(void);
>     >         +void coverage_unregister_thread(void);
>     >
>     >         +extern volatile unsigned int *coverage_val[2048];
>     >         +extern volatile size_t n_coverage_counters;
>     >          /* Defines COUNTER.  There must be exactly one such definition at file scope
>     >           * within a program. */
>     >          #define COVERAGE_DEFINE(COUNTER)                                        \
>     >                  DEFINE_STATIC_PER_THREAD_DATA(unsigned int,                     \
>     >                                                counter_##COUNTER, 0);            \
>     >         -        static unsigned int COUNTER##_count(void)                       \
>     >         +        static unsigned int COUNTER##_count(unsigned int i)             \
>     >                  {                                                               \
>     >         -            unsigned int *countp = counter_##COUNTER##_get();           \
>     >         +            volatile unsigned int *countp = coverage_val[i];            \
>     >                      unsigned int count = *countp;                               \
>     >                      *countp = 0;                                                \
>     >                      return count;                                               \
>     >         @@ -72,11 +77,17 @@ void coverage_counter_register(struct coverage_counter*);
>     >                  {                                                               \
>     >                      *counter_##COUNTER##_get() += n;                            \
>     >                  }                                                               \
>     >         -        extern struct coverage_counter counter_##COUNTER;               \
>     >         -        struct coverage_counter counter_##COUNTER                       \
>     >         +        extern thread_local struct coverage_counter counter_##COUNTER;  \
>     >         +        thread_local struct coverage_counter counter_##COUNTER          \
>     >                      = { #COUNTER, COUNTER##_count, 0, 0, {0}, {0} };            \
>     >         -        OVS_CONSTRUCTOR(COUNTER##_init) {                               \
>     >         +        static void COUNTER##_initializer(void) {                       \
>     >                      coverage_counter_register(&counter_##COUNTER);              \
>     >         +            coverage_val[n_coverage_counters - 1] =                     \
>     >         +                                      counter_##COUNTER##_get();       \
>     >         +        }                                                               \
>     >         +        OVS_CONSTRUCTOR(COUNTER##_init) {                               \
>     >         +            coverage_initializer_register(&COUNTER##_initializer);      \
>     >         +            COUNTER##_initializer();                                    \
>     >                  }
>     >
>     >          /* Adds 1 to COUNTER. */
>     >         diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>     >         index c144352..a92dd6a 100644
>     >         --- a/lib/dpif-netdev.c
>     >         +++ b/lib/dpif-netdev.c
>     >         @@ -34,6 +34,7 @@
>     >          #include "bitmap.h"
>     >          #include "cmap.h"
>     >          #include "csum.h"
>     >         +#include "coverage.h"
>     >          #include "dp-packet.h"
>     >          #include "dpif.h"
>     >          #include "dpif-provider.h"
>     >         @@ -2666,7 +2667,7 @@ pmd_thread_main(void *f_)
>     >
>     >              poll_cnt = 0;
>     >              poll_list = NULL;
>     >         -
>     >         +    coverage_register_new_thread();
>     >              /* Stores the pmd thread's 'pmd' to 'per_pmd_key'. */
>     >              ovsthread_setspecific(pmd->dp->per_pmd_key, pmd);
>     >              pmd_thread_setaffinity_cpu(pmd->core_id);
>     >         @@ -2717,6 +2718,7 @@ reload:
>     >              }
>     >
>     >              dp_netdev_pmd_reload_done(pmd);
>     >         +    coverage_unregister_thread();
>     >
>     >              free(poll_list);
>     >              return NULL;
>     >         --
>     >         2.1.4
>     >
>     >
>     >
> 
> 



More information about the dev mailing list