[ovs-dev] [PATCH ovs] lib: export calc_percentile function

Aaron Conole aconole at redhat.com
Tue Oct 19 15:18:08 UTC 2021


Xavier Simonart <xsimonar at redhat.com> writes:

> Hi Aaron
>
> We might for instance want to have some counters giving statistical data on flow_mods added per ovn iteration. 
> So, for instance, we would be able to compare those statistics with existing stopwatches indicating the time spent for
> those iterations, and see whether a high time stopwatch is linked to a high value of the maximum flow_mod....
>
> Coverage counters would not be sufficient. They provide very useful info (average over a few seconds, minute, hour),
> but they do not provide (as far as I know at least) this statistical view (max, percentile, ...) per iteration - e.g. the
> maximum. Adding this statistical view to the coverage counter does not seem to make too much sense for the existing
> coverage counters, so I would not modify the coverage counters.

Okay.

> As I mentioned, we could make the stopwatch (slightly) more generic by making two small modifications:
> - Add an empty unit, so the stopwatch could cover all the "counts" statistics, and not only the time related ones (as the
> existing stopwatches).
> - Add to the stopwatch printout the "Sum" of all items (and not only some averages). This addition might be useful as
> well for the existing stopwatches.
> With those two minor changes, OVN could add its own statistics.

I guess it becomes more than just a stopwatch at that point.  It's more
like a stats window.  Maybe it makes sense to decouple stopwatch from
that, and have a generic set of stats window functions, and then build
stopwatch on top.

> Do you think that such changes to the OVS stopwatch could be accepted upstream?

Seems like there are some use cases that you have considered - although
they do seem a bit more like OVN specific use cases.  It make sense
since OVS has this functionality to expose it.

Mark, any thoughts on how best to proceed?

> Thanks
> Xavier
>
> On Thu, Oct 14, 2021 at 9:39 PM Aaron Conole <aconole at redhat.com> wrote:
>
>  Xavier Simonart <xsimonar at redhat.com> writes:
>
>  > Hi Aaron
>  >
>  > Thanks for looking into this.
>  >
>  > You are right when saying that OVN uses the stopwatch API just fine.
>  > The goal of the proposed modification is to be able provide extra counters/statistics from OVN.
>  > For instance, there was some interest about how many flows (or groups) are added (removed, updated, ...) by
>  > ovn-controller.
>  > In addition to the raw count, we'd like to provide some statistical view per iteration - hence the need of being
>  able to
>  > calculate things like percentiles (in addition to average, max, ...).
>  >
>  > The stopwatch API provides most of what would be needed, but would have required some changes:
>  > - stopwatch always uses some "units" (msec, ...) while we would use it here to report non time-based statistics
>  => we
>  > would need to add some other "units". We could add an "empty" unit, to avoid having to require OVS changes
>  > everytime we need different statistics (today flows, tomorrow something else).
>  > - more important, stopwatch does not provide the total count we are looking for (i.e. the sum of all samples). We
>  could
>  > add a "Total" or a "Sum" to stopwatch, but this would have changed the stopwatch outputs for all existing
>  counters. I
>  > felt that this might be perceived as an API change or a compatibility issue.
>  >
>  > Hence I was proposing to only export the percentile related function, to avoid any change to OVS which might
>  impact
>  > backward compatibility.
>  > But I would be happy to modify the stopwatch, adding the "Sum" and the empty unit. This would simplify OVN
>  code as
>  > well compared to what I was initially proposing.
>  > Adding stopwatch_add_sample(...) would not change much in this case.
>
>  I see - what you're looking for is some kind of generic statistics -
>  would the coverage counters work instead?  We do keep some flow
>  statistics, so I guess it's useful to understand what you're looking
>  to track.  Maybe 'stopwatch' might not work perfectly, or maybe we can
>  make some changes to make it seem more generic.
>
>  > Thanks
>  > Xavier
>  >
>  > On Wed, Oct 13, 2021 at 6:42 PM Aaron Conole <aconole at redhat.com> wrote:
>  >
>  >  Xavier Simonart <xsimonar at redhat.com> writes:
>  >
>  >  > export calc_percentile function (and percentile struct) so that
>  >  > it can be used in other libraries such as OVN.
>  >  >
>  >  > Signed-off-by: Xavier Simonart <xsimonar at redhat.com>
>  >  > ---
>  >
>  >  What is the intent to use this in other libraries?  It would be nice to
>  >  understand why just running the existing stopwatch API doesn't work
>  >  (maybe you cannot start samples).  AFAIK, OVN uses the stopwatch API
>  >  just fine?
>  >
>  >  Perhaps it could make sense to expose the functionality in a different
>  >  fashion like:
>  >
>  >    void stopwatch_add_sample(const char *, unsigned long long);
>  >
>  >  This seems a useful API and doesn't expose all of the internal
>  >  information, but I don't know if it really is needed.  Can you expand
>  >  why you need calc_percentile exposed?
>  >
>  >  >  lib/stopwatch.c | 24 +-----------------------
>  >  >  lib/stopwatch.h | 26 ++++++++++++++++++++++++++
>  >  >  2 files changed, 27 insertions(+), 23 deletions(-)
>  >  >
>  >  > diff --git a/lib/stopwatch.c b/lib/stopwatch.c
>  >  > index f5602163b..003c3a05f 100644
>  >  > --- a/lib/stopwatch.c
>  >  > +++ b/lib/stopwatch.c
>  >  > @@ -35,25 +35,6 @@ struct average {
>  >  >      double alpha;   /* Weight given to new samples */
>  >  >  };
>  >  >  
>  >  > -#define MARKERS 5
>  >  > -
>  >  > -/* Number of samples to collect before reporting P-square calculated
>  >  > - * percentile
>  >  > - */
>  >  > -#define P_SQUARE_MIN 50
>  >  > -
>  >  > -/* The naming of these fields is based on the naming used in the
>  >  > - * P-square algorithm paper.
>  >  > - */
>  >  > -struct percentile {
>  >  > -    int n[MARKERS];
>  >  > -    double n_prime[MARKERS];
>  >  > -    double q[MARKERS];
>  >  > -    double dn[MARKERS];
>  >  > -    unsigned long long samples[P_SQUARE_MIN];
>  >  > -    double percentile;
>  >  > -};
>  >  > -
>  >  >  struct stopwatch {
>  >  >      enum stopwatch_units units;
>  >  >      unsigned long long n_samples;
>  >  > @@ -107,10 +88,7 @@ comp_samples(const void *left, const void *right)
>  >  >      return *right_d > *left_d ? -1 : *right_d < *left_d;
>  >  >  }
>  >  >  
>  >  > -/* Calculate the percentile using the P-square algorithm. For more
>  >  > - * information, see https://www1.cse.wustl.edu/~jain/papers/ftp/psqr.pdf
>  >  > - */
>  >  > -static void
>  >  > +void
>  >  >  calc_percentile(unsigned long long n_samples, struct percentile *pctl,
>  >  >                  unsigned long long new_sample)
>  >  >  {
>  >  > diff --git a/lib/stopwatch.h b/lib/stopwatch.h
>  >  > index 91abd64e4..efb9a9e8a 100644
>  >  > --- a/lib/stopwatch.h
>  >  > +++ b/lib/stopwatch.h
>  >  > @@ -36,6 +36,32 @@ struct stopwatch_stats {
>  >  >                                      (alpha 0.01). */
>  >  >  };
>  >  >  
>  >  > +#define MARKERS 5
>  >  > +
>  >  > +/* Number of samples to collect before reporting P-square calculated
>  >  > + * percentile
>  >  > + */
>  >  > +#define P_SQUARE_MIN 50
>  >  > +
>  >  > +/* The naming of these fields is based on the naming used in the
>  >  > + * P-square algorithm paper.
>  >  > + */
>  >  > +struct percentile {
>  >  > +    int n[MARKERS];
>  >  > +    double n_prime[MARKERS];
>  >  > +    double q[MARKERS];
>  >  > +    double dn[MARKERS];
>  >  > +    unsigned long long samples[P_SQUARE_MIN];
>  >  > +    double percentile;
>  >  > +};
>  >  > +
>  >  > +/* Calculate the percentile using the P-square algorithm. For more
>  >  > + * information, see https://www1.cse.wustl.edu/~jain/papers/ftp/psqr.pdf
>  >  > + */
>  >  > +void
>  >  > +calc_percentile(unsigned long long n_samples, struct percentile *pctl,
>  >  > +                unsigned long long new_sample);
>  >  > +
>  >  >  /* Create a new stopwatch.
>  >  >   * The "units" are not used for any calculations but are printed when
>  >  >   * statistics are requested.



More information about the dev mailing list