[ovs-dev] [PATCH ovs] lib: export calc_percentile function
Aaron Conole
aconole at redhat.com
Tue Oct 19 15:18:08 UTC 2021
Xavier Simonart <xsimonar at redhat.com> writes:
> Hi Aaron
>
> We might for instance want to have some counters giving statistical data on flow_mods added per ovn iteration.
> So, for instance, we would be able to compare those statistics with existing stopwatches indicating the time spent for
> those iterations, and see whether a high time stopwatch is linked to a high value of the maximum flow_mod....
>
> Coverage counters would not be sufficient. They provide very useful info (average over a few seconds, minute, hour),
> but they do not provide (as far as I know at least) this statistical view (max, percentile, ...) per iteration - e.g. the
> maximum. Adding this statistical view to the coverage counter does not seem to make too much sense for the existing
> coverage counters, so I would not modify the coverage counters.
Okay.
> As I mentioned, we could make the stopwatch (slightly) more generic by making two small modifications:
> - Add an empty unit, so the stopwatch could cover all the "counts" statistics, and not only the time related ones (as the
> existing stopwatches).
> - Add to the stopwatch printout the "Sum" of all items (and not only some averages). This addition might be useful as
> well for the existing stopwatches.
> With those two minor changes, OVN could add its own statistics.
I guess it becomes more than just a stopwatch at that point. It's more
like a stats window. Maybe it makes sense to decouple stopwatch from
that, and have a generic set of stats window functions, and then build
stopwatch on top.
> Do you think that such changes to the OVS stopwatch could be accepted upstream?
Seems like there are some use cases that you have considered - although
they do seem a bit more like OVN specific use cases. It make sense
since OVS has this functionality to expose it.
Mark, any thoughts on how best to proceed?
> Thanks
> Xavier
>
> On Thu, Oct 14, 2021 at 9:39 PM Aaron Conole <aconole at redhat.com> wrote:
>
> Xavier Simonart <xsimonar at redhat.com> writes:
>
> > Hi Aaron
> >
> > Thanks for looking into this.
> >
> > You are right when saying that OVN uses the stopwatch API just fine.
> > The goal of the proposed modification is to be able provide extra counters/statistics from OVN.
> > For instance, there was some interest about how many flows (or groups) are added (removed, updated, ...) by
> > ovn-controller.
> > In addition to the raw count, we'd like to provide some statistical view per iteration - hence the need of being
> able to
> > calculate things like percentiles (in addition to average, max, ...).
> >
> > The stopwatch API provides most of what would be needed, but would have required some changes:
> > - stopwatch always uses some "units" (msec, ...) while we would use it here to report non time-based statistics
> => we
> > would need to add some other "units". We could add an "empty" unit, to avoid having to require OVS changes
> > everytime we need different statistics (today flows, tomorrow something else).
> > - more important, stopwatch does not provide the total count we are looking for (i.e. the sum of all samples). We
> could
> > add a "Total" or a "Sum" to stopwatch, but this would have changed the stopwatch outputs for all existing
> counters. I
> > felt that this might be perceived as an API change or a compatibility issue.
> >
> > Hence I was proposing to only export the percentile related function, to avoid any change to OVS which might
> impact
> > backward compatibility.
> > But I would be happy to modify the stopwatch, adding the "Sum" and the empty unit. This would simplify OVN
> code as
> > well compared to what I was initially proposing.
> > Adding stopwatch_add_sample(...) would not change much in this case.
>
> I see - what you're looking for is some kind of generic statistics -
> would the coverage counters work instead? We do keep some flow
> statistics, so I guess it's useful to understand what you're looking
> to track. Maybe 'stopwatch' might not work perfectly, or maybe we can
> make some changes to make it seem more generic.
>
> > Thanks
> > Xavier
> >
> > On Wed, Oct 13, 2021 at 6:42 PM Aaron Conole <aconole at redhat.com> wrote:
> >
> > Xavier Simonart <xsimonar at redhat.com> writes:
> >
> > > export calc_percentile function (and percentile struct) so that
> > > it can be used in other libraries such as OVN.
> > >
> > > Signed-off-by: Xavier Simonart <xsimonar at redhat.com>
> > > ---
> >
> > What is the intent to use this in other libraries? It would be nice to
> > understand why just running the existing stopwatch API doesn't work
> > (maybe you cannot start samples). AFAIK, OVN uses the stopwatch API
> > just fine?
> >
> > Perhaps it could make sense to expose the functionality in a different
> > fashion like:
> >
> > void stopwatch_add_sample(const char *, unsigned long long);
> >
> > This seems a useful API and doesn't expose all of the internal
> > information, but I don't know if it really is needed. Can you expand
> > why you need calc_percentile exposed?
> >
> > > lib/stopwatch.c | 24 +-----------------------
> > > lib/stopwatch.h | 26 ++++++++++++++++++++++++++
> > > 2 files changed, 27 insertions(+), 23 deletions(-)
> > >
> > > diff --git a/lib/stopwatch.c b/lib/stopwatch.c
> > > index f5602163b..003c3a05f 100644
> > > --- a/lib/stopwatch.c
> > > +++ b/lib/stopwatch.c
> > > @@ -35,25 +35,6 @@ struct average {
> > > double alpha; /* Weight given to new samples */
> > > };
> > >
> > > -#define MARKERS 5
> > > -
> > > -/* Number of samples to collect before reporting P-square calculated
> > > - * percentile
> > > - */
> > > -#define P_SQUARE_MIN 50
> > > -
> > > -/* The naming of these fields is based on the naming used in the
> > > - * P-square algorithm paper.
> > > - */
> > > -struct percentile {
> > > - int n[MARKERS];
> > > - double n_prime[MARKERS];
> > > - double q[MARKERS];
> > > - double dn[MARKERS];
> > > - unsigned long long samples[P_SQUARE_MIN];
> > > - double percentile;
> > > -};
> > > -
> > > struct stopwatch {
> > > enum stopwatch_units units;
> > > unsigned long long n_samples;
> > > @@ -107,10 +88,7 @@ comp_samples(const void *left, const void *right)
> > > return *right_d > *left_d ? -1 : *right_d < *left_d;
> > > }
> > >
> > > -/* Calculate the percentile using the P-square algorithm. For more
> > > - * information, see https://www1.cse.wustl.edu/~jain/papers/ftp/psqr.pdf
> > > - */
> > > -static void
> > > +void
> > > calc_percentile(unsigned long long n_samples, struct percentile *pctl,
> > > unsigned long long new_sample)
> > > {
> > > diff --git a/lib/stopwatch.h b/lib/stopwatch.h
> > > index 91abd64e4..efb9a9e8a 100644
> > > --- a/lib/stopwatch.h
> > > +++ b/lib/stopwatch.h
> > > @@ -36,6 +36,32 @@ struct stopwatch_stats {
> > > (alpha 0.01). */
> > > };
> > >
> > > +#define MARKERS 5
> > > +
> > > +/* Number of samples to collect before reporting P-square calculated
> > > + * percentile
> > > + */
> > > +#define P_SQUARE_MIN 50
> > > +
> > > +/* The naming of these fields is based on the naming used in the
> > > + * P-square algorithm paper.
> > > + */
> > > +struct percentile {
> > > + int n[MARKERS];
> > > + double n_prime[MARKERS];
> > > + double q[MARKERS];
> > > + double dn[MARKERS];
> > > + unsigned long long samples[P_SQUARE_MIN];
> > > + double percentile;
> > > +};
> > > +
> > > +/* Calculate the percentile using the P-square algorithm. For more
> > > + * information, see https://www1.cse.wustl.edu/~jain/papers/ftp/psqr.pdf
> > > + */
> > > +void
> > > +calc_percentile(unsigned long long n_samples, struct percentile *pctl,
> > > + unsigned long long new_sample);
> > > +
> > > /* Create a new stopwatch.
> > > * The "units" are not used for any calculations but are printed when
> > > * statistics are requested.
More information about the dev
mailing list