[ovs-dev] [PATCH ovn] inc-proc-eng: Improve debug logging.

Dumitru Ceara dceara at redhat.com
Fri Jul 2 13:56:03 UTC 2021


On 7/2/21 1:36 PM, Dumitru Ceara wrote:
> On 7/2/21 1:32 PM, Mark Gray wrote:
>> On 02/07/2021 10:07, Dumitru Ceara wrote:
>>> Time how long change/run handlers take and log this at debug level.
>>> I-P engine debug logs are not so verbose so enabling them is quite
>>> common when debugging scale/control plane latency related issues.
>>>
>>> One of the major missing pieces though was a log about how long each I-P
>>> node change/run handler took to run.  This commit adds it and also logs
>>> a message when a node is recomputed because it misses an input change
>>> handler.
>>>
>>> Signed-off-by: Dumitru Ceara <dceara at redhat.com>
>>> ---
>>>  lib/inc-proc-eng.c | 16 +++++++++++++++-
>>>  1 file changed, 15 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/lib/inc-proc-eng.c b/lib/inc-proc-eng.c
>>> index c349efb22..bb96ef1a3 100644
>>> --- a/lib/inc-proc-eng.c
>>> +++ b/lib/inc-proc-eng.c
>>> @@ -27,6 +27,7 @@
>>>  #include "openvswitch/hmap.h"
>>>  #include "openvswitch/vlog.h"
>>>  #include "inc-proc-eng.h"
>>> +#include "timeval.h"
>>>  #include "unixctl.h"
>>>  
>>>  VLOG_DEFINE_THIS_MODULE(inc_proc_eng);
>>> @@ -333,8 +334,10 @@ engine_recompute(struct engine_node *node, bool forced, bool allowed)
>>>      }
>>>  
>>>      /* Run the node handler which might change state. */
>>> +    long long int now = time_msec();
>>>      node->run(node, node->data);
>>>      node->stats.recompute++;
>>> +    VLOG_DBG("node: %s, recompute took %lldms", node->name, time_msec() - now);
>>>  }
>>>  
>>>  /* Return true if the node could be computed, false otherwise. */
>>> @@ -350,7 +353,12 @@ engine_compute(struct engine_node *node, bool recompute_allowed)
>>>              /* If the input change can't be handled incrementally, run
>>>               * the node handler.
>>>               */
>>> -            if (!node->inputs[i].change_handler(node, node->data)) {
>>> +            long long int now = time_msec();
>>> +            bool handled = node->inputs[i].change_handler(node, node->data);
>>> +            VLOG_DBG("node: %s, handling change for input %s took %lldms",
>>> +                     node->name, node->inputs[i].node->name,
>>> +                     time_msec() - now);
>>
>> Why not add a stopwatch?
>>
> 
> Because I keep forgetting they exist. :)
> 
> I'll switch to stopwatches in v2, thanks for the review!
> 

After giving it a try and looking at the stopwatch implementation closer
I'm starting to think that it might be better to keep the logs as in
this patch instead of using stopwatch.  Some reasons:

1. For each incremental processing engine node we'll be creating:
- 1 stopwatch for its run callback
- 1 stopwatch for each of its input change handlers

This will result in quite a few stopwatches, we currently have ~30 nodes
in the I-P graph.  stopwatch/show, while dumping useful information in
general, would become extremely verbose.  Maybe that's not a blocking
issue though, however..

2. There's no way to get the value of a specific stopwatch sample.  In
the case of the I-P engine run times of the same callback vary a lot
based on what changes happen in the Southbound.  It's true that we could
enable DBG logs for inc_proc_eng and then look at timestamps for each
node and (kind of) deduce how long each handler took.  But if we're
enabling logs, why not make it easier for the engineer looking at them
and just log how long the callbacks took?

3. This is probably minor, but stopwatch_create/stop lock a mutex.
There might not be huge contention between the main thread and the
stopwatch but still.

Based on the above I won't be sending a v2 for now and I'll move back
the v1 in patchwork to "New" to continue the discussion here.

Thanks,
Dumitru



More information about the dev mailing list