[ovs-dev] [PATCH ovn] inc-proc-eng: Improve debug logging.

Dumitru Ceara dceara at redhat.com
Tue Jul 6 13:49:00 UTC 2021


On 7/2/21 8:39 PM, Han Zhou wrote:
> On Fri, Jul 2, 2021 at 2:08 AM Dumitru Ceara <dceara at redhat.com> wrote:
>>
>> Time how long change/run handlers take and log this at debug level.
>> I-P engine debug logs are not so verbose so enabling them is quite
>> common when debugging scale/control plane latency related issues.
>>
>> One of the major missing pieces though was a log about how long each I-P
>> node change/run handler took to run.  This commit adds it and also logs
>> a message when a node is recomputed because it misses an input change
>> handler.
>>
>> Signed-off-by: Dumitru Ceara <dceara at redhat.com>
> 
> Thanks Dumitru. This will be very helpful. I have just some minor
> suggestions.
> 

Thanks for reviewing this!

> 
>> ---
>>  lib/inc-proc-eng.c | 16 +++++++++++++++-
>>  1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/inc-proc-eng.c b/lib/inc-proc-eng.c
>> index c349efb22..bb96ef1a3 100644
>> --- a/lib/inc-proc-eng.c
>> +++ b/lib/inc-proc-eng.c
>> @@ -27,6 +27,7 @@
>>  #include "openvswitch/hmap.h"
>>  #include "openvswitch/vlog.h"
>>  #include "inc-proc-eng.h"
>> +#include "timeval.h"
>>  #include "unixctl.h"
>>
>>  VLOG_DEFINE_THIS_MODULE(inc_proc_eng);
>> @@ -333,8 +334,10 @@ engine_recompute(struct engine_node *node, bool
> forced, bool allowed)
>>      }
>>
>>      /* Run the node handler which might change state. */
>> +    long long int now = time_msec();
>>      node->run(node, node->data);
>>      node->stats.recompute++;
>> +    VLOG_DBG("node: %s, recompute took %lldms", node->name, time_msec()
> - now);
> 
> We already log an entry at the beginning of this function. I think it is
> better to move that one here and combine with the time spent into just one
> log.
> 
> Secondly, the log in the beginning prints if it is triggered or forced, but
> as you mentioned there can be two cases for "triggered" and we didn't have
> an extra log to tell when it is triggered by missing a handler, I'd suggest
> instead of adding another log, we can change the "forced" argument to
> "reason" so we can log the reason directly in the same log entry in this
> function, and remove the log for calling this function. We need to add
> another argument to this function to tell the current input that triggers
> this recompute. It can be something like:
> 
> "node: abc, recompute (forced/missing handler/failed handler), current
> input: xyz, time spent: ..."
> 

Ack.

>>  }
>>
>>  /* Return true if the node could be computed, false otherwise. */
>> @@ -350,7 +353,12 @@ engine_compute(struct engine_node *node, bool
> recompute_allowed)
>>              /* If the input change can't be handled incrementally, run
>>               * the node handler.
>>               */
>> -            if (!node->inputs[i].change_handler(node, node->data)) {
>> +            long long int now = time_msec();
>> +            bool handled = node->inputs[i].change_handler(node,
> node->data);
>> +            VLOG_DBG("node: %s, handling change for input %s took
> %lldms",
>> +                     node->name, node->inputs[i].node->name,
>> +                     time_msec() - now);
>> +            if (!handled) {
>>                  VLOG_DBG("node: %s, can't handle change for input %s, "
>>                           "fall back to recompute",
>>                           node->name, node->inputs[i].node->name);
>> @@ -369,8 +377,11 @@ engine_run_node(struct engine_node *node, bool
> recompute_allowed)
>>  {
>>      if (!node->n_inputs) {
>>          /* Run the node handler which might change state. */
>> +        long long int now = time_msec();
>>          node->run(node, node->data);
>>          node->stats.recompute++;
>> +        VLOG_DBG("node: %s, recompute took %lldms", node->name,
>> +                 time_msec() - now);
> 
> This is a pure input node, so it's better not calling it "recompute". Since
> they are evaluated in every iteration and usually very fast, I wonder if we
> should avoid this log, which could be just noise.
> 

Makes sense.

>>          return;
>>      }
>>
>> @@ -389,6 +400,9 @@ engine_run_node(struct engine_node *node, bool
> recompute_allowed)
>>
>>              /* Trigger a recompute if we don't have a change handler. */
>>              if (!node->inputs[i].change_handler) {
>> +                VLOG_DBG("node %s, no handler for input %s, "
>> +                         "fall back to recompute",
>> +                         node->name, node->inputs[i].node->name);
> 
> This is the one I mentioned above that could be avoided by passing the
> reason and current input node name to the function call and get logged in
> engine_recompute().
> 

Right.

I sent v2:
http://patchwork.ozlabs.org/project/ovn/patch/20210706134559.2335-1-dceara@redhat.com/

> Thanks,
> Han
> 

Thanks,
Dumitru



More information about the dev mailing list