[ovs-dev] [PATCH ovn] inc-proc-eng: Improve debug logging.
Dumitru Ceara
dceara at redhat.com
Tue Jul 6 13:49:00 UTC 2021
On 7/2/21 8:39 PM, Han Zhou wrote:
> On Fri, Jul 2, 2021 at 2:08 AM Dumitru Ceara <dceara at redhat.com> wrote:
>>
>> Time how long change/run handlers take and log this at debug level.
>> I-P engine debug logs are not so verbose so enabling them is quite
>> common when debugging scale/control plane latency related issues.
>>
>> One of the major missing pieces though was a log about how long each I-P
>> node change/run handler took to run. This commit adds it and also logs
>> a message when a node is recomputed because it misses an input change
>> handler.
>>
>> Signed-off-by: Dumitru Ceara <dceara at redhat.com>
>
> Thanks Dumitru. This will be very helpful. I have just some minor
> suggestions.
>
Thanks for reviewing this!
>
>> ---
>> lib/inc-proc-eng.c | 16 +++++++++++++++-
>> 1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/inc-proc-eng.c b/lib/inc-proc-eng.c
>> index c349efb22..bb96ef1a3 100644
>> --- a/lib/inc-proc-eng.c
>> +++ b/lib/inc-proc-eng.c
>> @@ -27,6 +27,7 @@
>> #include "openvswitch/hmap.h"
>> #include "openvswitch/vlog.h"
>> #include "inc-proc-eng.h"
>> +#include "timeval.h"
>> #include "unixctl.h"
>>
>> VLOG_DEFINE_THIS_MODULE(inc_proc_eng);
>> @@ -333,8 +334,10 @@ engine_recompute(struct engine_node *node, bool
> forced, bool allowed)
>> }
>>
>> /* Run the node handler which might change state. */
>> + long long int now = time_msec();
>> node->run(node, node->data);
>> node->stats.recompute++;
>> + VLOG_DBG("node: %s, recompute took %lldms", node->name, time_msec()
> - now);
>
> We already log an entry at the beginning of this function. I think it is
> better to move that one here and combine with the time spent into just one
> log.
>
> Secondly, the log in the beginning prints if it is triggered or forced, but
> as you mentioned there can be two cases for "triggered" and we didn't have
> an extra log to tell when it is triggered by missing a handler, I'd suggest
> instead of adding another log, we can change the "forced" argument to
> "reason" so we can log the reason directly in the same log entry in this
> function, and remove the log for calling this function. We need to add
> another argument to this function to tell the current input that triggers
> this recompute. It can be something like:
>
> "node: abc, recompute (forced/missing handler/failed handler), current
> input: xyz, time spent: ..."
>
Ack.
>> }
>>
>> /* Return true if the node could be computed, false otherwise. */
>> @@ -350,7 +353,12 @@ engine_compute(struct engine_node *node, bool
> recompute_allowed)
>> /* If the input change can't be handled incrementally, run
>> * the node handler.
>> */
>> - if (!node->inputs[i].change_handler(node, node->data)) {
>> + long long int now = time_msec();
>> + bool handled = node->inputs[i].change_handler(node,
> node->data);
>> + VLOG_DBG("node: %s, handling change for input %s took
> %lldms",
>> + node->name, node->inputs[i].node->name,
>> + time_msec() - now);
>> + if (!handled) {
>> VLOG_DBG("node: %s, can't handle change for input %s, "
>> "fall back to recompute",
>> node->name, node->inputs[i].node->name);
>> @@ -369,8 +377,11 @@ engine_run_node(struct engine_node *node, bool
> recompute_allowed)
>> {
>> if (!node->n_inputs) {
>> /* Run the node handler which might change state. */
>> + long long int now = time_msec();
>> node->run(node, node->data);
>> node->stats.recompute++;
>> + VLOG_DBG("node: %s, recompute took %lldms", node->name,
>> + time_msec() - now);
>
> This is a pure input node, so it's better not calling it "recompute". Since
> they are evaluated in every iteration and usually very fast, I wonder if we
> should avoid this log, which could be just noise.
>
Makes sense.
>> return;
>> }
>>
>> @@ -389,6 +400,9 @@ engine_run_node(struct engine_node *node, bool
> recompute_allowed)
>>
>> /* Trigger a recompute if we don't have a change handler. */
>> if (!node->inputs[i].change_handler) {
>> + VLOG_DBG("node %s, no handler for input %s, "
>> + "fall back to recompute",
>> + node->name, node->inputs[i].node->name);
>
> This is the one I mentioned above that could be avoided by passing the
> reason and current input node name to the function call and get logged in
> engine_recompute().
>
Right.
I sent v2:
http://patchwork.ozlabs.org/project/ovn/patch/20210706134559.2335-1-dceara@redhat.com/
> Thanks,
> Han
>
Thanks,
Dumitru
More information about the dev
mailing list