[ovs-dev] [PATCH] ofproto-dpif-upcall: Log the emergency flow flush.

Ilya Maximets i.maximets at ovn.org
Tue Sep 29 15:27:31 UTC 2020


On 9/15/20 9:21 PM, Flavio Leitner wrote:
> When the number of flows in the datapath reaches twice the
> maximum, revalidators will delete all flows as an emergency
> action to recover. In that case, log a message with values
> and increase a coverage counter.
> 
> Signed-off-by: Flavio Leitner <fbl at sysclose.org>
> ---
>  ofproto/ofproto-dpif-upcall.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index 05a912f57..12e94287a 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -57,6 +57,7 @@ COVERAGE_DEFINE(upcall_ukey_contention);
>  COVERAGE_DEFINE(upcall_ukey_replace);
>  COVERAGE_DEFINE(revalidate_missed_dp_flow);
>  COVERAGE_DEFINE(upcall_flow_limit_hit);
> +COVERAGE_DEFINE(upcall_flow_limit_kill);
>  
>  /* A thread that reads upcalls from dpif, forwards each upcall's packet,
>   * and possibly sets up a kernel flow as a cache. */
> @@ -2622,6 +2623,7 @@ revalidate(struct revalidator *revalidator)
>  
>          long long int max_idle;
>          long long int now;
> +        size_t kill_all_limit;
>          size_t n_dp_flows;
>          bool kill_them_all;
>  
> @@ -2649,7 +2651,18 @@ revalidate(struct revalidator *revalidator)
>              COVERAGE_INC(upcall_flow_limit_hit);
>          }
>  
> -        kill_them_all = n_dp_flows > flow_limit * 2;
> +        kill_them_all = false;
> +        kill_all_limit = flow_limit * 2;
> +        if (OVS_UNLIKELY(n_dp_flows > kill_all_limit)) {
> +            static struct vlog_rate_limit rlem = VLOG_RATE_LIMIT_INIT(1, 5);
> +
> +            VLOG_WARN_RL(&rlem, "Emergency: deleting all flows "
> +                         "(now: %"PRIuSIZE", max: %"PRIuSIZE")",

Almost he same concern here as in the other patch.  But this message is even
more misleading since it prints doubled dynamically adjusted limit which is
OK to print, but how to use this value in practice if you do not know internals
of this code?

Maybe something like "Number of datapath flows (%PRIuSIZE) twice as high as
current dynamic flow limit (%PRIuSIZE).  Deleting all flows as an emergency
measure."?

Another thing is that this message is actually misleading anyway because
revalidators recalculates the number of flows for each batch (50) of dumped
flows (every 100ms, to be honest), so we will not actually flush all the
flows in many cases.  We will remove them until situation is normalized.

So, probably:
"Number of datapath flows (%PRIuSIZE) twice as high as current dynamic flow
limit (%PRIuSIZE).  Starting to delete flows unconditionally as an emergency
measure."?

Also, this message should be printed only once per revalidation cycle, I guess,
not for every flow batch.

> +                         n_dp_flows, kill_all_limit);
> +            COVERAGE_INC(upcall_flow_limit_kill);
> +            kill_them_all = true;
> +        }
> +
>          max_idle = n_dp_flows > flow_limit ? 100 : ofproto_max_idle;
>  
>          udpif->dpif->current_ms = time_msec();
> 



More information about the dev mailing list