[ovs-dev] [PATCH] ofproto-dpif-upcall: Log the emergency flow flush.

Flavio Leitner fbl at sysclose.org
Tue Sep 29 20:09:30 UTC 2020


On Tue, Sep 29, 2020 at 05:27:31PM +0200, Ilya Maximets wrote:
> On 9/15/20 9:21 PM, Flavio Leitner wrote:
> > When the number of flows in the datapath reaches twice the
> > maximum, revalidators will delete all flows as an emergency
> > action to recover. In that case, log a message with values
> > and increase a coverage counter.
> > 
> > Signed-off-by: Flavio Leitner <fbl at sysclose.org>
> > ---
> >  ofproto/ofproto-dpif-upcall.c | 15 ++++++++++++++-
> >  1 file changed, 14 insertions(+), 1 deletion(-)
> > 
> > diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> > index 05a912f57..12e94287a 100644
> > --- a/ofproto/ofproto-dpif-upcall.c
> > +++ b/ofproto/ofproto-dpif-upcall.c
> > @@ -57,6 +57,7 @@ COVERAGE_DEFINE(upcall_ukey_contention);
> >  COVERAGE_DEFINE(upcall_ukey_replace);
> >  COVERAGE_DEFINE(revalidate_missed_dp_flow);
> >  COVERAGE_DEFINE(upcall_flow_limit_hit);
> > +COVERAGE_DEFINE(upcall_flow_limit_kill);
> >  
> >  /* A thread that reads upcalls from dpif, forwards each upcall's packet,
> >   * and possibly sets up a kernel flow as a cache. */
> > @@ -2622,6 +2623,7 @@ revalidate(struct revalidator *revalidator)
> >  
> >          long long int max_idle;
> >          long long int now;
> > +        size_t kill_all_limit;
> >          size_t n_dp_flows;
> >          bool kill_them_all;
> >  
> > @@ -2649,7 +2651,18 @@ revalidate(struct revalidator *revalidator)
> >              COVERAGE_INC(upcall_flow_limit_hit);
> >          }
> >  
> > -        kill_them_all = n_dp_flows > flow_limit * 2;
> > +        kill_them_all = false;
> > +        kill_all_limit = flow_limit * 2;
> > +        if (OVS_UNLIKELY(n_dp_flows > kill_all_limit)) {
> > +            static struct vlog_rate_limit rlem = VLOG_RATE_LIMIT_INIT(1, 5);
> > +
> > +            VLOG_WARN_RL(&rlem, "Emergency: deleting all flows "
> > +                         "(now: %"PRIuSIZE", max: %"PRIuSIZE")",
> 
> Almost he same concern here as in the other patch.  But this message is even
> more misleading since it prints doubled dynamically adjusted limit which is
> OK to print, but how to use this value in practice if you do not know internals
> of this code?
> 
> Maybe something like "Number of datapath flows (%PRIuSIZE) twice as high as
> current dynamic flow limit (%PRIuSIZE).  Deleting all flows as an emergency
> measure."?
> 
> Another thing is that this message is actually misleading anyway because
> revalidators recalculates the number of flows for each batch (50) of dumped
> flows (every 100ms, to be honest), so we will not actually flush all the
> flows in many cases.  We will remove them until situation is normalized.
> 
> So, probably:
> "Number of datapath flows (%PRIuSIZE) twice as high as current dynamic flow
> limit (%PRIuSIZE).  Starting to delete flows unconditionally as an emergency
> measure."?
> 
> Also, this message should be printed only once per revalidation cycle, I guess,
> not for every flow batch.
> 
> > +                         n_dp_flows, kill_all_limit);
> > +            COVERAGE_INC(upcall_flow_limit_kill);
> > +            kill_them_all = true;
> > +        }
> > +
> >          max_idle = n_dp_flows > flow_limit ? 100 : ofproto_max_idle;
> >  
> >          udpif->dpif->current_ms = time_msec();
> > 
> 

Sent v2:
https://mail.openvswitch.org/pipermail/ovs-dev/2020-September/375647.html

-- 
fbl


More information about the dev mailing list