[ovs-dev] [PATCH] ovn-controller: Fix flow installation latency caused by recompute.

Mon Aug 5 15:57:14 UTC 2019

I resent it for backporting to 2.12 as a series followed by a fix from
Dumitru, which has been merged in OVN repo, too.
https://patchwork.ozlabs.org/project/openvswitch/list/?series=123382

Thanks Dumitru again for the fix.

On Tue, Jul 30, 2019 at 12:16 AM Han Zhou <zhouhan at gmail.com> wrote:

> Thanks Mark and Numan for the review.
> Ben, now this is merged to OVN repo. Could you help backporting it to 2.12?
>
> Thanks,
> Han
>
>
> On Thu, Jul 25, 2019 at 1:23 PM Mark Michelson <mmichels at redhat.com>
> wrote:
> >
> > Forgot the e-mail address after my name :)
> >
> > Acked-by: Mark Michelson <mmichels at redhat.com>
> >
> > On 7/25/19 4:16 PM, Mark Michelson wrote:
> > > Acked-by: Mark Michelson
> > >
> > > Thanks for the informative commit message!
> > >
> > > On 7/23/19 10:52 PM, Han Zhou wrote:
> > >> From: Han Zhou <hzhou8 at ebay.com>
> > >>
> > >> When there are in-flight flow-installations pending to ovs-vswitchd,
> > >> current incremental processing logic prioritizes new change handling.
> > >> However, in scenarios where frequent recomputes are triggered, the
> > >> later recompute would block the flow-installation for previously
> > >> computed flows because recompute usually takes long time, especially
> > >> when there are large number of flows. This results in worse latency
> > >> than the version without incremental processing in specific scale
> > >> test scenarios.
> > >>
> > >> While we can simply fix the problem by prioritizing flow installation
> > >> rather than new change handling, it can cause the incremental
> > >> processing to degrade to always recompute in certain scenarios when
> > >> there are some changes triggering recomputes, followed by a lot of
> > >> continously coming changes that can be handled incrementally. Because
> > >> OVSDB change tracker cannot preserve changes across iterations, once
> > >> the recompute is triggered and resulted in a lot of pending messages
> > >> to ovs-vswitchd, and if we choose to skip the engine_run()
> > >> in the next iteration when a incrementally processible change comes,
> > >> we miss the opportunity to handle that tracked change and will have
> > >> to trigger recompute again in the next next iteration, and so on, if
> > >> such changes come continously.
> > >>
> > >> This patch solves the problem by introducing
> > >> engine_set_abort_recompute(),
> > >> so that we can prioritize new change handling if the change can be
> > >> incrementally processed, but if the change triggers recompute, we
> > >> abort there without spending CPU on the recompute to avoid blocking
> > >> the previous computed flow installation.
> > >>
> > >> Reported-by: Daniel Alvarez Sanchez <dalvarez at redhat.com>
> > >> Reported-by: Numan Siddique <nusiddiq at redhat.com>
> > >> Reported-at:
> > >>
> https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048822.html
> > >> Tested-by: Numan Siddique <nusiddiq at redhat.com>
> > >> Signed-off-by: Han Zhou <hzhou8 at ebay.com>
> > >> ---
> > >>   ovn/controller/ofctrl.c         |  2 +-
> > >>   ovn/controller/ofctrl.h         |  1 +
> > >>   ovn/controller/ovn-controller.c | 30 +++++++++++++++++++++++++++---
> > >>   ovn/lib/inc-proc-eng.c          | 26 ++++++++++++++++++++++----
> > >>   ovn/lib/inc-proc-eng.h          |  9 +++++++--
> > >>   5 files changed, 58 insertions(+), 10 deletions(-)
> > >>
> > >> diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c
> > >> index 043abd6..0fcaa72 100644
> > >> --- a/ovn/controller/ofctrl.c
> > >> +++ b/ovn/controller/ofctrl.c
> > >> @@ -985,7 +985,7 @@ add_meter(struct ovn_extend_table_info *m_desired,
> > >>    * in the correct state and not backlogged with existing flow_mods.
> > >> (Our
> > >>    * criteria for being backlogged appear very conservative, but the
> > >> socket
> > >>    * between ovn-controller and OVS provides some buffering.) */
> > >> -static bool
> > >> +bool
> > >>   ofctrl_can_put(void)
> > >>   {
> > >>       if (state != S_UPDATE_FLOWS
> > >> diff --git a/ovn/controller/ofctrl.h b/ovn/controller/ofctrl.h
> > >> index ed8918a..2b21c11 100644
> > >> --- a/ovn/controller/ofctrl.h
> > >> +++ b/ovn/controller/ofctrl.h
> > >> @@ -51,6 +51,7 @@ void ofctrl_put(struct ovn_desired_flow_table *,
> > >>                   const struct sbrec_meter_table *,
> > >>                   int64_t nb_cfg,
> > >>                   bool flow_changed);
> > >> +bool ofctrl_can_put(void);
> > >>   void ofctrl_wait(void);
> > >>   void ofctrl_destroy(void);
> > >>   int64_t ofctrl_get_cur_cfg(void);
> > >> diff --git a/ovn/controller/ovn-controller.c
> > >> b/ovn/controller/ovn-controller.c
> > >> index c4883aa..866cc1c 100644
> > >> --- a/ovn/controller/ovn-controller.c
> > >> +++ b/ovn/controller/ovn-controller.c
> > >> @@ -1869,6 +1869,7 @@ main(int argc, char *argv[])
> > >>       uint64_t engine_run_id = 0;
> > >>       uint64_t old_engine_run_id = 0;
> > >> +    bool engine_aborted = false;
> > >>       unsigned int ovs_cond_seqno = UINT_MAX;
> > >>       unsigned int ovnsb_cond_seqno = UINT_MAX;
> > >> @@ -1955,7 +1956,30 @@ main(int argc, char *argv[])
> > >>                       stopwatch_start(CONTROLLER_LOOP_STOPWATCH_NAME,
> > >>                                       time_msec());
> > >>                       if (ovnsb_idl_txn) {
> > >> -                        engine_run(&en_flow_output, ++engine_run_id);
> > >> +                        if (!ofctrl_can_put()) {
> > >> +                            /* When there are in-flight messages
> > >> pending to
> > >> +                             * ovs-vswitchd, we should hold on
> > >> recomputing so
> > >> +                             * that the previous flow installations
> > >> won't be
> > >> +                             * delayed.  However, we still want to
> > >> try if
> > >> +                             * recompute is not needed and we can
> > >> quickly
> > >> +                             * incrementally process the new changes,
> > >> to avoid
> > >> +                             * unnecessarily forced recomputes later
> > >> on.  This
> > >> +                             * is because the OVSDB change tracker
> > >> cannot
> > >> +                             * preserve tracked changes across
> > >> iterations.  If
> > >> +                             * change tracking is improved, we can
> > >> simply skip
> > >> +                             * this round of engine_run and continue
> > >> processing
> > >> +                             * acculated changes incrementally later
> > >> when
> > >> +                             * ofctrl_can_put() returns true. */
> > >> +                            if (!engine_aborted) {
> > >> +                                engine_set_abort_recompute(true);
> > >> +                                engine_aborted =
> > >> engine_run(&en_flow_output,
> > >> +
> > >> ++engine_run_id);
> > >> +                            }
> > >> +                        } else {
> > >> +                            engine_set_abort_recompute(false);
> > >> +                            engine_aborted = false;
> > >> +                            engine_run(&en_flow_output,
> > >> ++engine_run_id);
> > >> +                        }
> > >>                       }
> > >>                       stopwatch_stop(CONTROLLER_LOOP_STOPWATCH_NAME,
> > >>                                      time_msec());
> > >> @@ -1993,8 +2017,8 @@ main(int argc, char *argv[])
> > >>                   }
> > >>               }
> > >> -            if (old_engine_run_id == engine_run_id) {
> > >> -                if (engine_need_run(&en_flow_output)) {
> > >> +            if (old_engine_run_id == engine_run_id ||
> engine_aborted) {
> > >> +                if (engine_aborted ||
> > >> engine_need_run(&en_flow_output)) {
> > >>                       VLOG_DBG("engine did not run, force recompute
> > >> next time: "
> > >>                                "br_int %p, chassis %p", br_int,
> chassis);
> > >>                       engine_set_force_recompute(true);
> > >> diff --git a/ovn/lib/inc-proc-eng.c b/ovn/lib/inc-proc-eng.c
> > >> index 1ddea1a..1064a08 100644
> > >> --- a/ovn/lib/inc-proc-eng.c
> > >> +++ b/ovn/lib/inc-proc-eng.c
> > >> @@ -31,6 +31,7 @@
> > >>   VLOG_DEFINE_THIS_MODULE(inc_proc_eng);
> > >>   static bool engine_force_recompute = false;
> > >> +static bool engine_abort_recompute = false;
> > >>   static const struct engine_context *engine_context;
> > >>   void
> > >> @@ -39,6 +40,12 @@ engine_set_force_recompute(bool val)
> > >>       engine_force_recompute = val;
> > >>   }
> > >> +void
> > >> +engine_set_abort_recompute(bool val)
> > >> +{
> > >> +    engine_abort_recompute = val;
> > >> +}
> > >> +
> > >>   const struct engine_context *
> > >>   engine_get_context(void)
> > >>   {
> > >> @@ -121,11 +128,11 @@ engine_ovsdb_node_add_index(struct engine_node
> > >> *node, const char *name,
> > >>       ed->n_indexes ++;
> > >>   }
> > >> -void
> > >> +bool
> > >>   engine_run(struct engine_node *node, uint64_t run_id)
> > >>   {
> > >>       if (node->run_id == run_id) {
> > >> -        return;
> > >> +        return true;
> > >>       }
> > >>       node->run_id = run_id;
> > >> @@ -133,11 +140,13 @@ engine_run(struct engine_node *node, uint64_t
> > >> run_id)
> > >>       if (!node->n_inputs) {
> > >>           node->run(node);
> > >>           VLOG_DBG("node: %s, changed: %d", node->name,
> node->changed);
> > >> -        return;
> > >> +        return true;
> > >>       }
> > >>       for (size_t i = 0; i < node->n_inputs; i++) {
> > >> -        engine_run(node->inputs[i].node, run_id);
> > >> +        if (!engine_run(node->inputs[i].node, run_id)) {
> > >> +            return false;
> > >> +        }
> > >>       }
> > >>       bool need_compute = false;
> > >> @@ -160,6 +169,10 @@ engine_run(struct engine_node *node, uint64_t
> > >> run_id)
> > >>       if (need_recompute) {
> > >>           VLOG_DBG("node: %s, recompute (%s)", node->name,
> > >>                    engine_force_recompute ? "forced" : "triggered");
> > >> +        if (engine_abort_recompute) {
> > >> +            VLOG_DBG("node: %s, recompute aborted", node->name);
> > >> +            return false;
> > >> +        }
> > >>           node->run(node);
> > >>       } else if (need_compute) {
> > >>           for (size_t i = 0; i < node->n_inputs; i++) {
> > >> @@ -170,6 +183,10 @@ engine_run(struct engine_node *node, uint64_t
> > >> run_id)
> > >>                       VLOG_DBG("node: %s, can't handle change for
> > >> input %s, "
> > >>                                "fall back to recompute",
> > >>                                node->name,
> node->inputs[i].node->name);
> > >> +                    if (engine_abort_recompute) {
> > >> +                        VLOG_DBG("node: %s, recompute aborted",
> > >> node->name);
> > >> +                        return false;
> > >> +                    }
> > >>                       node->run(node);
> > >>                       break;
> > >>                   }
> > >> @@ -178,6 +195,7 @@ engine_run(struct engine_node *node, uint64_t
> run_id)
> > >>       }
> > >>       VLOG_DBG("node: %s, changed: %d", node->name, node->changed);
> > >> +    return true;
> > >>   }
> > >>   bool
> > >> diff --git a/ovn/lib/inc-proc-eng.h b/ovn/lib/inc-proc-eng.h
> > >> index aab899e..c3d7b5e 100644
> > >> --- a/ovn/lib/inc-proc-eng.h
> > >> +++ b/ovn/lib/inc-proc-eng.h
> > >> @@ -121,8 +121,9 @@ struct engine_node {
> > >>   void engine_init(struct engine_node *);
> > >>   /* Execute the processing recursively, which should be called in the
> > >> main
> > >> - * loop. */
> > >> -void engine_run(struct engine_node *, uint64_t run_id);
> > >> + * loop. Returns true if the execution is compelte, false if it is
> > >> aborted,
> > >> + * which could happen when engine_abort_recompute is set. */
> > >> +bool engine_run(struct engine_node *, uint64_t run_id);
> > >>   /* Clean up the data for the engine nodes recursively. It calls each
> > >> node's
> > >>    * cleanup() method if not NULL. It should be called before the
> program
> > >> @@ -150,6 +151,10 @@ void engine_add_input(struct engine_node *node,
> > >> struct engine_node *input,
> > >>    * iteration, and the change can't be tracked across iterations */
> > >>   void engine_set_force_recompute(bool val);
> > >> +/* Set the flag to cause engine execution to be aborted when there
> > >> + * is any recompute to be triggered in any node. */
> > >> +void engine_set_abort_recompute(bool val);
> > >> +
> > >>   const struct engine_context * engine_get_context(void);
> > >>   void engine_set_context(const struct engine_context *);
> > >>
> > >
> >
>