[ovs-dev] [PATCH ovn] ovn-controller: Fix IP engine run with in-flight messages

Han Zhou zhouhan at gmail.com
Sun Aug 4 20:40:42 UTC 2019


On Fri, Aug 2, 2019 at 2:23 AM Dumitru Ceara <dceara at redhat.com> wrote:
>
> When trying to incrementally process changes even if there are in-flight
> messages we were incorrectly setting the engine_aborted variable to the
> value returned by engine_run. However, engine_run returns true if the
> run was completed.
>
> This was causing discrepancies between logical flows and openflow flows
> due to the fact that the rerun wasn't happening after an aborted run.
>
> In order to avoid confusion the engine_aborted variable is now renamed to
> engine_run_done thus avoiding the negated logic.
>
> CC: Han Zhou <hzhou8 at ebay.com>
> Fixes: a6b7d9f4f047 ("ovn-controller: Fix flow installation latency
caused by recompute.")
> Signed-off-by: Dumitru Ceara <dceara at redhat.com>
> ---
>  controller/ovn-controller.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index ad33d17..15b9f4e 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -1896,7 +1896,7 @@ main(int argc, char *argv[])
>
>      uint64_t engine_run_id = 0;
>      uint64_t old_engine_run_id = 0;
> -    bool engine_aborted = false;
> +    bool engine_run_done = true;
>
>      unsigned int ovs_cond_seqno = UINT_MAX;
>      unsigned int ovnsb_cond_seqno = UINT_MAX;
> @@ -1997,14 +1997,14 @@ main(int argc, char *argv[])
>                               * this round of engine_run and continue
processing
>                               * acculated changes incrementally later when
>                               * ofctrl_can_put() returns true. */
> -                            if (!engine_aborted) {
> +                            if (engine_run_done) {
>                                  engine_set_abort_recompute(true);
> -                                engine_aborted =
engine_run(&en_flow_output,
> -
 ++engine_run_id);
> +                                engine_run_done =
engine_run(&en_flow_output,
> +
++engine_run_id);
>                              }
>                          } else {
>                              engine_set_abort_recompute(false);
> -                            engine_aborted = false;
> +                            engine_run_done = true;
>                              engine_run(&en_flow_output, ++engine_run_id);
>                          }
>                      }
> @@ -2048,8 +2048,8 @@ main(int argc, char *argv[])
>                  }
>
>              }
> -            if (old_engine_run_id == engine_run_id || engine_aborted) {
> -                if (engine_aborted || engine_need_run(&en_flow_output)) {
> +            if (old_engine_run_id == engine_run_id || !engine_run_done) {
> +                if (!engine_run_done ||
engine_need_run(&en_flow_output)) {
>                      VLOG_DBG("engine did not run, force recompute next
time: "
>                               "br_int %p, chassis %p", br_int, chassis);
>                      engine_set_force_recompute(true);
> --
> 1.8.3.1
>
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Acked-by: Han Zhou <hzhou8 at ebay.com>

Thanks Dumitru for this critical fix! It was my silly mistake. I tried to
reproduce the test case failure you reported, but it still didn't happen on
my laptop, even with -j10. Later we may need to add a test case that
triggers a lot of flows being installed so that the scenario with in-flight
messages are easily covered.

The original patch is still waiting to be backported to 2.12. I will send a
new version followed by your fix on 2.12.


More information about the dev mailing list