[ovs-dev] [PATCH] ovn-controller: Fix chassisredirect port flapping when ovs-vswitchd crashes

Mark Michelson mmichels at redhat.com
Mon Feb 4 15:40:15 UTC 2019


Acked-by: Mark Michelson <mmichels at redhat.com>

On 2/4/19 5:31 AM, nusiddiq at redhat.com wrote:
> From: Numan Siddique <nusiddiq at redhat.com>
> 
> On a chassis when ovs-vswitchd crashes for some reason, the BFD status doesn't
> get updated in the ovs db. ovn-controller will be reading the old BFD status
> even though ovs-vswitchd is crashed. This results in the chassiredirect port
> claim flapping between the master chassis and the chasiss with the next higher
> priority if ovs-vswitchd crashes in the master chassis.
> 
> All the other chassis notices the BFD status down with the master chassis
> and hence the next higher priority claims the port. But according to
> the master chassis, the BFD status is fine and it again claims back the
> chassisredirect port. And this results in flapping. The issue gets resolved
> when ovs-vswitchd comes back but until then it leads to lot of SB DB
> transactions and high CPU usage in ovn-controller's.
> 
> This patch fixes the issue by checking the OF connection status of the
> ovn-controller with ovs-vswitchd and calculates the active bfd tunnels
> only if it's connected.
> 
> Signed-off-by: Numan Siddique <nusiddiq at redhat.com>
> ---
>   ovn/controller/ofctrl.c         |  6 ++++++
>   ovn/controller/ofctrl.h         |  3 +++
>   ovn/controller/ovn-controller.c | 13 ++++++++++++-
>   tests/ovn.at                    | 30 +++++++++++++++++++++++++++++-
>   4 files changed, 50 insertions(+), 2 deletions(-)
> 
> diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c
> index 218612787..95b95b607 100644
> --- a/ovn/controller/ofctrl.c
> +++ b/ovn/controller/ofctrl.c
> @@ -1265,3 +1265,9 @@ ofctrl_inject_pkt(const struct ovsrec_bridge *br_int, const char *flow_s,
>   
>       return NULL;
>   }
> +
> +bool
> +ofctrl_is_connected(void)
> +{
> +    return rconn_is_connected(swconn);
> +}
> diff --git a/ovn/controller/ofctrl.h b/ovn/controller/ofctrl.h
> index ae0cfa513..f7521801b 100644
> --- a/ovn/controller/ofctrl.h
> +++ b/ovn/controller/ofctrl.h
> @@ -60,4 +60,7 @@ void ofctrl_check_and_add_flow(struct hmap *desired_flows, uint8_t table_id,
>                                  const struct match *,
>                                  const struct ofpbuf *ofpacts,
>                                  bool log_duplicate_flow);
> +
> +bool ofctrl_is_connected(void);
> +
>   #endif /* ovn/ofctrl.h */
> diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c
> index 4e9a5865f..b0f55a870 100644
> --- a/ovn/controller/ovn-controller.c
> +++ b/ovn/controller/ovn-controller.c
> @@ -689,7 +689,18 @@ main(int argc, char *argv[])
>                       ovsrec_bridge_table_get(ovs_idl_loop.idl), br_int,
>                       sbrec_chassis_table_get(ovnsb_idl_loop.idl), chassis_id,
>                       sbrec_sb_global_first(ovnsb_idl_loop.idl));
> -                bfd_calculate_active_tunnels(br_int, &active_tunnels);
> +
> +                if (ofctrl_is_connected()) {
> +                    /* Calculate the active tunnels only if have an an active
> +                     * OpenFlow connection to br-int.
> +                     * If we don't have a connection to br-int, it could mean
> +                     * ovs-vswitchd is down for some reason and the BFD status
> +                     * in the Interface rows could be stale. So its better to
> +                     * consider 'active_tunnels' set to be empty.
> +                     * */
> +                    bfd_calculate_active_tunnels(br_int, &active_tunnels);
> +                }
> +
>                   binding_run(ovnsb_idl_txn, ovs_idl_txn, sbrec_chassis_by_name,
>                               sbrec_datapath_binding_by_key,
>                               sbrec_port_binding_by_datapath,
> diff --git a/tests/ovn.at b/tests/ovn.at
> index f54f24c74..feafe1f00 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -8055,7 +8055,35 @@ ovn-nbctl --timeout=3 --wait=hv \
>   
>   test_ip_packet gw2 gw1
>   
> -OVN_CLEANUP([hv1],[gw1],[gw2],[ext1])
> +# Both gw1 and gw2 at this point should have claimed the cr-alice
> +# once each.
> +gw1_claim_ct=`grep "cr-alice: Claiming" gw1/ovn-controller.log | wc -l`
> +gw2_claim_ct=`grep "cr-alice: Claiming" gw2/ovn-controller.log | wc -l`
> +
> +# Kill ovs-vswitchd in gw2. gw1 should claim the gateway port.
> +# So gw1 should claim twice and gw1 only once.
> +
> +kill `cat gw2/ovs-vswitchd.pid`
> +
> +# gw1 should claim the cr-alice
> +gw1_claim_ct=$((gw1_claim_ct+1))
> +
> +OVS_WAIT_UNTIL([test $gw1_claim_ct = `cat gw1/ovn-controller.log \
> +| grep -c "cr-alice: Claiming"`])
> +
> +AT_CHECK([test $gw2_claim_ct = `cat gw2/ovn-controller.log | \
> +grep -c "cr-alice: Claiming"`])
> +
> +ovn-sbctl show
> +test_ip_packet gw1 gw2
> +
> +as gw2
> +ovs-appctl -t ovn-controller exit
> +ovs-appctl -t ovs-vswitchd exit
> +ps -aef | grep ovn-c
> +
> +OVN_CLEANUP([hv1],[gw1],[ext1])
> +
>   AT_CLEANUP
>   
>   AT_SETUP([ovn -- 4 HV, 3 LS, 2 LR, packet test with HA distributed router gateway port])
> 



More information about the dev mailing list