[ovs-dev] [PATCH ovn v2] northd: Don't merge ARP flood flows for all (un)reachable IPs.
Numan Siddique
numans at ovn.org
Fri Jul 30 17:57:23 UTC 2021
On Fri, Jul 30, 2021 at 10:42 AM Dumitru Ceara <dceara at redhat.com> wrote:
>
> Logical_Flows are immutable, that is, when the match or action change
> ovn-northd will actually remove the old logical flow and add a new one.
>
> Commit ecc941c70f16 added flows to flood ARPs to routers for unreachable
> addresses.
>
> In the following topology (2 switches connected to a router with a load
> balancer attached to it):
>
> S1 --- (IP1/MASK1) R (IP2/MASK2) --- S2
>
> LB1 (applied on R) with N VIPs.
>
> The following logical flows were generated:
>
> S1 (in_l2_lkup): if "arp.op == 1 && arp.tpa == { VIP1, .., VIPn, IP1 }" then ...
> S2 (in_l2_lkup): if "arp.op == 1 && arp.tpa == { VIP1, .., VIPn, IP2 }" then ...
>
> While this seemed a good idea initially because it would reduce the
> total number of logical flows, this approach has a few major downsides
> that severely affect scalability:
>
> 1. When logical datapath grouping is enabled (which is how high scale
> deployments should use OVN) there is no chance that the flows for
> reachable/unreachable router owned IPs are grouped together for all
> datapaths that correspond to the logical switches they are generated
> for.
> 2. Whenever a VIP (VIPn+1) is added to LB1 all these flows will be
> removed and new ones will be added with the only difference being the
> VIPn+1 aded to the flow match. As this is a new logical flow record,
> ovn-controller will have to remove all previously installed openflows
> (for the deleted logical flow) and add new ones. However, for most
> of these openflows, the only change is the cookie value (the first 32
> bits of the logical flow UUID). At scale this unnecessary openflow
> churn will induce significant latency. Moreover, this also creates
> unnecessary jsonrpc traffic from ovn-northd towards OVN_Southbound
> because the old flow needs to be removed and a new one (with a large
> match field) needs to be installed.
>
> To avoid this, we now install individual flows, one per IP, for
> managing ARP/ND traffic from logical switches towards router owned
> IPs that are reachable/unreachable on the logical port connecting
> the switch.
>
> On a deployment based on an ovn-kubernetes cluster with 20 nodes, 2500
> load balancer VIPs, 5000 pods, the following test was run:
> - measure number of logical flows in the SB DB, size of the SB DB
> - then create an additional 250 pods (logical ports), bind them, and add
> a load balancer VIP (with a single backend) for each of the additional
> pods, and measure how long it takes for all openflows corresponding to
> each of the pods to be installed in OVS.
>
> Results:
> - without fix:
> ------------
> SB lflows: 67976
> SB size (uncompacted): 46 MiB
> SB size (compacted): 40 MiB
>
> after 250 ports + VIPs added:
> SB lflows: 71479
> SB size (uncompacted): 248 MiB
> SB size (compacted): 42 MiB
> Max time to install all relevant openflows for a single pod: ~6sec
>
> - with fix:
> ---------
> SB lflows: 70399
> SB size (uncompacted): 46 MiB
> SB size (compacted): 40 MiB
>
> after 250 ports + VIPs added:
> SB lflows: 74149
> SB size (uncompacted): 165 MiB
> SB size (compacted): 42 MiB
> Max time to install all relevant openflows for a single pod: ~100msec
>
> NOTE:
> Since we now call build_lswitch_rport_arp_req_flow_for_unreachable_ip()
> for every load balancer VIP (instead of once for a list of VIPs) load on
> ovn-northd will be increased as it has to create more logical flows that
> will eventually be grouped together on the same datapath group.
>
> Fixes: ecc941c70f16 ("northd: Flood ARPs to routers for "unreachable" addresses.")
> Acked-by: Mark Michelson <mmichels at redhat.com>
> Signed-off-by: Dumitru Ceara <dceara at redhat.com>
Thanks Dumitru.
I applied this patch to the main branch.
Numan
> ---
> V2:
> - Address Mark Gray's comments:
> - update table id in code comments.
> - include the test results and northd performance note in the commit
> log.
> - Added Mark Michelson's ack.
> ---
> northd/ovn-northd.c | 108 ++++++++++++++++---------------------------
> northd/ovn_northd.dl | 28 +++++------
> tests/ovn-northd.at | 8 ++--
> 3 files changed, 57 insertions(+), 87 deletions(-)
>
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index 847a3e7c1..f91e662ee 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -6574,7 +6574,7 @@ lrouter_port_ipv6_reachable(const struct ovn_port *op,
> }
>
> /*
> - * Ingress table 19: Flows that flood self originated ARP/ND packets in the
> + * Ingress table 22: Flows that flood self originated ARP/ND packets in the
> * switching domain.
> */
> static void
> @@ -6651,7 +6651,7 @@ build_lswitch_rport_arp_req_self_orig_flow(struct ovn_port *op,
> }
>
> static void
> -arp_nd_ns_match(struct ds *ips, int addr_family, struct ds *match)
> +arp_nd_ns_match(const char *ips, int addr_family, struct ds *match)
> {
> /* Packets received from VXLAN tunnels have already been through the
> * router pipeline so we should skip them. Normally this is done by the
> @@ -6661,22 +6661,19 @@ arp_nd_ns_match(struct ds *ips, int addr_family, struct ds *match)
> ds_put_cstr(match, FLAGBIT_NOT_VXLAN " && ");
>
> if (addr_family == AF_INET) {
> - ds_put_cstr(match, "arp.op == 1 && arp.tpa == {");
> + ds_put_format(match, "arp.op == 1 && arp.tpa == %s", ips);
> } else {
> - ds_put_cstr(match, "nd_ns && nd.target == {");
> + ds_put_format(match, "nd_ns && nd.target == %s", ips);
> }
> -
> - ds_put_cstr(match, ds_cstr_ro(ips));
> - ds_put_cstr(match, "}");
> }
>
> /*
> - * Ingress table 19: Flows that forward ARP/ND requests only to the routers
> + * Ingress table 22: Flows that forward ARP/ND requests only to the routers
> * that own the addresses. Other ARP/ND packets are still flooded in the
> * switching domain as regular broadcast.
> */
> static void
> -build_lswitch_rport_arp_req_flow_for_reachable_ip(struct ds *ips,
> +build_lswitch_rport_arp_req_flow_for_reachable_ip(const char *ips,
> int addr_family, struct ovn_port *patch_op, struct ovn_datapath *od,
> uint32_t priority, struct hmap *lflows,
> const struct ovsdb_idl_row *stage_hint)
> @@ -6708,14 +6705,14 @@ build_lswitch_rport_arp_req_flow_for_reachable_ip(struct ds *ips,
> }
>
> /*
> - * Ingress table 19: Flows that forward ARP/ND requests for "unreachable" IPs
> + * Ingress table 22: Flows that forward ARP/ND requests for "unreachable" IPs
> * (NAT or load balancer IPs configured on a router that are outside the
> * router's configured subnets).
> * These ARP/ND packets are flooded in the switching domain as regular
> * broadcast.
> */
> static void
> -build_lswitch_rport_arp_req_flow_for_unreachable_ip(struct ds *ips,
> +build_lswitch_rport_arp_req_flow_for_unreachable_ip(const char *ips,
> int addr_family, struct ovn_datapath *od, uint32_t priority,
> struct hmap *lflows, const struct ovsdb_idl_row *stage_hint)
> {
> @@ -6732,7 +6729,7 @@ build_lswitch_rport_arp_req_flow_for_unreachable_ip(struct ds *ips,
> }
>
> /*
> - * Ingress table 19: Flows that forward ARP/ND requests only to the routers
> + * Ingress table 22: Flows that forward ARP/ND requests only to the routers
> * that own the addresses.
> * Priorities:
> * - 80: self originated GARPs that need to follow regular processing.
> @@ -6757,10 +6754,6 @@ build_lswitch_rport_arp_req_flows(struct ovn_port *op,
> * router port.
> * Priority: 80.
> */
> - struct ds ips_v4_match_reachable = DS_EMPTY_INITIALIZER;
> - struct ds ips_v6_match_reachable = DS_EMPTY_INITIALIZER;
> - struct ds ips_v4_match_unreachable = DS_EMPTY_INITIALIZER;
> - struct ds ips_v6_match_unreachable = DS_EMPTY_INITIALIZER;
>
> const char *ip_addr;
> SSET_FOR_EACH (ip_addr, &op->od->lb_ips_v4) {
> @@ -6771,9 +6764,13 @@ build_lswitch_rport_arp_req_flows(struct ovn_port *op,
> */
> if (ip_parse(ip_addr, &ipv4_addr)) {
> if (lrouter_port_ipv4_reachable(op, ipv4_addr)) {
> - ds_put_format(&ips_v4_match_reachable, "%s, ", ip_addr);
> + build_lswitch_rport_arp_req_flow_for_reachable_ip(
> + ip_addr, AF_INET, sw_op, sw_od, 80, lflows,
> + stage_hint);
> } else {
> - ds_put_format(&ips_v4_match_unreachable, "%s, ", ip_addr);
> + build_lswitch_rport_arp_req_flow_for_unreachable_ip(
> + ip_addr, AF_INET, sw_od, 90, lflows,
> + stage_hint);
> }
> }
> }
> @@ -6785,9 +6782,13 @@ build_lswitch_rport_arp_req_flows(struct ovn_port *op,
> */
> if (ipv6_parse(ip_addr, &ipv6_addr)) {
> if (lrouter_port_ipv6_reachable(op, &ipv6_addr)) {
> - ds_put_format(&ips_v6_match_reachable, "%s, ", ip_addr);
> + build_lswitch_rport_arp_req_flow_for_reachable_ip(
> + ip_addr, AF_INET6, sw_op, sw_od, 80, lflows,
> + stage_hint);
> } else {
> - ds_put_format(&ips_v6_match_unreachable, "%s, ", ip_addr);
> + build_lswitch_rport_arp_req_flow_for_unreachable_ip(
> + ip_addr, AF_INET6, sw_od, 90, lflows,
> + stage_hint);
> }
> }
> }
> @@ -6812,65 +6813,42 @@ build_lswitch_rport_arp_req_flows(struct ovn_port *op,
>
> if (!sset_contains(&op->od->lb_ips_v6, nat->external_ip)) {
> if (lrouter_port_ipv6_reachable(op, addr)) {
> - ds_put_format(&ips_v6_match_reachable, "%s, ",
> - nat->external_ip);
> + build_lswitch_rport_arp_req_flow_for_reachable_ip(
> + nat->external_ip, AF_INET6, sw_op, sw_od, 80, lflows,
> + stage_hint);
> } else {
> - ds_put_format(&ips_v6_match_unreachable, "%s, ",
> - nat->external_ip);
> + build_lswitch_rport_arp_req_flow_for_unreachable_ip(
> + nat->external_ip, AF_INET6, sw_od, 90, lflows,
> + stage_hint);
> }
> }
> } else {
> ovs_be32 addr = nat_entry->ext_addrs.ipv4_addrs[0].addr;
> if (!sset_contains(&op->od->lb_ips_v4, nat->external_ip)) {
> if (lrouter_port_ipv4_reachable(op, addr)) {
> - ds_put_format(&ips_v4_match_reachable, "%s, ",
> - nat->external_ip);
> + build_lswitch_rport_arp_req_flow_for_reachable_ip(
> + nat->external_ip, AF_INET, sw_op, sw_od, 80, lflows,
> + stage_hint);
> } else {
> - ds_put_format(&ips_v4_match_unreachable, "%s, ",
> - nat->external_ip);
> + build_lswitch_rport_arp_req_flow_for_unreachable_ip(
> + nat->external_ip, AF_INET, sw_od, 90, lflows,
> + stage_hint);
> }
> }
> }
> }
>
> for (size_t i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) {
> - ds_put_format(&ips_v4_match_reachable, "%s, ",
> - op->lrp_networks.ipv4_addrs[i].addr_s);
> - }
> - for (size_t i = 0; i < op->lrp_networks.n_ipv6_addrs; i++) {
> - ds_put_format(&ips_v6_match_reachable, "%s, ",
> - op->lrp_networks.ipv6_addrs[i].addr_s);
> - }
> -
> - if (ds_last(&ips_v4_match_reachable) != EOF) {
> - ds_chomp(&ips_v4_match_reachable, ' ');
> - ds_chomp(&ips_v4_match_reachable, ',');
> build_lswitch_rport_arp_req_flow_for_reachable_ip(
> - &ips_v4_match_reachable, AF_INET, sw_op, sw_od, 80, lflows,
> - stage_hint);
> + op->lrp_networks.ipv4_addrs[i].addr_s, AF_INET, sw_op, sw_od, 80,
> + lflows, stage_hint);
> }
> - if (ds_last(&ips_v6_match_reachable) != EOF) {
> - ds_chomp(&ips_v6_match_reachable, ' ');
> - ds_chomp(&ips_v6_match_reachable, ',');
> + for (size_t i = 0; i < op->lrp_networks.n_ipv6_addrs; i++) {
> build_lswitch_rport_arp_req_flow_for_reachable_ip(
> - &ips_v6_match_reachable, AF_INET6, sw_op, sw_od, 80, lflows,
> - stage_hint);
> + op->lrp_networks.ipv6_addrs[i].addr_s, AF_INET6, sw_op, sw_od, 80,
> + lflows, stage_hint);
> }
>
> - if (ds_last(&ips_v4_match_unreachable) != EOF) {
> - ds_chomp(&ips_v4_match_unreachable, ' ');
> - ds_chomp(&ips_v4_match_unreachable, ',');
> - build_lswitch_rport_arp_req_flow_for_unreachable_ip(
> - &ips_v4_match_unreachable, AF_INET, sw_od, 90, lflows,
> - stage_hint);
> - }
> - if (ds_last(&ips_v6_match_unreachable) != EOF) {
> - ds_chomp(&ips_v6_match_unreachable, ' ');
> - ds_chomp(&ips_v6_match_unreachable, ',');
> - build_lswitch_rport_arp_req_flow_for_unreachable_ip(
> - &ips_v6_match_unreachable, AF_INET6, sw_od, 90, lflows,
> - stage_hint);
> - }
> /* Self originated ARP requests/ND need to be flooded as usual.
> *
> * However, if the switch doesn't have any non-router ports we shouldn't
> @@ -6881,10 +6859,6 @@ build_lswitch_rport_arp_req_flows(struct ovn_port *op,
> if (sw_od->n_router_ports != sw_od->nbs->n_ports) {
> build_lswitch_rport_arp_req_self_orig_flow(op, 75, sw_od, lflows);
> }
> - ds_destroy(&ips_v4_match_reachable);
> - ds_destroy(&ips_v6_match_reachable);
> - ds_destroy(&ips_v4_match_unreachable);
> - ds_destroy(&ips_v6_match_unreachable);
> }
>
> static void
> @@ -7595,7 +7569,7 @@ build_lswitch_external_port(struct ovn_port *op,
> }
> }
>
> -/* Ingress table 19: Destination lookup, broadcast and multicast handling
> +/* Ingress table 22: Destination lookup, broadcast and multicast handling
> * (priority 70 - 100). */
> static void
> build_lswitch_destination_lookup_bmcast(struct ovn_datapath *od,
> @@ -7687,7 +7661,7 @@ build_lswitch_destination_lookup_bmcast(struct ovn_datapath *od,
> }
>
>
> -/* Ingress table 19: Add IP multicast flows learnt from IGMP/MLD
> +/* Ingress table 22: Add IP multicast flows learnt from IGMP/MLD
> * (priority 90). */
> static void
> build_lswitch_ip_mcast_igmp_mld(struct ovn_igmp_group *igmp_group,
> @@ -7765,7 +7739,7 @@ build_lswitch_ip_mcast_igmp_mld(struct ovn_igmp_group *igmp_group,
>
> static struct ovs_mutex mcgroup_mutex = OVS_MUTEX_INITIALIZER;
>
> -/* Ingress table 19: Destination lookup, unicast handling (priority 50), */
> +/* Ingress table 22: Destination lookup, unicast handling (priority 50), */
> static void
> build_lswitch_ip_unicast_lookup(struct ovn_port *op,
> struct hmap *lflows,
> diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
> index 2cf93d61a..011902c87 100644
> --- a/northd/ovn_northd.dl
> +++ b/northd/ovn_northd.dl
> @@ -4375,8 +4375,7 @@ Flow(.logical_datapath = sw._uuid,
> .stage = s_SWITCH_IN_L2_LKUP(),
> .priority = 80,
> .__match = fLAGBIT_NOT_VXLAN() ++
> - " && arp.op == 1 && arp.tpa == { " ++
> - ipv4.to_vec().join(", ") ++ "}",
> + " && arp.op == 1 && arp.tpa == " ++ ipv4,
> .actions = if (sw.has_non_router_port) {
> "clone {outport = ${sp.json_name}; output; }; "
> "outport = ${mc_flood_l2}; output;"
> @@ -4386,15 +4385,14 @@ Flow(.logical_datapath = sw._uuid,
> .external_ids = stage_hint(sp.lsp._uuid)) :-
> sp in &SwitchPort(.sw = sw, .peer = Some{rp}),
> rp.is_enabled(),
> - &SwitchPortARPForwards(.port = sp, .reachable_ips_v4 = ipv4),
> - not ipv4.is_empty(),
> + &SwitchPortARPForwards(.port = sp, .reachable_ips_v4 = ips_v4),
> + var ipv4 = FlatMap(ips_v4),
> var mc_flood_l2 = json_string_escape(mC_FLOOD_L2().0).
> Flow(.logical_datapath = sw._uuid,
> .stage = s_SWITCH_IN_L2_LKUP(),
> .priority = 80,
> .__match = fLAGBIT_NOT_VXLAN() ++
> - " && nd_ns && nd.target == { " ++
> - ipv6.to_vec().join(", ") ++ "}",
> + " && nd_ns && nd.target == " ++ ipv6,
> .actions = if (sw.has_non_router_port) {
> "clone {outport = ${sp.json_name}; output; }; "
> "outport = ${mc_flood_l2}; output;"
> @@ -4404,36 +4402,34 @@ Flow(.logical_datapath = sw._uuid,
> .external_ids = stage_hint(sp.lsp._uuid)) :-
> sp in &SwitchPort(.sw = sw, .peer = Some{rp}),
> rp.is_enabled(),
> - &SwitchPortARPForwards(.port = sp, .reachable_ips_v6 = ipv6),
> - not ipv6.is_empty(),
> + &SwitchPortARPForwards(.port = sp, .reachable_ips_v6 = ips_v6),
> + var ipv6 = FlatMap(ips_v6),
> var mc_flood_l2 = json_string_escape(mC_FLOOD_L2().0).
>
> Flow(.logical_datapath = sw._uuid,
> .stage = s_SWITCH_IN_L2_LKUP(),
> .priority = 90,
> .__match = fLAGBIT_NOT_VXLAN() ++
> - " && arp.op == 1 && arp.tpa == {" ++
> - ipv4.to_vec().join(", ") ++ "}",
> + " && arp.op == 1 && arp.tpa == " ++ ipv4,
> .actions = "outport = ${flood}; output;",
> .external_ids = stage_hint(sp.lsp._uuid)) :-
> sp in &SwitchPort(.sw = sw, .peer = Some{rp}),
> rp.is_enabled(),
> - &SwitchPortARPForwards(.port = sp, .unreachable_ips_v4 = ipv4),
> - not ipv4.is_empty(),
> + &SwitchPortARPForwards(.port = sp, .unreachable_ips_v4 = ips_v4),
> + var ipv4 = FlatMap(ips_v4),
> var flood = json_string_escape(mC_FLOOD().0).
>
> Flow(.logical_datapath = sw._uuid,
> .stage = s_SWITCH_IN_L2_LKUP(),
> .priority = 90,
> .__match = fLAGBIT_NOT_VXLAN() ++
> - " && nd_ns && nd.target == {" ++
> - ipv6.to_vec().join(", ") ++ "}",
> + " && nd_ns && nd.target == " ++ ipv6,
> .actions = "outport = ${flood}; output;",
> .external_ids = stage_hint(sp.lsp._uuid)) :-
> sp in &SwitchPort(.sw = sw, .peer = Some{rp}),
> rp.is_enabled(),
> - &SwitchPortARPForwards(.port = sp, .unreachable_ips_v6 = ipv6),
> - not ipv6.is_empty(),
> + &SwitchPortARPForwards(.port = sp, .unreachable_ips_v6 = ips_v6),
> + var ipv6 = FlatMap(ips_v6),
> var flood = json_string_escape(mC_FLOOD().0).
>
> for (SwitchPortNewDynamicAddress(.port = &SwitchPort{.lsp = lsp, .json_name = json_name, .sw = sw},
> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> index 9066b7415..66ee8a279 100644
> --- a/tests/ovn-northd.at
> +++ b/tests/ovn-northd.at
> @@ -4276,20 +4276,20 @@ AT_CHECK([ovn-sbctl lflow-list sw | grep ls_in_l2_lkup | grep priority=90 -c], [
> ])
>
> # We expect that the installed flows will match the unreachable DNAT addresses only.
> -AT_CHECK([ovn-sbctl lflow-list sw | grep ls_in_l2_lkup | grep priority=90 | grep "arp.tpa == {30.0.0.100}" -c], [0], [dnl
> +AT_CHECK([ovn-sbctl lflow-list sw | grep ls_in_l2_lkup | grep priority=90 | grep "arp.tpa == 30.0.0.100" -c], [0], [dnl
> 1
> ])
>
> -AT_CHECK([ovn-sbctl lflow-list sw | grep ls_in_l2_lkup | grep priority=90 | grep "arp.tpa == {40.0.0.100}" -c], [0], [dnl
> +AT_CHECK([ovn-sbctl lflow-list sw | grep ls_in_l2_lkup | grep priority=90 | grep "arp.tpa == 40.0.0.100" -c], [0], [dnl
> 1
> ])
>
> # Ensure that we do not create flows for SNAT addresses
> -AT_CHECK([ovn-sbctl lflow-list sw | grep ls_in_l2_lkup | grep priority=90 | grep "arp.tpa == {30.0.0.200}" -c], [1], [dnl
> +AT_CHECK([ovn-sbctl lflow-list sw | grep ls_in_l2_lkup | grep priority=90 | grep "arp.tpa == 30.0.0.200" -c], [1], [dnl
> 0
> ])
>
> -AT_CHECK([ovn-sbctl lflow-list sw | grep ls_in_l2_lkup | grep priority=90 | grep "arp.tpa == {40.0.0.200}" -c], [1], [dnl
> +AT_CHECK([ovn-sbctl lflow-list sw | grep ls_in_l2_lkup | grep priority=90 | grep "arp.tpa == 40.0.0.200" -c], [1], [dnl
> 0
> ])
>
> --
> 2.27.0
>
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
More information about the dev
mailing list