[ovs-dev] [PATCH ovn v5] ovn-northd: Limit ARP/ND broadcast domain whenever possible.

Han Zhou hzhou at ovn.org
Thu Nov 7 02:51:16 UTC 2019


On Mon, Nov 4, 2019 at 11:32 AM Numan Siddique <numans at ovn.org> wrote:
>
> On Mon, Nov 4, 2019 at 8:37 PM Dumitru Ceara <dceara at redhat.com> wrote:
> >
> > ARP request and ND NS packets for router owned IPs were being
> > flooded in the complete L2 domain (using the MC_FLOOD multicast group).
> > However this creates a scaling issue in scenarios where aggregation
> > logical switches are connected to more logical routers (~350). The
> > logical pipelines of all routers would have to be executed before the
> > packet is finally replied to by a single router, the owner of the IP
> > address.
> >
> > This commit limits the broadcast domain by bypassing the L2 Lookup stage
> > for ARP requests that will be replied by a single router. The packets
> > are still flooded in the L2 domain but not on any of the other patch
> > ports towards other routers connected to the switch. This restricted
> > flooding is done by using a new multicast group (MC_ARP_ND_FLOOD).
> >
> > IPs that are owned by the routers and for which this fix applies are:
> > - IP addresses configured on the router ports.
> > - VIPs.
> > - NAT IPs.
> >
> > This commit also fixes function get_router_load_balancer_ips() which
> > was incorrectly returning a single address_family even though the
> > IP set could contain a mix of IPv4 and IPv6 addresses.
> >
> > Reported-at: https://bugzilla.redhat.com/1756945
> > Reported-by: Anil Venkata <vkommadi at redhat.com>
> > Signed-off-by: Dumitru Ceara <dceara at redhat.com>
>
> Thanks Dumitru, for addressing the review comments.
>
> Acked-by: Numan Siddique <numans at ovn.org>
>
> Han, if you can take a look in this patch and provide your comments,
> that would be great.
>

Sorry for delayed response :(

The patch looks good to me except:

1. It changes the behavior that when an ARP request for a LRP's IP is
coming to the logical switch, other routers will no longer learn the MAC-IP
binding of the ARP sender. This has been discussed and I tend to agree with
Dumitru that it should not cause real issue. I think it just worth to be
documented clearly in the ovn-architecture, probably in the section:
Logical Routers and Logical Patch Ports, because this is something
different from a traditional switch's behavior, and network administrators
may get suprised.

2. Since we are anyway changing the behavior of ARP request handling and
bypassing logical router ports, and we think it should not cause real
problem, then I wonder why adding the MC_ARP_ND group to still flood to the
non-router ports. Is it really useful? Maybe it is not a big deal, but I
think the extra MC group would add some performance cost and it is almost
redundant with the regular MC_FLOOD except that there is no router ports in
it - it is a lot of redundancy considering that most LSPs are regular VIFs
in typical large scale environments. I would suggest to simplify this
unless there is clear concerns of not flooding to other ports.

3. Not a big thing, but the name of the functions
build_lswitch_rport_arp_responders() and build_lswitch_rport_arp_flow() are
a little bit confusing. It is not for arp-responder, but for bypassing
router ports. Would it be better to be something like:
build_lswitch_rport_arp_req_flows() and
build_lswitch_rport_arp_request_flow_for_ip()?

In addition, not for the patch, but for the problem itself, I think we'd
better quantify the number of router ports we support with current resubmit
limit (as suggested by Ben), and document somewhere the limit and the
impact if user exceeds the limit. If the limit is easily exceed in very
common deployments, we probably should consider increasing the resubmit
limit.

Thanks,
Han


More information about the dev mailing list