[ovs-discuss] [OVN] flow explosion in lr_in_arp_resolve table

Fri May 22 02:12:41 UTC 2020

On Thu, May 21, 2020 at 6:58 PM Tim Rozet <trozet at redhat.com> wrote:

> On Thu, May 21, 2020 at 8:45 PM Venugopal Iyer <venugopali at nvidia.com>
> wrote:
>
>> Hi, Han:
>>
>> ________________________________________
>> From: ovn-kubernetes at googlegroups.com <ovn-kubernetes at googlegroups.com>
>> on behalf of Han Zhou <zhouhan at gmail.com>
>> Sent: Thursday, May 21, 2020 4:42 PM
>> To: Tim Rozet
>> Cc: Venugopal Iyer; Dumitru Ceara; Girish Moodalbail; Han Zhou; Dan
>> Winship; ovs-discuss; ovn-kubernetes at googlegroups.com; Michael Cambria
>> Subject: Re: [ovs-discuss] [OVN] flow explosion in lr_in_arp_resolve table
>>
>> External email: Use caution opening links or attachments
>>
>>
>>
>> On Thu, May 21, 2020 at 2:35 PM Tim Rozet <trozet at redhat.com<mailto:
>> trozet at redhat.com>> wrote:
>> I think that if you directly connect GR to DR you don't need to learn any
>> ARP with packet_in and you can preprogram the static entries. Each GR will
>> have 1 enty for the DR, while the DR will have N number of entries for N
>> nodes.
>>
>> Hi Tim, as mentioned by Girish, directly connecting GRs to DR requires N
>> ports on the DR and also requires a lot of small subnets, which is not
>> desirable. And since changes are needed anyway in OVN to support that, we
>> moved forward with the current approach of avoiding the static ARP flows to
>> solve the problem instead of directly connecting GRs to DR.
>>
>> Why is that not desirable? They are all private subnets with /30 (if
> using ipv4). If IPv6, it's even less of a concern from an addressing
> perspective.
>

It is not just about the subnet management but also the additional logical
flows that created between two ways of connecting DR and GR.

Say, we have a fix that efficiently allows one to connect 1000s of GR using
a single logical switch, then would you rather use that instead of 1000
patch cables connecting a GR to DR? It is not only the issue of Subnet
Management for those 1000 point-to-point connections but also those 1000
patch ports are local to each of the chassis, so we need to understand in
such a topology how many addition logical flows gets created in the SB and
how many OpenFlow flows gets created on each of the 1000 chassis for those
1000 patch cables.

>
> The real issue with ARP learning comes from the GR-----External. You have
>> to learn these, and from my conversation with Girish it seems like every GR
>> is adding an entry on every ARP request it sees. This means 1 GR sends ARP
>> request to external L2 network and every GR sees the ARP request and adds
>> an entry. I think the behavior should be:
>>
>> GRs only add ARP entries when:
>>
>>   1.  An ARP Response is sent to it
>>   2.  The GR receives a GARP broadcast, and already has an entry in his
>> cache for that IP (Girish mentioned this is similar to linux arp_accept
>> behavior)
>>
>> For 2), it is expensive to do in OVN because OpenFlow doesn't support a
>> match condition of "field1 == field2", which is required to check if the
>> incoming ARP request is a GARP, i.e. SPA == TPA. However, it is ok to
>> support something similar like linux arp_accept configuration but slightly
>> different. In OVN we can configure it to alllow/disable learning from all
>> ARP requests to IPs not belonging to the router, including GARPs. Would
>> that solve the problem here? (@Venugopal Iyer<mailto:
>> venugopali at nvidia.com>  brought up the same thing about "arp_accept". I
>> hope this reply addresses that as well)
>>
>
> I think the issue there is if you have an external device, which is using
> a VIP and it fails over, it will usually send GARP to inform of the mac
> change. In this case if you ignore GARP, what happens? You wont send
> another ARP because OVN programs the arp entry forever and doesn't expire
> it right? So you won't learn the new mac and keep sending packets to a dead
> mac?
>

I think we will have to support GARP otherwise VIPs will not work like Tim
mentions. If we do learn from GARP and as long as the GARP itself is not
originated by any of the 1000s GRs, then we should be fine.

Regards,
~Girish
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20200521/cef15dc8/attachment.html>