[ovs-discuss] [OVN] flow explosion in lr_in_arp_resolve table

Han Zhou zhouhan at gmail.com
Sun May 17 06:17:04 UTC 2020


On Sat, May 16, 2020 at 12:13 PM Girish Moodalbail <gmoodalbail at gmail.com>
wrote:

> Hello Han,
>
> Can you please explain how the dynamic resolution of the IP-to-MAC will
> work with this new option set?
>
> Say the packet is being forwarded from router2 towards the distributed
> router? So, nexthop (reg0) is set to IP1 and we need to find the MAC
> address M1 to set eth.dst to.
>
> +----------------+        +----------------+
> |   l3gateway    |        |   l3gateway    |
> |    router2     |        |    router3     |
> +-------------+--+        +-+--------------+
>             IP2,M2         IP3,M3
>               |             |
>            +--+-------------+---+
>            |    join switch     |
>            +---------+----------+
>                      |
>                   IP1,M1
>              +-------+--------+
>              |  distributed   |
>              |     router     |
>              +----------------+
>
> The MAC M1 will not obviously in the MAC_binding table. On the hypervisor
> where the packet originated, the router2's port and the distributed
> router's port are locally present. So, does this result in a PACKET_IN to
> the ovn-controller and the resolution happens there?
>

Yes there will be a PACKET_IN, and then:
1. ovn-controller will generate the ARP request for IP1, and send
PACKET_OUT to OVS.
2. The ARP request will be delivered to the distributed router pipeline
only, because of a special handling of ARP in OVN for IPs of router ports,
although it is a broadcast. (It would have been broadcasted to all GRs
without that special handling)
3. The distributed router pipeline should learn the IP-MAC binding of
IP2-M2 (through a PACKET_IN to ovn-controller), and at the same time send
ARP reply to the router2 in the distributed router pipeline.
4. Router2 pipeline will handle the ARP response and learn the IP-MAC
binding of IP1-M1 (through a PACKET_IN to ovn-controller).


>
> How about the resolution of IP3-to-M3 happen on gateway router2? Will
> there be an ARP request packet that will be broadcasted on the join switch
> for this case?
>

I think in the use case of ovn-k8s, as you described before, this should
not happen. However, if this does happen, it is similar to above steps,
except that in step 2) and 3) the ARP request and response will be sent
between the chassises through tunnel. If this happens between all pairs of
GRs, then there will be again O(n^2) MAC_Binding entries.

I haven't tested the GR scenario yet, so I can't guarantee it works as
expected. Please let me know if you see any problems. I will submit formal
patch with more test cases if it is confirmed in your environment.

Thanks,
Han


>
> Regards,
> ~Girish
>
> On Sat, May 16, 2020 at 10:25 AM Girish Moodalbail <gmoodalbail at gmail.com>
> wrote:
>
>>
>>
>> On Sat, May 16, 2020 at 12:36 AM Han Zhou <zhouhan at gmail.com> wrote:
>>
>>>
>>>
>>> On Tue, May 5, 2020 at 11:57 AM Han Zhou <hzhou at ovn.org> wrote:
>>> >
>>> >
>>> >
>>> > On Fri, May 1, 2020 at 2:14 PM Dan Winship <danwinship at redhat.com>
>>> wrote:
>>> > >
>>> > > On 5/1/20 12:37 PM, Girish Moodalbail wrote:
>>> > > > If we now look at table=12 (lr_in_arp_resolve) in the ingress
>>> pipeline
>>> > > > of Gateway Router-1, then you will see that there will be 2000
>>> logical
>>> > > > flow entries...
>>> > >
>>> > > > In the topology above, the only intended path is North-South
>>> between
>>> > > > each gateway router and the logical router. There is no east-west
>>> > > > traffic between the gateway routers
>>> > >
>>> > > > Is there an another way to solve the above problem with just
>>> keeping the
>>> > > > single join logical switch?
>>> > >
>>> > > Two thoughts:
>>> > >
>>> > > 1. In openshift-sdn, the bridge doesn't try to handle ARP itself. It
>>> > > just lets ARP requests pass through normally, and lets ARP replies
>>> pass
>>> > > through normally as long as they are correct (ie, it doesn't let
>>> > > spoofing through). This means fewer flows but more traffic. Maybe
>>> that's
>>> > > the right tradeoff?
>>> > >
>>> > The 2M entries here is not for ARP responder, but more equivalent to
>>> the neighbour table (or ARP cache), on each LR. The ARP responder resides
>>> in the LS (join logical switch), which is O(n) instead of O(n^2), so it is
>>> not a problem here.
>>> >
>>> > However, a similar idea may works here to avoid the O(n^2) scale
>>> issue. For the neighbour table, actually OVN has two parts, one is
>>> statically build, which is the 2M entires mentioned in this case, and the
>>> other is the dynamic ARP resolve - the mac_binding table, which is
>>> dynamically populated by handling ARP messages. To solve the problem here,
>>> it is possible to change OVN to support configuring a LR to avoid static
>>> neighbour table, and relies only on dynamic ARP resolving. In this case,
>>> all the gateway routers can be configured as not using static ARP
>>> resolving, and eventually there will be only 2 entries (one for IPv4 and
>>> one for IPv6) for each gateway router in mac_binding table for the
>>> north-south traffic to the join router. (of source there will be still same
>>> amount of mac_bindings in each router for the external traffic on the other
>>> side of the gateway routers).
>>> >
>>> > This change seems straightforward, but I am not sure if there is any
>>> corner cases.
>>>
>>> Hi Girish,
>>>
>>> I've sent a RFC patch here for the above proposal:
>>> https://patchwork.ozlabs.org/project/openvswitch/patch/1589614395-99499-1-git-send-email-hzhou@ovn.org/
>>> For this use case, just set options:dynamic_neigh_routes=true for all
>>> the Gateway Routers. Could you try it in your scale environment and see if
>>> it solves the problem?
>>>
>>> Thanks,
>>> Han
>>>
>>> >
>>> > > 2. In most places in ovn-kubernetes, our MAC addresses are
>>> > > programmatically related to the corresponding IP addresses, and in
>>> > > places where that's not currently true, we could try to make it true,
>>> > > and then perhaps the thousands of rules could just be replaced by a
>>> > > single rule?
>>> > >
>>> > This may be a good idea, but I am not sure how to implement in OVN to
>>> make it generic, since most OVN users can't make such assumption.
>>> >
>>> > On the other hand, why wouldn't splitting the join logical switch to
>>> 1000 LSes solve the problem? I understand that there will be 1000 more
>>> datapaths, and 1000 more LRPs, but these are all O(n), which is much more
>>> efficient than the O(n^2) exploding. What's the other scale issues created
>>> by this?
>>> >
>>> > In addition, Girish, for the external LS, I am not sure why can't it
>>> be shared, if all the nodes are connected to a single L2 network. (If they
>>> are connected to separate L2 networks, different external LSes should be
>>> created, at least according to current OVN model).
>>>
>>
>> Thanks Han for the patch. Will give it a try and let you know.
>>
>> Regards,
>> ~Girish
>>
>>
>>> >
>>> > Thanks,
>>> > Han
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20200516/0cfa921e/attachment.html>


More information about the discuss mailing list