[ovs-discuss] [OVN] logical flow explosion in lr_in_ip_input table for dnat_and_snat IPs

Girish Moodalbail gmoodalbail at gmail.com
Mon Jun 15 17:56:20 UTC 2020


Hello Han,

On Wed, Jun 3, 2020 at 9:39 PM Han Zhou <zhouhan at gmail.com> wrote:

>
>
> On Wed, Jun 3, 2020 at 7:16 PM Girish Moodalbail <gmoodalbail at gmail.com>
> wrote:
>
>> Hello all,
>>
>> While working on an extension, see the diagram below, to the existing
>> OVN logical topology for the ovn-kubernetes project, I am seeing an
>> explosion of the "Reply to ARP requests" logical flows in the
>> `lr_in_ip_input` table for the distributed router (ovn_cluster_router)
>> configured with gateway port (rtol-LS)
>>
>>                         internet
>>                ---------+-------------->
>>                         |
>>                         |
>>       +----------localnet-port---------+
>>       |LS                              |
>>       +-----------------ltor-LS--------+
>>                            |
>>                            |
>>  +---------------------rtol-LS------------+
>>  |           ovn_cluster_router           |
>>  |          (Distributed Router)          |
>>  +-rtos-ls0------rtos-ls1--------rtos-ls2-+
>>       |              |              |
>>       |              |              |
>> +-----+-+       +----+--+     +-----+-+
>> |  LS0  |       |  LS1  |     |  LS2  |
>> +-+-----+       +-+-----+     +-+-----+
>>   |               |             |
>>   p0              p1            p2
>>  IA0             IA1           IA2
>>  EA0             EA1           EA2
>> (Node0)          (Node1)       (Node2)
>>
>> In the topology above, each of the three logical switch port has an
>> internal address of IAx and an external address of EAx (dnat_and_snat IP).
>> They are all bound to their respective nodes (Nodex). A packet from `p0`
>> heading towards the internet will be SNAT'ed to EA0 on the local hypervisor
>> and then sent out through the LS's localnet-port on that hypervisor.
>> Basically, they are configured for distributed NATing.
>>
>> I am seeing interesting "Reply to ARP requests" flows for arp.tpa set to
>> "EAX". Flows are like this:
>>
>> For EA0
>> priority=90, match=(inport == "rtos-ls0" && arp.tpa == EA0 && arp.op ==
>> 1), action=(/* ARP reply */)
>> priority=90, match=(inport == "rtos-ls1" && arp.tpa == EA0 && arp.op ==
>> 1), action=(/* ARP reply */)
>> priority=90, match=(inport == "rtos-ls2" && arp.tpa == EA0 && arp.op ==
>> 1), action=(/* ARP reply */)
>>
>> For EA1
>> priority=90, match=(inport == "rtos-ls0" && arp.tpa == EA1 && arp.op ==
>> 1), action=(/* ARP reply */)
>> priority=90, match=(inport == "rtos-ls1" && arp.tpa == EA0 && arp.op ==
>> 1), action=(/* ARP reply */)
>> priority=90, match=(inport == "rtos-ls2" && arp.tpa == EA1 && arp.op ==
>> 1), action=(/* ARP reply */)
>>
>> Similarly, for EA2.
>>
>> So, we have N * N "Reply to ARP requests" flows for N nodes each with 1
>> dnat_and_snat ip.
>> This is causing scale issues.
>>
>> If you look at the flows for `EA0`, i am confused as to why is it needed?
>>
>>    1. When will one see an ARP request for the EA0 from any of the
>>    LS{0,1,2}'s logical switch port.
>>    2. If it is needed at all, can't we just remove the `inport` thing
>>    altogether since the flow is configured for every port of logical router
>>    port except for the distributed gateway port rtol-LS. For this port, we
>>    could add an higher priority rule with action set to `next`.
>>    3. Say, we don't need east-west NAT connectivity. Is there a way to
>>    make these ARPs be learnt dynamically, like we are doing for join and
>>    external logical switch (the other thread [1]).
>>
>> Regards,
>> ~Girish
>>
>> [1]
>> https://mail.openvswitch.org/pipermail/ovs-discuss/2020-May/049994.html
>>
>
> In general, these flows should be per router instead of per router port,
> since the nat addresses are not attached to any router port. For
> distributed gateway ports, there will need per-port flows to match
> is_chassis_resident(gateway-chassis). I think this can be handled by:
> - priority X + 20 flows for each distributed gateway port with
> is_chassis_resident(), reply ARP
> - priority X + 10 flows for each distributed gateway port without
> is_chassis_resident(), drop
> - priority X flows for each router (no need to match inport), reply ARP
>
> This way, there are N * (2D + 1) flows per router. N = number of NAT IPs,
> D = number of distributed gateway ports. This would optimize the above
> scenario where there is only 1 distributed gateway port but many regular
> router ports. Thoughts?
>

We went ahead and added support for this topology in ovn-kubernetes project
in this commit
https://github.com/ovn-org/ovn-kubernetes/commit/edb24e6a71142f2e835b67b29c11e1688c645683


Han, was curious to know if the above fix is in your radar? Thanks.

The number of OpenFlow flows in each of the hypervisors is insanely high
and is consuming a lot of memory.

Regards,
~Girish





>
> Thanks,
> Han
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20200615/325b8ed5/attachment.html>


More information about the discuss mailing list