[ovs-discuss] [OVN] logical flow explosion in lr_in_ip_input table for dnat_and_snat IPs

Girish Moodalbail gmoodalbail at gmail.com
Tue Jun 16 15:18:36 UTC 2020


Thanks Han for the update.

Regards,
~Girish

On Mon, Jun 15, 2020 at 12:55 PM Han Zhou <zhouhan at gmail.com> wrote:

> Sorry Girish, I can't promise for now. I will see if I have time in the
> next couple of weeks, but welcome anyone to volunteer on this if it is
> urgent.
>
> On Mon, Jun 15, 2020 at 10:56 AM Girish Moodalbail <gmoodalbail at gmail.com>
> wrote:
>
>> Hello Han,
>>
>> On Wed, Jun 3, 2020 at 9:39 PM Han Zhou <zhouhan at gmail.com> wrote:
>>
>>>
>>>
>>> On Wed, Jun 3, 2020 at 7:16 PM Girish Moodalbail <gmoodalbail at gmail.com>
>>> wrote:
>>>
>>>> Hello all,
>>>>
>>>> While working on an extension, see the diagram below, to the existing
>>>> OVN logical topology for the ovn-kubernetes project, I am seeing an
>>>> explosion of the "Reply to ARP requests" logical flows in the
>>>> `lr_in_ip_input` table for the distributed router (ovn_cluster_router)
>>>> configured with gateway port (rtol-LS)
>>>>
>>>>                         internet
>>>>                ---------+-------------->
>>>>                         |
>>>>                         |
>>>>       +----------localnet-port---------+
>>>>       |LS                              |
>>>>       +-----------------ltor-LS--------+
>>>>                            |
>>>>                            |
>>>>  +---------------------rtol-LS------------+
>>>>  |           ovn_cluster_router           |
>>>>  |          (Distributed Router)          |
>>>>  +-rtos-ls0------rtos-ls1--------rtos-ls2-+
>>>>       |              |              |
>>>>       |              |              |
>>>> +-----+-+       +----+--+     +-----+-+
>>>> |  LS0  |       |  LS1  |     |  LS2  |
>>>> +-+-----+       +-+-----+     +-+-----+
>>>>   |               |             |
>>>>   p0              p1            p2
>>>>  IA0             IA1           IA2
>>>>  EA0             EA1           EA2
>>>> (Node0)          (Node1)       (Node2)
>>>>
>>>> In the topology above, each of the three logical switch port has an
>>>> internal address of IAx and an external address of EAx (dnat_and_snat IP).
>>>> They are all bound to their respective nodes (Nodex). A packet from `p0`
>>>> heading towards the internet will be SNAT'ed to EA0 on the local hypervisor
>>>> and then sent out through the LS's localnet-port on that hypervisor.
>>>> Basically, they are configured for distributed NATing.
>>>>
>>>> I am seeing interesting "Reply to ARP requests" flows for arp.tpa set
>>>> to "EAX". Flows are like this:
>>>>
>>>> For EA0
>>>> priority=90, match=(inport == "rtos-ls0" && arp.tpa == EA0 && arp.op ==
>>>> 1), action=(/* ARP reply */)
>>>> priority=90, match=(inport == "rtos-ls1" && arp.tpa == EA0 && arp.op ==
>>>> 1), action=(/* ARP reply */)
>>>> priority=90, match=(inport == "rtos-ls2" && arp.tpa == EA0 && arp.op ==
>>>> 1), action=(/* ARP reply */)
>>>>
>>>> For EA1
>>>> priority=90, match=(inport == "rtos-ls0" && arp.tpa == EA1 && arp.op ==
>>>> 1), action=(/* ARP reply */)
>>>> priority=90, match=(inport == "rtos-ls1" && arp.tpa == EA0 && arp.op ==
>>>> 1), action=(/* ARP reply */)
>>>> priority=90, match=(inport == "rtos-ls2" && arp.tpa == EA1 && arp.op ==
>>>> 1), action=(/* ARP reply */)
>>>>
>>>> Similarly, for EA2.
>>>>
>>>> So, we have N * N "Reply to ARP requests" flows for N nodes each with 1
>>>> dnat_and_snat ip.
>>>> This is causing scale issues.
>>>>
>>>> If you look at the flows for `EA0`, i am confused as to why is it
>>>> needed?
>>>>
>>>>    1. When will one see an ARP request for the EA0 from any of the
>>>>    LS{0,1,2}'s logical switch port.
>>>>    2. If it is needed at all, can't we just remove the `inport` thing
>>>>    altogether since the flow is configured for every port of logical router
>>>>    port except for the distributed gateway port rtol-LS. For this port, we
>>>>    could add an higher priority rule with action set to `next`.
>>>>    3. Say, we don't need east-west NAT connectivity. Is there a way to
>>>>    make these ARPs be learnt dynamically, like we are doing for join and
>>>>    external logical switch (the other thread [1]).
>>>>
>>>> Regards,
>>>> ~Girish
>>>>
>>>> [1]
>>>> https://mail.openvswitch.org/pipermail/ovs-discuss/2020-May/049994.html
>>>>
>>>>
>>>
>>> In general, these flows should be per router instead of per router port,
>>> since the nat addresses are not attached to any router port. For
>>> distributed gateway ports, there will need per-port flows to match
>>> is_chassis_resident(gateway-chassis). I think this can be handled by:
>>> - priority X + 20 flows for each distributed gateway port with
>>> is_chassis_resident(), reply ARP
>>> - priority X + 10 flows for each distributed gateway port without
>>> is_chassis_resident(), drop
>>> - priority X flows for each router (no need to match inport), reply ARP
>>>
>>> This way, there are N * (2D + 1) flows per router. N = number of NAT
>>> IPs, D = number of distributed gateway ports. This would optimize the above
>>> scenario where there is only 1 distributed gateway port but many regular
>>> router ports. Thoughts?
>>>
>>
>> We went ahead and added support for this topology in ovn-kubernetes
>> project in this commit
>>
>> https://github.com/ovn-org/ovn-kubernetes/commit/edb24e6a71142f2e835b67b29c11e1688c645683
>>
>>
>> Han, was curious to know if the above fix is in your radar? Thanks.
>>
>> The number of OpenFlow flows in each of the hypervisors is insanely high
>> and is consuming a lot of memory.
>>
>> Regards,
>> ~Girish
>>
>>
>>
>>
>>
>>>
>>> Thanks,
>>> Han
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "ovn-kubernetes" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to ovn-kubernetes+unsubscribe at googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/ovn-kubernetes/CAAF2STTOrzx-zy48TKpbxx4yxxQ_X5bN05VPqBHA79gpCBQfwg%40mail.gmail.com
>> <https://groups.google.com/d/msgid/ovn-kubernetes/CAAF2STTOrzx-zy48TKpbxx4yxxQ_X5bN05VPqBHA79gpCBQfwg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20200616/41581c51/attachment.html>


More information about the discuss mailing list