[ovs-discuss] [OVN] logical flow explosion in lr_in_ip_input table for dnat_and_snat IPs
Han Zhou
zhouhan at gmail.com
Thu Jun 4 04:39:08 UTC 2020
On Wed, Jun 3, 2020 at 7:16 PM Girish Moodalbail <gmoodalbail at gmail.com>
wrote:
> Hello all,
>
> While working on an extension, see the diagram below, to the existing OVN
> logical topology for the ovn-kubernetes project, I am seeing an explosion
> of the "Reply to ARP requests" logical flows in the `lr_in_ip_input` table
> for the distributed router (ovn_cluster_router) configured with gateway
> port (rtol-LS)
>
> internet
> ---------+-------------->
> |
> |
> +----------localnet-port---------+
> |LS |
> +-----------------ltor-LS--------+
> |
> |
> +---------------------rtol-LS------------+
> | ovn_cluster_router |
> | (Distributed Router) |
> +-rtos-ls0------rtos-ls1--------rtos-ls2-+
> | | |
> | | |
> +-----+-+ +----+--+ +-----+-+
> | LS0 | | LS1 | | LS2 |
> +-+-----+ +-+-----+ +-+-----+
> | | |
> p0 p1 p2
> IA0 IA1 IA2
> EA0 EA1 EA2
> (Node0) (Node1) (Node2)
>
> In the topology above, each of the three logical switch port has an
> internal address of IAx and an external address of EAx (dnat_and_snat IP).
> They are all bound to their respective nodes (Nodex). A packet from `p0`
> heading towards the internet will be SNAT'ed to EA0 on the local hypervisor
> and then sent out through the LS's localnet-port on that hypervisor.
> Basically, they are configured for distributed NATing.
>
> I am seeing interesting "Reply to ARP requests" flows for arp.tpa set to
> "EAX". Flows are like this:
>
> For EA0
> priority=90, match=(inport == "rtos-ls0" && arp.tpa == EA0 && arp.op ==
> 1), action=(/* ARP reply */)
> priority=90, match=(inport == "rtos-ls1" && arp.tpa == EA0 && arp.op ==
> 1), action=(/* ARP reply */)
> priority=90, match=(inport == "rtos-ls2" && arp.tpa == EA0 && arp.op ==
> 1), action=(/* ARP reply */)
>
> For EA1
> priority=90, match=(inport == "rtos-ls0" && arp.tpa == EA1 && arp.op ==
> 1), action=(/* ARP reply */)
> priority=90, match=(inport == "rtos-ls1" && arp.tpa == EA0 && arp.op ==
> 1), action=(/* ARP reply */)
> priority=90, match=(inport == "rtos-ls2" && arp.tpa == EA1 && arp.op ==
> 1), action=(/* ARP reply */)
>
> Similarly, for EA2.
>
> So, we have N * N "Reply to ARP requests" flows for N nodes each with 1
> dnat_and_snat ip.
> This is causing scale issues.
>
> If you look at the flows for `EA0`, i am confused as to why is it needed?
>
> 1. When will one see an ARP request for the EA0 from any of the
> LS{0,1,2}'s logical switch port.
> 2. If it is needed at all, can't we just remove the `inport` thing
> altogether since the flow is configured for every port of logical router
> port except for the distributed gateway port rtol-LS. For this port, we
> could add an higher priority rule with action set to `next`.
> 3. Say, we don't need east-west NAT connectivity. Is there a way to
> make these ARPs be learnt dynamically, like we are doing for join and
> external logical switch (the other thread [1]).
>
> Regards,
> ~Girish
>
> [1]
> https://mail.openvswitch.org/pipermail/ovs-discuss/2020-May/049994.html
>
In general, these flows should be per router instead of per router port,
since the nat addresses are not attached to any router port. For
distributed gateway ports, there will need per-port flows to match
is_chassis_resident(gateway-chassis). I think this can be handled by:
- priority X + 20 flows for each distributed gateway port with
is_chassis_resident(), reply ARP
- priority X + 10 flows for each distributed gateway port without
is_chassis_resident(), drop
- priority X flows for each router (no need to match inport), reply ARP
This way, there are N * (2D + 1) flows per router. N = number of NAT IPs, D
= number of distributed gateway ports. This would optimize the above
scenario where there is only 1 distributed gateway port but many regular
router ports. Thoughts?
Thanks,
Han
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20200603/4b9239f7/attachment-0001.html>
More information about the discuss
mailing list