[ovs-discuss] [OVN] logical flow explosion in lr_in_ip_input table for dnat_and_snat IPs

Dumitru Ceara dceara at redhat.com
Wed Jun 24 15:55:20 UTC 2020


Hi Girish,

I sent a patch series to implement Han's suggestion:
https://patchwork.ozlabs.org/project/openvswitch/list/?series=185580
https://mail.openvswitch.org/pipermail/ovs-dev/2020-June/372005.html

It would be great if you could give it a run on your setup too.

Thanks,
Dumitru

On 6/16/20 5:18 PM, Girish Moodalbail wrote:
> Thanks Han for the update.
> 
> Regards,
> ~Girish 
> 
> On Mon, Jun 15, 2020 at 12:55 PM Han Zhou <zhouhan at gmail.com
> <mailto:zhouhan at gmail.com>> wrote:
> 
>     Sorry Girish, I can't promise for now. I will see if I have time in
>     the next couple of weeks, but welcome anyone to volunteer on this if
>     it is urgent.
> 
>     On Mon, Jun 15, 2020 at 10:56 AM Girish Moodalbail
>     <gmoodalbail at gmail.com <mailto:gmoodalbail at gmail.com>> wrote:
> 
>         Hello Han,
> 
>         On Wed, Jun 3, 2020 at 9:39 PM Han Zhou <zhouhan at gmail.com
>         <mailto:zhouhan at gmail.com>> wrote:
> 
> 
> 
>             On Wed, Jun 3, 2020 at 7:16 PM Girish Moodalbail
>             <gmoodalbail at gmail.com <mailto:gmoodalbail at gmail.com>> wrote:
> 
>                 Hello all,
> 
>                 While working on an extension, see the diagram below, to
>                 the existing OVN logical topology for the ovn-kubernetes
>                 project, I am seeing an explosion of the "Reply to ARP
>                 requests" logical flows in the `lr_in_ip_input` table
>                 for the distributed router (ovn_cluster_router)
>                 configured with gateway port (rtol-LS)
> 
>                                         internet          
>                                ---------+-------------->  
>                                         |                  
>                                         |                                  
>                       +----------localnet-port---------+  
>                       |LS                              |  
>                       +-----------------ltor-LS--------+  
>                                            |              
>                                            |              
>                  +---------------------rtol-LS------------+
>                  |           ovn_cluster_router           |
>                  |          (Distributed Router)          |
>                  +-rtos-ls0------rtos-ls1--------rtos-ls2-+
>                       |              |              |        
>                       |              |              |      
>                 +-----+-+       +----+--+     +-----+-+    
>                 |  LS0  |       |  LS1  |     |  LS2  |    
>                 +-+-----+       +-+-----+     +-+-----+        
>                   |               |             |          
>                   p0              p1            p2        
>                  IA0             IA1           IA2        
>                  EA0             EA1           EA2 
>                 (Node0)          (Node1)       (Node2)
> 
>                 In the topology above, each of the three logical switch
>                 port has an internal address of IAx and an external
>                 address of EAx (dnat_and_snat IP). They are all bound to
>                 their respective nodes (Nodex). A packet from `p0`
>                 heading towards the internet will be SNAT'ed to EA0 on
>                 the local hypervisor and then sent out through the LS's
>                 localnet-port on that hypervisor. Basically, they are
>                 configured for distributed NATing.
> 
>                 I am seeing interesting "Reply to ARP requests" flows
>                 for arp.tpa set to "EAX". Flows are like this:
> 
>                 For EA0
>                 priority=90, match=(inport == "rtos-ls0" && arp.tpa ==
>                 EA0 && arp.op == 1), action=(/* ARP reply */)
>                 priority=90, match=(inport == "rtos-ls1" && arp.tpa ==
>                 EA0 && arp.op == 1), action=(/* ARP reply */)
>                 priority=90, match=(inport == "rtos-ls2" && arp.tpa ==
>                 EA0 && arp.op == 1), action=(/* ARP reply */)
> 
>                 For EA1
>                 priority=90, match=(inport == "rtos-ls0" && arp.tpa ==
>                 EA1 && arp.op == 1), action=(/* ARP reply */)
>                 priority=90, match=(inport == "rtos-ls1" && arp.tpa ==
>                 EA0 && arp.op == 1), action=(/* ARP reply */)
>                 priority=90, match=(inport == "rtos-ls2" && arp.tpa ==
>                 EA1 && arp.op == 1), action=(/* ARP reply */)
> 
>                 Similarly, for EA2.
> 
>                 So, we have N * N "Reply to ARP requests" flows for N
>                 nodes each with 1 dnat_and_snat ip. 
>                 This is causing scale issues.
> 
>                 If you look at the flows for `EA0`, i am confused as to
>                 why is it needed?
> 
>                  1. When will one see an ARP request for the EA0 from
>                     any of the LS{0,1,2}'s logical switch port.
>                  2. If it is needed at all, can't we just remove the
>                     `inport` thing altogether since the flow is
>                     configured for every port of logical router port
>                     except for the distributed gateway port rtol-LS. For
>                     this port, we could add an higher priority rule with
>                     action set to `next`.
>                  3. Say, we don't need east-west NAT connectivity. Is
>                     there a way to make these ARPs be learnt
>                     dynamically, like we are doing for join and external
>                     logical switch (the other thread [1]).
> 
>                 Regards,
>                 ~Girish
> 
>                 [1] https://mail.openvswitch.org/pipermail/ovs-discuss/2020-May/049994.html 
> 
> 
>             In general, these flows should be per router instead of per
>             router port, since the nat addresses are not attached to any
>             router port. For distributed gateway ports, there will need
>             per-port flows to match
>             is_chassis_resident(gateway-chassis). I think this can be
>             handled by:
>             - priority X + 20 flows for each distributed gateway port
>             with is_chassis_resident(), reply ARP
>             - priority X + 10 flows for each distributed gateway port
>             without is_chassis_resident(), drop
>             - priority X flows for each router (no need to match
>             inport), reply ARP
> 
>             This way, there are N * (2D + 1) flows per router. N =
>             number of NAT IPs, D = number of distributed gateway ports.
>             This would optimize the above scenario where there is only 1
>             distributed gateway port but many regular router ports.
>             Thoughts?
> 
> 
>         We went ahead and added support for this topology in
>         ovn-kubernetes project in this commit
>         https://github.com/ovn-org/ovn-kubernetes/commit/edb24e6a71142f2e835b67b29c11e1688c645683 
> 
>         Han, was curious to know if the above fix is in your radar? Thanks. 
> 
>         The number of OpenFlow flows in each of the hypervisors is
>         insanely high and is consuming a lot of memory.
> 
>         Regards,
>         ~Girish
> 
> 
> 
>          
> 
> 
>             Thanks,
>             Han
> 
>         -- 
>         You received this message because you are subscribed to the
>         Google Groups "ovn-kubernetes" group.
>         To unsubscribe from this group and stop receiving emails from
>         it, send an email to ovn-kubernetes+unsubscribe at googlegroups.com
>         <mailto:ovn-kubernetes+unsubscribe at googlegroups.com>.
>         To view this discussion on the web visit
>         https://groups.google.com/d/msgid/ovn-kubernetes/CAAF2STTOrzx-zy48TKpbxx4yxxQ_X5bN05VPqBHA79gpCBQfwg%40mail.gmail.com
>         <https://groups.google.com/d/msgid/ovn-kubernetes/CAAF2STTOrzx-zy48TKpbxx4yxxQ_X5bN05VPqBHA79gpCBQfwg%40mail.gmail.com?utm_medium=email&utm_source=footer>.
> 



More information about the discuss mailing list