[ovs-discuss] [OVN] flow explosion in lr_in_arp_resolve table

Girish Moodalbail gmoodalbail at gmail.com
Tue May 26 19:08:50 UTC 2020


Hello Dumitru,

There are several things that are being discussed on this thread. Let me
see if I can tease them out for clarity.

1. All the router IPs are known to OVN (the join switch case)
2. Some IPs are known and some are not known (the external logical switch
that connects to physical network case).

Let us look at each of the case above:

1. Join Switch Case

+----------------+        +----------------+
|   l3gateway    |        |   l3gateway    |
|    router2     |        |    router3     |
+-------------+--+        +-+--------------+
            IP2,M2         IP3,M3
              |             |
           +--+-------------+---+
           |    join switch     |
           +---------+----------+
                     |
                  IP1,M1
             +-------+--------+
             |  distributed   |
             |     router     |
             +----------------+


Say, GR router2 wants to send the packet out to DR and that we don't have
static mappings of MAC to IP in lr_in_arp_resolve table on GR router2 (with
Han's patch of dynamic_neigh_routes=true for all the Gateway Routers). With
this in mind, when an ARP request is sent out by router2's hypervisor the
packet should be directly sent to the distributed router alone. Your
commit 32f5ebb0622 (ovn-northd: Limit ARP/ND broadcast domain whenever
possible) should have allowed only unicast. However, in ls_in_l2_lkup table
we have

  table=19(ls_in_l2_lkup      ), priority=80   , match=(eth.src == { M2 }
&& (arp.op == 1 || nd_ns)), action=(outport = "_MC_flood"; output;)
  table=19(ls_in_l2_lkup      ), priority=75   , match=(flags[1] == 0 &&
arp.op == 1 && arp.tpa == { IP1}), action=(outport = "jtor-router2";
output;)

As you can see, `priority=80` rule will always be hit and sent out to all
the GRs. The `priority=75` rule is never hit. So, we will see ARP packets
on the GENEVE tunnel. So, we need to change `priority=80` to match GARP
request packets. That way, for the known OVN IPs case we don't do broadcast.

2. External Logical Switch Case

                       10.10.10.0/24
   -------------------------+--------------------------
                            |
                         localnet
                      +-----+-----+
                      | external  |
         +------------+    LS1    +-------------+
         |            +-----+-----+             |
         |                  |                   |
     10.10.10.2         10.10.10.3          10.10.10.4
        SNAT               SNAT                SNAT
   +-----+-----+      +-----+-----+       +-----------+
   | l3gateway |      | l3gateway |       | l3gateway |
   |   node1   |      |   node2   |       |   node3   |
   +-----------+      +-----------+       +-----------+

In this case, we have some of the IPs in OVN and some in the physical
network. If we fix (1) above, all the ARP requests for the OVN's router IPs
will be unicast. However, all the ARP requests to external IPs, say
10.10.10.1 on the "physical router", will be broadcast. Now, we will see
these ARP broadcasts on all the L3 gateway routers. With
'learn_from_arp_request=false' [a], then the MAC_Binding table will not
explode for both ARP and GARP requests.

So, I don't think GARP requests and replies is the issue here? Furthermore,
learning from the GARP replies are blocked on certain routers. For
example:
https://www.juniper.net/documentation/en_US/junose15.1/topics/concept/ip-gratuitous-arps-transmission-overview.html
says "By default, updating the ARP cache on GARP replies is disabled on the
router.". So, our NAT addresses mapping will not be learnt.

Regards,
~Girish


[a] - From Han's mail, the meaning of learn_from_arp_request=false --> if
the TPA is on the router, add a new entry (it means the
>     remote wants to communicate with this node, so it makes sense to
>     learn the remote as well). Otherwise, ignore it and no new entry
added.



On Mon, May 25, 2020 at 3:55 AM Dumitru Ceara <dceara at redhat.com> wrote:

> On 5/23/20 12:56 AM, Girish Moodalbail wrote:
> >
> >
> > On Fri, May 22, 2020 at 1:51 PM Han Zhou <zhouhan at gmail.com
> > <mailto:zhouhan at gmail.com>> wrote:
> >
> >
> >
> >     On Fri, May 22, 2020 at 8:39 AM Venugopal Iyer
> >     <venugopali at nvidia.com <mailto:venugopali at nvidia.com>> wrote:
> >
> >         A couple of comments below:
> >
> >
> >
> >
> >         <vi> I suppose the use of GARP as a reply v/s response is not
> >         very clear; [1], Section 3 seems to offer a concise summary of
> >         this. If the application sends GARP as
> >         <vi> a reply we are covered, but the question is if the GARP is
> >         a request (which is allowed) then what our response should be.
> >         Tim is right, we can't ignore
> >         <vi> the request (more so, since aging is not supported
> >         currently), however "arp_accept" ignores the request for
> >         creating a new cache entry, not updating
> >         <vi> an existing one (see last para below)
> >
> >         [2]
> >         arp_accept - BOOLEAN
> >                 Define behavior for gratuitous ARP frames who's IP is not
> >                 already present in the ARP table:
> >                 0 - don't create new entries in the ARP table
> >                 1 - create new entries in the ARP table
> >
> >                 Both replies and requests type gratuitous arp will
> >         trigger the
> >                 ARP table to be updated, if this setting is on.
> >
> >                 If the ARP table already contains the IP address of the
> >                 gratuitous arp frame, the arp table will be updated
> >         regardless
> >                 if this setting is on or off.
> >
> >         <vi> if we lookup and get a hit, we should still process the
> >         GARP; only if we don't  have a hit, we should ignore (instead of
> >         <vi> creating an entry). BTW, do we update today? if I
> >         understand the use of reg9[2] / REGBIT_LOOKUP_NEIGHBOR_RESULT
> >         (assuming lookup_arp
> >         <vi> returns 1 if entry exists), I am not sure it does? maybe I
> >         missed it ..
> >
> >         thanks,
> >
> >         -venu
> >
> >         [1]https://www.ietf.org/rfc/rfc5227.txt
> >
> >
> >     (Not sure why the indent format of your reply is not correct at
> >     least on my client - it mixes all previous replies together so one
> >     cannot tell which part was from whom, so I truncated all of them.)
> >
> >     Thanks Venu. I think this would work: we can add an option similar
> >     but different from arp_accept (because it is not easy to OVN to tell
> >     if it is GARP on the ingress pipeline). The option can be named
> >     like: learn_from_arp_request.
> >     When ARP request is received, always check if an old entry existed
> >     for the SPA. If existed and MAC is different, then update the
> >     mac-binding entry. If the entry doesn't exist, check the option
> setting:
> >     "true" - add a new entry.
> >     "false" - if the TPA is on the router, add a new entry (it means the
> >     remote wants to communicate with this node, so it makes sense to
> >     learn the remote as well). Otherwise, ignore it and no new entry
> added.
> >
> >     Do you think this works?
> >
> >
> > I think this should work as well.
> >
> > For the single join switch connected to 1000 GRs, it should work as well
> > (assuming your other fix for dynamic learning is present as well).
> > However, in this case,  even with this option set we will still be
> > sending the ARP broadcast out from Node1 to each of the other 999 Nodes.
> > After the packets have travelled through the tunnel, we are going to
> > drop the packet on the target hypervisor, if
> > `learn_from_arp_request=true'. As I understand, we are waiting for reply
> > from @Dumitru Ceara <mailto:dceara at redhat.com> to understand why such a
> > flow is required, correct?
> >
>
> As Han pointed out, commit 32f5ebb062 ("ovn-northd: Limit ARP/ND
> broadcast domain whenever possible.") added logical flows in the LS
> S_SWITCH_IN_L2_LKUP stage to explicitly flood ARP/ND requests originated
> from router owned IP interfaces. This was done for a couple of reasons:
>
> 1. ARP requests for destinations/next-hops outside OVN need to be
> flooded in the broadcast domain anyway and would otherwise match the
> lowest priority rule in S_SWITCH_IN_L2_LKUP that would flood them
> nevertheless.
>
> 2. OVN sends periodic GARP requests for router owned IPs (i.e., NAT
> addresses and logical_router_port addresses) to update external
> switch/router FDB/ARP caches in scenarios like VM migration:
> 6bfbb4c24187 ("ovn: Send GARP on localnet."). These packets should be
> flooded in the broadcast domain too.
>
> I think we have a few options:
>
> 1. Change OVN behavior and use GARP replies instead of GARP requests.
> The effect should be (almost [1]) the same from the external devices
> perspective but the advantage is that we can completely remove the
> logical flows that match on self originated ARP packets. This is quite
> easy to achieve and I have a patch ready for it if we decide to go this
> way.
>
> 2. Make the flows that match on self originated ARP traffic more
> explicit and restrict them to GARP requests. For example, for a logical
> router port with addresses MAC, IP1, IP2 and NAT entries with
> external_mac MAC-E and external IP IP-E:
>
> Right now we have a flow:
> if "eth.src == {MAC, MAC-E} && (arp_req || nd_ns)" then "flood"
>
> We could instead create:
> if "eth.src == MAC && arp.tpa == {IP1, IP2} && arp_req" then "flood"
> if "eth.src == MAC-E && arp.tpa == {IP-E} && arp_req" then "flood"
>
> I would prefer option 1 above but I'd like to hear more opinions about
> disadvantages of using GARP replies instead of GARP requests for OVN
> owned IP addresses.
>
> Option 2 is also relatively straightforward to implement but will
> generate a few more logical flows, still O(N) though, with N="number of
> logical routers connected to the logical switch".
>
> Thanks,
> Dumitru
>
> [1] https://tools.ietf.org/html/rfc5227#page-15
>
>
> > Regards,
> > ~Girish
>
> --
> You received this message because you are subscribed to the Google Groups
> "ovn-kubernetes" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ovn-kubernetes+unsubscribe at googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/ovn-kubernetes/eea5ee59-fb14-e11d-40c1-b33c72ffb470%40redhat.com
> .
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20200526/dc835910/attachment-0001.html>


More information about the discuss mailing list