[ovs-discuss] [OVN] flow explosion in lr_in_arp_resolve table

Han Zhou zhouhan at gmail.com
Thu May 28 06:34:56 UTC 2020


On Wed, May 27, 2020 at 1:10 AM Dumitru Ceara <dceara at redhat.com> wrote:
>
> Hi Girish, Han,
>
> On 5/26/20 11:51 PM, Han Zhou wrote:
> >
> >
> > On Tue, May 26, 2020 at 1:07 PM Girish Moodalbail <gmoodalbail at gmail.com
> > <mailto:gmoodalbail at gmail.com>> wrote:
> >>
> >>
> >>
> >> On Tue, May 26, 2020 at 12:42 PM Han Zhou <zhouhan at gmail.com
> > <mailto:zhouhan at gmail.com>> wrote:
> >>>
> >>> Hi Girish,
> >>>
> >>> Thanks for the summary. I agree with you that GARP request v.s. reply
> > is irrelavent to the problem here.
>
> Well, actually I think GARP request vs reply is relevant (at least for
> case 1 below) because if OVN would be generating GARP replies we
> wouldn't need the priority 80 flow to determine if an ARP request packet
> is actually an OVN self originated GARP that needs to be flooded in the
> L2 broadcast domain.
>
> On the other hand, router3 would be learning mac_binding IP2,M2 from the
> GARP reply originated by router2 and vice versa so we'd have to restrict
> flooding of GARP replies to non-patch ports.
>

Hi Dumitru, the point was that, on the external LS, the GRs will have to
send ARP requests to resolve unknown IPs (at least for the external GW),
and it has to be broadcasted, which will cause all the GRs learn all MACs
of other GRs. This is regardless of the GARP behavior. You are right that
if we only consider the Join switch then the GARP request v.s. reply does
make a difference. However, GARP request/reply may be really needed only on
the external LS.

> >>> Please see my comment inline below.
> >>>
> >>> On Tue, May 26, 2020 at 12:09 PM Girish Moodalbail
> > <gmoodalbail at gmail.com <mailto:gmoodalbail at gmail.com>> wrote:
> >>> >
> >>> > Hello Dumitru,
> >>> >
> >>> > There are several things that are being discussed on this thread.
> > Let me see if I can tease them out for clarity.
> >>> >
> >>> > 1. All the router IPs are known to OVN (the join switch case)
> >>> > 2. Some IPs are known and some are not known (the external logical
> > switch that connects to physical network case).
> >>> >
> >>> > Let us look at each of the case above:
> >>> >
> >>> > 1. Join Switch Case
> >>> >
> >>> > +----------------+        +----------------+
> >>> > |   l3gateway    |        |   l3gateway    |
> >>> > |    router2     |        |    router3     |
> >>> > +-------------+--+        +-+--------------+
> >>> >             IP2,M2         IP3,M3
> >>> >               |             |
> >>> >            +--+-------------+---+
> >>> >            |    join switch     |
> >>> >            +---------+----------+
> >>> >                      |
> >>> >                   IP1,M1
> >>> >              +-------+--------+
> >>> >              |  distributed   |
> >>> >              |     router     |
> >>> >              +----------------+
> >>> >
> >>> >
> >>> > Say, GR router2 wants to send the packet out to DR and that we
> > don't have static mappings of MAC to IP in lr_in_arp_resolve table on GR
> > router2 (with Han's patch of dynamic_neigh_routes=true for all the
> > Gateway Routers). With this in mind, when an ARP request is sent out by
> > router2's hypervisor the packet should be directly sent to the
> > distributed router alone. Your commit 32f5ebb0622 (ovn-northd: Limit
> > ARP/ND broadcast domain whenever possible) should have allowed only
> > unicast. However, in ls_in_l2_lkup table we have
> >>> >
> >>> >   table=19(ls_in_l2_lkup      ), priority=80   , match=(eth.src ==
> > { M2 } && (arp.op == 1 || nd_ns)), action=(outport = "_MC_flood";
output;)
> >>> >   table=19(ls_in_l2_lkup      ), priority=75   , match=(flags[1] ==
> > 0 && arp.op == 1 && arp.tpa == { IP1}), action=(outport =
> > "jtor-router2"; output;)
> >>> >
> >>> > As you can see, `priority=80` rule will always be hit and sent out
> > to all the GRs. The `priority=75` rule is never hit. So, we will see ARP
> > packets on the GENEVE tunnel. So, we need to change `priority=80` to
> > match GARP request packets. That way, for the known OVN IPs case we
> > don't do broadcast.
> >>>
> >>> Since the solution to case 2) below (i.e.
> > learn_from_arp_request=false) solves the problem of case 1), too, I
> > think we don't need this change just for case 1). As @Dumitru Ceara
> >  mentioned, there is some cost because it adds extra flows. It would be
> > significant amount of flows if there are a lot of snat_and_dnat IPs.
> > What do you think?
>
> I think the following might be a solution, although with the cost of
> adding as many flows as dnat_and_snat IPs are configured:
>
> - priority 80: explicitly determine if an ARP request is a self
> originated GARP for configured IP addresses and dnat_and_snat IPs (by
> matching on all eth.src and arp.tpa pairs) and if so flood on all
> non-patch ports.
> - priority 75: if arp.tpa is owned by an OVN logical router port,
> "unicast" it only on the patch port towards the router.
> - priority 1: flood any broadcast packet.
>
> Together with the learn_from_arp_request=false knob this would cover
> both case 1 (join switch) and case 2 (external switch).
>
> Wdyt?
>
Would the "learn_from_arp_request=false knob" cover both cases? If yes, we
don't need to add more flows of priority 80, or more accurately: whether to
update the priority-80 flows is not directly related to the current problem.

> >>
> >>
> >> Han, yes it will work. However, my only concern is that we would send
> > all these ARP requests via tunnel to each of 1000 hypervisors and these
> > hypervisors will just drop them on the floor. when they see
> > learn_from_arp_request=false.
> >
> > I think maybe it is not a problem since it happens only once on the Join
> > switch. Once the MAC is learned, it won't broadcast again. It may be
> > more of a problem on the external LS if periodical GARP is required
> > there. However, I'd suggest to have some test and see if it is really a
> > problem, before trying to solve it.
> >
> >>
> >> Han, Dumitru,
> >>
> >> Why can't we swap the priorities of the above two flows so that the
> > ARP request for NexHop IP known to OVN will be always sent via
`unicast`?
> >
> > If swapped, even GARP won't get broadcasted. Maybe that's not the
> > desired behavior.
> >
>
> This is definitely not desired as we'd be hitting the prio 75 flow that
> would send the self originated GARP request (IPx) packet back towards
> the router port that owns IPx.
>
> >>
> >> Regards,
> >> ~Girish
> >>
> >>>
> >>> >
> >>> > 2. External Logical Switch Case
> >>> >
> >>> >                        10.10.10.0/24 <http://10.10.10.0/24>
> >
> >>> >    -------------------------+--------------------------
> >>> >                             |
> >>> >                          localnet
> >>> >                       +-----+-----+
> >>> >                       | external  |
> >>> >          +------------+    LS1    +-------------+
> >>> >          |            +-----+-----+             |
> >>> >          |                  |                   |
> >>> >      10.10.10.2         10.10.10.3          10.10.10.4
> >>> >         SNAT               SNAT                SNAT
> >>> >    +-----+-----+      +-----+-----+       +-----------+
> >>> >    | l3gateway |      | l3gateway |       | l3gateway |
> >>> >    |   node1   |      |   node2   |       |   node3   |
> >>> >    +-----------+      +-----------+       +-----------+
> >>> >
> >>> > In this case, we have some of the IPs in OVN and some in the
> > physical network. If we fix (1) above, all the ARP requests for the
> > OVN's router IPs will be unicast. However, all the ARP requests to
> > external IPs, say 10.10.10.1 on the "physical router", will be
> > broadcast. Now, we will see these ARP broadcasts on all the L3 gateway
> > routers. With 'learn_from_arp_request=false' [a], then the MAC_Binding
> > table will not explode for both ARP and GARP requests.
> >>> >
> >>> > So, I don't think GARP requests and replies is the issue here?
> > Furthermore, learning from the GARP replies are blocked on certain
> > routers. For example:
> >
https://www.juniper.net/documentation/en_US/junose15.1/topics/concept/ip-gratuitous-arps-transmission-overview.html
> >  says "By default, updating the ARP cache on GARP replies is disabled on
> > the router.". So, our NAT addresses mapping will not be learnt.
>
> Just as a side note, the above doesn't mean Juniper boxes don't support
> learning from GARP replies, just that they'd need extra configuration. I
> don't necessarily think that's a bad thing if properly documented in OVN
> that we would be generating GARP replies.
>
> Regards,
> Dumitru
>
> >>> >
> >>> > Regards,
> >>> > ~Girish
> >>> >
> >>> >
> >>> > [a] - From Han's mail, the meaning of learn_from_arp_request=false
> > --> if the TPA is on the router, add a new entry (it means the
> >>> > >     remote wants to communicate with this node, so it makes sense
to
> >>> > >     learn the remote as well). Otherwise, ignore it and no new
> > entry added.
> >>> >
> >>> >
> >>> >
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> > Groups "ovn-kubernetes" group.
> >> To unsubscribe from this group and stop receiving emails from it, send
> > an email to ovn-kubernetes+unsubscribe at googlegroups.com
> > <mailto:ovn-kubernetes%2Bunsubscribe at googlegroups.com>.
> >> To view this discussion on the web visit
> >
https://groups.google.com/d/msgid/ovn-kubernetes/CAAF2STRnem2PeSahuwhro1t%2BQJxchZNC7viq8n-ngM9KU%2B%2B-Xw%40mail.gmail.com
.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20200527/68bba9bc/attachment-0001.html>


More information about the discuss mailing list