[ovs-discuss] [OVN] flow explosion in lr_in_arp_resolve table

Girish Moodalbail gmoodalbail at gmail.com
Thu Jun 4 00:03:57 UTC 2020


No worries, thanks for the update Han.

Once you have the patch, we can test your changes on our cluster and
provide you an update.

Regards,
~Girish

On Wed, Jun 3, 2020 at 4:27 PM Han Zhou <zhouhan at gmail.com> wrote:

> Hi Girish, yes, that's what we concluded in last OVN meeting, but sorry
> that I forgot to update here.
>
> On Wed, Jun 3, 2020 at 3:32 PM Girish Moodalbail <gmoodalbail at gmail.com>
> wrote:
> >
> > Hello all,
> >
> > To kind of proceed with the proposed fixes, with minimal impact, is the
> following a reasonable approach?
> >
> > Add an option, namely dynamic_neigh_routes={true|false}, for a gateway
> router. With this option enabled, the nextHop IP's MAC will be learned
> through a ARP request on the physical network. The ARP request will be
> flooded on the L2 broadcast domain (for both join switch and external
> switch).
> >
>
> The RFC patch fulfils this purpose:
> https://patchwork.ozlabs.org/project/openvswitch/patch/1589614395-99499-1-git-send-email-hzhou@ovn.org/
> I am working on the formal patch.
>
> > Add an option, namely learn_from_arp_request={true|false}, for a gateway
> router. The option is interpreted as below:\
> > "true" - learn the MAC/IP binding and add a new MAC_Binding entry
> (default behavior)
> > "false" - if there is a MAC_binding for that IP and the MAC is
> different, then update that MAC/IP binding. The external entity might be
> trying to advertise the new MAC for that IP. (If we don't do this, then we
> will never learn External VIP to MAC changes)
> >
> > (Irrespective of, learn_from_arp_request is true or false, always do
> this -- if the TPA is on the router, add a new entry (it means the remote
> wants to communicate with this node, so it makes sense to learn the remote
> as well))
> >
>
> I am working on this as well, but delayed a little. I hope to have
> something this week.
>
> >
> > For now, I think it is fine for ARP packets to be broadcasted on the
> tunnel for the `join` switch case. If it becomes a problem, then we can
> start looking around changing the logical flows.
> >
> > Thanks everyone for the lively discussion.
> >
> > Regards,
> > ~Girish
> >
> > On Thu, May 28, 2020 at 7:33 AM Tim Rozet <trozet at redhat.com> wrote:
> >>
> >>
> >>
> >> On Thu, May 28, 2020 at 7:26 AM Dumitru Ceara <dceara at redhat.com>
> wrote:
> >>>
> >>> On 5/28/20 12:48 PM, Daniel Alvarez Sanchez wrote:
> >>> > Hi all
> >>> >
> >>> > Sorry for top posting. I want to thank you all for the discussion and
> >>> > give also some feedback from OpenStack perspective which is affected
> >>> > by the problem described here.
> >>> >
> >>> > In OpenStack, it's kind of common to have a shared external network
> >>> > (logical switch with a localnet port) across many tenants. Each
> tenant
> >>> > user may create their own router where their instances will be
> >>> > connected to access the external network.
> >>> >
> >>> > In such scenario, we are hitting the issue described here. In
> >>> > particular in our tests we exercise 3K VIFs (with 1 FIP) each
> spanning
> >>> > 300 LS; each LS connected to a LR (ie. 300 LRs) and that router
> >>> > connected to the public LS. This is creating a huge problem in terms
> >>> > of performance and tons of events due to the MAC_Binding entries
> >>> > generated as a consequence of the GARPs sent for the floating IPs.
> >>> >
> >>>
> >>> Just as an addition to this, GARPs wouldn't be the only reason why all
> >>> routers would learn the MAC_Binding. Even if we wouldn't be sending
> >>> GARPs for the FIPs, when a VM that's behind a FIP would send traffic to
> >>> the outside, the router will generate an ARP request for the next hop
> >>> using the FIP-IP and FIP-MAC. This will be broadcasted to all routers
> >>> connected to the public LS and will trigger them to learn the
> >>> FIP-IP:FIP-MAC binding.
> >>
> >>
> >> Yeah we shouldn't be learning on regular ARP requests.
> >>
> >>>
> >>>
> >>> > Thanks,
> >>> > Daniel
> >>> >
> >>> >
> >>> > On Thu, May 28, 2020 at 10:51 AM Dumitru Ceara <dceara at redhat.com>
> wrote:
> >>> >>
> >>> >> On 5/28/20 8:34 AM, Han Zhou wrote:
> >>> >>>
> >>> >>>
> >>> >>> On Wed, May 27, 2020 at 1:10 AM Dumitru Ceara <dceara at redhat.com
> >>> >>> <mailto:dceara at redhat.com>> wrote:
> >>> >>>>
> >>> >>>> Hi Girish, Han,
> >>> >>>>
> >>> >>>> On 5/26/20 11:51 PM, Han Zhou wrote:
> >>> >>>>>
> >>> >>>>>
> >>> >>>>> On Tue, May 26, 2020 at 1:07 PM Girish Moodalbail
> >>> >>> <gmoodalbail at gmail.com <mailto:gmoodalbail at gmail.com>
> >>> >>>>> <mailto:gmoodalbail at gmail.com <mailto:gmoodalbail at gmail.com>>>
> wrote:
> >>> >>>>>>
> >>> >>>>>>
> >>> >>>>>>
> >>> >>>>>> On Tue, May 26, 2020 at 12:42 PM Han Zhou <zhouhan at gmail.com
> >>> >>> <mailto:zhouhan at gmail.com>
> >>> >>>>> <mailto:zhouhan at gmail.com <mailto:zhouhan at gmail.com>>> wrote:
> >>> >>>>>>>
> >>> >>>>>>> Hi Girish,
> >>> >>>>>>>
> >>> >>>>>>> Thanks for the summary. I agree with you that GARP request
> v.s. reply
> >>> >>>>> is irrelavent to the problem here.
> >>> >>>>
> >>> >>>> Well, actually I think GARP request vs reply is relevant (at
> least for
> >>> >>>> case 1 below) because if OVN would be generating GARP replies we
> >>> >>>> wouldn't need the priority 80 flow to determine if an ARP request
> packet
> >>> >>>> is actually an OVN self originated GARP that needs to be flooded
> in the
> >>> >>>> L2 broadcast domain.
> >>> >>>>
> >>> >>>> On the other hand, router3 would be learning mac_binding IP2,M2
> from the
> >>> >>>> GARP reply originated by router2 and vice versa so we'd have to
> restrict
> >>> >>>> flooding of GARP replies to non-patch ports.
> >>> >>>>
> >>> >>>
> >>> >>> Hi Dumitru, the point was that, on the external LS, the GRs will
> have to
> >>> >>> send ARP requests to resolve unknown IPs (at least for the
> external GW),
> >>> >>> and it has to be broadcasted, which will cause all the GRs learn
> all
> >>> >>> MACs of other GRs. This is regardless of the GARP behavior. You are
> >>> >>> right that if we only consider the Join switch then the GARP
> request
> >>> >>> v.s. reply does make a difference. However, GARP request/reply may
> be
> >>> >>> really needed only on the external LS.
> >>> >>>
> >>> >>
> >>> >> Ok, but do you see an easy way to determine if we need to add the
> >>> >> logical flows that flood self originated GARP packets on a given
> logical
> >>> >> switch? Right now we add them on all switches.
> >>> >>
> >>> >>>>>>> Please see my comment inline below.
> >>> >>>>>>>
> >>> >>>>>>> On Tue, May 26, 2020 at 12:09 PM Girish Moodalbail
> >>> >>>>> <gmoodalbail at gmail.com <mailto:gmoodalbail at gmail.com>
> >>> >>> <mailto:gmoodalbail at gmail.com <mailto:gmoodalbail at gmail.com>>>
> wrote:
> >>> >>>>>>>>
> >>> >>>>>>>> Hello Dumitru,
> >>> >>>>>>>>
> >>> >>>>>>>> There are several things that are being discussed on this
> thread.
> >>> >>>>> Let me see if I can tease them out for clarity.
> >>> >>>>>>>>
> >>> >>>>>>>> 1. All the router IPs are known to OVN (the join switch case)
> >>> >>>>>>>> 2. Some IPs are known and some are not known (the external
> logical
> >>> >>>>> switch that connects to physical network case).
> >>> >>>>>>>>
> >>> >>>>>>>> Let us look at each of the case above:
> >>> >>>>>>>>
> >>> >>>>>>>> 1. Join Switch Case
> >>> >>>>>>>>
> >>> >>>>>>>> +----------------+        +----------------+
> >>> >>>>>>>> |   l3gateway    |        |   l3gateway    |
> >>> >>>>>>>> |    router2     |        |    router3     |
> >>> >>>>>>>> +-------------+--+        +-+--------------+
> >>> >>>>>>>>             IP2,M2         IP3,M3
> >>> >>>>>>>>               |             |
> >>> >>>>>>>>            +--+-------------+---+
> >>> >>>>>>>>            |    join switch     |
> >>> >>>>>>>>            +---------+----------+
> >>> >>>>>>>>                      |
> >>> >>>>>>>>                   IP1,M1
> >>> >>>>>>>>              +-------+--------+
> >>> >>>>>>>>              |  distributed   |
> >>> >>>>>>>>              |     router     |
> >>> >>>>>>>>              +----------------+
> >>> >>>>>>>>
> >>> >>>>>>>>
> >>> >>>>>>>> Say, GR router2 wants to send the packet out to DR and that we
> >>> >>>>> don't have static mappings of MAC to IP in lr_in_arp_resolve
> table on GR
> >>> >>>>> router2 (with Han's patch of dynamic_neigh_routes=true for all
> the
> >>> >>>>> Gateway Routers). With this in mind, when an ARP request is sent
> out by
> >>> >>>>> router2's hypervisor the packet should be directly sent to the
> >>> >>>>> distributed router alone. Your commit 32f5ebb0622 (ovn-northd:
> Limit
> >>> >>>>> ARP/ND broadcast domain whenever possible) should have allowed
> only
> >>> >>>>> unicast. However, in ls_in_l2_lkup table we have
> >>> >>>>>>>>
> >>> >>>>>>>>   table=19(ls_in_l2_lkup      ), priority=80   ,
> match=(eth.src ==
> >>> >>>>> { M2 } && (arp.op == 1 || nd_ns)), action=(outport = "_MC_flood";
> >>> >>> output;)
> >>> >>>>>>>>   table=19(ls_in_l2_lkup      ), priority=75   ,
> match=(flags[1] ==
> >>> >>>>> 0 && arp.op == 1 && arp.tpa == { IP1}), action=(outport =
> >>> >>>>> "jtor-router2"; output;)
> >>> >>>>>>>>
> >>> >>>>>>>> As you can see, `priority=80` rule will always be hit and
> sent out
> >>> >>>>> to all the GRs. The `priority=75` rule is never hit. So, we will
> see ARP
> >>> >>>>> packets on the GENEVE tunnel. So, we need to change
> `priority=80` to
> >>> >>>>> match GARP request packets. That way, for the known OVN IPs case
> we
> >>> >>>>> don't do broadcast.
> >>> >>>>>>>
> >>> >>>>>>> Since the solution to case 2) below (i.e.
> >>> >>>>> learn_from_arp_request=false) solves the problem of case 1),
> too, I
> >>> >>>>> think we don't need this change just for case 1). As @Dumitru
> Ceara
> >>> >>>>>  mentioned, there is some cost because it adds extra flows. It
> would be
> >>> >>>>> significant amount of flows if there are a lot of snat_and_dnat
> IPs.
> >>> >>>>> What do you think?
> >>> >>>>
> >>> >>>> I think the following might be a solution, although with the cost
> of
> >>> >>>> adding as many flows as dnat_and_snat IPs are configured:
> >>> >>>>
> >>> >>>> - priority 80: explicitly determine if an ARP request is a self
> >>> >>>> originated GARP for configured IP addresses and dnat_and_snat IPs
> (by
> >>> >>>> matching on all eth.src and arp.tpa pairs) and if so flood on all
> >>> >>>> non-patch ports.
> >>> >>>> - priority 75: if arp.tpa is owned by an OVN logical router port,
> >>> >>>> "unicast" it only on the patch port towards the router.
> >>> >>>> - priority 1: flood any broadcast packet.
> >>> >>>>
> >>> >>>> Together with the learn_from_arp_request=false knob this would
> cover
> >>> >>>> both case 1 (join switch) and case 2 (external switch).
> >>> >>>>
> >>> >>>> Wdyt?
> >>> >>>>
> >>> >>> Would the "learn_from_arp_request=false knob" cover both cases? If
> yes,
> >>> >>> we don't need to add more flows of priority 80, or more accurately:
> >>> >>> whether to update the priority-80 flows is not directly related to
> the
> >>> >>> current problem.
> >>> >>>
> >>> >>
> >>> >> Yes, it would, except for the fact that the ARP requests would
> still be
> >>> >> flooded to all routers (and ignored at the destination). Which is
> afaiu
> >>> >> what Girish was worried about. In order to address that part too I'm
> >>> >> afraid we have to update the priority-80 flows.
> >>> >>
> >>> >> Regards,
> >>> >> Dumitru
> >>> >>
> >>> >>>>>>
> >>> >>>>>>
> >>> >>>>>> Han, yes it will work. However, my only concern is that we
> would send
> >>> >>>>> all these ARP requests via tunnel to each of 1000 hypervisors
> and these
> >>> >>>>> hypervisors will just drop them on the floor. when they see
> >>> >>>>> learn_from_arp_request=false.
> >>> >>>>>
> >>> >>>>> I think maybe it is not a problem since it happens only once on
> the Join
> >>> >>>>> switch. Once the MAC is learned, it won't broadcast again. It
> may be
> >>> >>>>> more of a problem on the external LS if periodical GARP is
> required
> >>> >>>>> there. However, I'd suggest to have some test and see if it is
> really a
> >>> >>>>> problem, before trying to solve it.
> >>> >>>>>
> >>> >>>>>>
> >>> >>>>>> Han, Dumitru,
> >>> >>>>>>
> >>> >>>>>> Why can't we swap the priorities of the above two flows so that
> the
> >>> >>>>> ARP request for NexHop IP known to OVN will be always sent via
> >>> >>> `unicast`?
> >>> >>>>>
> >>> >>>>> If swapped, even GARP won't get broadcasted. Maybe that's not the
> >>> >>>>> desired behavior.
> >>> >>>>>
> >>> >>>>
> >>> >>>> This is definitely not desired as we'd be hitting the prio 75
> flow that
> >>> >>>> would send the self originated GARP request (IPx) packet back
> towards
> >>> >>>> the router port that owns IPx.
> >>> >>>>
> >>> >>>>>>
> >>> >>>>>> Regards,
> >>> >>>>>> ~Girish
> >>> >>>>>>
> >>> >>>>>>>
> >>> >>>>>>>>
> >>> >>>>>>>> 2. External Logical Switch Case
> >>> >>>>>>>>
> >>> >>>>>>>>                        10.10.10.0/24 <http://10.10.10.0/24>
> >>> >>> <http://10.10.10.0/24>
> >>> >>>>>
> >>> >>>>>>>>    -------------------------+--------------------------
> >>> >>>>>>>>                             |
> >>> >>>>>>>>                          localnet
> >>> >>>>>>>>                       +-----+-----+
> >>> >>>>>>>>                       | external  |
> >>> >>>>>>>>          +------------+    LS1    +-------------+
> >>> >>>>>>>>          |            +-----+-----+             |
> >>> >>>>>>>>          |                  |                   |
> >>> >>>>>>>>      10.10.10.2         10.10.10.3          10.10.10.4
> >>> >>>>>>>>         SNAT               SNAT                SNAT
> >>> >>>>>>>>    +-----+-----+      +-----+-----+       +-----------+
> >>> >>>>>>>>    | l3gateway |      | l3gateway |       | l3gateway |
> >>> >>>>>>>>    |   node1   |      |   node2   |       |   node3   |
> >>> >>>>>>>>    +-----------+      +-----------+       +-----------+
> >>> >>>>>>>>
> >>> >>>>>>>> In this case, we have some of the IPs in OVN and some in the
> >>> >>>>> physical network. If we fix (1) above, all the ARP requests for
> the
> >>> >>>>> OVN's router IPs will be unicast. However, all the ARP requests
> to
> >>> >>>>> external IPs, say 10.10.10.1 on the "physical router", will be
> >>> >>>>> broadcast. Now, we will see these ARP broadcasts on all the L3
> gateway
> >>> >>>>> routers. With 'learn_from_arp_request=false' [a], then the
> MAC_Binding
> >>> >>>>> table will not explode for both ARP and GARP requests.
> >>> >>>>>>>>
> >>> >>>>>>>> So, I don't think GARP requests and replies is the issue here?
> >>> >>>>> Furthermore, learning from the GARP replies are blocked on
> certain
> >>> >>>>> routers. For example:
> >>> >>>>>
> >>> >>>
> https://www.juniper.net/documentation/en_US/junose15.1/topics/concept/ip-gratuitous-arps-transmission-overview.html
> >>> >>>>>  says "By default, updating the ARP cache on GARP replies is
> disabled on
> >>> >>>>> the router.". So, our NAT addresses mapping will not be learnt.
> >>> >>>>
> >>> >>>> Just as a side note, the above doesn't mean Juniper boxes don't
> support
> >>> >>>> learning from GARP replies, just that they'd need extra
> configuration. I
> >>> >>>> don't necessarily think that's a bad thing if properly documented
> in OVN
> >>> >>>> that we would be generating GARP replies.
> >>> >>>>
> >>> >>>> Regards,
> >>> >>>> Dumitru
> >>> >>>>
> >>> >>>>>>>>
> >>> >>>>>>>> Regards,
> >>> >>>>>>>> ~Girish
> >>> >>>>>>>>
> >>> >>>>>>>>
> >>> >>>>>>>> [a] - From Han's mail, the meaning of
> learn_from_arp_request=false
> >>> >>>>> --> if the TPA is on the router, add a new entry (it means the
> >>> >>>>>>>>>     remote wants to communicate with this node, so it makes
> >>> >>> sense to
> >>> >>>>>>>>>     learn the remote as well). Otherwise, ignore it and no
> new
> >>> >>>>> entry added.
> >>> >>>>>>>>
> >>> >>>>>>>>
> >>> >>>>>>>>
> >>> >>>>>>
> >>> >>>>>> --
> >>> >>>>>> You received this message because you are subscribed to the
> Google
> >>> >>>>> Groups "ovn-kubernetes" group.
> >>> >>>>>> To unsubscribe from this group and stop receiving emails from
> it, send
> >>> >>>>> an email to ovn-kubernetes+unsubscribe at googlegroups.com
> >>> >>> <mailto:ovn-kubernetes%2Bunsubscribe at googlegroups.com>
> >>> >>>>> <mailto:ovn-kubernetes%2Bunsubscribe at googlegroups.com
> >>> >>> <mailto:ovn-kubernetes%252Bunsubscribe at googlegroups.com>>.
> >>> >>>>>> To view this discussion on the web visit
> >>> >>>>>
> >>> >>>
> https://groups.google.com/d/msgid/ovn-kubernetes/CAAF2STRnem2PeSahuwhro1t%2BQJxchZNC7viq8n-ngM9KU%2B%2B-Xw%40mail.gmail.com
> .
> >>> >>>>
> >>> >>>
> >>> >>> --
> >>> >>> You received this message because you are subscribed to the Google
> >>> >>> Groups "ovn-kubernetes" group.
> >>> >>> To unsubscribe from this group and stop receiving emails from it,
> send
> >>> >>> an email to ovn-kubernetes+unsubscribe at googlegroups.com
> >>> >>> <mailto:ovn-kubernetes+unsubscribe at googlegroups.com>.
> >>> >>> To view this discussion on the web visit
> >>> >>>
> https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDCkHGft30Vx_Yx3fiCeki4NM4YwCvNJaU2S2mGv4buLwgg%40mail.gmail.com
> >>> >>> <
> https://groups.google.com/d/msgid/ovn-kubernetes/CADtzDCkHGft30Vx_Yx3fiCeki4NM4YwCvNJaU2S2mGv4buLwgg%40mail.gmail.com?utm_medium=email&utm_source=footer
> >.
> >>> >>
> >>> >> _______________________________________________
> >>> >> discuss mailing list
> >>> >> discuss at openvswitch.org
> >>> >> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> >>> >
> >>>
> >> --
> >> You received this message because you are subscribed to the Google
> Groups "ovn-kubernetes" group.
> >> To unsubscribe from this group and stop receiving emails from it, send
> an email to ovn-kubernetes+unsubscribe at googlegroups.com.
> >> To view this discussion on the web visit
> https://groups.google.com/d/msgid/ovn-kubernetes/CADO7ZnoBqbOvo-2jjTOKPA3otgA_4LYqiao2k718guFdW8kTAg%40mail.gmail.com
> .
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20200603/b2bbc80b/attachment-0001.html>


More information about the discuss mailing list