[ovs-discuss] [OVN] flow explosion in lr_in_arp_resolve table

Han Zhou zhouhan at gmail.com
Thu May 21 21:00:48 UTC 2020


On Thu, May 21, 2020 at 10:33 AM Venugopal Iyer <venugopali at nvidia.com>
wrote:

> Han,
>
> just a quick question below..
>
> ________________________________________
> From: ovn-kubernetes at googlegroups.com <ovn-kubernetes at googlegroups.com>
> on behalf of Girish Moodalbail <gmoodalbail at gmail.com>
> Sent: Tuesday, May 19, 2020 11:09 PM
> To: Han Zhou
> Cc: Han Zhou; Dan Winship; ovs-discuss; ovn-kubernetes at googlegroups.com
> Subject: Re: [ovs-discuss] [OVN] flow explosion in lr_in_arp_resolve table
>
> External email: Use caution opening links or attachments
>
> Hello Han,
>
> Please see in-line:
>
> On Sat, May 16, 2020 at 11:17 PM Han Zhou <zhouhan at gmail.com<mailto:
> zhouhan at gmail.com>> wrote:
>
>
> On Sat, May 16, 2020 at 12:13 PM Girish Moodalbail <gmoodalbail at gmail.com
> <mailto:gmoodalbail at gmail.com>> wrote:
> Hello Han,
>
> Can you please explain how the dynamic resolution of the IP-to-MAC will
> work with this new option set?
>
> Say the packet is being forwarded from router2 towards the distributed
> router? So, nexthop (reg0) is set to IP1 and we need to find the MAC
> address M1 to set eth.dst to.
>
> +----------------+        +----------------+
> |   l3gateway    |        |   l3gateway    |
> |    router2     |        |    router3     |
> +-------------+--+        +-+--------------+
>             IP2,M2         IP3,M3
>               |             |
>            +--+-------------+---+
>            |    join switch     |
>            +---------+----------+
>                      |
>                   IP1,M1
>              +-------+--------+
>              |  distributed   |
>              |     router     |
>              +----------------+
>
> The MAC M1 will not obviously in the MAC_binding table. On the hypervisor
> where the packet originated, the router2's port and the distributed
> router's port are locally present. So, does this result in a PACKET_IN to
> the ovn-controller and the resolution happens there?
>
> Yes there will be a PACKET_IN, and then:
> 1. ovn-controller will generate the ARP request for IP1, and send
> PACKET_OUT to OVS.
> 2. The ARP request will be delivered to the distributed router pipeline
> only, because of a special handling of ARP in OVN for IPs of router ports,
> although it is a broadcast. (It would have been broadcasted to all GRs
> without that special handling)
> 3. The distributed router pipeline should learn the IP-MAC binding of
> IP2-M2 (through a PACKET_IN to ovn-controller), and at the same time send
> ARP reply to the router2 in the distributed router pipeline.
> 4. Router2 pipeline will handle the ARP response and learn the IP-MAC
> binding of IP1-M1 (through a PACKET_IN to ovn-controller).
>
> Unfortunately, the ARP request (who as IP1) from router2 is broadcasted
> out to all of the chassis through Geneve Tunnel. The other gateway routers
> learn the Source mac of 'M2'. Now, each of the gateway router has an entry
> for (IP2, M2) in the MAC binding table on their respective rtoj-<blah>
> router port. So, the MAC_Binding table will now have N X N entries, where N
> is the number of gateway routers.
>
> Per your explanation above, the ARP request should not have broadcasted
> right?
>
>
> <vi> probably obvious and I am missing it, but..
> <vi> I see the lflow to direct ARP request to the router port, instead of
> bcast. However,
> <vi> we also add flows to bcast self-originated (unsolicitated ?) arp
> requests (we should
> <vi> not see this  for router IPs, I suppose). But, given we just match on
> the source
> <vi> MAC address  of the packet for such packets, does it differ from the
> ARP
> <vi> request generated for Router IP?
>
> Good catch! That seems to be the reason why it is broadcasted. I thought
the feature was only allowing GARP to be broadcasted, but it is actually
allowing (G)ARP including regular ARP generated by the LRs. It can be an
easy fix to: commit 32f5ebb062 ("ovn-northd: Limit ARP/ND broadcast domain
whenever possible."), but I am not sure if there are other concerns of
doing that. @Dumitru Ceara <dceara at redhat.com> to comment if we can
restrict it to be GARP only.

On the other hand, in this use case, if there are any ARP from the
distributed router to any of the GRs, then all the GRs should have learned
the MAC-bindings of the IP1-M1, and they won't send ARP for IP1 any more,
thus would not result in N x N MAC-bindings, right? In the real use case,
it may depend on which direction of traffic comes first. If it is always
from external to k8s workloads first, then yes it will end up with N x N
mac-bindings finally.


> thanks,
>
> -venu
>
> Note that the direction of  ARP request is from Gateway Router to
> Distributed Router.
>
> Regards,
> ~Girish
>
>
>
>
> How about the resolution of IP3-to-M3 happen on gateway router2? Will
> there be an ARP request packet that will be broadcasted on the join switch
> for this case?
>
> I think in the use case of ovn-k8s, as you described before, this should
> not happen. However, if this does happen, it is similar to above steps,
> except that in step 2) and 3) the ARP request and response will be sent
> between the chassises through tunnel. If this happens between all pairs of
> GRs, then there will be again O(n^2) MAC_Binding entries.
>
> I haven't tested the GR scenario yet, so I can't guarantee it works as
> expected. Please let me know if you see any problems. I will submit formal
> patch with more test cases if it is confirmed in your environment.
>
> Thanks,
> Han
>
>
> Regards,
> ~Girish
>
> On Sat, May 16, 2020 at 10:25 AM Girish Moodalbail <gmoodalbail at gmail.com
> <mailto:gmoodalbail at gmail.com>> wrote:
>
>
> On Sat, May 16, 2020 at 12:36 AM Han Zhou <zhouhan at gmail.com<mailto:
> zhouhan at gmail.com>> wrote:
>
>
> On Tue, May 5, 2020 at 11:57 AM Han Zhou <hzhou at ovn.org<mailto:
> hzhou at ovn.org>> wrote:
> >
> >
> >
> > On Fri, May 1, 2020 at 2:14 PM Dan Winship <danwinship at redhat.com
> <mailto:danwinship at redhat.com>> wrote:
> > >
> > > On 5/1/20 12:37 PM, Girish Moodalbail wrote:
> > > > If we now look at table=12 (lr_in_arp_resolve) in the ingress
> pipeline
> > > > of Gateway Router-1, then you will see that there will be 2000
> logical
> > > > flow entries...
> > >
> > > > In the topology above, the only intended path is North-South between
> > > > each gateway router and the logical router. There is no east-west
> > > > traffic between the gateway routers
> > >
> > > > Is there an another way to solve the above problem with just keeping
> the
> > > > single join logical switch?
> > >
> > > Two thoughts:
> > >
> > > 1. In openshift-sdn, the bridge doesn't try to handle ARP itself. It
> > > just lets ARP requests pass through normally, and lets ARP replies pass
> > > through normally as long as they are correct (ie, it doesn't let
> > > spoofing through). This means fewer flows but more traffic. Maybe
> that's
> > > the right tradeoff?
> > >
> > The 2M entries here is not for ARP responder, but more equivalent to the
> neighbour table (or ARP cache), on each LR. The ARP responder resides in
> the LS (join logical switch), which is O(n) instead of O(n^2), so it is not
> a problem here.
> >
> > However, a similar idea may works here to avoid the O(n^2) scale issue.
> For the neighbour table, actually OVN has two parts, one is statically
> build, which is the 2M entires mentioned in this case, and the other is the
> dynamic ARP resolve - the mac_binding table, which is dynamically populated
> by handling ARP messages. To solve the problem here, it is possible to
> change OVN to support configuring a LR to avoid static neighbour table, and
> relies only on dynamic ARP resolving. In this case, all the gateway routers
> can be configured as not using static ARP resolving, and eventually there
> will be only 2 entries (one for IPv4 and one for IPv6) for each gateway
> router in mac_binding table for the north-south traffic to the join router.
> (of source there will be still same amount of mac_bindings in each router
> for the external traffic on the other side of the gateway routers).
> >
> > This change seems straightforward, but I am not sure if there is any
> corner cases.
>
> Hi Girish,
>
> I've sent a RFC patch here for the above proposal:
> https://patchwork.ozlabs.org/project/openvswitch/patch/1589614395-99499-1-git-send-email-hzhou@ovn.org/
> For this use case, just set options:dynamic_neigh_routes=true for all the
> Gateway Routers. Could you try it in your scale environment and see if it
> solves the problem?
>
> Thanks,
> Han
>
> >
> > > 2. In most places in ovn-kubernetes, our MAC addresses are
> > > programmatically related to the corresponding IP addresses, and in
> > > places where that's not currently true, we could try to make it true,
> > > and then perhaps the thousands of rules could just be replaced by a
> > > single rule?
> > >
> > This may be a good idea, but I am not sure how to implement in OVN to
> make it generic, since most OVN users can't make such assumption.
> >
> > On the other hand, why wouldn't splitting the join logical switch to
> 1000 LSes solve the problem? I understand that there will be 1000 more
> datapaths, and 1000 more LRPs, but these are all O(n), which is much more
> efficient than the O(n^2) exploding. What's the other scale issues created
> by this?
> >
> > In addition, Girish, for the external LS, I am not sure why can't it be
> shared, if all the nodes are connected to a single L2 network. (If they are
> connected to separate L2 networks, different external LSes should be
> created, at least according to current OVN model).
>
> Thanks Han for the patch. Will give it a try and let you know.
>
> Regards,
> ~Girish
>
> >
> > Thanks,
> > Han
>
> --
> You received this message because you are subscribed to the Google Groups
> "ovn-kubernetes" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ovn-kubernetes+unsubscribe at googlegroups.com<mailto:
> ovn-kubernetes+unsubscribe at googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/ovn-kubernetes/CAAF2STTq4WSwvwHbws5e0yozT7OM9RYcpWwaA2v49k83JDmEqA%40mail.gmail.com
> <
> https://groups.google.com/d/msgid/ovn-kubernetes/CAAF2STTq4WSwvwHbws5e0yozT7OM9RYcpWwaA2v49k83JDmEqA%40mail.gmail.com?utm_medium=email&utm_source=footer
> >.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20200521/e6dfab65/attachment-0001.html>


More information about the discuss mailing list