[ovs-discuss] [OVN] flow explosion in lr_in_arp_resolve table

Han Zhou hzhou at ovn.org
Wed May 6 18:11:15 UTC 2020


On Wed, May 6, 2020 at 12:49 AM Numan Siddique <numans at ovn.org> wrote:
>
>
>
> On Wed, May 6, 2020 at 12:56 PM Numan Siddique <numans at ovn.org> wrote:
>>
>>
>>
>> On Wed, May 6, 2020 at 12:28 AM Han Zhou <hzhou at ovn.org> wrote:
>>>
>>>
>>>
>>> On Fri, May 1, 2020 at 2:14 PM Dan Winship <danwinship at redhat.com>
wrote:
>>> >
>>> > On 5/1/20 12:37 PM, Girish Moodalbail wrote:
>>> > > If we now look at table=12 (lr_in_arp_resolve) in the ingress
pipeline
>>> > > of Gateway Router-1, then you will see that there will be 2000
logical
>>> > > flow entries...
>>> >
>>> > > In the topology above, the only intended path is North-South between
>>> > > each gateway router and the logical router. There is no east-west
>>> > > traffic between the gateway routers
>>> >
>>> > > Is there an another way to solve the above problem with just
keeping the
>>> > > single join logical switch?
>>> >
>>> > Two thoughts:
>>> >
>>> > 1. In openshift-sdn, the bridge doesn't try to handle ARP itself. It
>>> > just lets ARP requests pass through normally, and lets ARP replies
pass
>>> > through normally as long as they are correct (ie, it doesn't let
>>> > spoofing through). This means fewer flows but more traffic. Maybe
that's
>>> > the right tradeoff?
>>> >
>>> The 2M entries here is not for ARP responder, but more equivalent to
the neighbour table (or ARP cache), on each LR. The ARP responder resides
in the LS (join logical switch), which is O(n) instead of O(n^2), so it is
not a problem here.
>>>
>>> However, a similar idea may works here to avoid the O(n^2) scale issue.
For the neighbour table, actually OVN has two parts, one is statically
build, which is the 2M entires mentioned in this case, and the other is the
dynamic ARP resolve - the mac_binding table, which is dynamically populated
by handling ARP messages. To solve the problem here, it is possible to
change OVN to support configuring a LR to avoid static neighbour table, and
relies only on dynamic ARP resolving. In this case, all the gateway routers
can be configured as not using static ARP resolving, and eventually there
will be only 2 entries (one for IPv4 and one for IPv6) for each gateway
router in mac_binding table for the north-south traffic to the join router.
(of source there will be still same amount of mac_bindings in each router
for the external traffic on the other side of the gateway routers).
>>>
>>> This change seems straightforward, but I am not sure if there is any
corner cases.
>>
>>
>> May be ovn-northd instead of adding these lflows in lr_in_arp_resolve,
can probably create a mac_binding table row in SB DB ?
>> This would result in less logical flows at the cost of more mac_binding
entries. The number of OF flows would still remain the same.
>
>
> I forgot to mention, Lorenzo have similar ideas for moving the arp
resolve lflows for NAT entries to mac_binding rows.
>

I am hesitate to the approach of moving to mac_binding as solution to this
particular problem, because:
1. Although cost of each mac_binding entry may be much lower than a logical
flow entry, it would still be O(n^2), since LRP is part of the key in the
table.
2. It is better to separate the static and dynamic part clearly. Moving to
mac_binding will lose this clarity in data, and also the ownership of the
data as well (now mac_binding entries are added only by ovn-controllers).
Although I am not in favor of solving the problem with this approach
(because of 1)), maybe it makes sense to reduce number of logical flows as
a general improvement by moving all neighbour information to mac_binding
for scalability. If we do so, I would suggest to figure out a way to keep
the data clarity between static and dynamic part.

For this particular problem, we just don't want the static part populated
because most of them are not needed except one per LRP. However, even
before considering optionally disabling the static part, I wanted to
understand firstly why separating the join LS would not solve the problem.

>>
>>
>> Thanks
>> Numan
>>
>>>
>>> > 2. In most places in ovn-kubernetes, our MAC addresses are
>>> > programmatically related to the corresponding IP addresses, and in
>>> > places where that's not currently true, we could try to make it true,
>>> > and then perhaps the thousands of rules could just be replaced by a
>>> > single rule?
>>> >
>>> This may be a good idea, but I am not sure how to implement in OVN to
make it generic, since most OVN users can't make such assumption.
>>>
>>> On the other hand, why wouldn't splitting the join logical switch to
1000 LSes solve the problem? I understand that there will be 1000 more
datapaths, and 1000 more LRPs, but these are all O(n), which is much more
efficient than the O(n^2) exploding. What's the other scale issues created
by this?
>>>
>>> In addition, Girish, for the external LS, I am not sure why can't it be
shared, if all the nodes are connected to a single L2 network. (If they are
connected to separate L2 networks, different external LSes should be
created, at least according to current OVN model).
>>>
>>> Thanks,
>>> Han
>>> _______________________________________________
>>> discuss mailing list
>>> discuss at openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20200506/550f47d0/attachment.html>


More information about the discuss mailing list