[ovs-dev] Scaling of Logical_Flows and MAC_Binding tables

Daniel Alvarez Sanchez dalvarez at redhat.com
Thu Nov 26 10:40:54 UTC 2020


On Wed, Nov 25, 2020 at 7:59 PM Dumitru Ceara <dceara at redhat.com> wrote:

> On 11/25/20 7:06 PM, Numan Siddique wrote:
> > On Wed, Nov 25, 2020 at 10:24 PM Renat Nurgaliyev <impleman at gmail.com>
> wrote:
> >>
> >>
> >>
> >> On 25.11.20 16:14, Dumitru Ceara wrote:
> >>> On 11/25/20 3:30 PM, Renat Nurgaliyev wrote:
> >>>> Hello folks,
> >>>>
> >>> Hi Renat,
> >>>
> >>>> we run a lab where we try to evaluate scalability potential of OVN
> with
> >>>> OpenStack as CMS.
> >>>> Current lab setup is following:
> >>>>
> >>>> 500 networks
> >>>> 500 routers
> >>>> 1500 VM ports (3 per network/router)
> >>>> 1500 Floating IPs (one per VM port)
> >>>>
> >>>> There is an external network, which is bridged to br-provider on
> gateway
> >>>> nodes. There are 2000 ports
> >>>> connected to this external network (1500 Floating IPs + 500 SNAT
> router
> >>>> ports). So the setup is not
> >>>> very big we'd say, but after applying this configuration via ML2/OVN
> >>>> plugin, northd kicks in and does
> >>>> its job, and after its done, Logical_Flow table gets 645877 entries,
> >>>> which is way too much. But ok,
> >>>> we move on and start one controller on the gateway chassis, and here
> >>>> things get really messy.
> >>>> MAC_Binding table grows from 0 to 999088 entries in one moment, and
> >>>> after its done, the size of SB
> >>>> biggest tables look like this:
> >>>>
> >>>> 999088 MAC_Binding
> >>>> 645877 Logical_Flow
> >>>> 4726 Port_Binding
> >>>> 1117 Multicast_Group
> >>>> 1068 Datapath_Binding
> >>>> 1046 Port_Group
> >>>> 551 IP_Multicast
> >>>> 519 DNS
> >>>> 517 HA_Chassis_Group
> >>>> 517 HA_Chassis
> >>>> ...
> >>>>
> >>>> MAC binding table gets huge, basically it now has an entry for every
> >>>> port that is connected to external
> >>>> network * number of datapaths, which roughly makes it one million
> >>>> entries. This table by itself increases
> >>>> the size of the SB by 200 megabytes. Logical_Flow table also gets very
> >>>> heavy, we have already played a bit
> >>>> with logical datapath patches that Ilya Maximets submitted, and it
> looks
> >>>> much better, but the size of
> >>>> the MAC_Binding table still feels inadequate.
> >>>>
> >>>> We would like to start to work at least on MAC_Binding table
> >>>> optimisation, but it is a bit difficult
> >>>> to start working from scratch. Can someone help us with ideas how this
> >>>> could be optimised?
> >>>>
> >>>> Maybe it would also make sense to group entries in MAC_Binding table
> in
> >>>> the same way like it is proposed
> >>>> for logical flows in Ilya's patch?
> >>>>
> >>> Maybe it would work but I'm not really sure how, right now.  However,
> >>> what if we change the way MAC_Bindings are created?
> >>>
> >>> Right now a MAC Binding is created for each logical router port but in
> >>> your case there are a lot of logical router ports connected to the
> >>> single provider logical switch and they all learn the same ARPs.
> >>>
> >>> What if we instead store MAC_Bindings per logical switch?  Basically
> >>> sharing all these MAC_Bindings between all router ports connected to
> the
> >>> same LS.
> >>>
> >>> Do you see any problem with this approach?
> >>>
> >>> Thanks,
> >>> Dumitru
> >>>
> >>>
> >> I believe that this approach is way to go, at least nothing comes to my
> mind
> >> that could go wrong here. We will try to make a patch for that.
> However, if
> >> someone is familiar with the code and knows how to do it fast, it would
> also
> >> be very nice.
> >
> > This approach should work.
> >
> > I've another idea (I won't call it a solution yet). What if we drop
> > the usage of MAC_Binding altogether ?
>
> This would be great!
>
> >
> > - When ovn-controller learns a mac_binding, it will not create a row
> > into the SB MAC_binding table
> > - Instead it will maintain the learnt mac binding in its memory.
> > - ovn-controller will still program the table 66 with the flow to set
> > the eth.dst (for the get_arp() action)
> >
> > This has a couple of advantages
> >   - Right now we never flush the old/stale mac_binding entries.
> >   - If suppose the mac of an external IP has changed, but OVN has an
> > entry for that IP with old mac in the mac_binding table,
> >     we will use the old mac, causing the packet to be sent out to the
> > wrong destination and the packet might get lost.
> >   - So we will get rid of this problem
> >   - We will also save SB DB space.
> >
> > There are few disadvantages
> >   -  Other ovn-controllers will not add the flows in table 66. I guess
> > this should be fine as each ovn-controller
> > can generate the ARP request and learn the mac.
> >   - When ovn-controller restarts we lose the learnt macs and would
> > need to learn again.
> >
> > Any thoughts on this ?
>

It'd be great to have some sort of local ARP cache but I'm concerned about
the performance implications.

- How are you going to determine when an entry is stale?
If you slow path the packets to reset the timeout everytime a pkt with
source mac is received, it doesn't look good. Maybe you have something else
in mind.

-

> >
> There's another scenario that we need to take care of and doesn't seem
> too obvious to address without MAC_Bindings.
>
> GARPs were being injected in the L2 broadcast domain of a LS for nat
> addresses in case FIPs are reused by the CMS, introduced by:
>
>
> https://github.com/ovn-org/ovn/commit/069a32cbf443c937feff44078e8828d7a2702da8


Dumitru and I have been discussing the possibility of reverting this patch
and rely on CMSs to maintain the MAC_Binding entries associated with the
FIPs [0].
I'm against reverting this patch in OVN [1] for multiple reasons being the
most important one the fact that if we rely on workarounds in the CMS side,
we'll be creating a control plane dependency for something that is pure
dataplane only (ie. if Neutron server is down - outage, upgrades, etc. -,
traffic is going to be disrupted). On the other hand one could argue that
the same dependency now exists on ovn-controller being up & running but I
believe that this is better than a) relying on workarounds on CMSs b)
relying on CMSs availability.

In the short term I think that moving the MAC_Binding entries to LS instead
of LRP as it was suggested up thread would be a good idea and in the long
haul, the ARP *local* cache seems to be the right solution. Brainstorming
with Dumitru he suggested inspecting the flows regularly to see if the
packet count on flows that check if src_mac == X has not increased in a
while and then remove the ARP responder flows locally.

[0]
https://github.com/openstack/networking-ovn/commit/5181f1106ff839d08152623c25c9a5f6797aa2d7

[1]
https://github.com/ovn-org/ovn/commit/069a32cbf443c937feff44078e8828d7a2702da8

>
>
> Recently, due to the dataplane scaling issue (4K resubmit limit being
> hit), we don't flood these packets on non-router ports and instead
> create the MAC Bindings directly from ovn-controller:
>
>
> https://github.com/ovn-org/ovn/commit/a2b88dc5136507e727e4bcdc4bf6fde559f519a9
>
> Without the MAC_Binding table we'd need to find a way to update or flush
> stale bindings when an IP is used for a VIF or FIP.
>
> Thanks,
> Dumitru
>
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>


More information about the dev mailing list