[ovs-dev] Scaling of Logical_Flows and MAC_Binding tables

Numan Siddique numans at ovn.org
Thu Nov 26 11:02:43 UTC 2020


On Thu, Nov 26, 2020 at 4:11 PM Daniel Alvarez Sanchez
<dalvarez at redhat.com> wrote:
>
> On Wed, Nov 25, 2020 at 7:59 PM Dumitru Ceara <dceara at redhat.com> wrote:
>
> > On 11/25/20 7:06 PM, Numan Siddique wrote:
> > > On Wed, Nov 25, 2020 at 10:24 PM Renat Nurgaliyev <impleman at gmail.com>
> > wrote:
> > >>
> > >>
> > >>
> > >> On 25.11.20 16:14, Dumitru Ceara wrote:
> > >>> On 11/25/20 3:30 PM, Renat Nurgaliyev wrote:
> > >>>> Hello folks,
> > >>>>
> > >>> Hi Renat,
> > >>>
> > >>>> we run a lab where we try to evaluate scalability potential of OVN
> > with
> > >>>> OpenStack as CMS.
> > >>>> Current lab setup is following:
> > >>>>
> > >>>> 500 networks
> > >>>> 500 routers
> > >>>> 1500 VM ports (3 per network/router)
> > >>>> 1500 Floating IPs (one per VM port)
> > >>>>
> > >>>> There is an external network, which is bridged to br-provider on
> > gateway
> > >>>> nodes. There are 2000 ports
> > >>>> connected to this external network (1500 Floating IPs + 500 SNAT
> > router
> > >>>> ports). So the setup is not
> > >>>> very big we'd say, but after applying this configuration via ML2/OVN
> > >>>> plugin, northd kicks in and does
> > >>>> its job, and after its done, Logical_Flow table gets 645877 entries,
> > >>>> which is way too much. But ok,
> > >>>> we move on and start one controller on the gateway chassis, and here
> > >>>> things get really messy.
> > >>>> MAC_Binding table grows from 0 to 999088 entries in one moment, and
> > >>>> after its done, the size of SB
> > >>>> biggest tables look like this:
> > >>>>
> > >>>> 999088 MAC_Binding
> > >>>> 645877 Logical_Flow
> > >>>> 4726 Port_Binding
> > >>>> 1117 Multicast_Group
> > >>>> 1068 Datapath_Binding
> > >>>> 1046 Port_Group
> > >>>> 551 IP_Multicast
> > >>>> 519 DNS
> > >>>> 517 HA_Chassis_Group
> > >>>> 517 HA_Chassis
> > >>>> ...
> > >>>>
> > >>>> MAC binding table gets huge, basically it now has an entry for every
> > >>>> port that is connected to external
> > >>>> network * number of datapaths, which roughly makes it one million
> > >>>> entries. This table by itself increases
> > >>>> the size of the SB by 200 megabytes. Logical_Flow table also gets very
> > >>>> heavy, we have already played a bit
> > >>>> with logical datapath patches that Ilya Maximets submitted, and it
> > looks
> > >>>> much better, but the size of
> > >>>> the MAC_Binding table still feels inadequate.
> > >>>>
> > >>>> We would like to start to work at least on MAC_Binding table
> > >>>> optimisation, but it is a bit difficult
> > >>>> to start working from scratch. Can someone help us with ideas how this
> > >>>> could be optimised?
> > >>>>
> > >>>> Maybe it would also make sense to group entries in MAC_Binding table
> > in
> > >>>> the same way like it is proposed
> > >>>> for logical flows in Ilya's patch?
> > >>>>
> > >>> Maybe it would work but I'm not really sure how, right now.  However,
> > >>> what if we change the way MAC_Bindings are created?
> > >>>
> > >>> Right now a MAC Binding is created for each logical router port but in
> > >>> your case there are a lot of logical router ports connected to the
> > >>> single provider logical switch and they all learn the same ARPs.
> > >>>
> > >>> What if we instead store MAC_Bindings per logical switch?  Basically
> > >>> sharing all these MAC_Bindings between all router ports connected to
> > the
> > >>> same LS.
> > >>>
> > >>> Do you see any problem with this approach?
> > >>>
> > >>> Thanks,
> > >>> Dumitru
> > >>>
> > >>>
> > >> I believe that this approach is way to go, at least nothing comes to my
> > mind
> > >> that could go wrong here. We will try to make a patch for that.
> > However, if
> > >> someone is familiar with the code and knows how to do it fast, it would
> > also
> > >> be very nice.
> > >
> > > This approach should work.
> > >
> > > I've another idea (I won't call it a solution yet). What if we drop
> > > the usage of MAC_Binding altogether ?
> >
> > This would be great!
> >
> > >
> > > - When ovn-controller learns a mac_binding, it will not create a row
> > > into the SB MAC_binding table
> > > - Instead it will maintain the learnt mac binding in its memory.
> > > - ovn-controller will still program the table 66 with the flow to set
> > > the eth.dst (for the get_arp() action)
> > >
> > > This has a couple of advantages
> > >   - Right now we never flush the old/stale mac_binding entries.
> > >   - If suppose the mac of an external IP has changed, but OVN has an
> > > entry for that IP with old mac in the mac_binding table,
> > >     we will use the old mac, causing the packet to be sent out to the
> > > wrong destination and the packet might get lost.
> > >   - So we will get rid of this problem
> > >   - We will also save SB DB space.
> > >
> > > There are few disadvantages
> > >   -  Other ovn-controllers will not add the flows in table 66. I guess
> > > this should be fine as each ovn-controller
> > > can generate the ARP request and learn the mac.
> > >   - When ovn-controller restarts we lose the learnt macs and would
> > > need to learn again.
> > >
> > > Any thoughts on this ?
> >
>
> It'd be great to have some sort of local ARP cache but I'm concerned about
> the performance implications.
>
> - How are you going to determine when an entry is stale?
> If you slow path the packets to reset the timeout everytime a pkt with
> source mac is received, it doesn't look good. Maybe you have something else
> in mind.

Right now we don't stale any mac_binding entry. If I understand you
correctly, your concern
is for the scenario where a floating ip is updated with a different
mac, how the local cache is updated ?

Right now networking-ovn (in the case of openstack) updates the
mac_binding entry in the South db
for such cases right ?

Thanks
Numan

>
> -
>
> > >
> > There's another scenario that we need to take care of and doesn't seem
> > too obvious to address without MAC_Bindings.
> >
> > GARPs were being injected in the L2 broadcast domain of a LS for nat
> > addresses in case FIPs are reused by the CMS, introduced by:
> >
> >
> > https://github.com/ovn-org/ovn/commit/069a32cbf443c937feff44078e8828d7a2702da8
>
>
> Dumitru and I have been discussing the possibility of reverting this patch
> and rely on CMSs to maintain the MAC_Binding entries associated with the
> FIPs [0].
> I'm against reverting this patch in OVN [1] for multiple reasons being the
> most important one the fact that if we rely on workarounds in the CMS side,
> we'll be creating a control plane dependency for something that is pure
> dataplane only (ie. if Neutron server is down - outage, upgrades, etc. -,
> traffic is going to be disrupted). On the other hand one could argue that
> the same dependency now exists on ovn-controller being up & running but I
> believe that this is better than a) relying on workarounds on CMSs b)
> relying on CMSs availability.
>
> In the short term I think that moving the MAC_Binding entries to LS instead
> of LRP as it was suggested up thread would be a good idea and in the long
> haul, the ARP *local* cache seems to be the right solution. Brainstorming
> with Dumitru he suggested inspecting the flows regularly to see if the
> packet count on flows that check if src_mac == X has not increased in a
> while and then remove the ARP responder flows locally.
>
> [0]
> https://github.com/openstack/networking-ovn/commit/5181f1106ff839d08152623c25c9a5f6797aa2d7
>
> [1]
> https://github.com/ovn-org/ovn/commit/069a32cbf443c937feff44078e8828d7a2702da8
>
> >
> >
> > Recently, due to the dataplane scaling issue (4K resubmit limit being
> > hit), we don't flood these packets on non-router ports and instead
> > create the MAC Bindings directly from ovn-controller:
> >
> >
> > https://github.com/ovn-org/ovn/commit/a2b88dc5136507e727e4bcdc4bf6fde559f519a9
> >
> > Without the MAC_Binding table we'd need to find a way to update or flush
> > stale bindings when an IP is used for a VIF or FIP.
> >
> > Thanks,
> > Dumitru
> >
> > _______________________________________________
> > dev mailing list
> > dev at openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
> >
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>


More information about the dev mailing list