[ovs-discuss] OVN Database sizes - Auto compact feature

Mark Michelson mmichels at redhat.com
Thu Mar 8 22:10:34 UTC 2018


On 03/08/2018 02:54 PM, Daniel Alvarez Sanchez wrote:
> I agree with you Mark. I tried to check how much it would shrink with 
> 1800 ports in the system:
> 
> [stack at ovn ovs]$ sudo ovn-nbctl list Logical_Switch_Port | grep uuid | wc -l
> 1809
> [stack at ovn ovs]$ sudo ovn-sbctl list Logical_Flow | grep uuid | wc -l
> 50780
> [stack at ovn ovs]$ ls -alh ovn*.db
> -rw-r--r--. 1 stack stack 15M Mar  8 15:56 ovnnb_db.db
> -rw-r--r--. 1 stack stack 61M Mar  8 15:56 ovnsb_db.db
> [stack at ovn ovs]$ sudo ovs-appctl -t 
> /usr/local/var/run/openvswitch/ovnsb_db.ctl ovsdb-server/compact
> [stack at ovn ovs]$ sudo ovs-appctl -t 
> /usr/local/var/run/openvswitch/ovnnb_db.ctl ovsdb-server/compact
> [stack at ovn ovs]$ ls -alh ovn*.db
> -rw-r--r--. 1 stack stack 5.8M Mar  8 20:45 ovnnb_db.db
> -rw-r--r--. 1 stack stack  23M Mar  8 20:45 ovnsb_db.db
> 
> As you can see, with ~50K lflows, the database min size would be ~23M 
> while the NB database
> is much smaller. Still I think we need to do something to not allow 
> delay the compact task to
> kick in this much unnecessarily. Or maybe we want some sort of 
> configuration (ie. normal, aggressive,...)
> for this since in some situations it may help to have the full log of 
> the DB (although this can be
> achieved through periodic backups :?). That said, I'm not a big fan of 
> such configs but...
> 

I'm also not a big fan of that sort of configuration. Based on Ben's 
replies here, I like the idea of being more aggressive with the 
compacting. The two ideas proposed here, compact at double the size 
instead of 4x and ensure a compact happens once every 24 hours, sound 
like good mitigations to me.

> 
> 
> On Thu, Mar 8, 2018 at 9:31 PM, Mark Michelson <mmichels at redhat.com 
> <mailto:mmichels at redhat.com>> wrote:
> 
>     Most of the data in this thread has been pretty easily explainable
>     based on what I've seen in the code compared with the nature of the
>     data in the southbound database.
> 
>     The southbound database tends to have more data in it than other
>     databases in OVS, due especially to the Logical_Flow table. The
>     result is that auto shrinking of the database does not shrink it
>     down by as much as other databases. You can see in Daniel's graphs
>     that each time the southbound database is shrunk, its "base" size
>     ends up noticeably larger than it previously was.
> 
>     Couple that with the fact that the database has to increase to 4x
>     its previous snapshot size in order to be shrunk, and you can end up
>     with a situation after a while where the "shrunk" southbound
>     database is 750MB, and it won't shrink again until it exceeds 3GB.
> 
>     To fix this, I think there are a few things that can be done:
> 
>     * Somehow make the southbound database have less data in it. I don't
>     have any real good ideas for how to do this, and doing this in a
>     backwards-compatible way will be difficult.
> 
>     * Ease the requirements for shrinking a database. For instance, once
>     the database reaches a certain size, maybe it doesn't need to grow
>     by 4x in order to be a candidate for shrinking. Maybe it only needs
>     to double in size. Or, there could be some time cutoff where the
>     database always will be shrunk. So for instance, every hour, always
>     shrink the database, no matter how much activity has occurred in it
>     (okay, maybe not if there have been 0 transactions).
> 
> 
> Maybe we can just do the the shrink if the last compact took place >24h 
> ago regardless of the other conditions.
> I can send a patch for this if you guys like the idea. It's some sort of 
> "cleanup task" just in case and seems harmless.
> What do you say?
> 
> 
> 
>     On 03/07/2018 02:50 PM, Ben Pfaff wrote:
> 
>         OK.
> 
>         I guess we need to investigate this issue from the basics.
> 
>         On Wed, Mar 07, 2018 at 09:02:02PM +0100, Daniel Alvarez Sanchez
>         wrote:
> 
>             With OVS 2.8 branch it never shrank when I started to delete
>             the ports since
>             the DB sizes didn't grow, which makes sense to me. The
>             conditions weren't
>             met for further compaction.
>             See attached image.
> 
>             NB:
>             2018-03-07T18:25:49.269Z|00009|ovsdb_file|INFO|/opt/stack/data/ovs/ovnnb_db.db:
>             compacting database online (647.317 seconds old, 436
>             transactions, 10505382
>             bytes)
>             2018-03-07T18:35:51.414Z|00012|ovsdb_file|INFO|/opt/stack/data/ovs/ovnnb_db.db:
>             compacting database online (602.089 seconds old, 431
>             transactions, 29551917
>             bytes)
>             2018-03-07T18:45:52.263Z|00015|ovsdb_file|INFO|/opt/stack/data/ovs/ovnnb_db.db:
>             compacting database online (600.563 seconds old, 463
>             transactions, 52843231
>             bytes)
>             2018-03-07T18:55:53.810Z|00016|ovsdb_file|INFO|/opt/stack/data/ovs/ovnnb_db.db:
>             compacting database online (601.128 seconds old, 365
>             transactions, 57618931
>             bytes)
> 
> 
>             SB:
>             2018-03-07T18:33:24.927Z|00009|ovsdb_file|INFO|/opt/stack/data/ovs/ovnsb_db.db:
>             compacting database online (1102.840 seconds old, 775
>             transactions,
>             10505486 bytes)
>             2018-03-07T18:43:27.569Z|00012|ovsdb_file|INFO|/opt/stack/data/ovs/ovnsb_db.db:
>             compacting database online (602.394 seconds old, 445
>             transactions, 15293972
>             bytes)
>             2018-03-07T18:53:31.664Z|00015|ovsdb_file|INFO|/opt/stack/data/ovs/ovnsb_db.db:
>             compacting database online (603.605 seconds old, 385
>             transactions, 19282371
>             bytes)
>             2018-03-07T19:03:42.116Z|00031|ovsdb_file|INFO|/opt/stack/data/ovs/ovnsb_db.db:
>             compacting database online (607.542 seconds old, 371
>             transactions, 23538784
>             bytes)
> 
> 
> 
> 
>             On Wed, Mar 7, 2018 at 7:18 PM, Daniel Alvarez Sanchez
>             <dalvarez at redhat.com <mailto:dalvarez at redhat.com>>
>             wrote:
> 
>                 No worries, I just triggered the test now running OVS
>                 compiled out of
>                 2.8 branch (2.8.3). I'll post the results and
>                 investigate too.
> 
>                 I have just sent a patch to fix the timing issue we can
>                 see in the traces I
>                 posted. I applied it and it works, I believe it's good
>                 to fix as it gives
>                 us
>                 an idea of how frequent the compact is, and also to
>                 backport if you
>                 agree with it.
> 
>                 Thanks!
> 
>                 On Wed, Mar 7, 2018 at 7:13 PM, Ben Pfaff <blp at ovn.org
>                 <mailto:blp at ovn.org>> wrote:
> 
>                     OK, thanks.
> 
>                     If this is a lot of trouble, let me know and I'll
>                     investigate directly
>                     instead of on the basis of a suspected regression.
> 
>                     On Wed, Mar 07, 2018 at 07:06:50PM +0100, Daniel
>                     Alvarez Sanchez wrote:
> 
>                         All right, I'll repeat it with code in branch-2.8.
>                         Will post the results once the test finishes.
>                         Daniel
> 
>                         On Wed, Mar 7, 2018 at 7:03 PM, Ben Pfaff
>                         <blp at ovn.org <mailto:blp at ovn.org>> wrote:
> 
>                             On Wed, Mar 07, 2018 at 05:53:15PM +0100,
>                             Daniel Alvarez Sanchez
> 
>                     wrote:
> 
>                                 Repeated the test with 1000 ports this
>                                 time. See attached image.
>                                 For some reason, the sizes grow while
>                                 deleting the ports (the
>                                 deletion task starts at around x=2500).
>                                 The weird thing is why
>                                 they keep growing and the online compact
>                                 doesn't work as when
>                                 I do it through ovs-appctl tool.
> 
>                                 I suspect this is a bug and eventually
>                                 it will grow and grow unless
>                                 we manually compact the db.
> 
> 
>                             Would you mind trying out an older
>                             ovsdb-server, for example the one
>                             from OVS 2.8?  Some of the logic in
>                             ovsdb-server around compaction
>                             changed in OVS 2.9, so it would be nice to
>                             know whether this was a
>                             regression or an existing bug.
> 
> 
> 
> 
> 
> 
> 
> 



More information about the discuss mailing list