[ovs-dev] RFC: OVN database options

Dan Mihai Dumitriu dmd17 at cornell.edu
Thu Mar 10 16:39:41 UTC 2016


On Fri, Mar 11, 2016 at 12:55 AM, Dan Mihai Dumitriu <dmd17 at cornell.edu>
wrote:

> Great writeup Ben.
>
> The NB DB does need HA and ACID transactions, but it has few clients, so
> it's probably not a very hard problem - could even use BDB with log
> shipping -
> http://www.oracle.com/technetwork/database/database-technologies/berkeleydb/overview/index-085366.html
> .
>
> However, one more potential requirement for the NB DB is secondary
> indices, because the NB clients may expect to query the NB models in
> various ways that weren't considered a priori. I bring this up because in
> the OpenStack context the NB DB could be used to store the Neutron data
> model entirely, thus obviating the need for the Neutron DB, and eliminating
> the "syncing problem" between Neutron and the NB DB. I could see the same
> applying in the context of containers.
>

My colleague Ivan pointed out that ZK could be used for the NB DB. I think
that could be a reasonable choice actually.



> Regarding the SB DB, as Liran pointed out, it doesn't necessarily need
> durable persistence. It would be possible to make the whole thing work with
> an in memory SB DB. (I am waiting for you to start shooting holes in this
> hypothesis, but I'm reasonably confident those holes can be filled.) That
> said, it does need to be replicated for HA - luckily the replication of an
> in memory data structure is easier and more performant than that of a
> durably persistent data structure. In order to support efficient syncing
> with clients (ovn-controller agents) the in memory replication should be a
> form of log shipping, so that clients that disconnect from one SB DB
> instance and reconnect to different SB DB instance can do a resync without
> a full table download. Is this premature optimization?
>
> On Thu, Mar 10, 2016 at 4:11 PM, Ben Pfaff <blp at ovn.org> wrote:
>
>> Requirements
>> ============
>>
>> OVN uses two databases, the "northbound" and "southbound" databases,
>> in a somewhat idiosyncratic manner.  Each client of one of these
>> databases maintains an in-memory replica of the database (or some
>> subset of it), and the server sends it updates to this replica as they
>> are committed.  Thus, at any given time, a client has a consistent
>> snapshot of the database, although it might be old if the database has
>> changed but the updates have not yet made it from the server to the
>> client.
>>
>> Beyond supporting this usage model, the basic requirements for the OVN
>> use case are:
>>
>>     - Size: 20 MB to 100 MB of data (estimated database size to hold
>>       data for our target scale of 1,000 hypervisors and 20,000
>>       logical ports).
>>
>>     - Scale: The northbound database has only a single-digit number of
>>       clients.  Each hypervisor is a client to the southbound
>>       database, so about 1,000 clients for our target scale of 1,000
>>       hypervisors.
>>
>>     - Performance: Hundreds of transactions per second.  (Because of
>>       the usage model described above, all transactions are write
>>       transactions; clients read from their local replicas.)
>>
>>     - Transactions: Clients expect atomic, consistent, isolated
>>       transactions.
>>
>>       Durability is not essential, because the clients will reissue
>>       lost transactions (up to and including completely refilling an
>>       empty database, although this can be slow).
>>
>>     - High availability: If the database server goes down, then this
>>       freezes the OVN configuration.  This is OK briefly for running
>>       clients--the existing configuration continues to work, it just
>>       can't be updated--but it prevents new clients or clients that
>>       restart from using OVN at all.
>>
>>       For the same reason that durability is not essential, it is
>>       acceptable if an occasional fail-over between database servers
>>       loses a few transactions, though of course it's best to minimize
>>       the probability and the amount of data lost.
>>
>>     - Open source.  Some "open source" databases only provide high
>>       availability and transactions as proprietary extensions; that's
>>       undesirable.
>>
>> Desirable features:
>>
>>     - C client, since OVN is written in C; otherwise, we'll likely
>>       have to write one.  (We've had suggestions that OVN should be
>>       written in another language, such as Java, but we have not
>>       decided to change the language yet.)
>>
>>     - Python client, since OVS includes tools written in Python.
>>
>>     - Table structured.  We could layer tables on top of a key-value
>>       store if necessary.
>>
>>     - Schema support, with referential integrity constraints.  We find
>>       this helpful for increasing our confidence in the system.  This
>>       is something that we could leave out or layer on top.
>>
>>     - Network protocol.  Some databases are just designed for local
>>       access.  If such a database were otherwise just right, we could
>>       wrap it for distributed use.  The analysis below mostly ignores
>>       databases that are local-only or in which remote access appears
>>       to be an afterthought.
>>
>>
>> Options
>> =======
>>
>> Each entry has the columns listed below.  In general, all-caps answers
>> are problematic for the OVN use case.
>>
>>     - Database: The database being evaluated.
>>
>>     - txn: "yes" if the database supports transactions across
>>       arbitrary data, "NO" if its transactions are limited to a single
>>       data item, such as a single key-value pair, or perhaps even more
>>       limited.
>>
>>     - ACID: The transactional properties that the database supports,
>>       within the transactions that the database supports.  (Thus, a
>>       database whose transactions cover only a single data item can be
>>       listed as ACID, but this is only for those limited
>>       transactions.)
>>
>>     - consist: The distributed consistency model that the database
>>       supports, one of "strong" for strong or linearizable
>>       consistency, "tunable" for consistency that can be tuned to be
>>       strong or linearizable or weaker, or "EVNTUAL" for eventual
>>       consistency.
>>
>>     - trk: "yes" if the database can automatically report data changes
>>       to clients, "NO" if the database requires clients to poll for
>>       changes.
>>
>>     - HA: "yes" if the database can be configured for high
>>       availability, so that loss of a single node does not stop
>>       database activity, "NO" otherwise.
>>
>>     - OS: "yes" if the database is open source or free (libre)
>>       software, "NO" if it is proprietary.  When a database has open
>>       source and proprietary editions, this is "yes" and only the
>>       features in the open source edition are credited in other
>>       columns.
>>
>>     - C: "yes" if the database has a C (not C++) client library, "NO"
>>       otherwise.
>>
>>     - Python: "yes" if the database has a Python client library, "NO"
>>       otherwise.
>>
>>     - format: The database's data model.  "sql", "db", "table",
>>       "multi" all indicate that OVN could directly use the data model,
>>       "KV" or "JSON" that OVN's data model would have to be overlaid
>>       on it.
>>
>> Database       txn  ACID  consist  trk   HA   OS    C   Py  format
>> -------------  ---  ----  -------  ---  ---  ---  ---  ---  ------
>> ActorDB        yes  ACID   strong   NO  yes  yes  yes  yes     sql
>> Aerospike      yes  ACID   strong   NO  yes  yes  yes  yes   db/KV
>> Cassandra       NO  -C-D  tunable   NO  yes  yes   NO  yes   table
>> Cockroach DB   yes  ACID   strong   NO  yes  yes   ?    ?      sql
>> Couchbase       NO  ????     ????   NO  yes  NO?  yes  yes    JSON
>> CrateIO         NO  ????  EVNTUAL   NO  yes  yes   NO  yes     sql
>> etcd            NO  ACID   strong  yes? yes  yes  yes  yes      KV
>> Gigaspaces XAP yes  ACID   strong  yes  yes   NO   NO   NO   multi
>> HBase           NO  ACID   strong   NO  yes  yes   NO  yes   table
>> Hyperdex       yes  ACID   strong   NO  yes   NO  yes  yes      KV
>> Hypertable      NO  ????     ????   NO  yes  yes   NO  yes   table
>> MongoDB         NO  ACID   strong   ??  yes  yes  yes  yes    JSON
>> RAMCloud       yes  ????   strong   NO  yes  yes   NO  yes      KV
>> Redis          yes  -C?D     ????   NO  yes  yes  yes  yes      KV
>> Riak            NO  ---D  EVNTUAL   NO  yes  yes  yes  yes      KV
>> Scalaris       yes  ACI-   strong   NO  yes  yes   NO  yes      KV
>> ScyllaDB        NO  -C-D  tunable   NO  yes  yes   NO  yes   table
>> Voldemort       NO  ????  EVNTUAL   NO  yes  yes   NO  yes      KV
>> Zookeeper      yes  AC-D   strong  yes  yes  yes  yes  yes      KV
>>
>> OVSDB          yes  ACID   strong  yes   NO  yes  yes  yes   table
>>
>>
>> Analysis
>> ========
>>
>> The most troublesome part of the OVN use case is the idiosyncratic use
>> of the database to maintain state, immediately distributing changes to
>> all of the clients.  As the "trk" column above shows, most databases
>> don't support this mode of operation.  Possibly this means that OVN is
>> misusing the concept of a database and should be redesigned not to use
>> a database; if so, that's a bigger discussion.
>>
>> Assuming that we wish to retain this requirement, then only the
>> following databases appear to support the feature to an acceptable
>> extent:
>>
>>     - etcd.  etcd appears to allow clients to receive a notification
>>       when keys change.  A client might be able to bootstrap
>>       monitoring of entire tables on top of this feature.  Perhaps
>>       this would require registering for notification separately on
>>       all of the keys that would be used to simulate a table on top of
>>       the etcd key-value store; if so, that would probably be
>>       unreasonable.  Assuming that is not a problem or can be
>>       overcome, it would also be necessary to make sure that the new
>>       values of all of the modified keys could be obtained in a way
>>       such that the client's view reflects a consistent snapshot of
>>       the database contents.
>>
>>     - Gigaspaces XAP.  Not open source.
>>
>>     - Zookeeper.  The issues here are similar to those for etcd.
>>       Also, Zookeeper transactions don't seem to be isolated.
>>
>>     - OVSDB.  If we choose to use OVSDB, we'll have to add
>>       high-availability support.  Also, the table doesn't mention
>>       scaling, since it's hard to compare objectively, but the OVSDB
>>       server currently doesn't scale well to the 1000 clients required
>>       for the southbound database, although Andy has started working
>>       on that.
>>
>>
>> Recommendation
>> ==============
>>
>> I'm intentionally not offering a recommendation, because I want to start
>> a discussion.
>> _______________________________________________
>> dev mailing list
>> dev at openvswitch.org
>> http://openvswitch.org/mailman/listinfo/dev
>>
>
>



More information about the dev mailing list