[ovs-dev] RFC: OVN database options
Dan Mihai Dumitriu
dmd17 at cornell.edu
Thu Mar 10 16:39:41 UTC 2016
On Fri, Mar 11, 2016 at 12:55 AM, Dan Mihai Dumitriu <dmd17 at cornell.edu>
wrote:
> Great writeup Ben.
>
> The NB DB does need HA and ACID transactions, but it has few clients, so
> it's probably not a very hard problem - could even use BDB with log
> shipping -
> http://www.oracle.com/technetwork/database/database-technologies/berkeleydb/overview/index-085366.html
> .
>
> However, one more potential requirement for the NB DB is secondary
> indices, because the NB clients may expect to query the NB models in
> various ways that weren't considered a priori. I bring this up because in
> the OpenStack context the NB DB could be used to store the Neutron data
> model entirely, thus obviating the need for the Neutron DB, and eliminating
> the "syncing problem" between Neutron and the NB DB. I could see the same
> applying in the context of containers.
>
My colleague Ivan pointed out that ZK could be used for the NB DB. I think
that could be a reasonable choice actually.
> Regarding the SB DB, as Liran pointed out, it doesn't necessarily need
> durable persistence. It would be possible to make the whole thing work with
> an in memory SB DB. (I am waiting for you to start shooting holes in this
> hypothesis, but I'm reasonably confident those holes can be filled.) That
> said, it does need to be replicated for HA - luckily the replication of an
> in memory data structure is easier and more performant than that of a
> durably persistent data structure. In order to support efficient syncing
> with clients (ovn-controller agents) the in memory replication should be a
> form of log shipping, so that clients that disconnect from one SB DB
> instance and reconnect to different SB DB instance can do a resync without
> a full table download. Is this premature optimization?
>
> On Thu, Mar 10, 2016 at 4:11 PM, Ben Pfaff <blp at ovn.org> wrote:
>
>> Requirements
>> ============
>>
>> OVN uses two databases, the "northbound" and "southbound" databases,
>> in a somewhat idiosyncratic manner. Each client of one of these
>> databases maintains an in-memory replica of the database (or some
>> subset of it), and the server sends it updates to this replica as they
>> are committed. Thus, at any given time, a client has a consistent
>> snapshot of the database, although it might be old if the database has
>> changed but the updates have not yet made it from the server to the
>> client.
>>
>> Beyond supporting this usage model, the basic requirements for the OVN
>> use case are:
>>
>> - Size: 20 MB to 100 MB of data (estimated database size to hold
>> data for our target scale of 1,000 hypervisors and 20,000
>> logical ports).
>>
>> - Scale: The northbound database has only a single-digit number of
>> clients. Each hypervisor is a client to the southbound
>> database, so about 1,000 clients for our target scale of 1,000
>> hypervisors.
>>
>> - Performance: Hundreds of transactions per second. (Because of
>> the usage model described above, all transactions are write
>> transactions; clients read from their local replicas.)
>>
>> - Transactions: Clients expect atomic, consistent, isolated
>> transactions.
>>
>> Durability is not essential, because the clients will reissue
>> lost transactions (up to and including completely refilling an
>> empty database, although this can be slow).
>>
>> - High availability: If the database server goes down, then this
>> freezes the OVN configuration. This is OK briefly for running
>> clients--the existing configuration continues to work, it just
>> can't be updated--but it prevents new clients or clients that
>> restart from using OVN at all.
>>
>> For the same reason that durability is not essential, it is
>> acceptable if an occasional fail-over between database servers
>> loses a few transactions, though of course it's best to minimize
>> the probability and the amount of data lost.
>>
>> - Open source. Some "open source" databases only provide high
>> availability and transactions as proprietary extensions; that's
>> undesirable.
>>
>> Desirable features:
>>
>> - C client, since OVN is written in C; otherwise, we'll likely
>> have to write one. (We've had suggestions that OVN should be
>> written in another language, such as Java, but we have not
>> decided to change the language yet.)
>>
>> - Python client, since OVS includes tools written in Python.
>>
>> - Table structured. We could layer tables on top of a key-value
>> store if necessary.
>>
>> - Schema support, with referential integrity constraints. We find
>> this helpful for increasing our confidence in the system. This
>> is something that we could leave out or layer on top.
>>
>> - Network protocol. Some databases are just designed for local
>> access. If such a database were otherwise just right, we could
>> wrap it for distributed use. The analysis below mostly ignores
>> databases that are local-only or in which remote access appears
>> to be an afterthought.
>>
>>
>> Options
>> =======
>>
>> Each entry has the columns listed below. In general, all-caps answers
>> are problematic for the OVN use case.
>>
>> - Database: The database being evaluated.
>>
>> - txn: "yes" if the database supports transactions across
>> arbitrary data, "NO" if its transactions are limited to a single
>> data item, such as a single key-value pair, or perhaps even more
>> limited.
>>
>> - ACID: The transactional properties that the database supports,
>> within the transactions that the database supports. (Thus, a
>> database whose transactions cover only a single data item can be
>> listed as ACID, but this is only for those limited
>> transactions.)
>>
>> - consist: The distributed consistency model that the database
>> supports, one of "strong" for strong or linearizable
>> consistency, "tunable" for consistency that can be tuned to be
>> strong or linearizable or weaker, or "EVNTUAL" for eventual
>> consistency.
>>
>> - trk: "yes" if the database can automatically report data changes
>> to clients, "NO" if the database requires clients to poll for
>> changes.
>>
>> - HA: "yes" if the database can be configured for high
>> availability, so that loss of a single node does not stop
>> database activity, "NO" otherwise.
>>
>> - OS: "yes" if the database is open source or free (libre)
>> software, "NO" if it is proprietary. When a database has open
>> source and proprietary editions, this is "yes" and only the
>> features in the open source edition are credited in other
>> columns.
>>
>> - C: "yes" if the database has a C (not C++) client library, "NO"
>> otherwise.
>>
>> - Python: "yes" if the database has a Python client library, "NO"
>> otherwise.
>>
>> - format: The database's data model. "sql", "db", "table",
>> "multi" all indicate that OVN could directly use the data model,
>> "KV" or "JSON" that OVN's data model would have to be overlaid
>> on it.
>>
>> Database txn ACID consist trk HA OS C Py format
>> ------------- --- ---- ------- --- --- --- --- --- ------
>> ActorDB yes ACID strong NO yes yes yes yes sql
>> Aerospike yes ACID strong NO yes yes yes yes db/KV
>> Cassandra NO -C-D tunable NO yes yes NO yes table
>> Cockroach DB yes ACID strong NO yes yes ? ? sql
>> Couchbase NO ???? ???? NO yes NO? yes yes JSON
>> CrateIO NO ???? EVNTUAL NO yes yes NO yes sql
>> etcd NO ACID strong yes? yes yes yes yes KV
>> Gigaspaces XAP yes ACID strong yes yes NO NO NO multi
>> HBase NO ACID strong NO yes yes NO yes table
>> Hyperdex yes ACID strong NO yes NO yes yes KV
>> Hypertable NO ???? ???? NO yes yes NO yes table
>> MongoDB NO ACID strong ?? yes yes yes yes JSON
>> RAMCloud yes ???? strong NO yes yes NO yes KV
>> Redis yes -C?D ???? NO yes yes yes yes KV
>> Riak NO ---D EVNTUAL NO yes yes yes yes KV
>> Scalaris yes ACI- strong NO yes yes NO yes KV
>> ScyllaDB NO -C-D tunable NO yes yes NO yes table
>> Voldemort NO ???? EVNTUAL NO yes yes NO yes KV
>> Zookeeper yes AC-D strong yes yes yes yes yes KV
>>
>> OVSDB yes ACID strong yes NO yes yes yes table
>>
>>
>> Analysis
>> ========
>>
>> The most troublesome part of the OVN use case is the idiosyncratic use
>> of the database to maintain state, immediately distributing changes to
>> all of the clients. As the "trk" column above shows, most databases
>> don't support this mode of operation. Possibly this means that OVN is
>> misusing the concept of a database and should be redesigned not to use
>> a database; if so, that's a bigger discussion.
>>
>> Assuming that we wish to retain this requirement, then only the
>> following databases appear to support the feature to an acceptable
>> extent:
>>
>> - etcd. etcd appears to allow clients to receive a notification
>> when keys change. A client might be able to bootstrap
>> monitoring of entire tables on top of this feature. Perhaps
>> this would require registering for notification separately on
>> all of the keys that would be used to simulate a table on top of
>> the etcd key-value store; if so, that would probably be
>> unreasonable. Assuming that is not a problem or can be
>> overcome, it would also be necessary to make sure that the new
>> values of all of the modified keys could be obtained in a way
>> such that the client's view reflects a consistent snapshot of
>> the database contents.
>>
>> - Gigaspaces XAP. Not open source.
>>
>> - Zookeeper. The issues here are similar to those for etcd.
>> Also, Zookeeper transactions don't seem to be isolated.
>>
>> - OVSDB. If we choose to use OVSDB, we'll have to add
>> high-availability support. Also, the table doesn't mention
>> scaling, since it's hard to compare objectively, but the OVSDB
>> server currently doesn't scale well to the 1000 clients required
>> for the southbound database, although Andy has started working
>> on that.
>>
>>
>> Recommendation
>> ==============
>>
>> I'm intentionally not offering a recommendation, because I want to start
>> a discussion.
>> _______________________________________________
>> dev mailing list
>> dev at openvswitch.org
>> http://openvswitch.org/mailman/listinfo/dev
>>
>
>
More information about the dev
mailing list