[ovs-dev] RFC: OVN database options

Ryan Moats rmoats at us.ibm.com
Thu Mar 10 19:14:41 UTC 2016


"dev" <dev-bounces at openvswitch.org> wrote on 03/10/2016 01:11:09 AM:

> From: Ben Pfaff <blp at ovn.org>
> To: dev at openvswitch.org
> Date: 03/10/2016 01:31 AM
> Subject: [ovs-dev] RFC: OVN database options
> Sent by: "dev" <dev-bounces at openvswitch.org>
>
> Requirements
> ============
>
> OVN uses two databases, the "northbound" and "southbound" databases,
> in a somewhat idiosyncratic manner.  Each client of one of these
> databases maintains an in-memory replica of the database (or some
> subset of it), and the server sends it updates to this replica as they
> are committed.  Thus, at any given time, a client has a consistent
> snapshot of the database, although it might be old if the database has
> changed but the updates have not yet made it from the server to the
> client.
>
> Beyond supporting this usage model, the basic requirements for the OVN
> use case are
>
>     - Size: 20 MB to 100 MB of data (estimated database size to hold
>       data for our target scale of 1,000 hypervisors and 20,000
>       logical ports).
>
>     - Scale: The northbound database has only a single-digit number of
>       clients.  Each hypervisor is a client to the southbound
>       database, so about 1,000 clients for our target scale of 1,000
>       hypervisors.
>
>     - Performance: Hundreds of transactions per second.  (Because of
>       the usage model described above, all transactions are write
>       transactions; clients read from their local replicas.)
>
>     - Transactions: Clients expect atomic, consistent, isolated
>       transactions.
>
>       Durability is not essential, because the clients will reissue
>       lost transactions (up to and including completely refilling an
>       empty database, although this can be slow).
>
>     - High availability: If the database server goes down, then this
>       freezes the OVN configuration.  This is OK briefly for running
>       clients--the existing configuration continues to work, it just
>       can't be updated--but it prevents new clients or clients that
>       restart from using OVN at all.
>
>       For the same reason that durability is not essential, it is
>       acceptable if an occasional fail-over between database servers
>       loses a few transactions, though of course it's best to minimize
>       the probability and the amount of data lost.
>
>     - Open source.  Some "open source" databases only provide high
>       availability and transactions as proprietary extensions; that's
>       undesirable.
>
> Desirable features:
>
>     - C client, since OVN is written in C; otherwise, we'll likely
>       have to write one.  (We've had suggestions that OVN should be
>       written in another language, such as Java, but we have not
>       decided to change the language yet.)
>
>     - Python client, since OVS includes tools written in Python.
>
>     - Table structured.  We could layer tables on top of a key-value
>       store if necessary.
>
>     - Schema support, with referential integrity constraints.  We find
>       this helpful for increasing our confidence in the system.  This
>       is something that we could leave out or layer on top.
>
>     - Network protocol.  Some databases are just designed for local
>       access.  If such a database were otherwise just right, we could
>       wrap it for distributed use.  The analysis below mostly ignores
>       databases that are local-only or in which remote access appears
>       to be an afterthought.
>
>
> Options
> =======
>
> Each entry has the columns listed below.  In general, all-caps answers
> are problematic for the OVN use case.
>
>     - Database: The database being evaluated.
>
>     - txn: "yes" if the database supports transactions across
>       arbitrary data, "NO" if its transactions are limited to a single
>       data item, such as a single key-value pair, or perhaps even more
>       limited.
>
>     - ACID: The transactional properties that the database supports,
>       within the transactions that the database supports.  (Thus, a
>       database whose transactions cover only a single data item can be
>       listed as ACID, but this is only for those limited
>       transactions.)
>
>     - consist: The distributed consistency model that the database
>       supports, one of "strong" for strong or linearizable
>       consistency, "tunable" for consistency that can be tuned to be
>       strong or linearizable or weaker, or "EVNTUAL" for eventual
>       consistency.
>
>     - trk: "yes" if the database can automatically report data changes
>       to clients, "NO" if the database requires clients to poll for
>       changes.
>
>     - HA: "yes" if the database can be configured for high
>       availability, so that loss of a single node does not stop
>       database activity, "NO" otherwise.
>
>     - OS: "yes" if the database is open source or free (libre)
>       software, "NO" if it is proprietary.  When a database has open
>       source and proprietary editions, this is "yes" and only the
>       features in the open source edition are credited in other
>       columns.
>
>     - C: "yes" if the database has a C (not C++) client library, "NO"
>       otherwise.
>
>     - Python: "yes" if the database has a Python client library, "NO"
>       otherwise.
>
>     - format: The database's data model.  "sql", "db", "table",
>       "multi" all indicate that OVN could directly use the data model,
>       "KV" or "JSON" that OVN's data model would have to be overlaid
>       on it.
>
> Database       txn  ACID  consist  trk   HA   OS    C   Py  format
> -------------  ---  ----  -------  ---  ---  ---  ---  ---  ------
> ActorDB        yes  ACID   strong   NO  yes  yes  yes  yes     sql
> Aerospike      yes  ACID   strong   NO  yes  yes  yes  yes   db/KV
> Cassandra       NO  -C-D  tunable   NO  yes  yes   NO  yes   table
> Cockroach DB   yes  ACID   strong   NO  yes  yes   ?    ?      sql
> Couchbase       NO  ????     ????   NO  yes  NO?  yes  yes    JSON
> CrateIO         NO  ????  EVNTUAL   NO  yes  yes   NO  yes     sql
> etcd            NO  ACID   strong  yes? yes  yes  yes  yes      KV
> Gigaspaces XAP yes  ACID   strong  yes  yes   NO   NO   NO   multi
> HBase           NO  ACID   strong   NO  yes  yes   NO  yes   table
> Hyperdex       yes  ACID   strong   NO  yes   NO  yes  yes      KV
> Hypertable      NO  ????     ????   NO  yes  yes   NO  yes   table
> MongoDB         NO  ACID   strong   ??  yes  yes  yes  yes    JSON
> RAMCloud       yes  ????   strong   NO  yes  yes   NO  yes      KV
> Redis          yes  -C?D     ????   NO  yes  yes  yes  yes      KV
> Riak            NO  ---D  EVNTUAL   NO  yes  yes  yes  yes      KV
> Scalaris       yes  ACI-   strong   NO  yes  yes   NO  yes      KV
> ScyllaDB        NO  -C-D  tunable   NO  yes  yes   NO  yes   table
> Voldemort       NO  ????  EVNTUAL   NO  yes  yes   NO  yes      KV
> Zookeeper      yes  AC-D   strong  yes  yes  yes  yes  yes      KV
>
> OVSDB          yes  ACID   strong  yes   NO  yes  yes  yes   table
>
>
> Analysis
> ========
>
> The most troublesome part of the OVN use case is the idiosyncratic use
> of the database to maintain state, immediately distributing changes to
> all of the clients.  As the "trk" column above shows, most databases
> don't support this mode of operation.  Possibly this means that OVN is
> misusing the concept of a database and should be redesigned not to use
> a database; if so, that's a bigger discussion.
>
> Assuming that we wish to retain this requirement, then only the
> following databases appear to support the feature to an acceptable
> extent:
>
>     - etcd.  etcd appears to allow clients to receive a notification
>       when keys change.  A client might be able to bootstrap
>       monitoring of entire tables on top of this feature.  Perhaps
>       this would require registering for notification separately on
>       all of the keys that would be used to simulate a table on top of
>       the etcd key-value store; if so, that would probably be
>       unreasonable.  Assuming that is not a problem or can be
>       overcome, it would also be necessary to make sure that the new
>       values of all of the modified keys could be obtained in a way
>       such that the client's view reflects a consistent snapshot of
>       the database contents.
>
>     - Gigaspaces XAP.  Not open source.
>
>     - Zookeeper.  The issues here are similar to those for etcd.
>       Also, Zookeeper transactions don't seem to be isolated.
>
>     - OVSDB.  If we choose to use OVSDB, we'll have to add
>       high-availability support.  Also, the table doesn't mention
>       scaling, since it's hard to compare objectively, but the OVSDB
>       server currently doesn't scale well to the 1000 clients required
>       for the southbound database, although Andy has started working
>       on that.
>
>
> Recommendation
> ==============
>
> I'm intentionally not offering a recommendation, because I want to start
> a discussion.

These are all great requirements, but I suspect (if I put my operator hat
on)
that I'll have a few others, or will order the above differently than
another
person reading this email would.  This brings me to the meta-question:

Do we want to be *in* the DB business?  I think the answer is no, which
means
we should be doing the work to *not* be in the DB business - refactoring
the
IDL to allow different DBs to be attached while ensuring that the
requirements
above are still met.  We can then have project code that allows various DBs
to be plugged in (ovsdb would be one of them certainly) and then we've
moved
to a place where we can say "here's different DBs that ovn works with, if
you want a different one, here's how to connect it..."

Ryan (regXboi)





More information about the dev mailing list