[ovs-dev] [PATCH ovn 1/3] ovn: Add initial design documentation.

Justin Pettit jpettit at nicira.com
Thu Feb 26 22:22:20 UTC 2015


This looks like a great start. Let's get it in, and we can continue to refine it as necessary:

Acked-by: Justin Pettit <jpettit at nicira.com>

Would you please push this before the other patches in the series?

Thanks,

--Justin


> On Feb 25, 2015, at 9:13 PM, Ben Pfaff <blp at nicira.com> wrote:
> 
> This commit adds preliminary design documentation for Open Virtual Network,
> or OVN, a new OVS-based project to add support for virtual networking to
> OVS, initially with OpenStack integration.
> 
> This initial design has been influenced by many people, including (in
> alphabetical order) Aaron Rosen, Chris Wright, Gurucharan Shetty, Jeremy
> Stribling, Justin Pettit, Ken Duda, Kevin Benton, Kyle Mestery, Madhu
> Venugopal, Martin Casado, Natasha Gude, Pankaj Thakkar, Russell Bryant,
> Teemu Koponen, and Thomas Graf.  All blunders, however, are due to my own
> hubris.
> 
> Signed-off-by: Ben Pfaff <blp at nicira.com>
> ---
> v1->v2: Rebase.
> v2->v3:
>  - Multiple CMSes are possible.
>  - Whitespace and typo fixes.
>  - ovn.ovsschema: Gateway table is not a root table, other tables are.
>  - ovn.xml: Talk about deleting rows on HV shutdown.
>  - ovn-nb.xml: Clarify 'switch' column in ACL table.
>  - ovn-nb.ovssechma: A Logical_Router_Port is no longer a Logical_Port.
>  - ovn.xml: Add action for generating ARP.
>  - ovn-nb.xml: Add allow-related action for security group support.
> v3->v4:
>  - Add initial TODO list.
> v4->v5:
>  - TODO: Revise default tunnel encapsulation thoughts.
>  - TODO: Fill in a few details for Neutron plugin.
>  - ovn-architecture: Mention DHCP as desirable.
> ---
> Makefile.am                |   1 +
> configure.ac               |   3 +-
> ovn/TODO                   | 306 ++++++++++++++++++++++++++++
> ovn/automake.mk            |  77 +++++++
> ovn/ovn-architecture.7.xml | 339 +++++++++++++++++++++++++++++++
> ovn/ovn-controller.8.in    |  41 ++++
> ovn/ovn-nb.ovsschema       |  62 ++++++
> ovn/ovn-nb.xml             | 245 ++++++++++++++++++++++
> ovn/ovn.ovsschema          |  50 +++++
> ovn/ovn.xml                | 497 +++++++++++++++++++++++++++++++++++++++++++++
> 10 files changed, 1620 insertions(+), 1 deletion(-)
> create mode 100644 ovn/TODO
> create mode 100644 ovn/automake.mk
> create mode 100644 ovn/ovn-architecture.7.xml
> create mode 100644 ovn/ovn-controller.8.in
> create mode 100644 ovn/ovn-nb.ovsschema
> create mode 100644 ovn/ovn-nb.xml
> create mode 100644 ovn/ovn.ovsschema
> create mode 100644 ovn/ovn.xml
> 
> diff --git a/Makefile.am b/Makefile.am
> index 0480d20..699a580 100644
> --- a/Makefile.am
> +++ b/Makefile.am
> @@ -370,3 +370,4 @@ include tutorial/automake.mk
> include vtep/automake.mk
> include datapath-windows/automake.mk
> include datapath-windows/include/automake.mk
> +include ovn/automake.mk
> diff --git a/configure.ac b/configure.ac
> index d2d02ca..795f876 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -1,4 +1,4 @@
> -# Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2014 Nicira, Inc.
> +# Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015 Nicira, Inc.
> #
> # Licensed under the Apache License, Version 2.0 (the "License");
> # you may not use this file except in compliance with the License.
> @@ -182,6 +182,7 @@ dnl This makes sure that include/openflow gets created in the build directory.
> AC_CONFIG_COMMANDS([include/openflow/openflow.h.stamp])
> 
> AC_CONFIG_COMMANDS([utilities/bugtool/dummy], [:])
> +AC_CONFIG_COMMANDS([ovn/dummy], [:])
> 
> m4_ifdef([AM_SILENT_RULES], [AM_SILENT_RULES])
> 
> diff --git a/ovn/TODO b/ovn/TODO
> new file mode 100644
> index 0000000..e405c7c
> --- /dev/null
> +++ b/ovn/TODO
> @@ -0,0 +1,306 @@
> +* Flow match expression handling library.
> +
> +  ovn-controller is the primary user of flow match expressions, but
> +  the same syntax and I imagine the same code ought to be useful in
> +  ovn-nbd for ACL match expressions.
> +
> +** Definition of data structures to represent a match expression as a
> +   syntax tree.
> +
> +** Definition of data structures to represent variables (fields).
> +
> +   Fields need names and prerequisites.  Most fields are numeric and
> +   thus need widths.  We need also need a way to represent nominal
> +   fields (currently just logical port names).  It might be
> +   appropriate to associate fields directly with OXM/NXM code points;
> +   we have to decide whether we want OVN to use the OVS flow structure
> +   or work with OXM more directly.
> +
> +   Probably should be defined so that the data structure is also
> +   useful for references to fields in action parsing.
> +
> +** Lexical analysis.
> +
> +   Probably should be defined so that the lexer can be reused for
> +   parsing actions.
> +
> +** Parsing into syntax tree.
> +
> +** Semantic checking against variable definitions.
> +
> +** Applying prerequisites.
> +
> +** Simplification into conjunction-of-disjunctions (CoD) form.
> +
> +** Transformation from CoD form into OXM matches.
> +
> +* ovn-controller
> +
> +** Flow table handling in ovn-controller.
> +
> +   ovn-controller has to transform logical datapath flows from the
> +   database into OpenFlow flows.
> +
> +*** Definition (or choice) of data structure for flows and flow table.
> +
> +    It would be natural enough to use "struct flow" and "struct
> +    classifier" for this.  Maybe that is what we should do.  However,
> +    "struct classifier" is optimized for searches based on packet
> +    headers, whereas all we care about here can be implemented with a
> +    hash table.  Also, we may want to make it easy to add and remove
> +    support for fields without recompiling, which is not possible with
> +    "struct flow" or "struct classifier".
> +
> +    On the other hand, we may find that it is difficult to decide that
> +    two OXM flow matches are identical (to normalize them) without a
> +    lot of domain-specific knowledge that is already embedded in struct
> +    flow.  It's also going to be a pain to come up with a way to make
> +    anything other than "struct flow" work with the ofputil_*()
> +    functions for encoding and decoding OpenFlow.
> +
> +    It's also possible we could use struct flow without struct
> +    classifier.
> +
> +*** Assembling conjunctive flows from flow match expressions.
> +
> +    This transformation explodes logical datapath flows into multiple
> +    OpenFlow flow table entries, since a flow match expression in CoD
> +    form requires several OpenFlow flow table entries.  It also
> +    requires merging together OpenFlow flow tables entries that contain
> +    "conjunction" actions (really just concatenating their actions).
> +
> +*** Translating logical datapath port names into port numbers.
> +
> +    Logical ports are specified by name in logical datapath flows, but
> +    OpenFlow only works in terms of numbers.
> +
> +*** Translating logical datapath actions into OpenFlow actions.
> +
> +    Some of the logical datapath actions do not have natural
> +    representations as OpenFlow actions: they require
> +    packet-in/packet-out round trips through ovn-controller.  The
> +    trickiest part of that is going to be making sure that the
> +    packet-out resumes the control flow that was broken off by the
> +    packet-in.  That's tricky; we'll probably have to restrict control
> +    flow or add OVS features to make resuming in general possible.  Not
> +    sure which is better at this point.
> +
> +*** OpenFlow flow table synchronization.
> +
> +    The internal representation of the OpenFlow flow table has to be
> +    synced across the controller connection to OVS.  This probably
> +    boils down to the "flow monitoring" feature of OF1.4 which was then
> +    made available as a "standard extension" to OF1.3.  (OVS hasn't
> +    implemented this for OF1.4 yet, but the feature is based on a OVS
> +    extension to OF1.0, so it should be straightforward to add it.)
> +
> +    We probably need some way to catch cases where OVS and OVN don't
> +    see eye-to-eye on what exactly constitutes a flow, so that OVN
> +    doesn't waste a lot of CPU time hammering at OVS trying to install
> +    something that it's not going to do.
> +
> +*** Logical/physical translation.
> +
> +    When a packet comes into the integration bridge, the first stage of
> +    processing needs to translate it from a physical to a logical
> +    context.  When a packet leaves the integration bridge, the final
> +    stage of processing needs to translate it back into a physical
> +    context.  ovn-controller needs to populate the OpenFlow flows
> +    tables to do these translations.
> +
> +*** Determine how to split logical pipeline across physical nodes.
> +
> +    From the original OVN architecture document:
> +
> +    The pipeline processing is split between the ingress and egress
> +    transport nodes.  In particular, the logical egress processing may
> +    occur at either hypervisor.  Processing the logical egress on the
> +    ingress hypervisor requires more state about the egress vif's
> +    policies, but reduces traffic on the wire that would eventually be
> +    dropped.  Whereas, processing on the egress hypervisor can reduce
> +    broadcast traffic on the wire by doing local replication.  We
> +    initially plan to process logical egress on the egress hypervisor
> +    so that less state needs to be replicated.  However, we may change
> +    this behavior once we gain some experience writing the logical
> +    flows.
> +
> +    The split pipeline processing split will influence how tunnel keys
> +    are encoded.
> +
> +** Interaction with Open_vSwitch and OVN databases:
> +
> +*** Monitor VIFs attached to the integration bridge in Open_vSwitch.
> +
> +    In response to changes, add or remove corresponding rows in
> +    Bindings table in OVN.
> +
> +*** Populate Chassis row in OVN at startup.  Maintain Chassis row over time.
> +
> +    (Warn if any other Chassis claims the same IP address.)
> +
> +*** Remove Chassis and Bindings rows from OVN on exit.
> +
> +*** Monitor Chassis table in OVN.
> +
> +    Populate Port records for tunnels to other chassis into
> +    Open_vSwitch database.  As a scale optimization later on, one can
> +    populate only records for tunnels to other chassis that have
> +    logical networks in common with this one.
> +
> +*** Monitor Pipeline table in OVN, trigger flow table recomputation on change.
> +
> +** ovn-controller parameters and configuration.
> +
> +*** Tunnel encapsulation to publish.
> +
> +    Default: VXLAN? Geneve?
> +
> +*** Location of Open_vSwitch database.
> +
> +    We can probably use the same default as ovs-vsctl.
> +
> +*** Location of OVN database.
> +
> +    Probably no useful default.
> +
> +*** SSL configuration.
> +
> +    Can probably get this from Open_vSwitch database.
> +
> +* ovn-nbd
> +
> +** Monitor OVN_Northbound database, trigger Pipeline recomputation on change.
> +
> +** Translate each OVN_Northbound entity into Pipeline logical datapath flows.
> +
> +   We have to first sit down and figure out what the general
> +   translation of each entity is.  The original OVN architecture
> +   description at
> +   http://openvswitch.org/pipermail/dev/2015-January/050380.html had
> +   some sketches of these, but they need to be completed and
> +   elaborated.
> +
> +   Initially, the simplest way to do this is probably to write
> +   straight C code to do a full translation of the entire
> +   OVN_Northbound database into the format for the Pipeline table in
> +   the OVN database.  As scale increases, this will probably be too
> +   inefficient since a small change in OVN_Northbound requires a full
> +   recomputation.  At that point, we probably want to adopt a more
> +   systematic approach, such as something akin to the "nlog" system
> +   used in NVP (see Koponen et al. "Network Virtualization in
> +   Multi-tenant Datacenters", NSDI 2014).
> +
> +** Push logical datapath flows to Pipeline table.
> +
> +** Monitor OVN database Bindings table.
> +
> +   Sync rows in the OVN Bindings table to the "up" column in the
> +   OVN_Northbound database.
> +
> +* ovsdb-server
> +
> +  ovsdb-server should have adequate features for OVN but it probably
> +  needs work for scale and possibly for availability as deployments
> +  grow.  Here are some thoughts.
> +
> +  Andy Zhou is looking at these issues.
> +
> +** Scaling number of connections.
> +
> +   In typical use today a given ovsdb-server has only a single-digit
> +   number of simultaneous connections.  The OVN database will have a
> +   connection from every hypervisor.  This use case needs testing and
> +   probably coding work.  Here are some possible improvements.
> +
> +*** Reducing amount of data sent to clients.
> +
> +    Currently, whenever a row monitored by a client changes,
> +    ovsdb-server sends the client every monitored column in the row,
> +    even if only one column changes.  It might be valuable to reduce
> +    this only to the columns that changes.
> +
> +    Also, whenever a column changes, ovsdb-server sends the entire
> +    contents of the column.  It might be valuable, for columns that
> +    are sets or maps, to send only added or removed values or
> +    key-values pairs.
> +
> +    Currently, clients monitor the entire contents of a table.  It
> +    might make sense to allow clients to monitor only rows that
> +    satisfy specific criteria, e.g. to allow an ovn-controller to
> +    receive only Pipeline rows for logical networks on its hypervisor.
> +
> +*** Reducing redundant data and code within ovsdb-server.
> +
> +    Currently, ovsdb-server separately composes database update
> +    information to send to each of its clients.  This is fine for a
> +    small number of clients, but it wastes time and memory when
> +    hundreds of clients all want the same updates (as will be in the
> +    case in OVN).
> +
> +    (This is somewhat opposed to the idea of letting a client monitor
> +    only some rows in a table, since that would increase the diversity
> +    among clients.)
> +
> +*** Multithreading.
> +
> +    If it turns out that other changes don't let ovsdb-server scale
> +    adequately, we can multithread ovsdb-server.  Initially one might
> +    only break protocol handling into separate threads, leaving the
> +    actual database work serialized through a lock.
> +
> +** Increasing availability.
> +
> +   Database availability might become an issue.  The OVN system
> +   shouldn't grind to a halt if the database becomes unavailable, but
> +   it would become impossible to bring VIFs up or down, etc.
> +
> +   My current thought on how to increase availability is to add
> +   clustering to ovsdb-server, probably via the Raft consensus
> +   algorithm.  As an experiment, I wrote an implementation of Raft
> +   for Open vSwitch that you can clone from:
> +
> +       https://github.com/blp/ovs-reviews.git raft
> +
> +** Reducing startup time.
> +
> +   As-is, if ovsdb-server restarts, every client will fetch a fresh
> +   copy of the part of the database that it cares about.  With
> +   hundreds of clients, this could cause heavy CPU load on
> +   ovsdb-server and use excessive network bandwidth.  It would be
> +   better to allow incremental updates even across connection loss.
> +   One way might be to use "Difference Digests" as described in
> +   Epstein et al., "What's the Difference? Efficient Set
> +   Reconciliation Without Prior Context".  (I'm not yet aware of
> +   previous non-academic use of this technique.)
> +
> +* Miscellaneous:
> +
> +** Write ovn-nbctl utility.
> +
> +   The idea here is that we need a utility to act on the OVN_Northbound
> +   database in a way similar to a CMS, so that we can do some testing
> +   without an actual CMS in the picture.
> +
> +   No details yet.
> +
> +** Init scripts for ovn-controller (on HVs), ovn-nbd, OVN DB server.
> +
> +** Distribution packaging.
> +
> +* Not yet scoped:
> +
> +** Neutron plugin.
> +
> +*** Create stackforge/networking-ovn repository based on OpenStack's
> +cookiecutter git repo generator
> +
> +*** Document mappings between Neutron data model and the OVN northbound DB
> +
> +*** Create a Neutron ML2 mechanism driver that implements the mappings
> +on Neutron resource requests
> +
> +*** Add synchronization for when we need to sanity check that the OVN
> +northbound DB reflects the current state of the world as intended by
> +Neutron (needed for various failure scenarios)
> +
> +** Gateways.
> diff --git a/ovn/automake.mk b/ovn/automake.mk
> new file mode 100644
> index 0000000..a4951dc
> --- /dev/null
> +++ b/ovn/automake.mk
> @@ -0,0 +1,77 @@
> +# OVN schema and IDL
> +EXTRA_DIST += ovn/ovn.ovsschema
> +pkgdata_DATA += ovn/ovn.ovsschema
> +
> +# OVN E-R diagram
> +#
> +# If "python" or "dot" is not available, then we do not add graphical diagram
> +# to the documentation.
> +if HAVE_PYTHON
> +if HAVE_DOT
> +ovn/ovn.gv: ovsdb/ovsdb-dot.in ovn/ovn.ovsschema
> +    $(AM_V_GEN)$(OVSDB_DOT) --no-arrows $(srcdir)/ovn/ovn.ovsschema > $@
> +ovn/ovn.pic: ovn/ovn.gv ovsdb/dot2pic
> +    $(AM_V_GEN)(dot -T plain < ovn/ovn.gv | $(PERL) $(srcdir)/ovsdb/dot2pic -f 3) > $@.tmp && \
> +    mv $@.tmp $@
> +OVN_PIC = ovn/ovn.pic
> +OVN_DOT_DIAGRAM_ARG = --er-diagram=$(OVN_PIC)
> +DISTCLEANFILES += ovn/ovn.gv ovn/ovn.pic
> +endif
> +endif
> +
> +# OVN schema documentation
> +EXTRA_DIST += ovn/ovn.xml
> +DISTCLEANFILES += ovn/ovn.5
> +man_MANS += ovn/ovn.5
> +ovn/ovn.5: \
> +    ovsdb/ovsdb-doc ovn/ovn.xml ovn/ovn.ovsschema $(OVN_PIC)
> +    $(AM_V_GEN)$(OVSDB_DOC) \
> +        $(OVN_DOT_DIAGRAM_ARG) \
> +        --version=$(VERSION) \
> +        $(srcdir)/ovn/ovn.ovsschema \
> +        $(srcdir)/ovn/ovn.xml > $@.tmp && \
> +    mv $@.tmp $@
> +
> +# OVN northbound schema and IDL
> +EXTRA_DIST += ovn/ovn-nb.ovsschema
> +pkgdata_DATA += ovn/ovn-nb.ovsschema
> +
> +# OVN northbound E-R diagram
> +#
> +# If "python" or "dot" is not available, then we do not add graphical diagram
> +# to the documentation.
> +if HAVE_PYTHON
> +if HAVE_DOT
> +ovn/ovn-nb.gv: ovsdb/ovsdb-dot.in ovn/ovn-nb.ovsschema
> +    $(AM_V_GEN)$(OVSDB_DOT) --no-arrows $(srcdir)/ovn/ovn-nb.ovsschema > $@
> +ovn/ovn-nb.pic: ovn/ovn-nb.gv ovsdb/dot2pic
> +    $(AM_V_GEN)(dot -T plain < ovn/ovn-nb.gv | $(PERL) $(srcdir)/ovsdb/dot2pic -f 3) > $@.tmp && \
> +    mv $@.tmp $@
> +OVN_NB_PIC = ovn/ovn-nb.pic
> +OVN_NB_DOT_DIAGRAM_ARG = --er-diagram=$(OVN_NB_PIC)
> +DISTCLEANFILES += ovn/ovn-nb.gv ovn/ovn-nb.pic
> +endif
> +endif
> +
> +# OVN northbound schema documentation
> +EXTRA_DIST += ovn/ovn-nb.xml
> +DISTCLEANFILES += ovn/ovn-nb.5
> +man_MANS += ovn/ovn-nb.5
> +ovn/ovn-nb.5: \
> +    ovsdb/ovsdb-doc ovn/ovn-nb.xml ovn/ovn-nb.ovsschema $(OVN_NB_PIC)
> +    $(AM_V_GEN)$(OVSDB_DOC) \
> +        $(OVN_NB_DOT_DIAGRAM_ARG) \
> +        --version=$(VERSION) \
> +        $(srcdir)/ovn/ovn-nb.ovsschema \
> +        $(srcdir)/ovn/ovn-nb.xml > $@.tmp && \
> +    mv $@.tmp $@
> +
> +man_MANS += ovn/ovn-controller.8 ovn/ovn-architecture.7
> +EXTRA_DIST += ovn/ovn-controller.8.in ovn/ovn-architecture.7.xml
> +
> +SUFFIXES += .xml
> +%: %.xml
> +    $(AM_V_GEN)$(run_python) $(srcdir)/build-aux/xml2nroff \
> +        --version=$(VERSION) $< > $@.tmp && mv $@.tmp $@
> +
> +EXTRA_DIST += ovn/TODO
> diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
> new file mode 100644
> index 0000000..9ffa036
> --- /dev/null
> +++ b/ovn/ovn-architecture.7.xml
> @@ -0,0 +1,339 @@
> +<?xml version="1.0" encoding="utf-8"?>
> +<manpage program="ovn-architecture" section="7" title="OVN Architecture">
> +  <h1>Name</h1>
> +  <p>ovn-architecture -- Open Virtual Network architecture</p>
> +
> +  <h1>Description</h1>
> +
> +  <p>
> +    OVN, the Open Virtual Network, is a system to support virtual network
> +    abstraction.  OVN complements the existing capabilities of OVS to add
> +    native support for virtual network abstractions, such as virtual L2 and L3
> +    overlays and security groups.  Services such as DHCP are also desirable
> +    features.  Just like OVS, OVN's design goal is to have a production-quality
> +    implementation that can operate at significant scale.
> +  </p>
> +
> +  <p>
> +    An OVN deployment consists of several components:
> +  </p>
> +
> +  <ul>
> +    <li>
> +      <p>
> +        A <dfn>Cloud Management System</dfn> (<dfn>CMS</dfn>), which is
> +        OVN's ultimate client (via its users and administrators).  OVN
> +        integration requires installing a CMS-specific plugin and
> +        related software (see below).  OVN initially targets OpenStack
> +        as CMS.
> +      </p>
> +
> +      <p>
> +        We generally speak of ``the'' CMS, but one can imagine scenarios in
> +        which multiple CMSes manage different parts of an OVN deployment.
> +      </p>
> +    </li>
> +
> +    <li>
> +      An OVN Database physical or virtual node (or, eventually, cluster)
> +      installed in a central location.
> +    </li>
> +
> +    <li>
> +      One or more (usually many) <dfn>hypervisors</dfn>.  Hypervisors must run
> +      Open vSwitch and implement the interface described in
> +      <code>IntegrationGuide.md</code> in the OVS source tree.  Any hypervisor
> +      platform supported by Open vSwitch is acceptable.
> +    </li>
> +
> +    <li>
> +      <p>
> +    Zero or more <dfn>gateways</dfn>.  A gateway extends a tunnel-based
> +    logical network into a physical network by bidirectionally forwarding
> +    packets between tunnels and a physical Ethernet port.  This allows
> +    non-virtualized machines to participate in logical networks.  A gateway
> +    may be a physical host, a virtual machine, or an ASIC-based hardware
> +    switch that supports the <code>vtep</code>(5) schema.  (Support for the
> +    latter will come later in OVN implementation.)
> +      </p>
> +
> +      <p>
> +    Hypervisors and gateways are together called <dfn>transport node</dfn>
> +    or <dfn>chassis</dfn>.
> +      </p>
> +    </li>
> +  </ul>
> +
> +  <p>
> +    The diagram below shows how the major components of OVN and related
> +    software interact.  Starting at the top of the diagram, we have:
> +  </p>
> +
> +  <ul>
> +    <li>
> +      The Cloud Management System, as defined above.
> +    </li>
> +
> +    <li>
> +      <p>
> +    The <dfn>OVN/CMS Plugin</dfn> is the component of the CMS that
> +    interfaces to OVN.  In OpenStack, this is a Neutron plugin.
> +    The plugin's main purpose is to translate the CMS's notion of logical
> +    network configuration, stored in the CMS's configuration database in a
> +    CMS-specific format, into an intermediate representation understood by
> +    OVN.
> +      </p>
> +
> +      <p>
> +    This component is necessarily CMS-specific, so a new plugin needs to be
> +    developed for each CMS that is integrated with OVN.  All of the
> +    components below this one in the diagram are CMS-independent.
> +      </p>
> +    </li>
> +
> +    <li>
> +      <p>
> +    The <dfn>OVN Northbound Database</dfn> receives the intermediate
> +    representation of logical network configuration passed down by the
> +    OVN/CMS Plugin.  The database schema is meant to be ``impedance
> +    matched'' with the concepts used in a CMS, so that it directly supports
> +    notions of logical switches, routers, ACLs, and so on.  See
> +    <code>ovs-nb</code>(5) for details.
> +      </p>
> +
> +      <p>
> +    The OVN Northbound Database has only two clients: the OVN/CMS Plugin
> +    above it and <code>ovn-nbd</code> below it.
> +      </p>
> +    </li>
> +
> +    <li>
> +      <code>ovn-nbd</code>(8) connects to the OVN Northbound Database above it
> +      and the OVN Database below it.  It translates the logical network
> +      configuration in terms of conventional network concepts, taken from the
> +      OVN Northbound Database, into logical datapath flows in the OVN Database
> +      below it.
> +    </li>
> +
> +    <li>
> +      <p>
> +    The <dfn>OVN Database</dfn> is the center of the system.  Its clients
> +    are <code>ovn-nbd</code>(8) above it and <code>ovn-controller</code>(8)
> +    on every transport node below it.
> +      </p>
> +
> +      <p>
> +    The OVN Database contains three kinds of data: <dfn>Physical
> +    Network</dfn> (PN) tables that specify how to reach hypervisor and
> +    other nodes, <dfn>Logical Network</dfn> (LN) tables that describe the
> +    logical network in terms of ``logical datapath flows,'' and
> +    <dfn>Binding</dfn> tables that link logical network components'
> +    locations to the physical network.  The hypervisors populate the PN and
> +    Binding tables, whereas <code>ovn-nbd</code>(8) populates the LN
> +    tables.
> +      </p>
> +
> +      <p>
> +    OVN Database performance must scale with the number of transport nodes.
> +    This will likely require some work on <code>ovsdb-server</code>(1) as
> +    we encounter bottlenecks.  Clustering for availability may be needed.
> +      </p>
> +    </li>
> +  </ul>
> +
> +  <p>
> +    The remaining components are replicated onto each hypervisor:
> +  </p>
> +
> +  <ul>
> +    <li>
> +      <code>ovn-controller</code>(8) is OVN's agent on each hypervisor and
> +      software gateway.  Northbound, it connects to the OVN Database to learn
> +      about OVN configuration and status and to populate the PN and <code>Bindings</code>
> +      tables with the hypervisor's status.  Southbound, it connects to
> +      <code>ovs-vswitchd</code>(8) as an OpenFlow controller, for control over
> +      network traffic, and to the local <code>ovsdb-server</code>(1) to allow
> +      it to monitor and control Open vSwitch configuration.
> +    </li>
> +
> +    <li>
> +      <code>ovs-vswitchd</code>(8) and <code>ovsdb-server</code>(1) are
> +      conventional components of Open vSwitch.
> +    </li>
> +  </ul>
> +
> +  <pre fixed="yes">
> +                                  CMS
> +                                   |
> +                                   |
> +                       +-----------|-----------+
> +                       |           |           |
> +                       |     OVN/CMS Plugin    |
> +                       |           |           |
> +                       |           |           |
> +                       |   OVN Northbound DB   |
> +                       |           |           |
> +                       |           |           |
> +                       |        ovn-nbd        |
> +                       |           |           |
> +                       +-----------|-----------+
> +                                   |
> +                                   |
> +                                +------+
> +                                |OVN DB|
> +                                +------+
> +                                   |
> +                                   |
> +                +------------------+------------------+
> +                |                  |                  |
> + HV 1           |                  |    HV n          |
> ++---------------|---------------+  .  +---------------|---------------+
> +|               |               |  .  |               |               |
> +|        ovn-controller         |  .  |        ovn-controller         |
> +|         |          |          |  .  |         |          |          |
> +|         |          |          |     |         |          |          |
> +|  ovs-vswitchd   ovsdb-server  |     |  ovs-vswitchd   ovsdb-server  |
> +|                               |     |                               |
> ++-------------------------------+     +-------------------------------+
> +  </pre>
> +
> +  <h3>Life Cycle of a VIF</h3>
> +
> +  <p>
> +    Tables and their schemas presented in isolation are difficult to
> +    understand.  Here's an example.
> +  </p>
> +
> +  <p>
> +    The steps in this example refer often to details of the OVN and OVN
> +    Northbound database schemas.  Please see <code>ovn</code>(5) and
> +    <code>ovn-nb</code>(5), respectively, for the full story on these
> +    databases.
> +  </p>
> +
> +  <ol>
> +    <li>
> +      A VIF's life cycle begins when a CMS administrator creates a new VIF
> +      using the CMS user interface or API and adds it to a switch (one
> +      implemented by OVN as a logical switch).  The CMS updates its own
> +      configuration.  This includes associating unique, persistent identifier
> +      <var>vif-id</var> and Ethernet address <var>mac</var> with the VIF.
> +    </li>
> +
> +    <li>
> +      The CMS plugin updates the OVN Northbound database to include the new
> +      VIF, by adding a row to the <code>Logical_Port</code> table.  In the new
> +      row, <code>name</code> is <var>vif-id</var>, <code>mac</code> is
> +      <var>mac</var>, <code>switch</code> points to the OVN logical switch's
> +      Logical_Switch record, and other columns are initialized appropriately.
> +    </li>
> +
> +    <li>
> +      <code>ovs-nbd</code> receives the OVN Northbound database update.  In
> +      turn, it makes the corresponding updates to the OVN database, by adding
> +      rows to the OVN database <code>Pipeline</code> table to reflect the new
> +      port, e.g. add a flow to recognize that packets destined to the new
> +      port's MAC address should be delivered to it, and update the flow that
> +      delivers broadcast and multicast packets to include the new port.
> +    </li>
> +
> +    <li>
> +      On every hypervisor, <code>ovn-controller</code> receives the
> +      <code>Pipeline</code> table updates that <code>ovs-nbd</code> made in the
> +      previous step.  As long as the VM that owns the VIF is powered off,
> +      <code>ovn-controller</code> cannot do much; it cannot, for example,
> +      arrange to send packets to or receive packets from the VIF, because the
> +      VIF does not actually exist anywhere.
> +    </li>
> +
> +    <li>
> +      Eventually, a user powers on the VM that owns the VIF.  On the hypervisor
> +      where the VM is powered on, the integration between the hypervisor and
> +      Open vSwitch (described in <code>IntegrationGuide.md</code>) adds the VIF
> +      to the OVN integration bridge and stores <var>vif-id</var> in
> +      <code>external-ids</code>:<code>iface-id</code> to indicate that the
> +      interface is an instantiation of the new VIF.  (None of this code is new
> +      in OVN; this is pre-existing integration work that has already been done
> +      on hypervisors that support OVS.)
> +    </li>
> +
> +    <li>
> +      On the hypervisor where the VM is powered on, <code>ovn-controller</code>
> +      notices <code>external-ids</code>:<code>iface-id</code> in the new
> +      Interface.  In response, it updates the local hypervisor's OpenFlow
> +      tables so that packets to and from the VIF are properly handled.
> +      Afterward, it updates the <code>Bindings</code> table in the OVN DB,
> +      adding a row that links the logical port from
> +      <code>external-ids</code>:<code>iface-id</code> to the hypervisor.
> +    </li>
> +
> +    <li>
> +      Some CMS systems, including OpenStack, fully start a VM only when its
> +      networking is ready.  To support this, <code>ovn-nbd</code> notices the
> +      new row in the <code>Bindings</code> table, and pushes this upward by
> +      updating the <ref column="up" table="Logical_Port" db="OVN_NB"/> column
> +      in the OVN Northbound database's <ref table="Logical_Port" db="OVN_NB"/>
> +      table to indicate that the VIF is now up.  The CMS, if it uses this
> +      feature, can then react by allowing the VM's execution to proceed.
> +    </li>
> +
> +    <li>
> +      On every hypervisor but the one where the VIF resides,
> +      <code>ovn-controller</code> notices the new row in the
> +      <code>Bindings</code> table.  This provides <code>ovn-controller</code>
> +      the physical location of the logical port, so each instance updates the
> +      OpenFlow tables of its switch (based on logical datapath flows in the OVN
> +      DB <code>Pipeline</code> table) so that packets to and from the VIF can
> +      be properly handled via tunnels.
> +    </li>
> +
> +    <li>
> +      Eventually, a user powers off the VM that owns the VIF.  On the
> +      hypervisor where the VM was powered on, the VIF is deleted from the OVN
> +      integration bridge.
> +    </li>
> +
> +    <li>
> +      On the hypervisor where the VM was powered on,
> +      <code>ovn-controller</code> notices that the VIF was deleted.  In
> +      response, it removes the logical port's row from the
> +      <code>Bindings</code> table.
> +    </li>
> +
> +    <li>
> +      On every hypervisor, <code>ovn-controller</code> notices the row removed
> +      from the <code>Bindings</code> table.  This means that
> +      <code>ovn-controller</code> no longer knows the physical location of the
> +      logical port, so each instance updates its OpenFlow table to reflect
> +      that.
> +    </li>
> +
> +    <li>
> +      Eventually, when the VIF (or its entire VM) is no longer needed by
> +      anyone, an administrator deletes the VIF using the CMS user interface or
> +      API.  The CMS updates its own configuration.
> +    </li>
> +
> +    <li>
> +      The CMS plugin removes the VIF from the OVN Northbound database,
> +      by deleting its row in the <code>Logical_Port</code> table.
> +    </li>
> +
> +    <li>
> +      <code>ovs-nbd</code> receives the OVN Northbound update and in turn
> +      updates the OVN database accordingly, by removing or updating the
> +      rows from the OVN database <code>Pipeline</code> table that were related
> +      to the now-destroyed VIF.
> +    </li>
> +
> +    <li>
> +      On every hypervisor, <code>ovn-controller</code> receives the
> +      <code>Pipeline</code> table updates that <code>ovs-nbd</code> made in the
> +      previous step.  <code>ovn-controller</code> updates OpenFlow tables to
> +      reflect the update, although there may not be much to do, since the VIF
> +      had already become unreachable when it was removed from the
> +      <code>Bindings</code> table in a previous step.
> +    </li>
> +  </ol>
> +
> +</manpage>
> diff --git a/ovn/ovn-controller.8.in b/ovn/ovn-controller.8.in
> new file mode 100644
> index 0000000..59fcb59
> --- /dev/null
> +++ b/ovn/ovn-controller.8.in
> @@ -0,0 +1,41 @@
> +.\" -*- nroff -*-
> +.de IQ
> +.  br
> +.  ns
> +.  IP "\\$1"
> +..
> +.TH ovn\-controller 8 "@VERSION@" "Open vSwitch" "Open vSwitch Manual"
> +.ds PN ovn\-controller
> +.
> +.SH NAME
> +ovn\-controller \- OVN local controller
> +.
> +.SH SYNOPSIS
> +\fBovn\-controller\fR [\fIoptions\fR]
> +.
> +.SH DESCRIPTION
> +\fBovn\-controller\fR is the local controller daemon for OVN, the Open
> +Virtual Network.  It connects northbound to the OVN database (see
> +\fBovn\fR(5)) over the OVSDB protocol, and southbound to the Open
> +vSwitch database (see \fBovs-vswitchd.conf.db\fR(5)) over the OVSDB
> +protocol and to \fBovs\-vswitchd\fR(8) via OpenFlow.  Each hypervisor
> +and software gateway in an OVN deployment runs its own independent
> +copy of \fBovn\-controller\fR; thus, \fBovn\-controller\fR's
> +southbound connections are machine-local and do not run over a
> +physical network.
> +.PP
> +XXX this is completely skeletal.
> +.
> +.SH OPTIONS
> +.SS "Public Key Infrastructure Options"
> +.so lib/ssl.man
> +.so lib/ssl-peer-ca-cert.man
> +.ds DD
> +.so lib/daemon.man
> +.so lib/vlog.man
> +.so lib/unixctl.man
> +.so lib/common.man
> +.
> +.SH "SEE ALSO"
> +.
> +\fBovn\-architecture\fR(7)
> diff --git a/ovn/ovn-nb.ovsschema b/ovn/ovn-nb.ovsschema
> new file mode 100644
> index 0000000..ad675ac
> --- /dev/null
> +++ b/ovn/ovn-nb.ovsschema
> @@ -0,0 +1,62 @@
> +{
> +    "name": "OVN_Northbound",
> +    "tables": {
> +        "Logical_Switch": {
> +            "columns": {
> +                "router_port": {"type": {"key": {"type": "uuid",
> +                                                 "refTable": "Logical_Router_Port",
> +                                                 "refType": "strong"},
> +                                         "min": 0, "max": 1}},
> +                "external_ids": {
> +                    "type": {"key": "string", "value": "string",
> +                             "min": 0, "max": "unlimited"}}}},
> +        "Logical_Port": {
> +            "columns": {
> +                "switch": {"type": {"key": {"type": "uuid",
> +                                            "refTable": "Logical_Switch",
> +                                            "refType": "strong"}}},
> +                "name": {"type": "string"},
> +                "macs": {"type": {"key": "string",
> +                                  "min": 0,
> +                                  "max": "unlimited"}},
> +                "port_security": {"type": {"key": "string",
> +                                           "min": 0,
> +                                           "max": "unlimited"}},
> +                "up": {"type": {"key": "boolean", "min": 0, "max": 1}},
> +                "external_ids": {
> +                    "type": {"key": "string", "value": "string",
> +                             "min": 0, "max": "unlimited"}}},
> +            "indexes": [["name"]]},
> +        "ACL": {
> +            "columns": {
> +                "switch": {"type": {"key": {"type": "uuid",
> +                                            "refTable": "Logical_Switch",
> +                                            "refType": "strong"}}},
> +                "priority": {"type": {"key": {"type": "integer",
> +                                              "minInteger": 0,
> +                                              "maxInteger": 65535}}},
> +                "match": {"type": "string"},
> +                "action": {"type": {"key": {"type": "string",
> +                                            "enum": ["set", ["allow", "allow-related", "drop", "reject"]]}}},
> +                "log": {"type": "boolean"},
> +                "external_ids": {
> +                    "type": {"key": "string", "value": "string",
> +                             "min": 0, "max": "unlimited"}}}},
> +        "Logical_Router": {
> +            "columns": {
> +                "ip": {"type": "string"},
> +                "default_gw": {"type": {"key": "string", "min": 0, "max": 1}},
> +                "external_ids": {
> +                    "type": {"key": "string", "value": "string",
> +                             "min": 0, "max": "unlimited"}}}},
> +        "Logical_Router_Port": {
> +            "columns": {
> +                "router": {"type": {"key": {"type": "uuid",
> +                                            "refTable": "Logical_Router",
> +                                            "refType": "strong"}}},
> +                "network": {"type": "string"},
> +                "mac": {"type": "string"},
> +                "external_ids": {
> +                    "type": {"key": "string", "value": "string",
> +                             "min": 0, "max": "unlimited"}}}}},
> +    "version": "1.0.0"}
> diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
> new file mode 100644
> index 0000000..80190ca
> --- /dev/null
> +++ b/ovn/ovn-nb.xml
> @@ -0,0 +1,245 @@
> +<?xml version="1.0" encoding="utf-8"?>
> +<database name="ovn-nb" title="OVN Northbound Database">
> +  <p>
> +    This database is the interface between OVN and the cloud management system
> +    (CMS), such as OpenStack, running above it.  The CMS produces almost all of
> +    the contents of the database.  The <code>ovs-nbd</code> program monitors
> +    the database contents, transforms it, and stores it into the <ref
> +    db="OVN"/> database.
> +  </p>
> +
> +  <p>
> +    We generally speak of ``the'' CMS, but one can imagine scenarios in
> +    which multiple CMSes manage different parts of an OVN deployment.
> +  </p>
> +
> +  <h2>External IDs</h2>
> +
> +  <p>
> +    Each of the tables in this database contains a special column, named
> +    <code>external_ids</code>.  This column has the same form and purpose each
> +    place it appears.
> +  </p>
> +
> +  <dl>
> +    <dt><code>external_ids</code>: map of string-string pairs</dt>
> +    <dd>
> +      Key-value pairs for use by the CMS.  The CMS might use certain pairs, for
> +      example, to identify entities in its own configuration that correspond to
> +      those in this database.
> +    </dd>
> +  </dl>
> +
> +  <table name="Logical_Switch" title="L2 logical switch">
> +    <p>
> +      Each row represents one L2 logical switch.  A given switch's ports are
> +      the <ref table="Logical_Port"/> rows whose <ref table="Logical_Port"
> +      column="switch"/> column points to its row.
> +    </p>
> +
> +    <column name="router_port">
> +      <p>
> +        The router port to which this logical switch is connected, or empty if
> +        this logical switch is not connected to any router.  A switch may be
> +        connected to at most one logical router, but this is not a significant
> +        restriction because logical routers may be connected into arbitrary
> +        topologies.
> +      </p>
> +    </column>
> +
> +    <group title="Common Columns">
> +      <column name="external_ids">
> +        See <em>External IDs</em> at the beginning of this document.
> +      </column>
> +    </group>
> +  </table>
> +
> +  <table name="Logical_Port" title="L2 logical switch port">
> +    <p>
> +      A port within an L2 logical switch.
> +    </p>
> +
> +    <column name="switch">
> +      The logical switch to which the logical port is connected.
> +    </column>
> +
> +    <column name="name">
> +      The logical port name.  The name used here must match those used in the
> +      <ref key="iface-id" table="Interface" column="external_ids"
> +      db="Open_vSwitch"/> in the <ref db="Open_vSwitch"/> database's <ref
> +      table="Interface" db="Open_vSwitch"/> table, because hypervisors use <ref
> +      key="iface-id" table="Interface" column="external_ids"
> +      db="Open_vSwitch"/> as a lookup key for logical ports.
> +    </column>
> +
> +    <column name="up">
> +      This column is populated by <code>ovn-nbd</code>, rather than by the CMS
> +      plugin as is most of this database.  When a logical port is bound to a
> +      physical location in the OVN database <ref db="OVN" table="Bindings"/>
> +      table, <code>ovn-nbd</code> sets this column to <code>true</code>;
> +      otherwise, or if the port becomes unbound later, it sets it to
> +      <code>false</code>.  This allows the CMS to wait for a VM's networking to
> +      become active before it allows the VM to start.
> +    </column>
> +
> +    <column name="macs">
> +      The logical port's own Ethernet address or addresses, each in the form
> +      <var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>.
> +      Like a physical Ethernet NIC, a logical port ordinarily has a single
> +      fixed Ethernet address.  The string <code>unknown</code> is also allowed
> +      to indicate that the logical port has an unknown set of (additional)
> +      source addresses.
> +    </column>
> +
> +    <column name="port_security">
> +      <p>
> +        A set of L2 (Ethernet) or L3 (IPv4 or IPv6) addresses or L2+L3 pairs
> +        from which the logical port is allowed to send packets and to which it
> +        is allowed to receive packets.  If this column is empty, all addresses
> +        are permitted.
> +      </p>
> +
> +      <p>
> +        Exact syntax is TBD.  One could simply use comma- or space-separated L2
> +        and L3 addresses in each set member, or replace this by a subset of the
> +        general-purpose expression language used for the <ref column="match"
> +        table="Pipeline" db="OVN"/> column in the OVN database's <ref
> +        table="Pipeline" db="OVN"/> table.
> +      </p>
> +    </column>
> +
> +    <group title="Common Columns">
> +      <column name="external_ids">
> +        See <em>External IDs</em> at the beginning of this document.
> +      </column>
> +    </group>
> +  </table>
> +
> +  <table name="ACL" title="Access Control List (ACL) rule">
> +    <p>
> +      Each row in this table represents one ACL rule for the logical switch in
> +      its <ref column="switch"/> column.  The <ref column="action"/> column for
> +      the highest-<ref column="priority"/> matching row in this table
> +      determines a packet's treatment.  If no row matches, packets are allowed
> +      by default.  (Default-deny treatment is possible: add a rule with <ref
> +      column="priority"/> 0, <code>true</code> as <ref column="match"/>, and
> +      <code>deny</code> as <ref column="action"/>.)
> +    </p>
> +
> +    <column name="switch">
> +      The switch to which the ACL rule applies.  The expression in the
> +      <ref column="match"/> column may match against logical ports
> +      within this switch.
> +    </column>
> +
> +    <column name="priority">
> +      The ACL rule's priority.  Rules with numerically higher priority take
> +      precedence over those with lower.  If two ACL rules with the same
> +      priority both match, then the one actually applied to a packet is
> +      undefined.
> +    </column>
> +
> +    <column name="match">
> +      The packets that the ACL should match, in the same expression language
> +      used for the <ref column="match" table="Pipeline" db="OVN"/> column in
> +      the OVN database's <ref table="Pipeline" db="OVN"/> table.  Match
> +      <code>inport</code> and <code>outport</code> against names of logical
> +      ports within <ref column="switch"/> to implement ingress and egress ACLs,
> +      respectively.  In logical switches connected to logical routers, the
> +      special port name <code>ROUTER</code> refers to the logical router port.
> +    </column>
> +
> +    <column name="action">
> +      <p>The action to take when the ACL rule matches:</p>
> +      
> +      <ul>
> +    <li>
> +      <code>allow</code>: Forward the packet.
> +    </li>
> +
> +    <li>
> +      <code>allow-related</code>: Forward the packet and related traffic
> +      (e.g. inbound replies to an outbound connection).
> +    </li>
> +
> +    <li>
> +      <code>drop</code>: Silently drop the packet.
> +    </li>
> +
> +    <li>
> +      <code>reject</code>: Drop the packet, replying with a RST for TCP or
> +      ICMP unreachable message for other IP-based protocols.
> +    </li>
> +      </ul>
> +    </column>
> +
> +    <column name="log">
> +      If set to <code>true</code>, packets that match the ACL will trigger a
> +      log message on the transport node or nodes that perform ACL processing.
> +      Logging may be combined with any <ref column="action"/>.
> +    </column>
> +
> +    <group title="Common Columns">
> +      <column name="external_ids">
> +        See <em>External IDs</em> at the beginning of this document.
> +      </column>
> +    </group>
> +  </table>
> +
> +  <table name="Logical_Router" title="L3 logical router">
> +    <p>
> +      Each row represents one L3 logical router.  A given router's ports are
> +      the <ref table="Logical_Router_Port"/> rows whose <ref
> +      table="Logical_Router_Port" column="router"/> column points to its row.
> +    </p>
> +
> +    <column name="ip">
> +      The logical router's own IP address.  The logical router uses this
> +      address for ICMP replies (e.g. network unreachable messages) and other
> +      traffic that it originates and responds to traffic destined to this
> +      address (e.g. ICMP echo requests).
> +    </column>
> +
> +    <column name="default_gw">
> +      IP address to use as default gateway, if any.
> +    </column>
> +
> +    <group title="Common Columns">
> +      <column name="external_ids">
> +        See <em>External IDs</em> at the beginning of this document.
> +      </column>
> +    </group>
> +  </table>
> +
> +  <table name="Logical_Router_Port" title="L3 logical router port">
> +    <p>
> +      A port within an L3 logical router.
> +    </p>
> +
> +    <p>
> +      A router port is always attached to a switch port.  The connection can be
> +      identified by following the <ref column="router_port"
> +      table="Logical_Port"/> column from an appropriate <ref
> +      table="Logical_Port"/> row.
> +    </p>
> +
> +    <column name="router">
> +      The router to which the port belongs.
> +    </column>
> +
> +    <column name="network">
> +      The IP network and netmask of the network on the router port.  Used for
> +      routing.
> +    </column>
> +
> +    <column name="mac">
> +      The Ethernet address that belongs to this router port.
> +    </column>
> +
> +    <group title="Common Columns">
> +      <column name="external_ids">
> +        See <em>External IDs</em> at the beginning of this document.
> +      </column>
> +    </group>
> +  </table>
> +</database>
> diff --git a/ovn/ovn.ovsschema b/ovn/ovn.ovsschema
> new file mode 100644
> index 0000000..5597df4
> --- /dev/null
> +++ b/ovn/ovn.ovsschema
> @@ -0,0 +1,50 @@
> +{
> +    "name": "OVN",
> +    "tables": {
> +        "Chassis": {
> +            "columns": {
> +                "name": {"type": "string"},
> +                "encap": {"type": {"key": {"type": "string",
> +                                           "enum": ["set", ["stt", "vxlan", "gre"]]}}},
> +                "encap_options": {"type": {"key": "string",
> +                                           "value": "string",
> +                                           "min": 0,
> +                                           "max": "unlimited"}},
> +                "ip": {"type": "string"},
> +                "gateway_ports": {"type": {"key": "string",
> +                                           "value": {"type": "uuid",
> +                                                     "refTable": "Gateway",
> +                                                     "refType": "strong"},
> +                                           "min": 0,
> +                                           "max": "unlimited"}}},
> +            "isRoot": true,
> +            "indexes": [["name"]]},
> +        "Gateway": {
> +            "columns": {"attached_port": {"type": "string"},
> +                        "vlan_map": {"type": {"key": {"type": "integer",
> +                                                      "minInteger": 0,
> +                                                      "maxInteger": 4095},
> +                                              "value": {"type": "string"},
> +                                              "min": 0,
> +                                              "max": "unlimited"}}}},
> +        "Pipeline": {
> +            "columns": {
> +                "table_id": {"type": {"key": {"type": "integer",
> +                                              "minInteger": 0,
> +                                              "maxInteger": 127}}},
> +                "priority": {"type": {"key": {"type": "integer",
> +                                              "minInteger": 0,
> +                                              "maxInteger": 65535}}},
> +                "match": {"type": "string"},
> +                "actions": {"type": "string"}},
> +            "isRoot": true},
> +        "Bindings": {
> +            "columns": {
> +                "logical_port": {"type": "string"},
> +                "chassis": {"type": "string"},
> +                "mac": {"type": {"key": "string",
> +                                 "min": 0,
> +                                 "max": "unlimited"}}},
> +            "indexes": [["logical_port"]],
> +            "isRoot": true}},
> +    "version": "1.0.0"}
> diff --git a/ovn/ovn.xml b/ovn/ovn.xml
> new file mode 100644
> index 0000000..a233112
> --- /dev/null
> +++ b/ovn/ovn.xml
> @@ -0,0 +1,497 @@
> +<?xml version="1.0" encoding="utf-8"?>
> +<database name="ovn" title="OVN Database">
> +  <p>
> +    This database holds logical and physical configuration and state for the
> +    Open Virtual Network (OVN) system to support virtual network abstraction.
> +    For an introduction to OVN, please see <code>ovn-architecture</code>(7).
> +  </p>
> +
> +  <p>
> +    The OVN database sits at the center of the OVN architecture.  It is the one
> +    component that speaks both southbound directly to all the hypervisors and
> +    gateways, via <code>ovn-controller</code>, and northbound to the Cloud
> +    Management System, via <code>ovn-nbd</code>:
> +  </p>
> +
> +  <h2>Database Structure</h2>
> +
> +  <p>
> +    The OVN database contains three classes of data with different properties,
> +    as described in the sections below.
> +  </p>
> +
> +  <h3>Physical Network (PN) data</h3>
> +
> +  <p>
> +    PN tables contain information about the chassis nodes in the system.  This
> +    contains all the information necessary to wire the overlay, such as IP
> +    addresses, supported tunnel types, and security keys.
> +  </p>
> +
> +  <p>
> +    The amount of PN data is small (O(n) in the number of chassis) and it
> +    changes infrequently, so it can be replicated to every chassis.
> +  </p>
> +
> +  <p>
> +    The <ref table="Chassis"/> and <ref table="Gateway"/> tables comprise the
> +    PN tables.
> +  </p>
> +
> +  <h3>Logical Network (LN) data</h3>
> +
> +  <p>
> +    LN tables contain the topology of logical switches and routers, ACLs,
> +    firewall rules, and everything needed to describe how packets traverse a
> +    logical network, represented as logical datapath flows (see Logical
> +    Datapath Flows, below).
> +  </p>
> +
> +  <p>
> +    LN data may be large (O(n) in the number of logical ports, ACL rules,
> +    etc.).  Thus, to improve scaling, each chassis should receive only data
> +    related to logical networks in which that chassis participates.  Past
> +    experience shows that in the presence of large logical networks, even
> +    finer-grained partitioning of data, e.g. designing logical flows so that
> +    only the chassis hosting a logical port needs related flows, pays off
> +    scale-wise.  (This is not necessary initially but it is worth bearing in
> +    mind in the design.)
> +  </p>
> +
> +  <p>
> +    The LN is a slave of the cloud management system running northbound of OVN.
> +    That CMS determines the entire OVN logical configuration and therefore the
> +    LN's content at any given time is a deterministic function of the CMS's
> +    configuration, although that happens indirectly via the OVN Northbound DB
> +    and <code>ovn-nvd</code>.
> +  </p>
> +
> +  <p>
> +    LN data is likely to change more quickly than PN data.  This is especially
> +    true in a container environment where VMs are created and destroyed (and
> +    therefore added to and deleted from logical switches) quickly.
> +  </p>
> +
> +  <p>
> +    The <ref table="Pipeline"/> table is currently the only LN table.
> +  </p>
> +
> +  <h3>Bindings data</h3>
> +
> +  <p>
> +    The Bindings tables contain the current placement of logical components
> +    (such as VMs and VIFs) onto chassis and the bindings between logical ports
> +    and MACs.
> +  </p>
> +
> +  <p>
> +    Bindings change frequently, at least every time a VM powers up or down
> +    or migrates, and especially quickly in a container environment.  The
> +    amount of data per VM (or VIF) is small.
> +  </p>
> +
> +  <p>
> +    Each chassis is authoritative about the VMs and VIFs that it hosts at any
> +    given time and can efficiently flood that state to a central location, so
> +    the consistency needs are minimal.
> +  </p>
> +
> +  <p>
> +    The <ref table="Bindings"/> table is currently the only Bindings table.
> +  </p>
> +
> +  <table name="Chassis" title="Physical Network Hypervisor and Gateway Information">
> +    <p>
> +      Each row in this table represents a hypervisor or gateway (a chassis) in
> +      the physical network (PN).  Each chassis, via
> +      <code>ovn-controller</code>, adds and updates its own row, and keeps a
> +      copy of the remaining rows to determine how to reach other hypervisors.
> +    </p>
> +
> +    <p>
> +      When a chassis shuts down gracefully, it should remove its own row.
> +      (This is not critical because resources hosted on the chassis are equally
> +      unreachable regardless of whether the row is present.)  If a chassis
> +      shuts down permanently without removing its row, some kind of manual or
> +      automatic cleanup is eventually needed; we can devise a process for that
> +      as necessary.
> +    </p>
> +
> +    <column name="name">
> +      A chassis name, taken from <ref key="system-id" table="Open_vSwitch"
> +      column="external_ids" db="Open_vSwitch"/> in the Open_vSwitch
> +      database's <ref table="Open_vSwitch" db="Open_vSwitch"/> table.  OVN does
> +      not prescribe a particular format for chassis names.
> +    </column>
> +
> +    <group title="Encapsulation">
> +      <p>
> +        These columns together identify how OVN may transmit logical dataplane
> +        packets to this chassis.
> +      </p>
> +
> +      <column name="encap">
> +        The encapsulation to use to transmit packets to this chassis.
> +      </column>
> +
> +      <column name="encap_options">
> +        Options for configuring the encapsulation, e.g. IPsec parameters when
> +        IPsec support is introduced.  No options are currently defined.
> +      </column>
> +
> +      <column name="ip">
> +        The IPv4 address of the encapsulation tunnel endpoint.
> +      </column>
> +    </group>
> +
> +    <group title="Gateway Configuration">
> +      <p>
> +        A <dfn>gateway</dfn> is a chassis that forwards traffic between a
> +        logical network and a physical VLAN.  Gateways are typically dedicated
> +        nodes that do not host VMs.
> +      </p>
> +
> +      <column name="gateway_ports">
> +        Maps from the name of a gateway port, which is typically a physical
> +        port (e.g. <code>eth1</code>) or an Open vSwitch patch port, to a <ref
> +        table="Gateway"/> record that describes the details of the gatewaying
> +        function.
> +      </column>
> +    </group>
> +  </table>
> +
> +  <table name="Gateway" title="Physical Network Gateway Ports">
> +    <p>
> +      The <ref column="gateway_ports" table="Chassis"/> column in the <ref
> +      table="Chassis"/> table refers to rows in this table to connect a chassis
> +      port to a gateway function.  Each row in this table describes the logical
> +      networks to which a gateway port is attached.  Each chassis, via
> +      <code>ovn-controller</code>(8), adds and updates its own rows, if any
> +      (since most chassis are not gateways), and keeps a copy of the remaining
> +      rows to determine how to reach other chassis.
> +    </p>
> +
> +    <column name="vlan_map">
> +      Maps from a VLAN ID to a logical port name.  Thus, each named logical
> +      port corresponds to one VLAN on the gateway port.
> +    </column>
> +
> +    <column name="attached_port">
> +      The name of the gateway port in the chassis's Open vSwitch integration
> +      bridge.
> +    </column>
> +  </table>
> +
> +  <table name="Pipeline" title="Logical Network Pipeline">
> +    <p>
> +      Each row in this table represents one logical flow.  The cloud management
> +      system, via its OVN integration, populates this table with logical flows
> +      that implement the L2 and L3 topology specified in the CMS configuration.
> +      Each hypervisor, via <code>ovn-controller</code>, translates the logical
> +      flows into OpenFlow flows specific to its hypervisor and installs them
> +      into Open vSwitch.
> +    </p>
> +
> +    <p>
> +      Logical flows are expressed in an OVN-specific format, described here.  A
> +      logical datapath flow is much like an OpenFlow flow, except that the
> +      flows are written in terms of logical ports and logical datapaths instead
> +      of physical ports and physical datapaths.  Translation between logical
> +      and physical flows helps to ensure isolation between logical datapaths.
> +      (The logical flow abstraction also allows the CMS to do less work, since
> +      it does not have to separately compute and push out physical physical
> +      flows to each chassis.)
> +    </p>
> +
> +    <p>
> +      The default action when no flow matches is to drop packets.
> +    </p>
> +
> +    <column name="table_id">
> +      The stage in the logical pipeline, analogous to an OpenFlow table number.
> +    </column>
> +
> +    <column name="priority">
> +      The flow's priority.  Flows with numerically higher priority take
> +      precedence over those with lower.  If two logical datapath flows with the
> +      same priority both match, then the one actually applied to the packet is
> +      undefined.
> +    </column>
> +
> +    <column name="match">
> +      <p>
> +        A matching expression.  OVN provides a superset of OpenFlow matching
> +        capabilities, using a syntax similar to Boolean expressions in a
> +        programming language.
> +      </p>
> +
> +      <p>
> +        Matching expressions have two important kinds of primary expression:
> +        <dfn>fields</dfn> and <dfn>constants</dfn>.  A field names a piece of
> +        data or metadata.  The supported fields are:
> +      </p>
> +
> +      <ul>
> +        <li>
> +          <code>metadata</code> <code>reg0</code> ... <code>reg7</code>
> +          <code>xreg0</code> ... <code>xreg3</code>
> +        </li>
> +        <li><code>inport</code> <code>outport</code> <code>queue</code></li>
> +        <li><code>eth.src</code> <code>eth.dst</code> <code>eth.type</code></li>
> +        <li><code>vlan.tci</code> <code>vlan.vid</code> <code>vlan.pcp</code> <code>vlan.present</code></li>
> +        <li><code>ip.proto</code> <code>ip.dscp</code> <code>ip.ecn</code> <code>ip.ttl</code> <code>ip.frag</code></li>
> +        <li><code>ip4.src</code> <code>ip4.dst</code></li>
> +        <li><code>ip6.src</code> <code>ip6.dst</code> <code>ip6.label</code></li>
> +        <li><code>arp.op</code> <code>arp.spa</code> <code>arp.tpa</code> <code>arp.sha</code> <code>arp.tha</code></li>
> +        <li><code>tcp.src</code> <code>tcp.dst</code> <code>tcp.flags</code></li>
> +        <li><code>udp.src</code> <code>udp.dst</code></li>
> +        <li><code>sctp.src</code> <code>sctp.dst</code></li>
> +        <li><code>icmp4.type</code> <code>icmp4.code</code></li>
> +        <li><code>icmp6.type</code> <code>icmp6.code</code></li>
> +        <li><code>nd.target</code> <code>nd.sll</code> <code>nd.tll</code></li>
> +      </ul>
> +
> +      <p>
> +        Subfields may be addressed using a <code>[]</code> suffix,
> +        e.g. <code>tcp.src[0..7]</code> refers to the low 8 bits of the TCP
> +        source port.  A subfield may be used in any context a field is allowed.
> +      </p>
> +
> +      <p>
> +        Some fields have prerequisites.  OVN implicitly adds clauses to satisfy
> +        these.  For example, <code>arp.op == 1</code> is equivalent to
> +        <code>eth.type == 0x0806 &amp;&amp; arp.op == 1</code>, and
> +        <code>tcp.src == 80</code> is equivalent to <code>(eth.type == 0x0800
> +        || eth.type == 0x86dd) &amp;&amp; ip.proto == 6 &amp;&amp; tcp.src ==
> +        80</code>.
> +      </p>
> +
> +      <p>
> +        Most fields have integer values.  Integer constants may be expressed in
> +        several forms: decimal integers, hexadecimal integers prefixed by
> +        <code>0x</code>, dotted-quad IPv4 addresses, IPv6 addresses in their
> +        standard forms, and as Ethernet addresses as colon-separated hex
> +        digits.  A constant in any of these forms may be followed by a slash
> +        and a second constant (the mask) in the same form, to form a masked
> +        constant.  IPv4 and IPv6 masks may be given as integers, to express
> +        CIDR prefixes.
> +      </p>
> +
> +      <p>
> +        The <code>inport</code> and <code>outport</code> fields have string
> +        values.  The useful values are <ref column="logical_port"/> names from
> +        the <ref column="Bindings"/> and <ref column="Gateway"/> table.
> +      </p>
> +
> +      <p>
> +        The available operators, from highest to lowest precedence, are:
> +      </p>
> +
> +      <ul>
> +        <li><code>()</code></li>
> +        <li><code>==   !=   &lt;   &lt;=   &gt;   &gt;=   in   not in</code></li>
> +        <li><code>!</code></li>
> +        <li><code>&amp;&amp;</code></li>
> +        <li><code>||</code></li>
> +      </ul>
> +
> +      <p>
> +        The <code>()</code> operator is used for grouping.
> +      </p>
> +
> +      <p>
> +        The equality operator <code>==</code> is the most important operator.
> +        Its operands must be a field and an optionally masked constant, in
> +        either order.  The <code>==</code> operator yields true when the
> +        field's value equals the constant's value for all the bits included in
> +        the mask.  The <code>==</code> operator translates simply and naturally
> +        to OpenFlow.
> +      </p>
> +
> +      <p>
> +        The inequality operator <code>!=</code> yields the inverse of
> +        <code>==</code> but its syntax and use are the same.  Implementation of
> +        the inequality operator is expensive.
> +      </p>
> +
> +      <p>
> +        The relational operators are &lt;, &lt;=, &gt;, and &gt;=.  Their
> +        operands must be a field and a constant, in either order; the constant
> +        must not be masked.  These operators are most commonly useful for L4
> +        ports, e.g. <code>tcp.src &lt; 1024</code>.  Implementation of the
> +        relational operators is expensive.
> +      </p>
> +
> +      <p>
> +        The set membership operator <code>in</code>, with syntax
> +        ``<code><var>field</var> in { <var>constant1</var>,
> +        <var>constant2</var>,</code> ... <code>}</code>'', is syntactic sugar
> +        for ``<code>(<var>field</var> == <var>constant1</var> ||
> +        <var>field</var> == <var>constant2</var> || </code>...<code>)</code>.
> +        Conversely, ``<code><var>field</var> not in { <var>constant1</var>,
> +        <var>constant2</var>, </code>...<code> }</code>'' is syntactic sugar
> +        for ``<code>(<var>field</var> != <var>constant1</var> &amp;&amp;
> +        <var>field</var> != <var>constant2</var> &amp;&amp;
> +        </code>...<code>)</code>''.
> +      </p>
> +
> +      <p>
> +        The unary prefix operator <code>!</code> yields its operand's inverse.
> +      </p>
> +
> +      <p>
> +        The logical AND operator <code>&amp;&amp;</code> yields true only if
> +        both of its operands are true.
> +      </p>
> +
> +      <p>
> +        The logical OR operator <code>||</code> yields true if at least one of
> +        its operands is true.
> +      </p>
> +
> +      <p>
> +        Finally, the keywords <code>true</code> and <code>false</code> may also
> +        be used in matching expressions.  <code>true</code> is useful by itself
> +        as a catch-all expression that matches every packet.
> +      </p>
> +
> +      <p>
> +        (The above is pretty ambitious.  It probably makes sense to initially
> +        implement only a subset of this specification.  The full specification
> +        is written out mainly to get an idea of what a fully general matching
> +        expression language could include.)
> +      </p>
> +    </column>
> +
> +    <column name="actions">
> +      <p>
> +        Below, a <var>value</var> is either a <var>constant</var> or a
> +        <var>field</var>.  The following actions seem most likely to be useful:
> +      </p>
> +
> +      <dl>
> +        <dt><code>drop;</code></dt>
> +        <dd>syntactic sugar for no actions</dd>
> +
> +        <dt><code>output(<var>value</var>);</code></dt>
> +        <dd>output to port</dd>
> +
> +        <dt><code>broadcast;</code></dt>
> +        <dd>output to every logical port except ingress port</dd>
> +
> +        <dt><code>resubmit;</code></dt>
> +        <dd>execute next logical datapath table as subroutine</dd>
> +
> +        <dt><code>set(<var>field</var>=<var>value</var>);</code></dt>
> +        <dd>set data or metadata field, or copy between fields</dd>
> +      </dl>
> +
> +      <p>
> +        Following are not well thought out:
> +      </p>
> +
> +      <dl>
> +          <dt><code>learn</code></dt>
> +
> +          <dt><code>conntrack</code></dt>
> +
> +          <dt><code>with(<var>field</var>=<var>value</var>) { <var>action</var>, </code>...<code> }</code></dt>
> +          <dd>execute <var>actions</var> with temporary changes to <var>fields</var></dd>
> +
> +          <dt><code>dec_ttl { <var>action</var>, </code>...<code> } { <var>action</var>; </code>...<code>}</code></dt>
> +          <dd>
> +            decrement TTL; execute first set of actions if
> +            successful, second set if TTL decrement fails
> +          </dd>
> +
> +          <dt><code>icmp_reply { <var>action</var>, </code>...<code> }</code></dt>
> +          <dd>generate ICMP reply from packet, execute <var>action</var>s</dd>
> +
> +      <dt><code>arp { <var>action</var>, </code>...<code> }</code></dt>
> +      <dd>generate ARP from packet, execute <var>action</var>s</dd>
> +      </dl>
> +
> +      <p>
> +        Other actions can be added as needed
> +        (e.g. <code>push_vlan</code>, <code>pop_vlan</code>,
> +        <code>push_mpls</code>, <code>pop_mpls</code>).
> +      </p>
> +
> +      <p>
> +        Some of the OVN actions do not map directly to OpenFlow actions, e.g.:
> +      </p>
> +
> +      <ul>
> +        <li>
> +          <code>with</code>: Implemented as <code>stack_push;
> +          set(</code>...<code>); <var>actions</var>; stack_pop</code>.
> +        </li>
> +
> +        <li>
> +          <code>dec_ttl</code>: Implemented as <code>dec_ttl</code> followed
> +          by the successful actions.  The failure case has to be implemented by
> +          ovn-controller interpreting packet-ins.  It might be difficult to
> +          identify the particular place in the processing pipeline in
> +          <code>ovn-controller</code>; maybe some restrictions will be
> +          necessary.
> +        </li>
> +
> +        <li>
> +          <code>icmp_reply</code>: Implemented by sending the packet to
> +          <code>ovn-controller</code>, which generates the ICMP reply and sends
> +          the packet back to <code>ovs-vswitchd</code>.
> +        </li>
> +      </ul>
> +    </column>
> +  </table>
> +
> +  <table name="Bindings" title="Physical-Logical Bindings">
> +    <p>
> +      Each row in this table identifies the physical location of a logical
> +      port.  Each hypervisor, via <code>ovn-controller</code>, populates this
> +      table with rows for the logical ports that are located on its hypervisor,
> +      which <code>ovn-controller</code> in turn finds out by monitoring the
> +      local hypervisor's Open_vSwitch database, which identifies logical ports
> +      via the conventions described in <code>IntegrationGuide.md</code>.
> +    </p>
> +
> +    <p>
> +      When a chassis shuts down gracefully, it should remove its bindings.
> +      (This is not critical because resources hosted on the chassis are equally
> +      unreachable regardless of whether their rows are present.)  To handle the
> +      case where a VM is shut down abruptly on one chassis, then brought up
> +      again on a different one, <code>ovn-controller</code> must delete any
> +      existing <ref table="Binding"/> record for a logical port when it adds a
> +      new one.
> +    </p>
> +
> +    <column name="logical_port">
> +      A logical port, taken from <ref key="iface-id" table="Interface"
> +      column="external_ids" db="Open_vSwitch"/> in the Open_vSwitch database's
> +      <ref table="Interface" db="Open_vSwitch"/> table.  OVN does not prescribe
> +      a particular format for the logical port ID.
> +    </column>
> +
> +    <column name="chassis">
> +      The physical location of the logical port.  To successfully identify a
> +      chassis, this column must match the <ref table="Chassis" column="name"/>
> +      column in some row in the <ref table="Chassis"/> table.
> +    </column>
> +
> +    <column name="mac">
> +      <p>
> +        The Ethernet address or addresses used as a source address on the
> +        logical port, each in the form
> +        <var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>.
> +        The string <code>unknown</code> is also allowed to indicate that the
> +        logical port has an unknown set of (additional) source addresses.
> +      </p>
> +
> +      <p>
> +        A VM interface would ordinarily have a single Ethernet address.  A
> +        gateway port might initially only have <code>unknown</code>, and then
> +        add MAC addresses to the set as it learns new source addresses.
> +      </p>
> +    </column>
> +  </table>
> +</database>
> -- 
> 2.1.3
> 
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev



More information about the dev mailing list