[ovs-dev] [PATCH ovn 1/3] ovn: Add initial design documentation.
Justin Pettit
jpettit at nicira.com
Thu Feb 26 22:22:20 UTC 2015
This looks like a great start. Let's get it in, and we can continue to refine it as necessary:
Acked-by: Justin Pettit <jpettit at nicira.com>
Would you please push this before the other patches in the series?
Thanks,
--Justin
> On Feb 25, 2015, at 9:13 PM, Ben Pfaff <blp at nicira.com> wrote:
>
> This commit adds preliminary design documentation for Open Virtual Network,
> or OVN, a new OVS-based project to add support for virtual networking to
> OVS, initially with OpenStack integration.
>
> This initial design has been influenced by many people, including (in
> alphabetical order) Aaron Rosen, Chris Wright, Gurucharan Shetty, Jeremy
> Stribling, Justin Pettit, Ken Duda, Kevin Benton, Kyle Mestery, Madhu
> Venugopal, Martin Casado, Natasha Gude, Pankaj Thakkar, Russell Bryant,
> Teemu Koponen, and Thomas Graf. All blunders, however, are due to my own
> hubris.
>
> Signed-off-by: Ben Pfaff <blp at nicira.com>
> ---
> v1->v2: Rebase.
> v2->v3:
> - Multiple CMSes are possible.
> - Whitespace and typo fixes.
> - ovn.ovsschema: Gateway table is not a root table, other tables are.
> - ovn.xml: Talk about deleting rows on HV shutdown.
> - ovn-nb.xml: Clarify 'switch' column in ACL table.
> - ovn-nb.ovssechma: A Logical_Router_Port is no longer a Logical_Port.
> - ovn.xml: Add action for generating ARP.
> - ovn-nb.xml: Add allow-related action for security group support.
> v3->v4:
> - Add initial TODO list.
> v4->v5:
> - TODO: Revise default tunnel encapsulation thoughts.
> - TODO: Fill in a few details for Neutron plugin.
> - ovn-architecture: Mention DHCP as desirable.
> ---
> Makefile.am | 1 +
> configure.ac | 3 +-
> ovn/TODO | 306 ++++++++++++++++++++++++++++
> ovn/automake.mk | 77 +++++++
> ovn/ovn-architecture.7.xml | 339 +++++++++++++++++++++++++++++++
> ovn/ovn-controller.8.in | 41 ++++
> ovn/ovn-nb.ovsschema | 62 ++++++
> ovn/ovn-nb.xml | 245 ++++++++++++++++++++++
> ovn/ovn.ovsschema | 50 +++++
> ovn/ovn.xml | 497 +++++++++++++++++++++++++++++++++++++++++++++
> 10 files changed, 1620 insertions(+), 1 deletion(-)
> create mode 100644 ovn/TODO
> create mode 100644 ovn/automake.mk
> create mode 100644 ovn/ovn-architecture.7.xml
> create mode 100644 ovn/ovn-controller.8.in
> create mode 100644 ovn/ovn-nb.ovsschema
> create mode 100644 ovn/ovn-nb.xml
> create mode 100644 ovn/ovn.ovsschema
> create mode 100644 ovn/ovn.xml
>
> diff --git a/Makefile.am b/Makefile.am
> index 0480d20..699a580 100644
> --- a/Makefile.am
> +++ b/Makefile.am
> @@ -370,3 +370,4 @@ include tutorial/automake.mk
> include vtep/automake.mk
> include datapath-windows/automake.mk
> include datapath-windows/include/automake.mk
> +include ovn/automake.mk
> diff --git a/configure.ac b/configure.ac
> index d2d02ca..795f876 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -1,4 +1,4 @@
> -# Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2014 Nicira, Inc.
> +# Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015 Nicira, Inc.
> #
> # Licensed under the Apache License, Version 2.0 (the "License");
> # you may not use this file except in compliance with the License.
> @@ -182,6 +182,7 @@ dnl This makes sure that include/openflow gets created in the build directory.
> AC_CONFIG_COMMANDS([include/openflow/openflow.h.stamp])
>
> AC_CONFIG_COMMANDS([utilities/bugtool/dummy], [:])
> +AC_CONFIG_COMMANDS([ovn/dummy], [:])
>
> m4_ifdef([AM_SILENT_RULES], [AM_SILENT_RULES])
>
> diff --git a/ovn/TODO b/ovn/TODO
> new file mode 100644
> index 0000000..e405c7c
> --- /dev/null
> +++ b/ovn/TODO
> @@ -0,0 +1,306 @@
> +* Flow match expression handling library.
> +
> + ovn-controller is the primary user of flow match expressions, but
> + the same syntax and I imagine the same code ought to be useful in
> + ovn-nbd for ACL match expressions.
> +
> +** Definition of data structures to represent a match expression as a
> + syntax tree.
> +
> +** Definition of data structures to represent variables (fields).
> +
> + Fields need names and prerequisites. Most fields are numeric and
> + thus need widths. We need also need a way to represent nominal
> + fields (currently just logical port names). It might be
> + appropriate to associate fields directly with OXM/NXM code points;
> + we have to decide whether we want OVN to use the OVS flow structure
> + or work with OXM more directly.
> +
> + Probably should be defined so that the data structure is also
> + useful for references to fields in action parsing.
> +
> +** Lexical analysis.
> +
> + Probably should be defined so that the lexer can be reused for
> + parsing actions.
> +
> +** Parsing into syntax tree.
> +
> +** Semantic checking against variable definitions.
> +
> +** Applying prerequisites.
> +
> +** Simplification into conjunction-of-disjunctions (CoD) form.
> +
> +** Transformation from CoD form into OXM matches.
> +
> +* ovn-controller
> +
> +** Flow table handling in ovn-controller.
> +
> + ovn-controller has to transform logical datapath flows from the
> + database into OpenFlow flows.
> +
> +*** Definition (or choice) of data structure for flows and flow table.
> +
> + It would be natural enough to use "struct flow" and "struct
> + classifier" for this. Maybe that is what we should do. However,
> + "struct classifier" is optimized for searches based on packet
> + headers, whereas all we care about here can be implemented with a
> + hash table. Also, we may want to make it easy to add and remove
> + support for fields without recompiling, which is not possible with
> + "struct flow" or "struct classifier".
> +
> + On the other hand, we may find that it is difficult to decide that
> + two OXM flow matches are identical (to normalize them) without a
> + lot of domain-specific knowledge that is already embedded in struct
> + flow. It's also going to be a pain to come up with a way to make
> + anything other than "struct flow" work with the ofputil_*()
> + functions for encoding and decoding OpenFlow.
> +
> + It's also possible we could use struct flow without struct
> + classifier.
> +
> +*** Assembling conjunctive flows from flow match expressions.
> +
> + This transformation explodes logical datapath flows into multiple
> + OpenFlow flow table entries, since a flow match expression in CoD
> + form requires several OpenFlow flow table entries. It also
> + requires merging together OpenFlow flow tables entries that contain
> + "conjunction" actions (really just concatenating their actions).
> +
> +*** Translating logical datapath port names into port numbers.
> +
> + Logical ports are specified by name in logical datapath flows, but
> + OpenFlow only works in terms of numbers.
> +
> +*** Translating logical datapath actions into OpenFlow actions.
> +
> + Some of the logical datapath actions do not have natural
> + representations as OpenFlow actions: they require
> + packet-in/packet-out round trips through ovn-controller. The
> + trickiest part of that is going to be making sure that the
> + packet-out resumes the control flow that was broken off by the
> + packet-in. That's tricky; we'll probably have to restrict control
> + flow or add OVS features to make resuming in general possible. Not
> + sure which is better at this point.
> +
> +*** OpenFlow flow table synchronization.
> +
> + The internal representation of the OpenFlow flow table has to be
> + synced across the controller connection to OVS. This probably
> + boils down to the "flow monitoring" feature of OF1.4 which was then
> + made available as a "standard extension" to OF1.3. (OVS hasn't
> + implemented this for OF1.4 yet, but the feature is based on a OVS
> + extension to OF1.0, so it should be straightforward to add it.)
> +
> + We probably need some way to catch cases where OVS and OVN don't
> + see eye-to-eye on what exactly constitutes a flow, so that OVN
> + doesn't waste a lot of CPU time hammering at OVS trying to install
> + something that it's not going to do.
> +
> +*** Logical/physical translation.
> +
> + When a packet comes into the integration bridge, the first stage of
> + processing needs to translate it from a physical to a logical
> + context. When a packet leaves the integration bridge, the final
> + stage of processing needs to translate it back into a physical
> + context. ovn-controller needs to populate the OpenFlow flows
> + tables to do these translations.
> +
> +*** Determine how to split logical pipeline across physical nodes.
> +
> + From the original OVN architecture document:
> +
> + The pipeline processing is split between the ingress and egress
> + transport nodes. In particular, the logical egress processing may
> + occur at either hypervisor. Processing the logical egress on the
> + ingress hypervisor requires more state about the egress vif's
> + policies, but reduces traffic on the wire that would eventually be
> + dropped. Whereas, processing on the egress hypervisor can reduce
> + broadcast traffic on the wire by doing local replication. We
> + initially plan to process logical egress on the egress hypervisor
> + so that less state needs to be replicated. However, we may change
> + this behavior once we gain some experience writing the logical
> + flows.
> +
> + The split pipeline processing split will influence how tunnel keys
> + are encoded.
> +
> +** Interaction with Open_vSwitch and OVN databases:
> +
> +*** Monitor VIFs attached to the integration bridge in Open_vSwitch.
> +
> + In response to changes, add or remove corresponding rows in
> + Bindings table in OVN.
> +
> +*** Populate Chassis row in OVN at startup. Maintain Chassis row over time.
> +
> + (Warn if any other Chassis claims the same IP address.)
> +
> +*** Remove Chassis and Bindings rows from OVN on exit.
> +
> +*** Monitor Chassis table in OVN.
> +
> + Populate Port records for tunnels to other chassis into
> + Open_vSwitch database. As a scale optimization later on, one can
> + populate only records for tunnels to other chassis that have
> + logical networks in common with this one.
> +
> +*** Monitor Pipeline table in OVN, trigger flow table recomputation on change.
> +
> +** ovn-controller parameters and configuration.
> +
> +*** Tunnel encapsulation to publish.
> +
> + Default: VXLAN? Geneve?
> +
> +*** Location of Open_vSwitch database.
> +
> + We can probably use the same default as ovs-vsctl.
> +
> +*** Location of OVN database.
> +
> + Probably no useful default.
> +
> +*** SSL configuration.
> +
> + Can probably get this from Open_vSwitch database.
> +
> +* ovn-nbd
> +
> +** Monitor OVN_Northbound database, trigger Pipeline recomputation on change.
> +
> +** Translate each OVN_Northbound entity into Pipeline logical datapath flows.
> +
> + We have to first sit down and figure out what the general
> + translation of each entity is. The original OVN architecture
> + description at
> + http://openvswitch.org/pipermail/dev/2015-January/050380.html had
> + some sketches of these, but they need to be completed and
> + elaborated.
> +
> + Initially, the simplest way to do this is probably to write
> + straight C code to do a full translation of the entire
> + OVN_Northbound database into the format for the Pipeline table in
> + the OVN database. As scale increases, this will probably be too
> + inefficient since a small change in OVN_Northbound requires a full
> + recomputation. At that point, we probably want to adopt a more
> + systematic approach, such as something akin to the "nlog" system
> + used in NVP (see Koponen et al. "Network Virtualization in
> + Multi-tenant Datacenters", NSDI 2014).
> +
> +** Push logical datapath flows to Pipeline table.
> +
> +** Monitor OVN database Bindings table.
> +
> + Sync rows in the OVN Bindings table to the "up" column in the
> + OVN_Northbound database.
> +
> +* ovsdb-server
> +
> + ovsdb-server should have adequate features for OVN but it probably
> + needs work for scale and possibly for availability as deployments
> + grow. Here are some thoughts.
> +
> + Andy Zhou is looking at these issues.
> +
> +** Scaling number of connections.
> +
> + In typical use today a given ovsdb-server has only a single-digit
> + number of simultaneous connections. The OVN database will have a
> + connection from every hypervisor. This use case needs testing and
> + probably coding work. Here are some possible improvements.
> +
> +*** Reducing amount of data sent to clients.
> +
> + Currently, whenever a row monitored by a client changes,
> + ovsdb-server sends the client every monitored column in the row,
> + even if only one column changes. It might be valuable to reduce
> + this only to the columns that changes.
> +
> + Also, whenever a column changes, ovsdb-server sends the entire
> + contents of the column. It might be valuable, for columns that
> + are sets or maps, to send only added or removed values or
> + key-values pairs.
> +
> + Currently, clients monitor the entire contents of a table. It
> + might make sense to allow clients to monitor only rows that
> + satisfy specific criteria, e.g. to allow an ovn-controller to
> + receive only Pipeline rows for logical networks on its hypervisor.
> +
> +*** Reducing redundant data and code within ovsdb-server.
> +
> + Currently, ovsdb-server separately composes database update
> + information to send to each of its clients. This is fine for a
> + small number of clients, but it wastes time and memory when
> + hundreds of clients all want the same updates (as will be in the
> + case in OVN).
> +
> + (This is somewhat opposed to the idea of letting a client monitor
> + only some rows in a table, since that would increase the diversity
> + among clients.)
> +
> +*** Multithreading.
> +
> + If it turns out that other changes don't let ovsdb-server scale
> + adequately, we can multithread ovsdb-server. Initially one might
> + only break protocol handling into separate threads, leaving the
> + actual database work serialized through a lock.
> +
> +** Increasing availability.
> +
> + Database availability might become an issue. The OVN system
> + shouldn't grind to a halt if the database becomes unavailable, but
> + it would become impossible to bring VIFs up or down, etc.
> +
> + My current thought on how to increase availability is to add
> + clustering to ovsdb-server, probably via the Raft consensus
> + algorithm. As an experiment, I wrote an implementation of Raft
> + for Open vSwitch that you can clone from:
> +
> + https://github.com/blp/ovs-reviews.git raft
> +
> +** Reducing startup time.
> +
> + As-is, if ovsdb-server restarts, every client will fetch a fresh
> + copy of the part of the database that it cares about. With
> + hundreds of clients, this could cause heavy CPU load on
> + ovsdb-server and use excessive network bandwidth. It would be
> + better to allow incremental updates even across connection loss.
> + One way might be to use "Difference Digests" as described in
> + Epstein et al., "What's the Difference? Efficient Set
> + Reconciliation Without Prior Context". (I'm not yet aware of
> + previous non-academic use of this technique.)
> +
> +* Miscellaneous:
> +
> +** Write ovn-nbctl utility.
> +
> + The idea here is that we need a utility to act on the OVN_Northbound
> + database in a way similar to a CMS, so that we can do some testing
> + without an actual CMS in the picture.
> +
> + No details yet.
> +
> +** Init scripts for ovn-controller (on HVs), ovn-nbd, OVN DB server.
> +
> +** Distribution packaging.
> +
> +* Not yet scoped:
> +
> +** Neutron plugin.
> +
> +*** Create stackforge/networking-ovn repository based on OpenStack's
> +cookiecutter git repo generator
> +
> +*** Document mappings between Neutron data model and the OVN northbound DB
> +
> +*** Create a Neutron ML2 mechanism driver that implements the mappings
> +on Neutron resource requests
> +
> +*** Add synchronization for when we need to sanity check that the OVN
> +northbound DB reflects the current state of the world as intended by
> +Neutron (needed for various failure scenarios)
> +
> +** Gateways.
> diff --git a/ovn/automake.mk b/ovn/automake.mk
> new file mode 100644
> index 0000000..a4951dc
> --- /dev/null
> +++ b/ovn/automake.mk
> @@ -0,0 +1,77 @@
> +# OVN schema and IDL
> +EXTRA_DIST += ovn/ovn.ovsschema
> +pkgdata_DATA += ovn/ovn.ovsschema
> +
> +# OVN E-R diagram
> +#
> +# If "python" or "dot" is not available, then we do not add graphical diagram
> +# to the documentation.
> +if HAVE_PYTHON
> +if HAVE_DOT
> +ovn/ovn.gv: ovsdb/ovsdb-dot.in ovn/ovn.ovsschema
> + $(AM_V_GEN)$(OVSDB_DOT) --no-arrows $(srcdir)/ovn/ovn.ovsschema > $@
> +ovn/ovn.pic: ovn/ovn.gv ovsdb/dot2pic
> + $(AM_V_GEN)(dot -T plain < ovn/ovn.gv | $(PERL) $(srcdir)/ovsdb/dot2pic -f 3) > $@.tmp && \
> + mv $@.tmp $@
> +OVN_PIC = ovn/ovn.pic
> +OVN_DOT_DIAGRAM_ARG = --er-diagram=$(OVN_PIC)
> +DISTCLEANFILES += ovn/ovn.gv ovn/ovn.pic
> +endif
> +endif
> +
> +# OVN schema documentation
> +EXTRA_DIST += ovn/ovn.xml
> +DISTCLEANFILES += ovn/ovn.5
> +man_MANS += ovn/ovn.5
> +ovn/ovn.5: \
> + ovsdb/ovsdb-doc ovn/ovn.xml ovn/ovn.ovsschema $(OVN_PIC)
> + $(AM_V_GEN)$(OVSDB_DOC) \
> + $(OVN_DOT_DIAGRAM_ARG) \
> + --version=$(VERSION) \
> + $(srcdir)/ovn/ovn.ovsschema \
> + $(srcdir)/ovn/ovn.xml > $@.tmp && \
> + mv $@.tmp $@
> +
> +# OVN northbound schema and IDL
> +EXTRA_DIST += ovn/ovn-nb.ovsschema
> +pkgdata_DATA += ovn/ovn-nb.ovsschema
> +
> +# OVN northbound E-R diagram
> +#
> +# If "python" or "dot" is not available, then we do not add graphical diagram
> +# to the documentation.
> +if HAVE_PYTHON
> +if HAVE_DOT
> +ovn/ovn-nb.gv: ovsdb/ovsdb-dot.in ovn/ovn-nb.ovsschema
> + $(AM_V_GEN)$(OVSDB_DOT) --no-arrows $(srcdir)/ovn/ovn-nb.ovsschema > $@
> +ovn/ovn-nb.pic: ovn/ovn-nb.gv ovsdb/dot2pic
> + $(AM_V_GEN)(dot -T plain < ovn/ovn-nb.gv | $(PERL) $(srcdir)/ovsdb/dot2pic -f 3) > $@.tmp && \
> + mv $@.tmp $@
> +OVN_NB_PIC = ovn/ovn-nb.pic
> +OVN_NB_DOT_DIAGRAM_ARG = --er-diagram=$(OVN_NB_PIC)
> +DISTCLEANFILES += ovn/ovn-nb.gv ovn/ovn-nb.pic
> +endif
> +endif
> +
> +# OVN northbound schema documentation
> +EXTRA_DIST += ovn/ovn-nb.xml
> +DISTCLEANFILES += ovn/ovn-nb.5
> +man_MANS += ovn/ovn-nb.5
> +ovn/ovn-nb.5: \
> + ovsdb/ovsdb-doc ovn/ovn-nb.xml ovn/ovn-nb.ovsschema $(OVN_NB_PIC)
> + $(AM_V_GEN)$(OVSDB_DOC) \
> + $(OVN_NB_DOT_DIAGRAM_ARG) \
> + --version=$(VERSION) \
> + $(srcdir)/ovn/ovn-nb.ovsschema \
> + $(srcdir)/ovn/ovn-nb.xml > $@.tmp && \
> + mv $@.tmp $@
> +
> +man_MANS += ovn/ovn-controller.8 ovn/ovn-architecture.7
> +EXTRA_DIST += ovn/ovn-controller.8.in ovn/ovn-architecture.7.xml
> +
> +SUFFIXES += .xml
> +%: %.xml
> + $(AM_V_GEN)$(run_python) $(srcdir)/build-aux/xml2nroff \
> + --version=$(VERSION) $< > $@.tmp && mv $@.tmp $@
> +
> +EXTRA_DIST += ovn/TODO
> diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
> new file mode 100644
> index 0000000..9ffa036
> --- /dev/null
> +++ b/ovn/ovn-architecture.7.xml
> @@ -0,0 +1,339 @@
> +<?xml version="1.0" encoding="utf-8"?>
> +<manpage program="ovn-architecture" section="7" title="OVN Architecture">
> + <h1>Name</h1>
> + <p>ovn-architecture -- Open Virtual Network architecture</p>
> +
> + <h1>Description</h1>
> +
> + <p>
> + OVN, the Open Virtual Network, is a system to support virtual network
> + abstraction. OVN complements the existing capabilities of OVS to add
> + native support for virtual network abstractions, such as virtual L2 and L3
> + overlays and security groups. Services such as DHCP are also desirable
> + features. Just like OVS, OVN's design goal is to have a production-quality
> + implementation that can operate at significant scale.
> + </p>
> +
> + <p>
> + An OVN deployment consists of several components:
> + </p>
> +
> + <ul>
> + <li>
> + <p>
> + A <dfn>Cloud Management System</dfn> (<dfn>CMS</dfn>), which is
> + OVN's ultimate client (via its users and administrators). OVN
> + integration requires installing a CMS-specific plugin and
> + related software (see below). OVN initially targets OpenStack
> + as CMS.
> + </p>
> +
> + <p>
> + We generally speak of ``the'' CMS, but one can imagine scenarios in
> + which multiple CMSes manage different parts of an OVN deployment.
> + </p>
> + </li>
> +
> + <li>
> + An OVN Database physical or virtual node (or, eventually, cluster)
> + installed in a central location.
> + </li>
> +
> + <li>
> + One or more (usually many) <dfn>hypervisors</dfn>. Hypervisors must run
> + Open vSwitch and implement the interface described in
> + <code>IntegrationGuide.md</code> in the OVS source tree. Any hypervisor
> + platform supported by Open vSwitch is acceptable.
> + </li>
> +
> + <li>
> + <p>
> + Zero or more <dfn>gateways</dfn>. A gateway extends a tunnel-based
> + logical network into a physical network by bidirectionally forwarding
> + packets between tunnels and a physical Ethernet port. This allows
> + non-virtualized machines to participate in logical networks. A gateway
> + may be a physical host, a virtual machine, or an ASIC-based hardware
> + switch that supports the <code>vtep</code>(5) schema. (Support for the
> + latter will come later in OVN implementation.)
> + </p>
> +
> + <p>
> + Hypervisors and gateways are together called <dfn>transport node</dfn>
> + or <dfn>chassis</dfn>.
> + </p>
> + </li>
> + </ul>
> +
> + <p>
> + The diagram below shows how the major components of OVN and related
> + software interact. Starting at the top of the diagram, we have:
> + </p>
> +
> + <ul>
> + <li>
> + The Cloud Management System, as defined above.
> + </li>
> +
> + <li>
> + <p>
> + The <dfn>OVN/CMS Plugin</dfn> is the component of the CMS that
> + interfaces to OVN. In OpenStack, this is a Neutron plugin.
> + The plugin's main purpose is to translate the CMS's notion of logical
> + network configuration, stored in the CMS's configuration database in a
> + CMS-specific format, into an intermediate representation understood by
> + OVN.
> + </p>
> +
> + <p>
> + This component is necessarily CMS-specific, so a new plugin needs to be
> + developed for each CMS that is integrated with OVN. All of the
> + components below this one in the diagram are CMS-independent.
> + </p>
> + </li>
> +
> + <li>
> + <p>
> + The <dfn>OVN Northbound Database</dfn> receives the intermediate
> + representation of logical network configuration passed down by the
> + OVN/CMS Plugin. The database schema is meant to be ``impedance
> + matched'' with the concepts used in a CMS, so that it directly supports
> + notions of logical switches, routers, ACLs, and so on. See
> + <code>ovs-nb</code>(5) for details.
> + </p>
> +
> + <p>
> + The OVN Northbound Database has only two clients: the OVN/CMS Plugin
> + above it and <code>ovn-nbd</code> below it.
> + </p>
> + </li>
> +
> + <li>
> + <code>ovn-nbd</code>(8) connects to the OVN Northbound Database above it
> + and the OVN Database below it. It translates the logical network
> + configuration in terms of conventional network concepts, taken from the
> + OVN Northbound Database, into logical datapath flows in the OVN Database
> + below it.
> + </li>
> +
> + <li>
> + <p>
> + The <dfn>OVN Database</dfn> is the center of the system. Its clients
> + are <code>ovn-nbd</code>(8) above it and <code>ovn-controller</code>(8)
> + on every transport node below it.
> + </p>
> +
> + <p>
> + The OVN Database contains three kinds of data: <dfn>Physical
> + Network</dfn> (PN) tables that specify how to reach hypervisor and
> + other nodes, <dfn>Logical Network</dfn> (LN) tables that describe the
> + logical network in terms of ``logical datapath flows,'' and
> + <dfn>Binding</dfn> tables that link logical network components'
> + locations to the physical network. The hypervisors populate the PN and
> + Binding tables, whereas <code>ovn-nbd</code>(8) populates the LN
> + tables.
> + </p>
> +
> + <p>
> + OVN Database performance must scale with the number of transport nodes.
> + This will likely require some work on <code>ovsdb-server</code>(1) as
> + we encounter bottlenecks. Clustering for availability may be needed.
> + </p>
> + </li>
> + </ul>
> +
> + <p>
> + The remaining components are replicated onto each hypervisor:
> + </p>
> +
> + <ul>
> + <li>
> + <code>ovn-controller</code>(8) is OVN's agent on each hypervisor and
> + software gateway. Northbound, it connects to the OVN Database to learn
> + about OVN configuration and status and to populate the PN and <code>Bindings</code>
> + tables with the hypervisor's status. Southbound, it connects to
> + <code>ovs-vswitchd</code>(8) as an OpenFlow controller, for control over
> + network traffic, and to the local <code>ovsdb-server</code>(1) to allow
> + it to monitor and control Open vSwitch configuration.
> + </li>
> +
> + <li>
> + <code>ovs-vswitchd</code>(8) and <code>ovsdb-server</code>(1) are
> + conventional components of Open vSwitch.
> + </li>
> + </ul>
> +
> + <pre fixed="yes">
> + CMS
> + |
> + |
> + +-----------|-----------+
> + | | |
> + | OVN/CMS Plugin |
> + | | |
> + | | |
> + | OVN Northbound DB |
> + | | |
> + | | |
> + | ovn-nbd |
> + | | |
> + +-----------|-----------+
> + |
> + |
> + +------+
> + |OVN DB|
> + +------+
> + |
> + |
> + +------------------+------------------+
> + | | |
> + HV 1 | | HV n |
> ++---------------|---------------+ . +---------------|---------------+
> +| | | . | | |
> +| ovn-controller | . | ovn-controller |
> +| | | | . | | | |
> +| | | | | | | |
> +| ovs-vswitchd ovsdb-server | | ovs-vswitchd ovsdb-server |
> +| | | |
> ++-------------------------------+ +-------------------------------+
> + </pre>
> +
> + <h3>Life Cycle of a VIF</h3>
> +
> + <p>
> + Tables and their schemas presented in isolation are difficult to
> + understand. Here's an example.
> + </p>
> +
> + <p>
> + The steps in this example refer often to details of the OVN and OVN
> + Northbound database schemas. Please see <code>ovn</code>(5) and
> + <code>ovn-nb</code>(5), respectively, for the full story on these
> + databases.
> + </p>
> +
> + <ol>
> + <li>
> + A VIF's life cycle begins when a CMS administrator creates a new VIF
> + using the CMS user interface or API and adds it to a switch (one
> + implemented by OVN as a logical switch). The CMS updates its own
> + configuration. This includes associating unique, persistent identifier
> + <var>vif-id</var> and Ethernet address <var>mac</var> with the VIF.
> + </li>
> +
> + <li>
> + The CMS plugin updates the OVN Northbound database to include the new
> + VIF, by adding a row to the <code>Logical_Port</code> table. In the new
> + row, <code>name</code> is <var>vif-id</var>, <code>mac</code> is
> + <var>mac</var>, <code>switch</code> points to the OVN logical switch's
> + Logical_Switch record, and other columns are initialized appropriately.
> + </li>
> +
> + <li>
> + <code>ovs-nbd</code> receives the OVN Northbound database update. In
> + turn, it makes the corresponding updates to the OVN database, by adding
> + rows to the OVN database <code>Pipeline</code> table to reflect the new
> + port, e.g. add a flow to recognize that packets destined to the new
> + port's MAC address should be delivered to it, and update the flow that
> + delivers broadcast and multicast packets to include the new port.
> + </li>
> +
> + <li>
> + On every hypervisor, <code>ovn-controller</code> receives the
> + <code>Pipeline</code> table updates that <code>ovs-nbd</code> made in the
> + previous step. As long as the VM that owns the VIF is powered off,
> + <code>ovn-controller</code> cannot do much; it cannot, for example,
> + arrange to send packets to or receive packets from the VIF, because the
> + VIF does not actually exist anywhere.
> + </li>
> +
> + <li>
> + Eventually, a user powers on the VM that owns the VIF. On the hypervisor
> + where the VM is powered on, the integration between the hypervisor and
> + Open vSwitch (described in <code>IntegrationGuide.md</code>) adds the VIF
> + to the OVN integration bridge and stores <var>vif-id</var> in
> + <code>external-ids</code>:<code>iface-id</code> to indicate that the
> + interface is an instantiation of the new VIF. (None of this code is new
> + in OVN; this is pre-existing integration work that has already been done
> + on hypervisors that support OVS.)
> + </li>
> +
> + <li>
> + On the hypervisor where the VM is powered on, <code>ovn-controller</code>
> + notices <code>external-ids</code>:<code>iface-id</code> in the new
> + Interface. In response, it updates the local hypervisor's OpenFlow
> + tables so that packets to and from the VIF are properly handled.
> + Afterward, it updates the <code>Bindings</code> table in the OVN DB,
> + adding a row that links the logical port from
> + <code>external-ids</code>:<code>iface-id</code> to the hypervisor.
> + </li>
> +
> + <li>
> + Some CMS systems, including OpenStack, fully start a VM only when its
> + networking is ready. To support this, <code>ovn-nbd</code> notices the
> + new row in the <code>Bindings</code> table, and pushes this upward by
> + updating the <ref column="up" table="Logical_Port" db="OVN_NB"/> column
> + in the OVN Northbound database's <ref table="Logical_Port" db="OVN_NB"/>
> + table to indicate that the VIF is now up. The CMS, if it uses this
> + feature, can then react by allowing the VM's execution to proceed.
> + </li>
> +
> + <li>
> + On every hypervisor but the one where the VIF resides,
> + <code>ovn-controller</code> notices the new row in the
> + <code>Bindings</code> table. This provides <code>ovn-controller</code>
> + the physical location of the logical port, so each instance updates the
> + OpenFlow tables of its switch (based on logical datapath flows in the OVN
> + DB <code>Pipeline</code> table) so that packets to and from the VIF can
> + be properly handled via tunnels.
> + </li>
> +
> + <li>
> + Eventually, a user powers off the VM that owns the VIF. On the
> + hypervisor where the VM was powered on, the VIF is deleted from the OVN
> + integration bridge.
> + </li>
> +
> + <li>
> + On the hypervisor where the VM was powered on,
> + <code>ovn-controller</code> notices that the VIF was deleted. In
> + response, it removes the logical port's row from the
> + <code>Bindings</code> table.
> + </li>
> +
> + <li>
> + On every hypervisor, <code>ovn-controller</code> notices the row removed
> + from the <code>Bindings</code> table. This means that
> + <code>ovn-controller</code> no longer knows the physical location of the
> + logical port, so each instance updates its OpenFlow table to reflect
> + that.
> + </li>
> +
> + <li>
> + Eventually, when the VIF (or its entire VM) is no longer needed by
> + anyone, an administrator deletes the VIF using the CMS user interface or
> + API. The CMS updates its own configuration.
> + </li>
> +
> + <li>
> + The CMS plugin removes the VIF from the OVN Northbound database,
> + by deleting its row in the <code>Logical_Port</code> table.
> + </li>
> +
> + <li>
> + <code>ovs-nbd</code> receives the OVN Northbound update and in turn
> + updates the OVN database accordingly, by removing or updating the
> + rows from the OVN database <code>Pipeline</code> table that were related
> + to the now-destroyed VIF.
> + </li>
> +
> + <li>
> + On every hypervisor, <code>ovn-controller</code> receives the
> + <code>Pipeline</code> table updates that <code>ovs-nbd</code> made in the
> + previous step. <code>ovn-controller</code> updates OpenFlow tables to
> + reflect the update, although there may not be much to do, since the VIF
> + had already become unreachable when it was removed from the
> + <code>Bindings</code> table in a previous step.
> + </li>
> + </ol>
> +
> +</manpage>
> diff --git a/ovn/ovn-controller.8.in b/ovn/ovn-controller.8.in
> new file mode 100644
> index 0000000..59fcb59
> --- /dev/null
> +++ b/ovn/ovn-controller.8.in
> @@ -0,0 +1,41 @@
> +.\" -*- nroff -*-
> +.de IQ
> +. br
> +. ns
> +. IP "\\$1"
> +..
> +.TH ovn\-controller 8 "@VERSION@" "Open vSwitch" "Open vSwitch Manual"
> +.ds PN ovn\-controller
> +.
> +.SH NAME
> +ovn\-controller \- OVN local controller
> +.
> +.SH SYNOPSIS
> +\fBovn\-controller\fR [\fIoptions\fR]
> +.
> +.SH DESCRIPTION
> +\fBovn\-controller\fR is the local controller daemon for OVN, the Open
> +Virtual Network. It connects northbound to the OVN database (see
> +\fBovn\fR(5)) over the OVSDB protocol, and southbound to the Open
> +vSwitch database (see \fBovs-vswitchd.conf.db\fR(5)) over the OVSDB
> +protocol and to \fBovs\-vswitchd\fR(8) via OpenFlow. Each hypervisor
> +and software gateway in an OVN deployment runs its own independent
> +copy of \fBovn\-controller\fR; thus, \fBovn\-controller\fR's
> +southbound connections are machine-local and do not run over a
> +physical network.
> +.PP
> +XXX this is completely skeletal.
> +.
> +.SH OPTIONS
> +.SS "Public Key Infrastructure Options"
> +.so lib/ssl.man
> +.so lib/ssl-peer-ca-cert.man
> +.ds DD
> +.so lib/daemon.man
> +.so lib/vlog.man
> +.so lib/unixctl.man
> +.so lib/common.man
> +.
> +.SH "SEE ALSO"
> +.
> +\fBovn\-architecture\fR(7)
> diff --git a/ovn/ovn-nb.ovsschema b/ovn/ovn-nb.ovsschema
> new file mode 100644
> index 0000000..ad675ac
> --- /dev/null
> +++ b/ovn/ovn-nb.ovsschema
> @@ -0,0 +1,62 @@
> +{
> + "name": "OVN_Northbound",
> + "tables": {
> + "Logical_Switch": {
> + "columns": {
> + "router_port": {"type": {"key": {"type": "uuid",
> + "refTable": "Logical_Router_Port",
> + "refType": "strong"},
> + "min": 0, "max": 1}},
> + "external_ids": {
> + "type": {"key": "string", "value": "string",
> + "min": 0, "max": "unlimited"}}}},
> + "Logical_Port": {
> + "columns": {
> + "switch": {"type": {"key": {"type": "uuid",
> + "refTable": "Logical_Switch",
> + "refType": "strong"}}},
> + "name": {"type": "string"},
> + "macs": {"type": {"key": "string",
> + "min": 0,
> + "max": "unlimited"}},
> + "port_security": {"type": {"key": "string",
> + "min": 0,
> + "max": "unlimited"}},
> + "up": {"type": {"key": "boolean", "min": 0, "max": 1}},
> + "external_ids": {
> + "type": {"key": "string", "value": "string",
> + "min": 0, "max": "unlimited"}}},
> + "indexes": [["name"]]},
> + "ACL": {
> + "columns": {
> + "switch": {"type": {"key": {"type": "uuid",
> + "refTable": "Logical_Switch",
> + "refType": "strong"}}},
> + "priority": {"type": {"key": {"type": "integer",
> + "minInteger": 0,
> + "maxInteger": 65535}}},
> + "match": {"type": "string"},
> + "action": {"type": {"key": {"type": "string",
> + "enum": ["set", ["allow", "allow-related", "drop", "reject"]]}}},
> + "log": {"type": "boolean"},
> + "external_ids": {
> + "type": {"key": "string", "value": "string",
> + "min": 0, "max": "unlimited"}}}},
> + "Logical_Router": {
> + "columns": {
> + "ip": {"type": "string"},
> + "default_gw": {"type": {"key": "string", "min": 0, "max": 1}},
> + "external_ids": {
> + "type": {"key": "string", "value": "string",
> + "min": 0, "max": "unlimited"}}}},
> + "Logical_Router_Port": {
> + "columns": {
> + "router": {"type": {"key": {"type": "uuid",
> + "refTable": "Logical_Router",
> + "refType": "strong"}}},
> + "network": {"type": "string"},
> + "mac": {"type": "string"},
> + "external_ids": {
> + "type": {"key": "string", "value": "string",
> + "min": 0, "max": "unlimited"}}}}},
> + "version": "1.0.0"}
> diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
> new file mode 100644
> index 0000000..80190ca
> --- /dev/null
> +++ b/ovn/ovn-nb.xml
> @@ -0,0 +1,245 @@
> +<?xml version="1.0" encoding="utf-8"?>
> +<database name="ovn-nb" title="OVN Northbound Database">
> + <p>
> + This database is the interface between OVN and the cloud management system
> + (CMS), such as OpenStack, running above it. The CMS produces almost all of
> + the contents of the database. The <code>ovs-nbd</code> program monitors
> + the database contents, transforms it, and stores it into the <ref
> + db="OVN"/> database.
> + </p>
> +
> + <p>
> + We generally speak of ``the'' CMS, but one can imagine scenarios in
> + which multiple CMSes manage different parts of an OVN deployment.
> + </p>
> +
> + <h2>External IDs</h2>
> +
> + <p>
> + Each of the tables in this database contains a special column, named
> + <code>external_ids</code>. This column has the same form and purpose each
> + place it appears.
> + </p>
> +
> + <dl>
> + <dt><code>external_ids</code>: map of string-string pairs</dt>
> + <dd>
> + Key-value pairs for use by the CMS. The CMS might use certain pairs, for
> + example, to identify entities in its own configuration that correspond to
> + those in this database.
> + </dd>
> + </dl>
> +
> + <table name="Logical_Switch" title="L2 logical switch">
> + <p>
> + Each row represents one L2 logical switch. A given switch's ports are
> + the <ref table="Logical_Port"/> rows whose <ref table="Logical_Port"
> + column="switch"/> column points to its row.
> + </p>
> +
> + <column name="router_port">
> + <p>
> + The router port to which this logical switch is connected, or empty if
> + this logical switch is not connected to any router. A switch may be
> + connected to at most one logical router, but this is not a significant
> + restriction because logical routers may be connected into arbitrary
> + topologies.
> + </p>
> + </column>
> +
> + <group title="Common Columns">
> + <column name="external_ids">
> + See <em>External IDs</em> at the beginning of this document.
> + </column>
> + </group>
> + </table>
> +
> + <table name="Logical_Port" title="L2 logical switch port">
> + <p>
> + A port within an L2 logical switch.
> + </p>
> +
> + <column name="switch">
> + The logical switch to which the logical port is connected.
> + </column>
> +
> + <column name="name">
> + The logical port name. The name used here must match those used in the
> + <ref key="iface-id" table="Interface" column="external_ids"
> + db="Open_vSwitch"/> in the <ref db="Open_vSwitch"/> database's <ref
> + table="Interface" db="Open_vSwitch"/> table, because hypervisors use <ref
> + key="iface-id" table="Interface" column="external_ids"
> + db="Open_vSwitch"/> as a lookup key for logical ports.
> + </column>
> +
> + <column name="up">
> + This column is populated by <code>ovn-nbd</code>, rather than by the CMS
> + plugin as is most of this database. When a logical port is bound to a
> + physical location in the OVN database <ref db="OVN" table="Bindings"/>
> + table, <code>ovn-nbd</code> sets this column to <code>true</code>;
> + otherwise, or if the port becomes unbound later, it sets it to
> + <code>false</code>. This allows the CMS to wait for a VM's networking to
> + become active before it allows the VM to start.
> + </column>
> +
> + <column name="macs">
> + The logical port's own Ethernet address or addresses, each in the form
> + <var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>.
> + Like a physical Ethernet NIC, a logical port ordinarily has a single
> + fixed Ethernet address. The string <code>unknown</code> is also allowed
> + to indicate that the logical port has an unknown set of (additional)
> + source addresses.
> + </column>
> +
> + <column name="port_security">
> + <p>
> + A set of L2 (Ethernet) or L3 (IPv4 or IPv6) addresses or L2+L3 pairs
> + from which the logical port is allowed to send packets and to which it
> + is allowed to receive packets. If this column is empty, all addresses
> + are permitted.
> + </p>
> +
> + <p>
> + Exact syntax is TBD. One could simply use comma- or space-separated L2
> + and L3 addresses in each set member, or replace this by a subset of the
> + general-purpose expression language used for the <ref column="match"
> + table="Pipeline" db="OVN"/> column in the OVN database's <ref
> + table="Pipeline" db="OVN"/> table.
> + </p>
> + </column>
> +
> + <group title="Common Columns">
> + <column name="external_ids">
> + See <em>External IDs</em> at the beginning of this document.
> + </column>
> + </group>
> + </table>
> +
> + <table name="ACL" title="Access Control List (ACL) rule">
> + <p>
> + Each row in this table represents one ACL rule for the logical switch in
> + its <ref column="switch"/> column. The <ref column="action"/> column for
> + the highest-<ref column="priority"/> matching row in this table
> + determines a packet's treatment. If no row matches, packets are allowed
> + by default. (Default-deny treatment is possible: add a rule with <ref
> + column="priority"/> 0, <code>true</code> as <ref column="match"/>, and
> + <code>deny</code> as <ref column="action"/>.)
> + </p>
> +
> + <column name="switch">
> + The switch to which the ACL rule applies. The expression in the
> + <ref column="match"/> column may match against logical ports
> + within this switch.
> + </column>
> +
> + <column name="priority">
> + The ACL rule's priority. Rules with numerically higher priority take
> + precedence over those with lower. If two ACL rules with the same
> + priority both match, then the one actually applied to a packet is
> + undefined.
> + </column>
> +
> + <column name="match">
> + The packets that the ACL should match, in the same expression language
> + used for the <ref column="match" table="Pipeline" db="OVN"/> column in
> + the OVN database's <ref table="Pipeline" db="OVN"/> table. Match
> + <code>inport</code> and <code>outport</code> against names of logical
> + ports within <ref column="switch"/> to implement ingress and egress ACLs,
> + respectively. In logical switches connected to logical routers, the
> + special port name <code>ROUTER</code> refers to the logical router port.
> + </column>
> +
> + <column name="action">
> + <p>The action to take when the ACL rule matches:</p>
> +
> + <ul>
> + <li>
> + <code>allow</code>: Forward the packet.
> + </li>
> +
> + <li>
> + <code>allow-related</code>: Forward the packet and related traffic
> + (e.g. inbound replies to an outbound connection).
> + </li>
> +
> + <li>
> + <code>drop</code>: Silently drop the packet.
> + </li>
> +
> + <li>
> + <code>reject</code>: Drop the packet, replying with a RST for TCP or
> + ICMP unreachable message for other IP-based protocols.
> + </li>
> + </ul>
> + </column>
> +
> + <column name="log">
> + If set to <code>true</code>, packets that match the ACL will trigger a
> + log message on the transport node or nodes that perform ACL processing.
> + Logging may be combined with any <ref column="action"/>.
> + </column>
> +
> + <group title="Common Columns">
> + <column name="external_ids">
> + See <em>External IDs</em> at the beginning of this document.
> + </column>
> + </group>
> + </table>
> +
> + <table name="Logical_Router" title="L3 logical router">
> + <p>
> + Each row represents one L3 logical router. A given router's ports are
> + the <ref table="Logical_Router_Port"/> rows whose <ref
> + table="Logical_Router_Port" column="router"/> column points to its row.
> + </p>
> +
> + <column name="ip">
> + The logical router's own IP address. The logical router uses this
> + address for ICMP replies (e.g. network unreachable messages) and other
> + traffic that it originates and responds to traffic destined to this
> + address (e.g. ICMP echo requests).
> + </column>
> +
> + <column name="default_gw">
> + IP address to use as default gateway, if any.
> + </column>
> +
> + <group title="Common Columns">
> + <column name="external_ids">
> + See <em>External IDs</em> at the beginning of this document.
> + </column>
> + </group>
> + </table>
> +
> + <table name="Logical_Router_Port" title="L3 logical router port">
> + <p>
> + A port within an L3 logical router.
> + </p>
> +
> + <p>
> + A router port is always attached to a switch port. The connection can be
> + identified by following the <ref column="router_port"
> + table="Logical_Port"/> column from an appropriate <ref
> + table="Logical_Port"/> row.
> + </p>
> +
> + <column name="router">
> + The router to which the port belongs.
> + </column>
> +
> + <column name="network">
> + The IP network and netmask of the network on the router port. Used for
> + routing.
> + </column>
> +
> + <column name="mac">
> + The Ethernet address that belongs to this router port.
> + </column>
> +
> + <group title="Common Columns">
> + <column name="external_ids">
> + See <em>External IDs</em> at the beginning of this document.
> + </column>
> + </group>
> + </table>
> +</database>
> diff --git a/ovn/ovn.ovsschema b/ovn/ovn.ovsschema
> new file mode 100644
> index 0000000..5597df4
> --- /dev/null
> +++ b/ovn/ovn.ovsschema
> @@ -0,0 +1,50 @@
> +{
> + "name": "OVN",
> + "tables": {
> + "Chassis": {
> + "columns": {
> + "name": {"type": "string"},
> + "encap": {"type": {"key": {"type": "string",
> + "enum": ["set", ["stt", "vxlan", "gre"]]}}},
> + "encap_options": {"type": {"key": "string",
> + "value": "string",
> + "min": 0,
> + "max": "unlimited"}},
> + "ip": {"type": "string"},
> + "gateway_ports": {"type": {"key": "string",
> + "value": {"type": "uuid",
> + "refTable": "Gateway",
> + "refType": "strong"},
> + "min": 0,
> + "max": "unlimited"}}},
> + "isRoot": true,
> + "indexes": [["name"]]},
> + "Gateway": {
> + "columns": {"attached_port": {"type": "string"},
> + "vlan_map": {"type": {"key": {"type": "integer",
> + "minInteger": 0,
> + "maxInteger": 4095},
> + "value": {"type": "string"},
> + "min": 0,
> + "max": "unlimited"}}}},
> + "Pipeline": {
> + "columns": {
> + "table_id": {"type": {"key": {"type": "integer",
> + "minInteger": 0,
> + "maxInteger": 127}}},
> + "priority": {"type": {"key": {"type": "integer",
> + "minInteger": 0,
> + "maxInteger": 65535}}},
> + "match": {"type": "string"},
> + "actions": {"type": "string"}},
> + "isRoot": true},
> + "Bindings": {
> + "columns": {
> + "logical_port": {"type": "string"},
> + "chassis": {"type": "string"},
> + "mac": {"type": {"key": "string",
> + "min": 0,
> + "max": "unlimited"}}},
> + "indexes": [["logical_port"]],
> + "isRoot": true}},
> + "version": "1.0.0"}
> diff --git a/ovn/ovn.xml b/ovn/ovn.xml
> new file mode 100644
> index 0000000..a233112
> --- /dev/null
> +++ b/ovn/ovn.xml
> @@ -0,0 +1,497 @@
> +<?xml version="1.0" encoding="utf-8"?>
> +<database name="ovn" title="OVN Database">
> + <p>
> + This database holds logical and physical configuration and state for the
> + Open Virtual Network (OVN) system to support virtual network abstraction.
> + For an introduction to OVN, please see <code>ovn-architecture</code>(7).
> + </p>
> +
> + <p>
> + The OVN database sits at the center of the OVN architecture. It is the one
> + component that speaks both southbound directly to all the hypervisors and
> + gateways, via <code>ovn-controller</code>, and northbound to the Cloud
> + Management System, via <code>ovn-nbd</code>:
> + </p>
> +
> + <h2>Database Structure</h2>
> +
> + <p>
> + The OVN database contains three classes of data with different properties,
> + as described in the sections below.
> + </p>
> +
> + <h3>Physical Network (PN) data</h3>
> +
> + <p>
> + PN tables contain information about the chassis nodes in the system. This
> + contains all the information necessary to wire the overlay, such as IP
> + addresses, supported tunnel types, and security keys.
> + </p>
> +
> + <p>
> + The amount of PN data is small (O(n) in the number of chassis) and it
> + changes infrequently, so it can be replicated to every chassis.
> + </p>
> +
> + <p>
> + The <ref table="Chassis"/> and <ref table="Gateway"/> tables comprise the
> + PN tables.
> + </p>
> +
> + <h3>Logical Network (LN) data</h3>
> +
> + <p>
> + LN tables contain the topology of logical switches and routers, ACLs,
> + firewall rules, and everything needed to describe how packets traverse a
> + logical network, represented as logical datapath flows (see Logical
> + Datapath Flows, below).
> + </p>
> +
> + <p>
> + LN data may be large (O(n) in the number of logical ports, ACL rules,
> + etc.). Thus, to improve scaling, each chassis should receive only data
> + related to logical networks in which that chassis participates. Past
> + experience shows that in the presence of large logical networks, even
> + finer-grained partitioning of data, e.g. designing logical flows so that
> + only the chassis hosting a logical port needs related flows, pays off
> + scale-wise. (This is not necessary initially but it is worth bearing in
> + mind in the design.)
> + </p>
> +
> + <p>
> + The LN is a slave of the cloud management system running northbound of OVN.
> + That CMS determines the entire OVN logical configuration and therefore the
> + LN's content at any given time is a deterministic function of the CMS's
> + configuration, although that happens indirectly via the OVN Northbound DB
> + and <code>ovn-nvd</code>.
> + </p>
> +
> + <p>
> + LN data is likely to change more quickly than PN data. This is especially
> + true in a container environment where VMs are created and destroyed (and
> + therefore added to and deleted from logical switches) quickly.
> + </p>
> +
> + <p>
> + The <ref table="Pipeline"/> table is currently the only LN table.
> + </p>
> +
> + <h3>Bindings data</h3>
> +
> + <p>
> + The Bindings tables contain the current placement of logical components
> + (such as VMs and VIFs) onto chassis and the bindings between logical ports
> + and MACs.
> + </p>
> +
> + <p>
> + Bindings change frequently, at least every time a VM powers up or down
> + or migrates, and especially quickly in a container environment. The
> + amount of data per VM (or VIF) is small.
> + </p>
> +
> + <p>
> + Each chassis is authoritative about the VMs and VIFs that it hosts at any
> + given time and can efficiently flood that state to a central location, so
> + the consistency needs are minimal.
> + </p>
> +
> + <p>
> + The <ref table="Bindings"/> table is currently the only Bindings table.
> + </p>
> +
> + <table name="Chassis" title="Physical Network Hypervisor and Gateway Information">
> + <p>
> + Each row in this table represents a hypervisor or gateway (a chassis) in
> + the physical network (PN). Each chassis, via
> + <code>ovn-controller</code>, adds and updates its own row, and keeps a
> + copy of the remaining rows to determine how to reach other hypervisors.
> + </p>
> +
> + <p>
> + When a chassis shuts down gracefully, it should remove its own row.
> + (This is not critical because resources hosted on the chassis are equally
> + unreachable regardless of whether the row is present.) If a chassis
> + shuts down permanently without removing its row, some kind of manual or
> + automatic cleanup is eventually needed; we can devise a process for that
> + as necessary.
> + </p>
> +
> + <column name="name">
> + A chassis name, taken from <ref key="system-id" table="Open_vSwitch"
> + column="external_ids" db="Open_vSwitch"/> in the Open_vSwitch
> + database's <ref table="Open_vSwitch" db="Open_vSwitch"/> table. OVN does
> + not prescribe a particular format for chassis names.
> + </column>
> +
> + <group title="Encapsulation">
> + <p>
> + These columns together identify how OVN may transmit logical dataplane
> + packets to this chassis.
> + </p>
> +
> + <column name="encap">
> + The encapsulation to use to transmit packets to this chassis.
> + </column>
> +
> + <column name="encap_options">
> + Options for configuring the encapsulation, e.g. IPsec parameters when
> + IPsec support is introduced. No options are currently defined.
> + </column>
> +
> + <column name="ip">
> + The IPv4 address of the encapsulation tunnel endpoint.
> + </column>
> + </group>
> +
> + <group title="Gateway Configuration">
> + <p>
> + A <dfn>gateway</dfn> is a chassis that forwards traffic between a
> + logical network and a physical VLAN. Gateways are typically dedicated
> + nodes that do not host VMs.
> + </p>
> +
> + <column name="gateway_ports">
> + Maps from the name of a gateway port, which is typically a physical
> + port (e.g. <code>eth1</code>) or an Open vSwitch patch port, to a <ref
> + table="Gateway"/> record that describes the details of the gatewaying
> + function.
> + </column>
> + </group>
> + </table>
> +
> + <table name="Gateway" title="Physical Network Gateway Ports">
> + <p>
> + The <ref column="gateway_ports" table="Chassis"/> column in the <ref
> + table="Chassis"/> table refers to rows in this table to connect a chassis
> + port to a gateway function. Each row in this table describes the logical
> + networks to which a gateway port is attached. Each chassis, via
> + <code>ovn-controller</code>(8), adds and updates its own rows, if any
> + (since most chassis are not gateways), and keeps a copy of the remaining
> + rows to determine how to reach other chassis.
> + </p>
> +
> + <column name="vlan_map">
> + Maps from a VLAN ID to a logical port name. Thus, each named logical
> + port corresponds to one VLAN on the gateway port.
> + </column>
> +
> + <column name="attached_port">
> + The name of the gateway port in the chassis's Open vSwitch integration
> + bridge.
> + </column>
> + </table>
> +
> + <table name="Pipeline" title="Logical Network Pipeline">
> + <p>
> + Each row in this table represents one logical flow. The cloud management
> + system, via its OVN integration, populates this table with logical flows
> + that implement the L2 and L3 topology specified in the CMS configuration.
> + Each hypervisor, via <code>ovn-controller</code>, translates the logical
> + flows into OpenFlow flows specific to its hypervisor and installs them
> + into Open vSwitch.
> + </p>
> +
> + <p>
> + Logical flows are expressed in an OVN-specific format, described here. A
> + logical datapath flow is much like an OpenFlow flow, except that the
> + flows are written in terms of logical ports and logical datapaths instead
> + of physical ports and physical datapaths. Translation between logical
> + and physical flows helps to ensure isolation between logical datapaths.
> + (The logical flow abstraction also allows the CMS to do less work, since
> + it does not have to separately compute and push out physical physical
> + flows to each chassis.)
> + </p>
> +
> + <p>
> + The default action when no flow matches is to drop packets.
> + </p>
> +
> + <column name="table_id">
> + The stage in the logical pipeline, analogous to an OpenFlow table number.
> + </column>
> +
> + <column name="priority">
> + The flow's priority. Flows with numerically higher priority take
> + precedence over those with lower. If two logical datapath flows with the
> + same priority both match, then the one actually applied to the packet is
> + undefined.
> + </column>
> +
> + <column name="match">
> + <p>
> + A matching expression. OVN provides a superset of OpenFlow matching
> + capabilities, using a syntax similar to Boolean expressions in a
> + programming language.
> + </p>
> +
> + <p>
> + Matching expressions have two important kinds of primary expression:
> + <dfn>fields</dfn> and <dfn>constants</dfn>. A field names a piece of
> + data or metadata. The supported fields are:
> + </p>
> +
> + <ul>
> + <li>
> + <code>metadata</code> <code>reg0</code> ... <code>reg7</code>
> + <code>xreg0</code> ... <code>xreg3</code>
> + </li>
> + <li><code>inport</code> <code>outport</code> <code>queue</code></li>
> + <li><code>eth.src</code> <code>eth.dst</code> <code>eth.type</code></li>
> + <li><code>vlan.tci</code> <code>vlan.vid</code> <code>vlan.pcp</code> <code>vlan.present</code></li>
> + <li><code>ip.proto</code> <code>ip.dscp</code> <code>ip.ecn</code> <code>ip.ttl</code> <code>ip.frag</code></li>
> + <li><code>ip4.src</code> <code>ip4.dst</code></li>
> + <li><code>ip6.src</code> <code>ip6.dst</code> <code>ip6.label</code></li>
> + <li><code>arp.op</code> <code>arp.spa</code> <code>arp.tpa</code> <code>arp.sha</code> <code>arp.tha</code></li>
> + <li><code>tcp.src</code> <code>tcp.dst</code> <code>tcp.flags</code></li>
> + <li><code>udp.src</code> <code>udp.dst</code></li>
> + <li><code>sctp.src</code> <code>sctp.dst</code></li>
> + <li><code>icmp4.type</code> <code>icmp4.code</code></li>
> + <li><code>icmp6.type</code> <code>icmp6.code</code></li>
> + <li><code>nd.target</code> <code>nd.sll</code> <code>nd.tll</code></li>
> + </ul>
> +
> + <p>
> + Subfields may be addressed using a <code>[]</code> suffix,
> + e.g. <code>tcp.src[0..7]</code> refers to the low 8 bits of the TCP
> + source port. A subfield may be used in any context a field is allowed.
> + </p>
> +
> + <p>
> + Some fields have prerequisites. OVN implicitly adds clauses to satisfy
> + these. For example, <code>arp.op == 1</code> is equivalent to
> + <code>eth.type == 0x0806 && arp.op == 1</code>, and
> + <code>tcp.src == 80</code> is equivalent to <code>(eth.type == 0x0800
> + || eth.type == 0x86dd) && ip.proto == 6 && tcp.src ==
> + 80</code>.
> + </p>
> +
> + <p>
> + Most fields have integer values. Integer constants may be expressed in
> + several forms: decimal integers, hexadecimal integers prefixed by
> + <code>0x</code>, dotted-quad IPv4 addresses, IPv6 addresses in their
> + standard forms, and as Ethernet addresses as colon-separated hex
> + digits. A constant in any of these forms may be followed by a slash
> + and a second constant (the mask) in the same form, to form a masked
> + constant. IPv4 and IPv6 masks may be given as integers, to express
> + CIDR prefixes.
> + </p>
> +
> + <p>
> + The <code>inport</code> and <code>outport</code> fields have string
> + values. The useful values are <ref column="logical_port"/> names from
> + the <ref column="Bindings"/> and <ref column="Gateway"/> table.
> + </p>
> +
> + <p>
> + The available operators, from highest to lowest precedence, are:
> + </p>
> +
> + <ul>
> + <li><code>()</code></li>
> + <li><code>== != < <= > >= in not in</code></li>
> + <li><code>!</code></li>
> + <li><code>&&</code></li>
> + <li><code>||</code></li>
> + </ul>
> +
> + <p>
> + The <code>()</code> operator is used for grouping.
> + </p>
> +
> + <p>
> + The equality operator <code>==</code> is the most important operator.
> + Its operands must be a field and an optionally masked constant, in
> + either order. The <code>==</code> operator yields true when the
> + field's value equals the constant's value for all the bits included in
> + the mask. The <code>==</code> operator translates simply and naturally
> + to OpenFlow.
> + </p>
> +
> + <p>
> + The inequality operator <code>!=</code> yields the inverse of
> + <code>==</code> but its syntax and use are the same. Implementation of
> + the inequality operator is expensive.
> + </p>
> +
> + <p>
> + The relational operators are <, <=, >, and >=. Their
> + operands must be a field and a constant, in either order; the constant
> + must not be masked. These operators are most commonly useful for L4
> + ports, e.g. <code>tcp.src < 1024</code>. Implementation of the
> + relational operators is expensive.
> + </p>
> +
> + <p>
> + The set membership operator <code>in</code>, with syntax
> + ``<code><var>field</var> in { <var>constant1</var>,
> + <var>constant2</var>,</code> ... <code>}</code>'', is syntactic sugar
> + for ``<code>(<var>field</var> == <var>constant1</var> ||
> + <var>field</var> == <var>constant2</var> || </code>...<code>)</code>.
> + Conversely, ``<code><var>field</var> not in { <var>constant1</var>,
> + <var>constant2</var>, </code>...<code> }</code>'' is syntactic sugar
> + for ``<code>(<var>field</var> != <var>constant1</var> &&
> + <var>field</var> != <var>constant2</var> &&
> + </code>...<code>)</code>''.
> + </p>
> +
> + <p>
> + The unary prefix operator <code>!</code> yields its operand's inverse.
> + </p>
> +
> + <p>
> + The logical AND operator <code>&&</code> yields true only if
> + both of its operands are true.
> + </p>
> +
> + <p>
> + The logical OR operator <code>||</code> yields true if at least one of
> + its operands is true.
> + </p>
> +
> + <p>
> + Finally, the keywords <code>true</code> and <code>false</code> may also
> + be used in matching expressions. <code>true</code> is useful by itself
> + as a catch-all expression that matches every packet.
> + </p>
> +
> + <p>
> + (The above is pretty ambitious. It probably makes sense to initially
> + implement only a subset of this specification. The full specification
> + is written out mainly to get an idea of what a fully general matching
> + expression language could include.)
> + </p>
> + </column>
> +
> + <column name="actions">
> + <p>
> + Below, a <var>value</var> is either a <var>constant</var> or a
> + <var>field</var>. The following actions seem most likely to be useful:
> + </p>
> +
> + <dl>
> + <dt><code>drop;</code></dt>
> + <dd>syntactic sugar for no actions</dd>
> +
> + <dt><code>output(<var>value</var>);</code></dt>
> + <dd>output to port</dd>
> +
> + <dt><code>broadcast;</code></dt>
> + <dd>output to every logical port except ingress port</dd>
> +
> + <dt><code>resubmit;</code></dt>
> + <dd>execute next logical datapath table as subroutine</dd>
> +
> + <dt><code>set(<var>field</var>=<var>value</var>);</code></dt>
> + <dd>set data or metadata field, or copy between fields</dd>
> + </dl>
> +
> + <p>
> + Following are not well thought out:
> + </p>
> +
> + <dl>
> + <dt><code>learn</code></dt>
> +
> + <dt><code>conntrack</code></dt>
> +
> + <dt><code>with(<var>field</var>=<var>value</var>) { <var>action</var>, </code>...<code> }</code></dt>
> + <dd>execute <var>actions</var> with temporary changes to <var>fields</var></dd>
> +
> + <dt><code>dec_ttl { <var>action</var>, </code>...<code> } { <var>action</var>; </code>...<code>}</code></dt>
> + <dd>
> + decrement TTL; execute first set of actions if
> + successful, second set if TTL decrement fails
> + </dd>
> +
> + <dt><code>icmp_reply { <var>action</var>, </code>...<code> }</code></dt>
> + <dd>generate ICMP reply from packet, execute <var>action</var>s</dd>
> +
> + <dt><code>arp { <var>action</var>, </code>...<code> }</code></dt>
> + <dd>generate ARP from packet, execute <var>action</var>s</dd>
> + </dl>
> +
> + <p>
> + Other actions can be added as needed
> + (e.g. <code>push_vlan</code>, <code>pop_vlan</code>,
> + <code>push_mpls</code>, <code>pop_mpls</code>).
> + </p>
> +
> + <p>
> + Some of the OVN actions do not map directly to OpenFlow actions, e.g.:
> + </p>
> +
> + <ul>
> + <li>
> + <code>with</code>: Implemented as <code>stack_push;
> + set(</code>...<code>); <var>actions</var>; stack_pop</code>.
> + </li>
> +
> + <li>
> + <code>dec_ttl</code>: Implemented as <code>dec_ttl</code> followed
> + by the successful actions. The failure case has to be implemented by
> + ovn-controller interpreting packet-ins. It might be difficult to
> + identify the particular place in the processing pipeline in
> + <code>ovn-controller</code>; maybe some restrictions will be
> + necessary.
> + </li>
> +
> + <li>
> + <code>icmp_reply</code>: Implemented by sending the packet to
> + <code>ovn-controller</code>, which generates the ICMP reply and sends
> + the packet back to <code>ovs-vswitchd</code>.
> + </li>
> + </ul>
> + </column>
> + </table>
> +
> + <table name="Bindings" title="Physical-Logical Bindings">
> + <p>
> + Each row in this table identifies the physical location of a logical
> + port. Each hypervisor, via <code>ovn-controller</code>, populates this
> + table with rows for the logical ports that are located on its hypervisor,
> + which <code>ovn-controller</code> in turn finds out by monitoring the
> + local hypervisor's Open_vSwitch database, which identifies logical ports
> + via the conventions described in <code>IntegrationGuide.md</code>.
> + </p>
> +
> + <p>
> + When a chassis shuts down gracefully, it should remove its bindings.
> + (This is not critical because resources hosted on the chassis are equally
> + unreachable regardless of whether their rows are present.) To handle the
> + case where a VM is shut down abruptly on one chassis, then brought up
> + again on a different one, <code>ovn-controller</code> must delete any
> + existing <ref table="Binding"/> record for a logical port when it adds a
> + new one.
> + </p>
> +
> + <column name="logical_port">
> + A logical port, taken from <ref key="iface-id" table="Interface"
> + column="external_ids" db="Open_vSwitch"/> in the Open_vSwitch database's
> + <ref table="Interface" db="Open_vSwitch"/> table. OVN does not prescribe
> + a particular format for the logical port ID.
> + </column>
> +
> + <column name="chassis">
> + The physical location of the logical port. To successfully identify a
> + chassis, this column must match the <ref table="Chassis" column="name"/>
> + column in some row in the <ref table="Chassis"/> table.
> + </column>
> +
> + <column name="mac">
> + <p>
> + The Ethernet address or addresses used as a source address on the
> + logical port, each in the form
> + <var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>:<var>xx</var>.
> + The string <code>unknown</code> is also allowed to indicate that the
> + logical port has an unknown set of (additional) source addresses.
> + </p>
> +
> + <p>
> + A VM interface would ordinarily have a single Ethernet address. A
> + gateway port might initially only have <code>unknown</code>, and then
> + add MAC addresses to the set as it learns new source addresses.
> + </p>
> + </column>
> + </table>
> +</database>
> --
> 2.1.3
>
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
More information about the dev
mailing list