[ovs-dev] [PATCH v4 07/14] Implement serializing the state of packet traversal in "continuations".

Jarno Rajahalme jarno at ovn.org
Fri Feb 19 23:43:35 UTC 2016


With small comments below:

Acked-by: Jarno Rajahalme <jarno at ovn.org>

> On Feb 19, 2016, at 12:34 AM, Ben Pfaff <blp at ovn.org> wrote:
> 
> One purpose of OpenFlow packet-in messages is to allow a controller to
> interpose on the path of a packet through the flow tables.  If, for
> example, the controller needs to modify a packet in some way that the
> switch doesn't directly support, the controller should be able to
> program the switch to send it the packet, then modify the packet and
> send it back to the switch to continue through the flow table.
> 
> That's the theory.  In practice, this doesn't work with any but the
> simplest flow tables.  Packet-in messages simply don't include enough
> context to allow the flow table traversal to continue.  For example:
> 
>    * Via "resubmit" actions, an Open vSwitch packet can have an
>      effective "call stack", but a packet-in can't describe it, and
>      so it would be lost.
> 
>    * Via "patch ports", an Open vSwitch packet can traverse multiple
>      OpenFlow logical switches.  A packet-in can't describe or resume
>      this context.
> 

Is there any context regarding this that needs to be described?

>    * A packet-in can't preserve the stack used by NXAST_PUSH and
>      NXAST_POP actions.
> 
>    * A packet-in can't preserve the OpenFlow 1.1+ action set.
> 
>    * A packet-in can't preserve the state of Open vSwitch mirroring
>      or connection tracking.
> 
> This commit introduces a solution called "continuations".  A continuation
> is the state of a packet's traversal through OpenFlow flow tables.  A
> "controller" action with the "pause" flag, which is newly implemented in
> this comit, generates a continuation and sends it to the OpenFlow

“commit"

> controller in a packet-in asynchronous message (only NXT_PACKET_IN2
> supports continuations, so the controller must configure them with
> NXT_SET_PACKET_IN_FORMAT).  The controller processes the packet-in,
> possibly modifying some of its data, and sends it back to the switch with
> an NXT_RESUME request, which causes flow table traversal to continue.  In
> principle, a single packet can be paused and resumed multiple times.
> 
> Another way to look at it is:
> 
>    - "pause" is an extension of the existing OFPAT_CONTROLLER
>      action.  It sends the packet to the controller, with full
>      pipeline context (some of which is switch implementation
>      dependent, and may thus vary from switch to switch).
> 
>    - A continuation is an extension of OFPT_PACKET_IN, allowing for
>      implementation dependent metadata.
> 
>    - NXT_RESUME is an extension of OFPT_PACKET_OUT, with the
>      semantics that the pipeline processing is continued with the
>      original translation context from where it was left at the time
>      it was paused.
> 
> Signed-off-by: Ben Pfaff <blp at ovn.org>
> Acked-by: Jarno Rajahalme <jarno at ovn.org>
> ---
> NEWS                          |   5 +-
> include/openflow/nicira-ext.h |  96 ++++++++-
> lib/learning-switch.c         |   3 +-
> lib/meta-flow.c               |   9 +-
> lib/meta-flow.h               |   3 +-
> lib/ofp-actions.c             |  28 ++-
> lib/ofp-actions.h             |   5 +
> lib/ofp-errors.h              |  16 +-
> lib/ofp-msgs.h                |   4 +
> lib/ofp-print.c               |  78 +++++--
> lib/ofp-util.c                | 470 ++++++++++++++++++++++++++++++++++++------
> lib/ofp-util.h                |  57 ++++-
> lib/rconn.c                   |   3 +-
> ofproto/connmgr.c             |  23 ++-
> ofproto/connmgr.h             |   2 +-
> ofproto/fail-open.c           |  16 +-
> ofproto/ofproto-dpif-xlate.c  | 199 ++++++++++++++----
> ofproto/ofproto-dpif-xlate.h  |   4 +
> ofproto/ofproto-dpif.c        |  34 +++
> ofproto/ofproto-provider.h    |   3 +
> ofproto/ofproto.c             |  24 +++
> ovn/TODO                      |  57 -----
> ovn/controller/pinctrl.c      |   4 +-
> tests/ofp-actions.at          |  13 +-
> tests/ofp-print.at            |  12 ++
> tests/ofproto-dpif.at         | 172 ++++++++++++++++
> tests/ofproto-macros.at       |  35 +++-
> utilities/ovs-ofctl.8.in      |  11 +-
> utilities/ovs-ofctl.c         | 109 +++++++---
> 29 files changed, 1239 insertions(+), 256 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index 9ab6cae..ba4b7f7 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -6,7 +6,10 @@ Post-v2.5.0
>      * OpenFlow 1.1+ OFPT_QUEUE_GET_CONFIG_REQUEST now supports OFPP_ANY.
>      * OpenFlow 1.4+ OFPMP_QUEUE_DESC is now supported.
>      * New property-based packet-in message format NXT_PACKET_IN2 with support
> -       for arbitrary user-provided data.
> +       for arbitrary user-provided data and for serializing flow table
> +       traversal into a continuation for later resumption.
> +     * New extension message NXT_SET_ASYNC_CONFIG2 to allow OpenFlow 1.4-like
> +       control over asynchronous messages in earlier versions of OpenFlow.
>    - ovs-ofctl:
>      * queue-get-config command now allows a queue ID to be specified.
>    - DPDK:
> diff --git a/include/openflow/nicira-ext.h b/include/openflow/nicira-ext.h
> index 7e56066..77a735d 100644
> --- a/include/openflow/nicira-ext.h
> +++ b/include/openflow/nicira-ext.h
> @@ -260,12 +260,103 @@ struct nx_packet_in {
> };
> OFP_ASSERT(sizeof(struct nx_packet_in) == 24);
> 
> -/* NXT_PACKET_IN2.
> +/* NXT_PACKET_IN2
> + * ==============
>  *
>  * NXT_PACKET_IN2 is conceptually similar to OFPT_PACKET_IN but it is expressed
>  * as an extensible set of properties instead of using a fixed structure.
>  *
> - * Added in Open vSwitch 2.6. */
> + * Added in Open vSwitch 2.6
> + *
> + *
> + * Continuations
> + * -------------
> + *
> + * When a "controller" action specifies the "pause" flag, the controller action
> + * freezes the packet's trip through Open vSwitch flow tables and serializes
> + * that state into the packet-in message as a "continuation".  The controller
> + * can later send the continuation back to the switch, which will restart the
> + * packet's traversal from the point where it was interrupted.  This permits an
> + * OpenFlow controller to interpose on a packet midway through processing in
> + * Open vSwitch.
> + *
> + * Continuations fit into packet processing this way:
> + *
> + * 1. A packet ingresses into Open vSwitch, which runs it through the OpenFlow
> + *    tables.
> + *
> + * 2. An OpenFlow flow executes a "controller" action that includes the "pause"
> + *    flag.  Open vSwitch serializes the packet processing state and sends it,
> + *    as an NXT_PACKET_IN2 that includes an additional NXPINT_CONTINUATION
> + *    property (the continuation), to the OpenFlow controller.
> + *
> + *    (The controller must use NXAST_CONTROLLER2 to generate the packet-in,
> + *    because only this form of the "controller" action has a "pause" flag.
> + *    Similarly, the controller must use NXT_SET_PACKET_IN_FORMAT to select
> + *    NXT_PACKET_IN2 as the packet-in format, because this is the only format
> + *    that supports continuation passing.)
> + *
> + * 3. The controller receives the NXT_PACKET_IN2 and processes it.  The
> + *    controller can interpret and, if desired, modify some of the contents of
> + *    the packet-in, such as the packet and the metadata being processed.
> + *
> + * 4. The controller sends the continuation back to the switch, using an
> + *    NXT_RESUME message.  Packet processing resumes where it left off.
> + *
> + * The controller might change the pipeline configuration concurrently with
> + * steps 2 through 4.  For example, it might add or remove OpenFlow flows.  If
> + * that happens, then the packet will experience a mix of processing from the
> + * two configurations, that is, the initial processing (before
> + * NXAST_CONTROLLER2) uses the initial flow table, and the later processing
> + * (after NXT_RESUME) uses the later flow table.


Maybe it should be noted here that if the layout of data that is pushed/popped to/from the stack changes then the continuation of the packet processing might have unpredictable behavior. But maybe this is true for pipeline “shape” changes in general.

> + *
> + * External side effects (e.g. "output") of OpenFlow actions processed before
> + * NXAST_CONTROLLER2 is encountered might be executed during step 2 or step 4,
> + * and the details may vary among Open vSwitch features and versions.  Thus, a
> + * controller that wants to make sure that side effects are executed must pass
> + * the continuation back to the switch, that is, must not skip step 4.
> + *
> + * Architecturally, continuations may be "stateful" or "stateless", that is,
> + * they may or may not refer to buffered state maintained in Open vSwitch.
> + * This means that a controller should not attempt to resume a given
> + * continuations more than once (because the switch might have discarded the
> + * buffered state after the first use).  For the same reason, continuations
> + * might become "stale" if the controller takes too long to resume them
> + * (because the switch might have discarded old buffered state).  Taken
> + * together with the previous note, this means that a controller should resume
> + * each continuation exactly once (and promptly).
> + *
> + * Without the information in NXPINT_CONTINUATION, the controller can (with
> + * careful design, and help from the flow cookie) determine where the packet is
> + * in the pipeline, but in the general case it can't determine what nested
> + * "resubmit"s that may be in progress, or what data is on the stack maintained
> + * by NXAST_STACK_PUSH and NXAST_STACK_POP actions, what is in the OpenFlow
> + * action set, etc.
> + *
> + * Continuations are expensive because they require a round trip between the
> + * switch and the controller.  Thus, they should not be used to implement
> + * processing that needs to happen at "line rate".
> + *
> + * The contents of NXPINT_CONTINUATION are private to the switch, may change
> + * unpredictably from one version of Open vSwitch to another, and are not
> + * documented here.  The contents are also tied to a given Open vSwitch process
> + * and bridge, so that restarting Open vSwitch or deleting and recreating a
> + * bridge will cause the corresponding NXT_RESUME to be rejected.
> + *
> + * In the current implementation, Open vSwitch forks the packet processing
> + * pipeline across patch ports.  Suppose, for example, that the pipeline for
> + * br0 outputs to a patch port whose peer belongs to br1, and that the pipeline
> + * for br1 executes a controller action with the "pause" flag.  This only
> + * pauses processing within br1, and processing in br0 continues and possibly
> + * completes with visible side effects, such as outputting to ports, before
> + * br1's controller receives or processes the continuation.  This
> + * implementation maintains the independence of separate bridges and, since
> + * processing in br1 cannot affect the behavior of br0 anyway, should not cause
> + * visible behavioral changes.
> + *
> + * A packet-in that includes a continuation always includes the entire packet
> + * and is never buffered.

Does this need to be the case? Does not not contradict the stateful/stateless comment above?

> + */
> enum nx_packet_in2_prop_type {
>     /* Packet. */
>     NXPINT_PACKET,              /* Raw packet data. */
> @@ -280,6 +371,7 @@ enum nx_packet_in2_prop_type {
>     NXPINT_REASON,              /* uint8_t, one of OFPR_*. */
>     NXPINT_METADATA,            /* NXM or OXM for metadata fields. */
>     NXPINT_USERDATA,            /* From NXAST_CONTROLLER2 userdata. */
> +    NXPINT_CONTINUATION,        /* Private data for continuing processing. */
> };
> 
> /* Configures the "role" of the sending controller.  The default role is:
> 

(snip)




More information about the dev mailing list