[ovs-dev] [PATCH 10/10] Implement "closures".

Wed Feb 10 06:02:18 UTC 2016

On Thu, Jan 28, 2016 at 02:15:29PM -0800, Jarno Rajahalme wrote:
> IMO you have included sufficient testing to warrant the code to be
> merged, maybe remove the “RFC” reference from the commit message?

Yes, that was a mistake, I forgot to remove it from the commit message
before posting.

> Some comments for future consideration below.
> 
> With this incremental:
> 
> diff --git a/lib/ofp-util.c b/lib/ofp-util.c
> index 42c8f1e..4a080cd 100644
> --- a/lib/ofp-util.c
> +++ b/lib/ofp-util.c
> @@ -3887,7 +3887,7 @@ parse_actions_property(struct ofpbuf *property, enum ofp_version version,
>  {
>      if (!ofpbuf_try_pull(property, ROUND_UP(ofpbuf_headersize(property), 8))) {
>          VLOG_WARN_RL(&bad_ofmsg_rl,
> -                     "actions property has bad length %"PRIuSIZE,
> +                     "actions property has bad length %"PRIu32,
>                       property->size);
>          return OFPERR_OFPBPC_BAD_LEN;
>      }
> 
> 
> Acked-by: Jarno Rajahalme <jarno at ovn.org>

Thanks, I folded that fix in.

> > On Jan 27, 2016, at 9:51 AM, Ben Pfaff <blp at ovn.org> wrote:
> > 
> > One purpose of OpenFlow packet-in messages is to allow a controller to
> > interpose on the path of a packet through the flow tables.  If, for
> > example, the controller needs to modify a packet in some way that the
> > switch doesn't directly support, the controller should be able to
> > program the switch to send it the packet, then modify the packet and
> > send it back to the switch to continue through the flow table.
> > 
> > That's the theory.  In practice, this doesn't work with any but the
> > simplest flow tables.  Packet-in messages simply don't include enough
> > context to allow the flow table traversal to continue.  For example:
> > 
> >    * Via "resubmit" actions, an Open vSwitch packet can have an
> >      effective "call stack", but a packet-in can't describe it, and
> >      so it would be lost.
> > 
> >    * Via "patch ports", an Open vSwitch packet can traverse multiple
> >      OpenFlow logical switches.  A packet-in can't describe or resume
> >      this context.
> > 
> >    * A packet-in can't preserve the stack used by NXAST_PUSH and
> >      NXAST_POP actions.
> > 
> >    * A packet-in can't preserve the OpenFlow 1.1+ action set.
> > 
> >    * A packet-in can't preserve the state of Open vSwitch mirroring
> >      or connection tracking.
> > 
> > This commit introduces a solution called "closures".  A closure is the
> > state of a packet's traversal through OpenFlow flow tables.  A new
> > NXAST_PAUSE action generates a closure and sends it to the OpenFlow
> > controller as an NXT_CLOSURE asynchronous message (which must be
> > enabled through an extension to OFPT_SET_ASYNC or NXT_SET_ASYNC2).
> 
> Should the latter be NXT_SET_ASYNC_CONFIG?

It was supposed to be NXT_SET_ASYNC_CONFIG2, which is like the OF1.4
OFPT_SET_ASYNC but backported to other OpenFlow versions so that it can
support closures.

> >    - OpenFlow:
> >      * OpenFlow 1.1+ OFPT_QUEUE_GET_CONFIG_REQUEST now supports OFPP_ANY.
> >      * OpenFlow 1.4+ OFPMP_QUEUE_DESC is now supported.
> > +     * New extension message NXT_SET_ASYNC_CONFIG2 to allow OpenFlow 1.4-like
> 
> Should this be just NXT_SET_ASYNC_CONFIG?

I got it right here.

> > +    /* Private properties.  These are subject to change and not architectural.
> > +     * Do not depend on them. */
> > +    NXCPT_BRIDGE = 0x8000,
> > +    NXCPT_STACK,
> > +    NXCPT_MIRRORS,
> > +    NXCPT_CONNTRACKED,
> > +    NXCPT_TABLE_ID,
> > +    NXCPT_COOKIE,
> > +    NXCPT_ACTIONS,
> > +    NXCPT_ACTION_SET,
> 
> When I wrote the following I did not yet see that there may be
> multiple table_id, cookie, and actions properties. I guess the comment
> below applies to the table_id if the table where the pause action was
> located, and the cookie to the cookie of the rule that has the pause
> action:
> 
> It seems to me that the controller parsing of the incoming closures
> might be eased by making table_id and cookie public properties. I do
> not have any specific example in mind, but controllers explicitly
> install rules with specific cookies to specific tables, hence the
> controller could use, e.g., a special cookie value (or bits) to
> separate different PAUSE reasons from each other. With that thought,
> maybe there could be a ‘reason’ parameter to the PAUSE action that
> would then be passed on in the CLOSURE message?

I think we could add a "reason" parameter if it becomes useful.  So far
I haven't come across it yet.

> > +static void
> > +ofputil_put_closure_private(const struct ofputil_closure_private *closure,
> > +                            enum ofputil_protocol protocol, struct ofpbuf *msg)
> > +{
> > +    enum ofp_version version = ofputil_protocol_to_ofp_version(protocol);
> > +
> > +    ofputil_put_closure(&closure->public, protocol, msg);
> > +
> > +    ofpprop_put_uuid(msg, NXCPT_BRIDGE, &closure->bridge);
> > +
> > +    for (size_t i = 0; i < closure->n_stack; i++) {
> > +        const union mf_subvalue *s = &closure->stack[i];
> > +        size_t ofs;
> > +        for (ofs = 0; ofs < sizeof *s; ofs++) {
> > +            if (s->u8[ofs]) {
> > +                break;
> > +            }
> > +        }
> > +
> > +        ofpprop_put(msg, NXCPT_STACK, &s->u8[ofs], sizeof *s - ofs);
> > +    }
> > +
> > +    if (closure->mirrors) {
> > +        ofpprop_put_u32(msg, NXCPT_MIRRORS, closure->mirrors);
> > +    }
> > +
> > +    if (closure->conntracked) {
> > +        ofpprop_put_flag(msg, NXCPT_CONNTRACKED);
> > +    }
> > +
> > +    if (closure->actions_len) {
> > +        const struct ofpact *const end = ofpact_end(closure->actions,
> > +                                                    closure->actions_len);
> > +        const struct ofpact_unroll_xlate *unroll = NULL;
> > +        uint8_t table_id = 0;
> > +        ovs_be64 cookie = 0;
> > +
> > +        const struct ofpact *a;
> > +        for (a = closure->actions; ; a = ofpact_next(a)) {
> > +            if (a == end || a->type == OFPACT_UNROLL_XLATE) {
> > +                if (unroll) {
> > +                    if (table_id != unroll->rule_table_id) {
> > +                        ofpprop_put_u8(msg, NXCPT_TABLE_ID,
> > +                                       unroll->rule_table_id);
> > +                        table_id = unroll->rule_table_id;
> > +                    }
> > +                    if (cookie != unroll->rule_cookie) {
> > +                        ofpprop_put_be64(msg, NXCPT_COOKIE,
> > +                                         unroll->rule_cookie);
> > +                        cookie = unroll->rule_cookie;
> > +                    }
> > +                }
> > +
> > +                const struct ofpact *start
> > +                    = unroll ? ofpact_next(&unroll->ofpact) : closure->actions;
> > +                put_actions_property(msg, NXCPT_ACTIONS, version,
> > +                                     start, (a - start) * sizeof *a);
> > +
> > +                if (a == end) {
> > +                    break;
> > +                }
> > +                unroll = ofpact_get_UNROLL_XLATE(a);
> > +            }
> > +        }
> 
> So the above emits sets of actions as separated by the unroll_xlate
> actions, and instead of encoding the separating unroll_xlate action, a
> pair of table_id and cookie properties is encoded (but only if they
> differ from the last ones). Is this to keep unroll_xlate private to
> OVS, or are there some other reasons as well? Maybe the loop could be
> preceded by a comment explaining all this?

I didn't really want to make unroll_xlate public, even if it was
restricted just to closures.

I added a comment:
        /* Divide 'closure->actions' into groups that begins with an
         * unroll_xlate action.  For each group, emit a NXCPT_TABLE_ID and
         * NXCPT_COOKIE property (if either has changed; each is initially
         * assumed 0), then a NXCPT_ACTIONS property with the grouped
         * actions.
         *
         * The alternative is to make OFPACT_UNROLL_XLATE public.  We can
         * always do that later, since this is a private property. */

> > +static enum ofperr
> > +parse_actions_property(struct ofpbuf *property, enum ofp_version version,
> > +                       struct ofpbuf *ofpacts)
> > +{
> > +    if (!ofpbuf_try_pull(property, ROUND_UP(ofpbuf_headersize(property), 8))) {
> > +        VLOG_WARN_RL(&bad_ofmsg_rl,
> > +                     "actions property has bad length %"PRIuSIZE,
> > +                     property->size);
> 
> ‘property->size’ is an uint32_t, not size_t, so the formatting is wrong.

Fixed, thanks.

> > +            msg = ofputil_encode_closure_private(&am->closure, protocol);
> > +            sched = SCHED_CLOSURE;
> > +            port = 0; /* XXX */
> 
> Maybe should just document that closures have port number 0?

This is just for fairness.

I changed the comment to make this clear:

            /* XXX This 'port' value is used only to provide fairness: if
             * pinsched has to drop messages, it drops those for the port with
             * the most messages first.  Perhaps we should figure out a good
             * way to use this. */
            port = 0;

> > +static void
> > compose_recirculate_action__(struct xlate_ctx *ctx, uint8_t table)
> > {
> >     struct recirc_metadata md;
> > @@ -3652,20 +3682,26 @@ compose_recirculate_action__(struct xlate_ctx *ctx, uint8_t table)
> >         .action_set_len = ctx->recirc_action_offset,
> >     };
> > 
> > -    /* Allocate a unique recirc id for the given metadata state in the
> > -     * flow.  An existing id, with a new reference to the corresponding
> > -     * recirculation context, will be returned if possible.
> > -     * The life-cycle of this recirc id is managed by associating it
> > -     * with the udpif key ('ukey') created for each new datapath flow. */
> > -    id = recirc_alloc_id_ctx(&state);
> > -    if (!id) {
> > -        XLATE_REPORT_ERROR(ctx, "Failed to allocate recirculation id");
> > -        ctx->error = XLATE_NO_RECIRCULATION_CONTEXT;
> > -        return;
> > -    }
> > -    recirc_refs_add(&ctx->xout->recircs, id);
> > +    if (ctx->pause) {
> > +        if (ctx->xin->packet) {
> > +            emit_closure(ctx, &state);
> > +        }
> > +    } else {
> > +        /* Allocate a unique recirc id for the given metadata state in the
> > +         * flow.  An existing id, with a new reference to the corresponding
> > +         * recirculation context, will be returned if possible.
> > +         * The life-cycle of this recirc id is managed by associating it
> > +         * with the udpif key ('ukey') created for each new datapath flow. */
> > +        id = recirc_alloc_id_ctx(&state);
> > +        if (!id) {
> > +            XLATE_REPORT_ERROR(ctx, "Failed to allocate recirculation id");
> > +            ctx->error = XLATE_NO_RECIRCULATION_CONTEXT;
> > +            return;
> > +        }
> > +        recirc_refs_add(&ctx->xout->recircs, id);
> > 
> > -    nl_msg_put_u32(ctx->odp_actions, OVS_ACTION_ATTR_RECIRC, id);
> > +        nl_msg_put_u32(ctx->odp_actions, OVS_ACTION_ATTR_RECIRC, id);
> > +    }
> 
> This function’s name could be changed to reflect the fact that it is
> used to either emit closure or to compose the recirculation action.

I changed it to finish_freezing__() in the new series.

> > 
> >     /* Undo changes done by recirculation. */
> >     ctx->action_set.size = ctx->recirc_action_offset;
> > @@ -4178,6 +4214,7 @@ recirc_unroll_actions(const struct ofpact *ofpacts, size_t ofpacts_len,
> >         case OFPACT_GROUP:
> >         case OFPACT_OUTPUT:
> >         case OFPACT_CONTROLLER:
> > +        case OFPACT_PAUSE:
> >         case OFPACT_DEC_MPLS_TTL:
> >         case OFPACT_DEC_TTL:
> >             /* These actions may generate asynchronous messages, which include
> > @@ -4746,6 +4783,13 @@ do_xlate_actions(const struct ofpact *ofpacts, size_t ofpacts_len,
> >             break;
> >         }
> > 
> > +        case OFPACT_PAUSE:
> > +            ctx->pause = ofpact_get_PAUSE(a);
> > +            ctx->xout->slow |= SLOW_CONTROLLER;
> > +            ctx_trigger_recirculation(ctx);
> 
> ctx_trigger_recirculation() could be renamed as
> ctx_pause_translation(), which could then lead to either emitting the
> closure or recirculation.

I changed it to ctx_trigger_freeze() in the new series.

> > +            a = ofpact_next(a);
> > +            break;
> > +
> >         case OFPACT_EXIT:
> >             ctx->exit = true;
> >             break;
> > @@ -5132,6 +5176,7 @@ xlate_actions(struct xlate_in *xin, struct xlate_out *xout)
> > 
> >         .recirc_action_offset = -1,
> >         .last_unroll_offset = -1,
> > +        .pause = NULL,
> > 
> >         .was_mpls = false,
> >         .conntracked = false,
> > @@ -5434,6 +5479,67 @@ exit:
> >     return ctx.error;
> > }
> > 
> > +enum ofperr
> > +xlate_closure(struct ofproto_dpif *ofproto,
> > +              const struct ofputil_closure_private *closure,
> > +              struct ofpbuf *odp_actions,
> > +              enum slow_path_reason *slow)
> > +{
> > +    struct dp_packet packet;
> > +    dp_packet_use_const(&packet, closure->public.packet,
> > +                        closure->public.packet_len);
> > +
> > +    struct flow flow;
> > +    flow_extract(&packet, &flow);
> > +
> > +    struct xlate_in xin;
> > +    xlate_in_init(&xin, ofproto, &flow, 0, NULL, flow.tcp_flags, &packet,
> > +                  NULL, odp_actions);
> > +
> > +    struct ofpact_note noop;
> > +    ofpact_init_NOTE(&noop);
> > +    noop.length = 0;
> > +
> > +    bool any_actions = closure->actions_len > 0;
> > +    struct recirc_state recirc = {
> > +        .table_id = 0,     /* Not the table where NXAST_PAUSE was executed. */
> > +        .ofproto_uuid = closure->bridge,
> > +        .stack = closure->stack,
> > +        .n_stack = closure->n_stack,
> > +        .mirrors = closure->mirrors,
> > +        .conntracked = closure->conntracked,
> > +
> > +        /* When there are no actions, xlate_actions() will search the flow
> > +         * table.  We don't want it to do that (we want it to resume), so
> > +         * supply a no-op action if there aren't any.
> > +         *
> > +         * (We can't necessarily avoid translating actions entirely if there
> > +         * aren't any actions, because there might be some finishing-up to do
> > +         * at the end of the pipeline, and we don't check for those
> > +         * conditions.)) */
> 
> So this part in parenthesis is the explanation for the use of the
> ‘noop’. Would it be correct to state that even if the closure has no
> actions, it still may have an action set that is executed during the
> translation? What if also the action set is empty, is the translation
> still needed?

Maybe not.  But I think that actually skipping it would be a
micro-optimization that is just begging to miss some case, either now or
later.

> There are two closing parentheses above: ‘))’. 

Fixed.

Even though you acked it, I'm going to repost a new version.  It's again
the end of a longer series, since I found several ways to improve the
surrounding code.