[ovs-dev] [PATCH 10/10] Implement "closures".
Ben Pfaff
blp at ovn.org
Wed Feb 10 06:02:18 UTC 2016
On Thu, Jan 28, 2016 at 02:15:29PM -0800, Jarno Rajahalme wrote:
> IMO you have included sufficient testing to warrant the code to be
> merged, maybe remove the “RFC” reference from the commit message?
Yes, that was a mistake, I forgot to remove it from the commit message
before posting.
> Some comments for future consideration below.
>
> With this incremental:
>
> diff --git a/lib/ofp-util.c b/lib/ofp-util.c
> index 42c8f1e..4a080cd 100644
> --- a/lib/ofp-util.c
> +++ b/lib/ofp-util.c
> @@ -3887,7 +3887,7 @@ parse_actions_property(struct ofpbuf *property, enum ofp_version version,
> {
> if (!ofpbuf_try_pull(property, ROUND_UP(ofpbuf_headersize(property), 8))) {
> VLOG_WARN_RL(&bad_ofmsg_rl,
> - "actions property has bad length %"PRIuSIZE,
> + "actions property has bad length %"PRIu32,
> property->size);
> return OFPERR_OFPBPC_BAD_LEN;
> }
>
>
> Acked-by: Jarno Rajahalme <jarno at ovn.org>
Thanks, I folded that fix in.
> > On Jan 27, 2016, at 9:51 AM, Ben Pfaff <blp at ovn.org> wrote:
> >
> > One purpose of OpenFlow packet-in messages is to allow a controller to
> > interpose on the path of a packet through the flow tables. If, for
> > example, the controller needs to modify a packet in some way that the
> > switch doesn't directly support, the controller should be able to
> > program the switch to send it the packet, then modify the packet and
> > send it back to the switch to continue through the flow table.
> >
> > That's the theory. In practice, this doesn't work with any but the
> > simplest flow tables. Packet-in messages simply don't include enough
> > context to allow the flow table traversal to continue. For example:
> >
> > * Via "resubmit" actions, an Open vSwitch packet can have an
> > effective "call stack", but a packet-in can't describe it, and
> > so it would be lost.
> >
> > * Via "patch ports", an Open vSwitch packet can traverse multiple
> > OpenFlow logical switches. A packet-in can't describe or resume
> > this context.
> >
> > * A packet-in can't preserve the stack used by NXAST_PUSH and
> > NXAST_POP actions.
> >
> > * A packet-in can't preserve the OpenFlow 1.1+ action set.
> >
> > * A packet-in can't preserve the state of Open vSwitch mirroring
> > or connection tracking.
> >
> > This commit introduces a solution called "closures". A closure is the
> > state of a packet's traversal through OpenFlow flow tables. A new
> > NXAST_PAUSE action generates a closure and sends it to the OpenFlow
> > controller as an NXT_CLOSURE asynchronous message (which must be
> > enabled through an extension to OFPT_SET_ASYNC or NXT_SET_ASYNC2).
>
> Should the latter be NXT_SET_ASYNC_CONFIG?
It was supposed to be NXT_SET_ASYNC_CONFIG2, which is like the OF1.4
OFPT_SET_ASYNC but backported to other OpenFlow versions so that it can
support closures.
> > - OpenFlow:
> > * OpenFlow 1.1+ OFPT_QUEUE_GET_CONFIG_REQUEST now supports OFPP_ANY.
> > * OpenFlow 1.4+ OFPMP_QUEUE_DESC is now supported.
> > + * New extension message NXT_SET_ASYNC_CONFIG2 to allow OpenFlow 1.4-like
>
> Should this be just NXT_SET_ASYNC_CONFIG?
I got it right here.
> > + /* Private properties. These are subject to change and not architectural.
> > + * Do not depend on them. */
> > + NXCPT_BRIDGE = 0x8000,
> > + NXCPT_STACK,
> > + NXCPT_MIRRORS,
> > + NXCPT_CONNTRACKED,
> > + NXCPT_TABLE_ID,
> > + NXCPT_COOKIE,
> > + NXCPT_ACTIONS,
> > + NXCPT_ACTION_SET,
>
> When I wrote the following I did not yet see that there may be
> multiple table_id, cookie, and actions properties. I guess the comment
> below applies to the table_id if the table where the pause action was
> located, and the cookie to the cookie of the rule that has the pause
> action:
>
> It seems to me that the controller parsing of the incoming closures
> might be eased by making table_id and cookie public properties. I do
> not have any specific example in mind, but controllers explicitly
> install rules with specific cookies to specific tables, hence the
> controller could use, e.g., a special cookie value (or bits) to
> separate different PAUSE reasons from each other. With that thought,
> maybe there could be a ‘reason’ parameter to the PAUSE action that
> would then be passed on in the CLOSURE message?
I think we could add a "reason" parameter if it becomes useful. So far
I haven't come across it yet.
> > +static void
> > +ofputil_put_closure_private(const struct ofputil_closure_private *closure,
> > + enum ofputil_protocol protocol, struct ofpbuf *msg)
> > +{
> > + enum ofp_version version = ofputil_protocol_to_ofp_version(protocol);
> > +
> > + ofputil_put_closure(&closure->public, protocol, msg);
> > +
> > + ofpprop_put_uuid(msg, NXCPT_BRIDGE, &closure->bridge);
> > +
> > + for (size_t i = 0; i < closure->n_stack; i++) {
> > + const union mf_subvalue *s = &closure->stack[i];
> > + size_t ofs;
> > + for (ofs = 0; ofs < sizeof *s; ofs++) {
> > + if (s->u8[ofs]) {
> > + break;
> > + }
> > + }
> > +
> > + ofpprop_put(msg, NXCPT_STACK, &s->u8[ofs], sizeof *s - ofs);
> > + }
> > +
> > + if (closure->mirrors) {
> > + ofpprop_put_u32(msg, NXCPT_MIRRORS, closure->mirrors);
> > + }
> > +
> > + if (closure->conntracked) {
> > + ofpprop_put_flag(msg, NXCPT_CONNTRACKED);
> > + }
> > +
> > + if (closure->actions_len) {
> > + const struct ofpact *const end = ofpact_end(closure->actions,
> > + closure->actions_len);
> > + const struct ofpact_unroll_xlate *unroll = NULL;
> > + uint8_t table_id = 0;
> > + ovs_be64 cookie = 0;
> > +
> > + const struct ofpact *a;
> > + for (a = closure->actions; ; a = ofpact_next(a)) {
> > + if (a == end || a->type == OFPACT_UNROLL_XLATE) {
> > + if (unroll) {
> > + if (table_id != unroll->rule_table_id) {
> > + ofpprop_put_u8(msg, NXCPT_TABLE_ID,
> > + unroll->rule_table_id);
> > + table_id = unroll->rule_table_id;
> > + }
> > + if (cookie != unroll->rule_cookie) {
> > + ofpprop_put_be64(msg, NXCPT_COOKIE,
> > + unroll->rule_cookie);
> > + cookie = unroll->rule_cookie;
> > + }
> > + }
> > +
> > + const struct ofpact *start
> > + = unroll ? ofpact_next(&unroll->ofpact) : closure->actions;
> > + put_actions_property(msg, NXCPT_ACTIONS, version,
> > + start, (a - start) * sizeof *a);
> > +
> > + if (a == end) {
> > + break;
> > + }
> > + unroll = ofpact_get_UNROLL_XLATE(a);
> > + }
> > + }
>
> So the above emits sets of actions as separated by the unroll_xlate
> actions, and instead of encoding the separating unroll_xlate action, a
> pair of table_id and cookie properties is encoded (but only if they
> differ from the last ones). Is this to keep unroll_xlate private to
> OVS, or are there some other reasons as well? Maybe the loop could be
> preceded by a comment explaining all this?
I didn't really want to make unroll_xlate public, even if it was
restricted just to closures.
I added a comment:
/* Divide 'closure->actions' into groups that begins with an
* unroll_xlate action. For each group, emit a NXCPT_TABLE_ID and
* NXCPT_COOKIE property (if either has changed; each is initially
* assumed 0), then a NXCPT_ACTIONS property with the grouped
* actions.
*
* The alternative is to make OFPACT_UNROLL_XLATE public. We can
* always do that later, since this is a private property. */
> > +static enum ofperr
> > +parse_actions_property(struct ofpbuf *property, enum ofp_version version,
> > + struct ofpbuf *ofpacts)
> > +{
> > + if (!ofpbuf_try_pull(property, ROUND_UP(ofpbuf_headersize(property), 8))) {
> > + VLOG_WARN_RL(&bad_ofmsg_rl,
> > + "actions property has bad length %"PRIuSIZE,
> > + property->size);
>
> ‘property->size’ is an uint32_t, not size_t, so the formatting is wrong.
Fixed, thanks.
> > + msg = ofputil_encode_closure_private(&am->closure, protocol);
> > + sched = SCHED_CLOSURE;
> > + port = 0; /* XXX */
>
> Maybe should just document that closures have port number 0?
This is just for fairness.
I changed the comment to make this clear:
/* XXX This 'port' value is used only to provide fairness: if
* pinsched has to drop messages, it drops those for the port with
* the most messages first. Perhaps we should figure out a good
* way to use this. */
port = 0;
> > +static void
> > compose_recirculate_action__(struct xlate_ctx *ctx, uint8_t table)
> > {
> > struct recirc_metadata md;
> > @@ -3652,20 +3682,26 @@ compose_recirculate_action__(struct xlate_ctx *ctx, uint8_t table)
> > .action_set_len = ctx->recirc_action_offset,
> > };
> >
> > - /* Allocate a unique recirc id for the given metadata state in the
> > - * flow. An existing id, with a new reference to the corresponding
> > - * recirculation context, will be returned if possible.
> > - * The life-cycle of this recirc id is managed by associating it
> > - * with the udpif key ('ukey') created for each new datapath flow. */
> > - id = recirc_alloc_id_ctx(&state);
> > - if (!id) {
> > - XLATE_REPORT_ERROR(ctx, "Failed to allocate recirculation id");
> > - ctx->error = XLATE_NO_RECIRCULATION_CONTEXT;
> > - return;
> > - }
> > - recirc_refs_add(&ctx->xout->recircs, id);
> > + if (ctx->pause) {
> > + if (ctx->xin->packet) {
> > + emit_closure(ctx, &state);
> > + }
> > + } else {
> > + /* Allocate a unique recirc id for the given metadata state in the
> > + * flow. An existing id, with a new reference to the corresponding
> > + * recirculation context, will be returned if possible.
> > + * The life-cycle of this recirc id is managed by associating it
> > + * with the udpif key ('ukey') created for each new datapath flow. */
> > + id = recirc_alloc_id_ctx(&state);
> > + if (!id) {
> > + XLATE_REPORT_ERROR(ctx, "Failed to allocate recirculation id");
> > + ctx->error = XLATE_NO_RECIRCULATION_CONTEXT;
> > + return;
> > + }
> > + recirc_refs_add(&ctx->xout->recircs, id);
> >
> > - nl_msg_put_u32(ctx->odp_actions, OVS_ACTION_ATTR_RECIRC, id);
> > + nl_msg_put_u32(ctx->odp_actions, OVS_ACTION_ATTR_RECIRC, id);
> > + }
>
> This function’s name could be changed to reflect the fact that it is
> used to either emit closure or to compose the recirculation action.
I changed it to finish_freezing__() in the new series.
> >
> > /* Undo changes done by recirculation. */
> > ctx->action_set.size = ctx->recirc_action_offset;
> > @@ -4178,6 +4214,7 @@ recirc_unroll_actions(const struct ofpact *ofpacts, size_t ofpacts_len,
> > case OFPACT_GROUP:
> > case OFPACT_OUTPUT:
> > case OFPACT_CONTROLLER:
> > + case OFPACT_PAUSE:
> > case OFPACT_DEC_MPLS_TTL:
> > case OFPACT_DEC_TTL:
> > /* These actions may generate asynchronous messages, which include
> > @@ -4746,6 +4783,13 @@ do_xlate_actions(const struct ofpact *ofpacts, size_t ofpacts_len,
> > break;
> > }
> >
> > + case OFPACT_PAUSE:
> > + ctx->pause = ofpact_get_PAUSE(a);
> > + ctx->xout->slow |= SLOW_CONTROLLER;
> > + ctx_trigger_recirculation(ctx);
>
> ctx_trigger_recirculation() could be renamed as
> ctx_pause_translation(), which could then lead to either emitting the
> closure or recirculation.
I changed it to ctx_trigger_freeze() in the new series.
> > + a = ofpact_next(a);
> > + break;
> > +
> > case OFPACT_EXIT:
> > ctx->exit = true;
> > break;
> > @@ -5132,6 +5176,7 @@ xlate_actions(struct xlate_in *xin, struct xlate_out *xout)
> >
> > .recirc_action_offset = -1,
> > .last_unroll_offset = -1,
> > + .pause = NULL,
> >
> > .was_mpls = false,
> > .conntracked = false,
> > @@ -5434,6 +5479,67 @@ exit:
> > return ctx.error;
> > }
> >
> > +enum ofperr
> > +xlate_closure(struct ofproto_dpif *ofproto,
> > + const struct ofputil_closure_private *closure,
> > + struct ofpbuf *odp_actions,
> > + enum slow_path_reason *slow)
> > +{
> > + struct dp_packet packet;
> > + dp_packet_use_const(&packet, closure->public.packet,
> > + closure->public.packet_len);
> > +
> > + struct flow flow;
> > + flow_extract(&packet, &flow);
> > +
> > + struct xlate_in xin;
> > + xlate_in_init(&xin, ofproto, &flow, 0, NULL, flow.tcp_flags, &packet,
> > + NULL, odp_actions);
> > +
> > + struct ofpact_note noop;
> > + ofpact_init_NOTE(&noop);
> > + noop.length = 0;
> > +
> > + bool any_actions = closure->actions_len > 0;
> > + struct recirc_state recirc = {
> > + .table_id = 0, /* Not the table where NXAST_PAUSE was executed. */
> > + .ofproto_uuid = closure->bridge,
> > + .stack = closure->stack,
> > + .n_stack = closure->n_stack,
> > + .mirrors = closure->mirrors,
> > + .conntracked = closure->conntracked,
> > +
> > + /* When there are no actions, xlate_actions() will search the flow
> > + * table. We don't want it to do that (we want it to resume), so
> > + * supply a no-op action if there aren't any.
> > + *
> > + * (We can't necessarily avoid translating actions entirely if there
> > + * aren't any actions, because there might be some finishing-up to do
> > + * at the end of the pipeline, and we don't check for those
> > + * conditions.)) */
>
> So this part in parenthesis is the explanation for the use of the
> ‘noop’. Would it be correct to state that even if the closure has no
> actions, it still may have an action set that is executed during the
> translation? What if also the action set is empty, is the translation
> still needed?
Maybe not. But I think that actually skipping it would be a
micro-optimization that is just begging to miss some case, either now or
later.
> There are two closing parentheses above: ‘))’.
Fixed.
Even though you acked it, I'm going to repost a new version. It's again
the end of a longer series, since I found several ways to improve the
surrounding code.
More information about the dev
mailing list