[ovs-dev] [PATCH v3 1/2] ofproto: Allow xlate_actions() to fail.

Joe Stringer joe at ovn.org
Wed Nov 25 22:58:01 UTC 2015


On 25 November 2015 at 11:23, Jarno Rajahalme <jarno at ovn.org> wrote:
>
> On Nov 25, 2015, at 11:11 AM, Jarno Rajahalme <jarno at ovn.org> wrote:
>
>
> On Nov 25, 2015, at 10:52 AM, Joe Stringer <joe at ovn.org> wrote:
>
> On 25 November 2015 at 10:31, Jarno Rajahalme <jarno at ovn.org> wrote:
>
>
> On Nov 24, 2015, at 5:02 PM, Joe Stringer <joe at ovn.org> wrote:
>
> On 24 November 2015 at 13:41, Jarno Rajahalme <jarno at ovn.org> wrote:
>
> Sometimes xlate_actions() fails due to too deep recursion, too many
> MPLS labels, or missing recirculation context.  Make xlate_actions()
> clear out the produced odp actions in these cases to make it easy for
> the caller to install a drop flow (instead or installing a flow with
> partially translated actions).  Also, return a specific error code, so
> that the error can be properly propagated where meaningful.
>
> Before this patch it was possible that the revalidation installed a
> flow with a recirculation ID with an invalid recirc ID (== 0), due to
> the introduction of in-place modification in commit 43b2f131a229
> (ofproto: Allow in-place modifications of datapath flows).
>
> Signed-off-by: Jarno Rajahalme <jarno at ovn.org>
>
>
> Should this also set the error when receiving packets on a mirror port
> in xlate_actions()? Or when receiving tagged VLAN traffic that doesn't
> correspond to the port's vlan tag? Or when a group has no live bucket?
> Are there any other cases that should also be covered? (I just scanned
> across ofproto/ofproto-dpif-xlate.c looking for cases where we're
> already logging that we drop the packet, but maybe there's a reasoning
> behind not including these - if so, please enlighten me)
>
>
> No reasoning for missing those, I just did not notice them. Thanks for
> pointing them out.
>
>
> OK, I thought it may have been something like "expected errors" vs.
> "unexpected errors".
>
>
> Looking into these I noticed this to be the case. Must discern whether to
> fail just the individual action v.s. the whole pipeline.
>
>
> How about this incremental to cover two cases here (rest are “expected
> errors” IMO):
>
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index 36a6fbc..2908339 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -336,6 +336,10 @@ const char *xlate_strerror(enum xlate_error error)
>          return "Recirculation conflict";
>      case XLATE_TOO_MANY_MPLS_LABELS:
>          return "Too many MPLS labels";
> +    case XLATE_BUCKET_CHAINING_TOO_DEEP:
> +        return "Bucket chaining too deep";
> +    case XLATE_NO_INPUT_BUNDLE:
> +        return "No input bundle";
>      }
>      return "Unknown error";
>  }
> @@ -1444,10 +1448,9 @@ bucket_is_alive(const struct xlate_ctx *ctx,
>                  struct ofputil_bucket *bucket, int depth)
>  {
>      if (depth >= MAX_LIVENESS_RECURSION) {
> -        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
> -
> -        VLOG_WARN_RL(&rl, "bucket chaining exceeded %d links",
> -                     MAX_LIVENESS_RECURSION);
> +        XLATE_REPORT_ERROR(ctx, "bucket chaining exceeded %d links",
> +                           MAX_LIVENESS_RECURSION);
> +        ctx->error = XLATE_BUCKET_CHAINING_TOO_DEEP;
>          return false;
>      }
>
> @@ -2323,7 +2326,8 @@ xlate_normal(struct xlate_ctx *ctx)
>      in_xbundle = lookup_input_bundle(ctx->xbridge, flow->in_port.ofp_port,
>                                       ctx->xin->packet != NULL, &in_port);
>      if (!in_xbundle) {
> -        xlate_report(ctx, "no input bundle, dropping");
> +        XLATE_REPORT_ERROR(ctx, "no input bundle, dropping");
> +        ctx->error = XLATE_NO_INPUT_BUNDLE;
>          return;
>      }
>
>
> The last one is debatable, as setting the error fails the whole translation
> rather than just the normal action. But this is most likely an configuration
> error, so maybe failing the whole pipeline (and installing a drop flow) is
> the right thing to do here?

Jarno and I discussed this offline, and I'll try to summarise here.
Broadly speaking, we're talking about the decision between failing an
individual (piece of an) action or completely failing the action
processing for the flow. And I think arguably the approach should be
that if it is a serious error such as running out of resources or an
internal conflict of recirc IDs, then we should fail the entire action
processing. In this case it will have two user-visible effects:
1) ofproto/trace will tell the user which serious condition is being
triggered that causes dropping of the flow
2) OpenFlow controllers attempting packet_out could be notified that
the error occurred (rather than silently failing like currently)

However, in the two cases in the incremental patch here, the actions
inherently have some ambiguity as to whether they successfully execute
(eg output) or not. The more obvious case is in the bucket_is_alive()
logic, where recursion will cause a bucket to be not used. If a bucket
is not live in the spec, this doesn't mean that the entire flow should
stop processing. In the case of normal, I'd argue it's very similar in
that 'normal' doesn't specifically attempt to output to a particular
port; sending packets out to different ports may fail for different
reasons, but this shouldn't prevent later actions in the actions list
from being executed.

I think the latter cases should be reported for ofproto/trace, though.

Looking back across this thread, it looks not far off your reasoning
described earlier so I think we're converging on the same view. Does
this sound like a fair approach?

--

In the mirror case, the point is moot because do_xlate_actions() isn't
even called in that case, so it's purely a matter of whether we want
to return the error up the stack or not. Maybe that should be reported
for ofproto/trace as well.

I didn't see any other cases that might need handling through this.



More information about the dev mailing list