[ovs-dev] [RFC PATCH 4/4] Add packet recirculation

Jesse Gross jesse at nicira.com
Mon Mar 18 20:49:43 UTC 2013


On Fri, Mar 15, 2013 at 7:27 AM, Simon Horman <horms at verge.net.au> wrote:
> Recirculation is a technique to allow a frame to re-enter
> frame processing. This is intended to be used after actions
> have been applied to the frame with modify the frame in
> some way that makes it possible for richer processing to occur.
>
> An example is and indeed targeted use case is MPLS. If an MPLS frame has an
> mpls_pop action applied with the IPv4 ethernet type then it becomes
> possible to decode the IPv4 portion of the frame. This may be used to
> construct a facet that modifies the IPv4 portion of the frame. This is not
> possible prior to the mpls_pop action as the contents of the frame after
> the MPLS stack is not known to be IPv4.
>
> Design:
> * New recirculation action.
>
>   ovs-vswitchd adds a recirculation action to the end of a list of
>   datapath actions for a flow when the actions are truncated because
>   insufficient flow match information is available to add the next
>   OpenFlow action.  The recirculation action includes an id which can be
>   used to scope a facet lookup of a recirculated packet.
>
>   e.g.  pop_mpls(0x0800),dec_ttl becomes pop_mpls(0x800),resubmit(id=1)
>
>   ovs-vswtichd then restarts processing of the packet, with truncated actions.
>
>   When the actions do not contain a recirculate action processing stops
>   and the mangled packet is executed.
>
>   If the packet is being processed by ovs-vswitchd as a miss and the miss
>   is being handled with facets (the usual case) then facets for each of
>   recirculation steps are added to the datapath.
>
>   For a packet that is not recirculated this will be a single facet. For
>   a packet that is recirculated once this will be two facets. And so on,
>   with the number of facets increasing by one for each recirculation.
>
> * New recirculation_id match attribute
>
>   Facets of recirculated packets have a recirculation_id attribute added to
>   the match key, the id is that of the recirculation action that
>   caused the packet to be recirculated. This is used to scope the lookup
>   of facets.
>
> * Datapath behaviour
>
>   Then the datapath encounters a recirculate action it:
>   + Recalculates the flow key based on the packet
>     which will typically have been modified by previous actions
>   + Adds a recirculation_id attribute to the match key whose id
>     is that of the recirculate action
>   + Performs a lookup using the new match key
>   + Processes the packet if a facet matches the key or;
>   + Makes an upcall if necessary
>
> TODO:
>
> * More sensible handling of recirculation IDs [ovs-vswtichd]
> * More sensible lookup of facets based on recirculation IDs [ovs-vswtichd]
> * Look more closely at facet revalidation [ovs-vswtichd]
>
> Signed-off-by: Simon Horman <horms at verge.net.au>

I'm having trouble applying this (even after applying the earlier
patches in the series).  I'm not sure what changed but would you mind
sending an updated version?

This is a somewhat short review, since I'm just looking at the diff.

> diff --git a/datapath/actions.c b/datapath/actions.c
> index 60fa71a..b75f479 100644
> --- a/datapath/actions.c
> +++ b/datapath/actions.c
> @@ -634,6 +636,12 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
>                 case OVS_ACTION_ATTR_SAMPLE:
>                         err = sample(dp, skb, a, tun_key);
>                         break;
> +
> +               case OVS_ACTION_ATTR_RECIRCULATE:
> +                       if (recirculation_id) {
> +                               *recirculation_id = nla_get_u32(a);
> +                       }

OVS_CB seems like the perfect place to store recirculation_id, that
way we don't have to pass pointers around everywhere.

However, do we actually want to tie this directly to recirculation as
opposed to as a separate action?  I see possible benefits of
separating it out:
 * It makes it more generic - we could potentially use skb_mark
directly or worst case introduce a new metadata field if we are
worried about conflicting uses (although we could always set the real
skb mark on the last pass).  Either way there is a better potential
for reuse.
 * It seems likely that if you were doing more than two passes of
recirculation then you would still only want to use a single ID so
it's not really desirable to remark on every pass.

> @@ -686,23 +695,28 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb)
>         if (unlikely(++loop->count > MAX_LOOPS))
>                 loop->looping = true;
>         if (unlikely(loop->looping)) {
> -               error = loop_suppress(dp, acts);
> -               kfree_skb(skb);
> +               printk("%s: looping", __func__);

Isn't this logging already handled by loop_suppress()?

I think the memory management of the skb is somewhat inconsistent -
the caller retains ownership in the case of recirculation and some
errors but the callee takes it for normal output actions.  What if the
callee takes it in all cases (as today) and gives it back as a return
value for recirculation?

> diff --git a/datapath/datapath.c b/datapath/datapath.c
> index af0dba8..ac2fb0d 100644
> --- a/datapath/datapath.c
> +++ b/datapath/datapath.c
>  /* Must be called with rcu_read_lock. */
>  void ovs_dp_process_received_packet(struct vport *p, struct sk_buff *skb)
[...]
> -out:
> -       /* Update datapath statistics. */
> -       u64_stats_update_begin(&stats->sync);
> -       (*stats_counter)++;
> -       u64_stats_update_end(&stats->sync);
> +               if (likely(!recirculate))
> +                       break;
> +               if (unlikely(!limit--)) {
> +                       kfree_skb(skb);
> +                       return;

We should probably log a rate-limited warning in this case.

> +               }
> +               OVS_CB(skb)->flow = NULL;

I think the check for having OVS_CB(skb)->flow already set is dead
code at this point and we don't need to special case it.

> @@ -905,10 +929,16 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
>                 goto err_unlock;
>
>         local_bh_disable();
> -       err = ovs_execute_actions(dp, packet);
> +       err = ovs_execute_actions(dp, packet, NULL);
>         local_bh_enable();
>         rcu_read_unlock();
>
> +       if (err > 0) {
> +               /* Recirculation is invalid on packet execute */
> +               err = -EINVAL;
> +               goto err_flow_free;
> +       }

Isn't this going to add a lot of complication to userspace?  It's
clearly possible for userspace to not use recirculation since it has
the full packet but it's basically going to require a separate
processing path just to handle this.

> diff --git a/datapath/flow.h b/datapath/flow.h
> index 27e394b..f665d3a 100644
> --- a/datapath/flow.h
> +++ b/datapath/flow.h
>  struct sw_flow_key {
>         struct ovs_key_ipv4_tunnel tun_key;  /* Encapsulating tunnel key. */
> +       __be32  recirculation_id;       /* Identification of parent facet used
> +                                        * progress of recirculation of a
> +                                        * packet.  Zero if the flow is not
> +                                        * the result of a recirculation
> +                                        * access. */

I don't think that there's any reason to make this networking byte
order, since it will never go out on the wire.

> diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
> index f7b788c..24d1eb5 100644
> --- a/include/linux/openvswitch.h
> +++ b/include/linux/openvswitch.h
> @@ -280,6 +280,7 @@ enum ovs_key_attr {
>         OVS_KEY_ATTR_ND,        /* struct ovs_key_nd */
>         OVS_KEY_ATTR_SKB_MARK,  /* u32 skb mark */
>         OVS_KEY_ATTR_TUNNEL,    /* Nested set of ovs_tunnel attributes */
> +       OVS_KEY_ATTR_RECIRCULATION_ID,  /* u32 recirculation id */
>
>  #ifdef __KERNEL__
>         OVS_KEY_ATTR_IPV4_TUNNEL,  /* struct ovs_key_ipv4_tunnel */
> @@ -513,10 +514,13 @@ struct ovs_action_push_vlan {
>   * indicate the new packet contents This could potentially still be
>   * %ETH_P_MPLS_* if the resulting MPLS label stack is not empty.  If there
>   * is no MPLS label stack, as determined by ethertype, no action is taken.
> + * @OVS_ACTION_ATTR_RECIRCULATE: Restart processing of packet.
> + * The packet must have been modified by a previous action in such a way
> + * that it does not match its original facet again.

I think we should avoid use of the word 'facet' in the interface or in
kernel code since it is a specific userspace concept which could
change over time.

> diff --git a/lib/match.c b/lib/match.c
> index 2aa4e89..63626a3 100644
> --- a/lib/match.c
> +++ b/lib/match.c
> @@ -849,7 +849,7 @@ match_format(const struct match *match, struct ds *s, unsigned int priority)
>
>      int i;
>
> -    BUILD_ASSERT_DECL(FLOW_WC_SEQ == 20);
> +    BUILD_ASSERT_DECL(FLOW_WC_SEQ == 21);
>
>      if (priority != OFP_DEFAULT_PRIORITY) {
>          ds_put_format(s, "priority=%u,", priority);

I think we need to be able to print out any new fields that we add to
struct flow for debugging purposes, even if they aren't exposed to the
user normally.

> diff --git a/lib/odp-util.c b/lib/odp-util.c
> index 0ded70d..7fd7a3f 100644
> --- a/lib/odp-util.c
> +++ b/lib/odp-util.c
> @@ -1546,6 +1558,11 @@ odp_flow_key_from_flow(struct ofpbuf *buf, const struct flow *flow,
>          nl_msg_put_u32(buf, OVS_KEY_ATTR_SKB_MARK, flow->skb_mark);
>      }
>
> +    if (flow->recirculation_id) {
> +        nl_msg_put_u32(buf, OVS_KEY_ATTR_RECIRCULATION_ID,
> +                       flow->recirculation_id);
> +    }

I think we should be careful about making zero a magic value.  It's
fine to do that in userspace as part of our implementation but I
wouldn't want to carry that assumption across interface boundaries.

>  commit_set_nw_action(const struct flow *flow, struct flow *base,
>                       struct ofpbuf *odp_actions)
>  {
> +    //VLOG_WARN("%s: %u 0x%04x", __func__, flow->nw_proto, ntohs(flow->dl_type));

Can we remove this debugging code?

> @@ -2419,7 +2450,8 @@ commit_set_skb_mark_action(const struct flow *flow, struct flow *base,
>   * 'base' and 'flow', appends ODP actions to 'odp_actions' that change the flow
>   * key from 'base' into 'flow', and then changes 'base' the same way.  Does not
>   * commit set_tunnel actions.  Users should call commit_odp_tunnel_action()
> - * in addition to this function if needed. */
> + * and commit_recirculate_action() in addition to those functions are needed.

I think the actual name of the function is
commit_odp_recirculate_action (otherwise it makes it difficult to
search for it).

> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
> index 9fe0aaa..0329e3c 100644
> --- a/ofproto/ofproto-dpif.c
> +++ b/ofproto/ofproto-dpif.c

I stopped here since the code is complex enough that I need to be able
to see it in context.



More information about the dev mailing list