[ovs-dev] [PATCH] datapath: Add basic MPLS support to kernel

Jesse Gross jesse at nicira.com
Tue Jun 11 21:26:32 UTC 2013


On Mon, Jun 10, 2013 at 1:07 AM, Simon Horman <horms at verge.net.au> wrote:
> diff --git a/datapath/actions.c b/datapath/actions.c
> index 0dac658..197811a 100644
> --- a/datapath/actions.c
> +++ b/datapath/actions.c
> +static int push_mpls(struct sk_buff *skb,
> +                    const struct ovs_action_push_mpls *mpls)
> +{
> +       __be32 *new_mpls_lse;
> +       int err;
> +
> +       if (skb_cow_head(skb, MPLS_HLEN) < 0)
> +               return -ENOMEM;
> +
> +       err = make_writable(skb, skb->mac_len);
> +       if (unlikely(err))
> +               return err;
> +
> +       skb_push(skb, MPLS_HLEN);
> +       memmove(skb_mac_header(skb) - MPLS_HLEN, skb_mac_header(skb),
> +               skb->mac_len);
> +       skb_reset_mac_header(skb);
> +
> +       new_mpls_lse = (__be32 *)mac_header_end(skb);
> +       *new_mpls_lse = mpls->mpls_lse;
> +
> +       if (get_ip_summed(skb) == OVS_CSUM_COMPLETE)
> +               skb->csum = csum_add(skb->csum, csum_partial(new_mpls_lse,
> +                                                            MPLS_HLEN, 0));
> +
> +       /* If the skb was non-MPLS then record the original protocol
> +        * in skb->inner_protocol. This is to allow GSO to perform
> +        * segmentation of the inner-packet.
> +        */
> +       if (!eth_p_mpls(skb->protocol))
> +               skb_set_inner_protocol(skb, skb->protocol);
> +
> +       set_ethertype(skb, mpls->mpls_ethertype);
> +       if (!vlan_tx_tag_present(skb) && skb->protocol != htons(ETH_P_8021Q)) {
> +               skb->protocol = mpls->mpls_ethertype;
> +       }

I don't think you need to check for vlan_tx_tag_present here - when
doing vlan offloading, skb->protocol reflects the packet as if there
was no vlan tag there at all. Also, you shouldn't need the braces
there.

When looking at this, I had an additional thought: different versions
of OpenFlow define the tag order differently and there is some limited
flexibility in how they get applied. Obviously, we're assuming that
vlans are applied before MPLS here. However, I'm not sure that is
actually required and may even simplify things (for example, your MPLS
GSO code doesn't do anything special with vlans and it should continue
to just work either way). I'm hesitant to suggest changes at this
point but it might actually make things cleaner and work with the
general model that we already have.

> diff --git a/datapath/datapath.c b/datapath/datapath.c
> index 42af315..7e0e7a1 100644
> --- a/datapath/datapath.c
> +++ b/datapath/datapath.c
> @@ -551,18 +551,132 @@ static inline void add_nested_action_end(struct sw_flow_actions *sfa, int st_off
>         a->nla_len = sfa->actions_len - st_offset;
>  }
>
> -static int validate_and_copy_actions(const struct nlattr *attr,
> +#define MAX_ETH_TYPES 16 /* Arbitrary Limit */
> +
> +/* struct eth_types - possible eth types
> + * @types: provides storage for the possible eth types.
> + * @start: is the index of the first entry of types which is possible.
> + * @end: is the index of the last entry of types which is possible.
> + * @cursor: is the index of the entry which should be updated if an action
> + * changes the eth type.
> + *
> + * Due to the sample action there may be multiple possible eth types.
> + * In order to correctly validate actions all possible types are tracked
> + * and verified. This is done using struct eth_types.
> + *
> + * Initially start, end and cursor should be 0, and the first element of
> + * types should be set to the eth type of the flow.
> + *
> + * When an action changes the eth type then the values of start and end are
> + * updated to the value of cursor. The new type is stored at types[cursor].
> + *
> + * When entering a sample action the start and cursor values are saved. The
> + * value of cursor is set to the value of end plus one.
> + *
> + * When leaving a sample action the start and cursor values are restored to
> + * their saved values.
> + *
> + * An example follows.
> + *
> + * actions: pop_mpls(A),sample(pop_mpls(B)),sample(pop_mpls(C)),pop_mpls(D)
> + *
> + * 0. Initial state:
> + *     types = { original_eth_type }
> + *     start = end = cursor = 0;
> + *
> + * 1. pop_mpls(A)
> + *    a. Check types from start (0) to end (0) inclusive
> + *       i.e. Check against original_eth_type
> + *    b. Set start = end = cursor
> + *    c. Set types[cursor] = A
> + *    New state:
> + *     types = { A }
> + *     start = end = cursor = 0;
> + *
> + * 2. Enter first sample()
> + *    a. Save start and cursor
> + *    b. Set cursor = end + 1
> + *    New state:
> + *     types = { A }
> + *     start = end = 0;
> + *     cursor = 1;
> + *
> + * 3. pop_mpls(B)
> + *    a. Check types from start (0) to end (0)
> + *       i.e: Check against A
> + *    b. Set start = end = cursor
> + *    c. Set types[cursor] = B
> + *    New state:
> + *     types = { A, B }
> + *     start = end = cursor = 1;
> + *
> + * 4. Leave first sample()
> + *    a. Restore start and cursor to the values when entering 2.
> + *    New state:
> + *     types = { A, B }
> + *     start = cursor = 0;
> + *     end = 1;
> + *
> + * 5. Enter second sample()
> + *    a. Save start and cursor
> + *    b. Set cursor = end + 1
> + *    New state:
> + *     types = { A, B }
> + *     start = 0;
> + *     end = 1;
> + *     cursor = 2;
> + *
> + * 6. pop_mpls(C)
> + *    a. Check types from start (0) to end (1) inclusive
> + *       i.e: Check against A and B
> + *    b. Set start = end = cursor
> + *    c. Set types[cursor] = C
> + *    New state:
> + *     types = { A, B, C }
> + *     start = end = cursor = 2;
> + *
> + * 7. Leave second sample()
> + *    a. Restore start and cursor to the values when entering 5.
> + *    New state:
> + *     types = { A, B, C }
> + *     start = cursor = 0;
> + *     end = 2;
> + *
> + * 8. pop_mpls(D)
> + *    a. Check types from start (0) to end (2) inclusive
> + *       i.e: Check against A, B and C
> + *    b. Set start = end = cursor
> + *    c. Set types[cursor] = D
> + *    New state:
> + *     types = { D } // Trailing entries of type are no longer used end = 0
> + *     start = end = cursor = 0;
> + */

The algorithm looks good to me now, thanks for working through it and
for documenting it.

> +static int validate_and_copy_actions__(const struct nlattr *attr,
> +                               const struct sw_flow_key *key, int depth,
> +                               struct sw_flow_actions **sfa,
> +                               struct eth_types *eth_types)
[...]
> +               case OVS_ACTION_ATTR_POP_MPLS: {
> +                       size_t i;

size_t seems a little bit odd to me here, maybe just use an int?

> +                       for (i = eth_types->start; i <= eth_types->end; i++)
> +                               if (!eth_p_mpls(eth_types->types[i]))
> +                                       return -EINVAL;
> +
> +                       /* Disallow subsequent L2.5+ set and mpls_pop actions
> +                        * as there is no check here to ensure that the new
> +                        * eth_type is valid and thus set actions could
> +                        * write off the end of the packet or otherwise
> +                        * corrupt it.
> +                        *
> +                        * Support for these actions that after mpls_pop
> +                        * using packet recirculation is planned.
> +                        * are planned to be supported using using packet
> +                        * recirculation.

I think there is an editing error in the above comment.

> diff --git a/datapath/datapath.h b/datapath/datapath.h
> index ad59a3a..e47493a 100644
> --- a/datapath/datapath.h
> +++ b/datapath/datapath.h
> +#ifdef HAVE_INNER_PROTOCOL
> +static inline void skb_set_inner_protocol(struct sk_buff *skb, __be16 ethertype)
> +{
> +       skb->inner_protocol = ethertype;
> +}
> +
> +static inline __be16 skb_get_inner_protocol(struct sk_buff *skb)
> +{
> +       return skb->inner_protocol;
> +}
> +
> +static inline bool kernel_supports_mpls_gso(void)
> +{
> +       return true;
> +}
> +#else
> +static inline void skb_set_inner_protocol(struct sk_buff *skb, __be16 ethertype) {
> +       OVS_CB(skb)->inner_protocol = ethertype;
> +}
> +
> +static inline __be16 skb_get_inner_protocol(struct sk_buff *skb)
> +{
> +       return OVS_CB(skb)->inner_protocol;
> +}
> +
> +static inline bool kernel_supports_mpls_gso(void)
> +{
> +       return false;
> +}
> +#endif

Can we move these functions compat.h, prefix them with ovs_, and make
it clear that they are part of OVS and not skbuff.h?

> diff --git a/datapath/tunnel.c b/datapath/tunnel.c
> index 8c93e18..2d870aa 100644
> --- a/datapath/tunnel.c
> +++ b/datapath/tunnel.c
> @@ -150,6 +150,16 @@ static struct sk_buff *handle_offloads(struct sk_buff *skb)
>
>         if (skb_is_gso(skb)) {
>                 struct sk_buff *nskb;
> +               __be16 mpls_protocol = htons(0);
> +
> +               /* Swap the protocol so we can reuse the existing
> +                * skb_gso_segment() function to handle L3 GSO. We will
> +                * restore this later. */
> +               if (!kernel_supports_mpls_gso() && eth_p_mpls(skb->protocol) &&
> +                   !eth_p_mpls(skb_get_inner_protocol(skb))) {
> +                       mpls_protocol = skb->protocol;
> +                       skb->protocol = skb_get_inner_protocol(skb);
> +               }
>
>                 nskb = __skb_gso_segment(skb, 0, false);
>                 if (IS_ERR(nskb)) {
> @@ -159,6 +169,13 @@ static struct sk_buff *handle_offloads(struct sk_buff *skb)
>
>                 consume_skb(skb);
>                 skb = nskb;
> +
> +               if (mpls_protocol != htons(0)) {
> +                       do {
> +                               nskb->protocol = mpls_protocol;
> +                               nskb = nskb->next;
> +                       } while (nskb);
> +               }
>         } else if (get_ip_summed(skb) == OVS_CSUM_PARTIAL) {
>                 /* Pages aren't locked and could change at any time.
>                  * If this happens after we compute the checksum, the

Can we push this completely into the skb_gso_segment() compatibility
code? It's both nicer and may make the interactions with the vlan code
less confusing.

> diff --git a/datapath/vport-netdev.c b/datapath/vport-netdev.c
> index 4e7342c..db3284f 100644
> --- a/datapath/vport-netdev.c
> +++ b/datapath/vport-netdev.c
> @@ -309,8 +310,31 @@ static int netdev_send(struct vport *vport, struct sk_buff *skb)
>         skb->dev = netdev_vport->dev;
>         forward_ip_summed(skb, true);
>
> +       vlan = mpls = false;
> +
> +       if (!kernel_supports_mpls_gso() && eth_p_mpls(skb->protocol) &&
> +           !eth_p_mpls(skb_get_inner_protocol(skb)))
> +               mpls = true;
> +
> +       /* If we are segmenting a VLAN packet, then we don't need to handle
> +        * MPLS segmentation, because MPLS is part of the extended L2 header
> +        * and the kernel already knows how to handle this. */
>         if (vlan_tx_tag_present(skb) && !dev_supports_vlan_tx(skb->dev)) {
> -               int features;
> +               mpls = false;
> +               vlan = true;
> +       }

Don't we still need to save and restore the MPLS skb->protocol field
even if we are also processing vlans?

> @@ -341,10 +376,14 @@ static int netdev_send(struct vport *vport, struct sk_buff *skb)
>                                 nskb = skb->next;
>                                 skb->next = NULL;
>
> -                               skb = __vlan_put_tag(skb, vlan_tx_tag_get(skb));
> +                               if (vlan)
> +                                       skb = __vlan_put_tag(skb, vlan_tx_tag_get(skb));
>                                 if (likely(skb)) {
>                                         len += skb->len;
> -                                       vlan_set_tci(skb, 0);
> +                                       if (mpls)
> +                                               skb->protocol = mpls_protocol;
> +                                       if (vlan)
> +                                               vlan_set_tci(skb, 0);
>                                         dev_queue_xmit(skb);
>                                 }

The order of setting the protocol field is different from the one
below and I think this one isn't right because it means that the MPLS
protocol can override the vlan.

Pravin, can you take a look at this as well since you are working in
the same area?



More information about the dev mailing list