[ovs-dev] [PATCH] datapath: Add basic MPLS support to kernel

Pravin Shelar pshelar at nicira.com
Thu Jun 13 01:14:07 UTC 2013


On Tue, Jun 11, 2013 at 2:26 PM, Jesse Gross <jesse at nicira.com> wrote:
> On Mon, Jun 10, 2013 at 1:07 AM, Simon Horman <horms at verge.net.au> wrote:
>> diff --git a/datapath/actions.c b/datapath/actions.c
>> index 0dac658..197811a 100644
>> --- a/datapath/actions.c
>> +++ b/datapath/actions.c
>> +static int push_mpls(struct sk_buff *skb,
>> +                    const struct ovs_action_push_mpls *mpls)
>> +{
>> +       __be32 *new_mpls_lse;
>> +       int err;
>> +
>> +       if (skb_cow_head(skb, MPLS_HLEN) < 0)
>> +               return -ENOMEM;
>> +
>> +       err = make_writable(skb, skb->mac_len);
>> +       if (unlikely(err))
>> +               return err;
>> +
>> +       skb_push(skb, MPLS_HLEN);
>> +       memmove(skb_mac_header(skb) - MPLS_HLEN, skb_mac_header(skb),
>> +               skb->mac_len);
>> +       skb_reset_mac_header(skb);
>> +
>> +       new_mpls_lse = (__be32 *)mac_header_end(skb);
>> +       *new_mpls_lse = mpls->mpls_lse;
>> +
>> +       if (get_ip_summed(skb) == OVS_CSUM_COMPLETE)
>> +               skb->csum = csum_add(skb->csum, csum_partial(new_mpls_lse,
>> +                                                            MPLS_HLEN, 0));
>> +
>> +       /* If the skb was non-MPLS then record the original protocol
>> +        * in skb->inner_protocol. This is to allow GSO to perform
>> +        * segmentation of the inner-packet.
>> +        */
>> +       if (!eth_p_mpls(skb->protocol))
>> +               skb_set_inner_protocol(skb, skb->protocol);
>> +
>> +       set_ethertype(skb, mpls->mpls_ethertype);
>> +       if (!vlan_tx_tag_present(skb) && skb->protocol != htons(ETH_P_8021Q)) {
>> +               skb->protocol = mpls->mpls_ethertype;
>> +       }
>
> I don't think you need to check for vlan_tx_tag_present here - when
> doing vlan offloading, skb->protocol reflects the packet as if there
> was no vlan tag there at all. Also, you shouldn't need the braces
> there.
>
> When looking at this, I had an additional thought: different versions
> of OpenFlow define the tag order differently and there is some limited
> flexibility in how they get applied. Obviously, we're assuming that
> vlans are applied before MPLS here. However, I'm not sure that is
> actually required and may even simplify things (for example, your MPLS
> GSO code doesn't do anything special with vlans and it should continue
> to just work either way). I'm hesitant to suggest changes at this
> point but it might actually make things cleaner and work with the
> general model that we already have.
>
>> diff --git a/datapath/datapath.c b/datapath/datapath.c
>> index 42af315..7e0e7a1 100644
>> --- a/datapath/datapath.c
>> +++ b/datapath/datapath.c
>> @@ -551,18 +551,132 @@ static inline void add_nested_action_end(struct sw_flow_actions *sfa, int st_off
>>         a->nla_len = sfa->actions_len - st_offset;
>>  }
>>
>> -static int validate_and_copy_actions(const struct nlattr *attr,
>> +#define MAX_ETH_TYPES 16 /* Arbitrary Limit */
>> +
>> +/* struct eth_types - possible eth types
>> + * @types: provides storage for the possible eth types.
>> + * @start: is the index of the first entry of types which is possible.
>> + * @end: is the index of the last entry of types which is possible.
>> + * @cursor: is the index of the entry which should be updated if an action
>> + * changes the eth type.
>> + *
>> + * Due to the sample action there may be multiple possible eth types.
>> + * In order to correctly validate actions all possible types are tracked
>> + * and verified. This is done using struct eth_types.
>> + *
>> + * Initially start, end and cursor should be 0, and the first element of
>> + * types should be set to the eth type of the flow.
>> + *
>> + * When an action changes the eth type then the values of start and end are
>> + * updated to the value of cursor. The new type is stored at types[cursor].
>> + *
>> + * When entering a sample action the start and cursor values are saved. The
>> + * value of cursor is set to the value of end plus one.
>> + *
>> + * When leaving a sample action the start and cursor values are restored to
>> + * their saved values.
>> + *
>> + * An example follows.
>> + *
>> + * actions: pop_mpls(A),sample(pop_mpls(B)),sample(pop_mpls(C)),pop_mpls(D)
>> + *
>> + * 0. Initial state:
>> + *     types = { original_eth_type }
>> + *     start = end = cursor = 0;
>> + *
>> + * 1. pop_mpls(A)
>> + *    a. Check types from start (0) to end (0) inclusive
>> + *       i.e. Check against original_eth_type
>> + *    b. Set start = end = cursor
>> + *    c. Set types[cursor] = A
>> + *    New state:
>> + *     types = { A }
>> + *     start = end = cursor = 0;
>> + *
>> + * 2. Enter first sample()
>> + *    a. Save start and cursor
>> + *    b. Set cursor = end + 1
>> + *    New state:
>> + *     types = { A }
>> + *     start = end = 0;
>> + *     cursor = 1;
>> + *
>> + * 3. pop_mpls(B)
>> + *    a. Check types from start (0) to end (0)
>> + *       i.e: Check against A
>> + *    b. Set start = end = cursor
>> + *    c. Set types[cursor] = B
>> + *    New state:
>> + *     types = { A, B }
>> + *     start = end = cursor = 1;
>> + *
>> + * 4. Leave first sample()
>> + *    a. Restore start and cursor to the values when entering 2.
>> + *    New state:
>> + *     types = { A, B }
>> + *     start = cursor = 0;
>> + *     end = 1;
>> + *
>> + * 5. Enter second sample()
>> + *    a. Save start and cursor
>> + *    b. Set cursor = end + 1
>> + *    New state:
>> + *     types = { A, B }
>> + *     start = 0;
>> + *     end = 1;
>> + *     cursor = 2;
>> + *
>> + * 6. pop_mpls(C)
>> + *    a. Check types from start (0) to end (1) inclusive
>> + *       i.e: Check against A and B
>> + *    b. Set start = end = cursor
>> + *    c. Set types[cursor] = C
>> + *    New state:
>> + *     types = { A, B, C }
>> + *     start = end = cursor = 2;
>> + *
>> + * 7. Leave second sample()
>> + *    a. Restore start and cursor to the values when entering 5.
>> + *    New state:
>> + *     types = { A, B, C }
>> + *     start = cursor = 0;
>> + *     end = 2;
>> + *
>> + * 8. pop_mpls(D)
>> + *    a. Check types from start (0) to end (2) inclusive
>> + *       i.e: Check against A, B and C
>> + *    b. Set start = end = cursor
>> + *    c. Set types[cursor] = D
>> + *    New state:
>> + *     types = { D } // Trailing entries of type are no longer used end = 0
>> + *     start = end = cursor = 0;
>> + */
>
> The algorithm looks good to me now, thanks for working through it and
> for documenting it.
>
>> +static int validate_and_copy_actions__(const struct nlattr *attr,
>> +                               const struct sw_flow_key *key, int depth,
>> +                               struct sw_flow_actions **sfa,
>> +                               struct eth_types *eth_types)
> [...]
>> +               case OVS_ACTION_ATTR_POP_MPLS: {
>> +                       size_t i;
>
> size_t seems a little bit odd to me here, maybe just use an int?
>
>> +                       for (i = eth_types->start; i <= eth_types->end; i++)
>> +                               if (!eth_p_mpls(eth_types->types[i]))
>> +                                       return -EINVAL;
>> +
>> +                       /* Disallow subsequent L2.5+ set and mpls_pop actions
>> +                        * as there is no check here to ensure that the new
>> +                        * eth_type is valid and thus set actions could
>> +                        * write off the end of the packet or otherwise
>> +                        * corrupt it.
>> +                        *
>> +                        * Support for these actions that after mpls_pop
>> +                        * using packet recirculation is planned.
>> +                        * are planned to be supported using using packet
>> +                        * recirculation.
>
> I think there is an editing error in the above comment.
>
>> diff --git a/datapath/datapath.h b/datapath/datapath.h
>> index ad59a3a..e47493a 100644
>> --- a/datapath/datapath.h
>> +++ b/datapath/datapath.h
>> +#ifdef HAVE_INNER_PROTOCOL
>> +static inline void skb_set_inner_protocol(struct sk_buff *skb, __be16 ethertype)
>> +{
>> +       skb->inner_protocol = ethertype;
>> +}
>> +
>> +static inline __be16 skb_get_inner_protocol(struct sk_buff *skb)
>> +{
>> +       return skb->inner_protocol;
>> +}
>> +
>> +static inline bool kernel_supports_mpls_gso(void)
>> +{
>> +       return true;
>> +}
>> +#else
>> +static inline void skb_set_inner_protocol(struct sk_buff *skb, __be16 ethertype) {
>> +       OVS_CB(skb)->inner_protocol = ethertype;
>> +}
>> +
>> +static inline __be16 skb_get_inner_protocol(struct sk_buff *skb)
>> +{
>> +       return OVS_CB(skb)->inner_protocol;
>> +}
>> +
>> +static inline bool kernel_supports_mpls_gso(void)
>> +{
>> +       return false;
>> +}
>> +#endif
>
> Can we move these functions compat.h, prefix them with ovs_, and make
> it clear that they are part of OVS and not skbuff.h?
>
>> diff --git a/datapath/tunnel.c b/datapath/tunnel.c
>> index 8c93e18..2d870aa 100644
>> --- a/datapath/tunnel.c
>> +++ b/datapath/tunnel.c
>> @@ -150,6 +150,16 @@ static struct sk_buff *handle_offloads(struct sk_buff *skb)
>>
>>         if (skb_is_gso(skb)) {
>>                 struct sk_buff *nskb;
>> +               __be16 mpls_protocol = htons(0);
>> +
>> +               /* Swap the protocol so we can reuse the existing
>> +                * skb_gso_segment() function to handle L3 GSO. We will
>> +                * restore this later. */
>> +               if (!kernel_supports_mpls_gso() && eth_p_mpls(skb->protocol) &&
>> +                   !eth_p_mpls(skb_get_inner_protocol(skb))) {
>> +                       mpls_protocol = skb->protocol;
>> +                       skb->protocol = skb_get_inner_protocol(skb);
>> +               }
>>
>>                 nskb = __skb_gso_segment(skb, 0, false);
>>                 if (IS_ERR(nskb)) {
>> @@ -159,6 +169,13 @@ static struct sk_buff *handle_offloads(struct sk_buff *skb)
>>
>>                 consume_skb(skb);
>>                 skb = nskb;
>> +
>> +               if (mpls_protocol != htons(0)) {
>> +                       do {
>> +                               nskb->protocol = mpls_protocol;
>> +                               nskb = nskb->next;
>> +                       } while (nskb);
>> +               }
>>         } else if (get_ip_summed(skb) == OVS_CSUM_PARTIAL) {
>>                 /* Pages aren't locked and could change at any time.
>>                  * If this happens after we compute the checksum, the
>
> Can we push this completely into the skb_gso_segment() compatibility
> code? It's both nicer and may make the interactions with the vlan code
> less confusing.
>
>> diff --git a/datapath/vport-netdev.c b/datapath/vport-netdev.c
>> index 4e7342c..db3284f 100644
>> --- a/datapath/vport-netdev.c
>> +++ b/datapath/vport-netdev.c
>> @@ -309,8 +310,31 @@ static int netdev_send(struct vport *vport, struct sk_buff *skb)
>>         skb->dev = netdev_vport->dev;
>>         forward_ip_summed(skb, true);
>>
>> +       vlan = mpls = false;
>> +
>> +       if (!kernel_supports_mpls_gso() && eth_p_mpls(skb->protocol) &&
>> +           !eth_p_mpls(skb_get_inner_protocol(skb)))
>> +               mpls = true;
>> +
>> +       /* If we are segmenting a VLAN packet, then we don't need to handle
>> +        * MPLS segmentation, because MPLS is part of the extended L2 header
>> +        * and the kernel already knows how to handle this. */
>>         if (vlan_tx_tag_present(skb) && !dev_supports_vlan_tx(skb->dev)) {
>> -               int features;
>> +               mpls = false;
>> +               vlan = true;
>> +       }
>
> Don't we still need to save and restore the MPLS skb->protocol field
> even if we are also processing vlans?
>
>> @@ -341,10 +376,14 @@ static int netdev_send(struct vport *vport, struct sk_buff *skb)
>>                                 nskb = skb->next;
>>                                 skb->next = NULL;
>>
>> -                               skb = __vlan_put_tag(skb, vlan_tx_tag_get(skb));
>> +                               if (vlan)
>> +                                       skb = __vlan_put_tag(skb, vlan_tx_tag_get(skb));
>>                                 if (likely(skb)) {
>>                                         len += skb->len;
>> -                                       vlan_set_tci(skb, 0);
>> +                                       if (mpls)
>> +                                               skb->protocol = mpls_protocol;
>> +                                       if (vlan)
>> +                                               vlan_set_tci(skb, 0);
>>                                         dev_queue_xmit(skb);
>>                                 }
>
> The order of setting the protocol field is different from the one
> below and I think this one isn't right because it means that the MPLS
> protocol can override the vlan.
>
> Pravin, can you take a look at this as well since you are working in
> the same area?

I think this compatibility code can be moved to new compat code posted
on gre-restructuring code few changes.



More information about the dev mailing list