[ovs-dev] [PATCH 10/15] datapath: Add basic MPLS support to kernel

Jesse Gross jesse at nicira.com
Tue Feb 26 22:49:47 UTC 2013


On Fri, Feb 15, 2013 at 1:56 AM, Simon Horman <horms at verge.net.au> wrote:
> Allow datapath to recognize and extract MPLS labels into flow keys
> and execute actions which push, pop, and set labels on packets.
>
> Based heavily on work by Leo Alterman and Ravi K.
>
> Cc: Ravi K <rkerur at gmail.com>
> Cc: Leo Alterman <lalterman at nicira.com>
> Reviewed-by: Isaku Yamahata <yamahata at valinux.co.jp>
> Signed-off-by: Simon Horman <horms at verge.net.au>

I received some sparse errors:
  CHECK   /home/jgross/openvswitch/datapath/linux/datapath.c
/home/jgross/openvswitch/datapath/linux/datapath.c:74:5: warning:
symbol 'ovs_dp_ioctl_hook' was not declared. Should it be static?
/home/jgross/openvswitch/datapath/linux/datapath.c:87:13: warning:
restricted __be16 degrades to integer
/home/jgross/openvswitch/datapath/linux/datapath.c:87:13: warning:
restricted __be16 degrades to integer

The first one looks like it is due to a rebase error and the second
one seems like a real endian problem.

A couple high level comments:
 * I'd like to nail down the format of the userspace/kernel interface
for multiple levels of tags.  We don't need to implement multiple
levels but we should have a good plan on how we would cleanly extend
the interface to do that in the future if we want.  Ben and I had
talked about using an array of tags in a single Netlink attribute,
which seems fairly clean and avoids needing lots of encap attributes.
 * I'm not really excited about enabling userspace to work blind, such
as popping off multiple levels of tags or manipulating protocols
beyond what it can see.  This is currently a problem for vlans and the
combination of MPLS and vlans makes things worse.  Not to sound like a
broken record but recirculation would solve this problem once and for
all.  Also, in the case of vlans, even though OpenFlow and OVS do
support this behavior currently, I don't know of anyone using it since
they don't have any visibility into the packet either.

I'll make a few comments on the code below but the second one impacts
a number of places so it's important to get resolved first.

> diff --git a/datapath/actions.c b/datapath/actions.c
> index f638ffc..60522be 100644
> --- a/datapath/actions.c
> +++ b/datapath/actions.c
> +static int push_mpls(struct sk_buff *skb, const struct ovs_action_push_mpls *mpls)
> +{
> +       u32 l2_size;
> +       __be32 *new_mpls_lse;
> +
> +       if (skb_cow_head(skb, MPLS_HLEN) < 0) {
> +               kfree_skb(skb);
> +               return -ENOMEM;
> +       }

I don't think there is a reason to free the skb on error here since it
will get handled later on like other actions.  vlans require a special
case but that doesn't mean that we have to propagate it.

> +       l2_size = skb_cb_l2_size(skb);
> +       skb_push(skb, MPLS_HLEN);
> +       memmove(skb_mac_header(skb) - MPLS_HLEN, skb_mac_header(skb), l2_size);
> +       skb_reset_mac_header(skb);

Should we set the mac len as well?

> +       new_mpls_lse = (__be32 *)(skb_mac_header(skb) + l2_size);
> +       *new_mpls_lse = mpls->mpls_lse;
> +
> +       set_ethertype(skb, mpls->mpls_ethertype);
> +       return 0;
> +}

There are potentially a number of issues with offloading here: if we
have a packet that requires some form of offloading (say TSO) and we
push an MPLS header on the front, then it's quite likely that the NIC
can no longer handle the packet and we must emulate in software first.
 That would ideally mean GSO support for MPLS (as was done for vlan).

In addition, as MPLS labels are modified we need to update skb->csum
in the CHECKSUM_COMPLETE case (it looks like we have a similar issue
for some of the other modifications as well).

> +static int set_mpls(struct sk_buff *skb, const __be32 *mpls_lse)
> +{
> +       __be16 current_ethertype = get_ethertype(skb);
> +       if (eth_p_mpls(current_ethertype))
> +               memcpy(skb_cb_mpls_bos(skb), mpls_lse, sizeof(__be32));

You could use MPLS_HLEN here.

> diff --git a/datapath/datapath.c b/datapath/datapath.c
> index 87c96ae..1386917 100644
> --- a/datapath/datapath.c
> +++ b/datapath/datapath.c
> +void skb_cb_set_l2_size(struct sk_buff *skb)
> +{
> +       struct ethhdr *eth;
> +       int nh_ofs;
> +       __be16 dl_type = 0;
> +
> +       skb_reset_mac_header(skb);
> +
> +       eth = eth_hdr(skb);
> +       nh_ofs = sizeof(struct ethhdr);
> +       if (likely(eth->h_proto >= htons(ETH_TYPE_MIN))) {
> +               dl_type = eth->h_proto;
> +
> +               while (dl_type == htons(ETH_P_8021Q) &&
> +                               skb->len >= nh_ofs + sizeof(struct vlan_hdr)) {
> +                       struct vlan_hdr *vh = (struct vlan_hdr*)(skb->data + nh_ofs);
> +                       dl_type = vh->h_vlan_encapsulated_proto;
> +                       nh_ofs += sizeof(struct vlan_hdr);
> +               }
> +
> +               OVS_CB(skb)->l2_size = nh_ofs;
> +       } else {
> +               OVS_CB(skb)->l2_size = 0;
> +       }
> +}

It really seems like this should happen as part of flow extraction,
similar to the initialization of the other layer pointers, rather than
separately in every place that we get a packet.  However, I don't
think that we should be skipping over more vlan headers than we can
parse anyways so if we follow the model that I mentioned above then
the initialization should just happen naturally.

> +unsigned char *skb_cb_mpls_bos(const struct sk_buff *skb)
> +{
> +       return skb_mac_header(skb) + OVS_CB(skb)->l2_size;
> +}

Isn't this actually supposed to point to the top of the stack? (I
think it does but the name seems wrong.)

> +ptrdiff_t skb_cb_l2_size(const struct sk_buff *skb)
> +{
> +       return OVS_CB(skb)->l2_size;
> +}

It seems like in some places we use this accessor function and in
other places just retrieve the size directly.  Does it provide much
value?

> @@ -667,6 +706,11 @@ static int validate_set(const struct nlattr *a,
>
>                 return validate_tp_port(flow_key);
>
> +       case OVS_KEY_ATTR_MPLS:
> +               if (!eth_p_mpls(flow_key->eth.type))
> +                       return -EINVAL;

I think this will fail if you try to first push and then set an MPLS header.

> diff --git a/datapath/flow.c b/datapath/flow.c
> index fad9e19..27e1920 100644
> --- a/datapath/flow.c
> +++ b/datapath/flow.c
> @@ -728,6 +728,17 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key,
>                         memcpy(key->ipv4.arp.tha, arp->ar_tha, ETH_ALEN);
>                         key_len = SW_FLOW_KEY_OFFSET(ipv4.arp);
>                 }
> +       } else if (eth_p_mpls(key->eth.type)) {
> +               error = check_header(skb, MPLS_HLEN);
> +               if (unlikely(error))
> +                       goto out;
> +
> +               key_len = SW_FLOW_KEY_OFFSET(mpls.top_lse);
> +               memcpy(&key->mpls.top_lse, skb_network_header(skb), MPLS_HLEN);
> +
> +               /* Update network header */
> +               skb_set_network_header(skb, skb_network_header(skb) -
> +                                      skb->data + MPLS_HLEN);

Is it possible to actually use this network header pointer?

> @@ -838,6 +849,7 @@ const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
>         [OVS_KEY_ATTR_ETHERNET] = sizeof(struct ovs_key_ethernet),
>         [OVS_KEY_ATTR_VLAN] = sizeof(__be16),
>         [OVS_KEY_ATTR_ETHERTYPE] = sizeof(__be16),
> +       [OVS_KEY_ATTR_MPLS] = sizeof(struct ovs_key_mpls),

Can you put this under the non-upstream group?

> @@ -1473,6 +1495,15 @@ int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb)
>                 arp_key->arp_op = htons(swkey->ip.proto);
>                 memcpy(arp_key->arp_sha, swkey->ipv4.arp.sha, ETH_ALEN);
>                 memcpy(arp_key->arp_tha, swkey->ipv4.arp.tha, ETH_ALEN);
> +       } else if (eth_p_mpls(swkey->eth.type)) {
> +               struct ovs_key_mpls *mpls_key;
> +
> +               nla = nla_reserve(skb, OVS_KEY_ATTR_MPLS, sizeof(*mpls_key));
> +               if (!nla)
> +                       goto nla_put_failure;
> +               mpls_key = nla_data(nla);
> +               memset(mpls_key, 0, sizeof(struct ovs_key_mpls));

Is there any reason for this memset here?

> diff --git a/datapath/flow.h b/datapath/flow.h
> index 6949640..d8e350c 100644
> --- a/datapath/flow.h
> +++ b/datapath/flow.h
> @@ -73,6 +73,9 @@ struct sw_flow_key {
>                 __be16 type;            /* Ethernet frame type. */
>         } eth;
>         struct {
> +               __be32 top_lse;         /* top label stack entry */
> +       } mpls;

We should be able to make this a union with the IP headers: since we
stop parsing after the MPLS header we can never have both at the same
time so we can avoid expanding the size of the struct in the non-MPLS
case.

> +#define ETH_TYPE_MIN 0x600

This really seems like it belongs in the upstream kernel someplace
since there's a million uses of this constant.



More information about the dev mailing list