[ovs-dev] [PATCH 10/15] datapath: Add basic MPLS support to kernel

Simon Horman horms at verge.net.au
Thu Feb 28 08:57:08 UTC 2013


On Wed, Feb 27, 2013 at 07:39:02AM -0800, Jesse Gross wrote:
> On Tue, Feb 26, 2013 at 11:10 PM, Simon Horman <horms at verge.net.au> wrote:
> > On Tue, Feb 26, 2013 at 02:49:47PM -0800, Jesse Gross wrote:
> >> A couple high level comments:
> >>  * I'd like to nail down the format of the userspace/kernel interface
> >> for multiple levels of tags.  We don't need to implement multiple
> >> levels but we should have a good plan on how we would cleanly extend
> >> the interface to do that in the future if we want.  Ben and I had
> >> talked about using an array of tags in a single Netlink attribute,
> >> which seems fairly clean and avoids needing lots of encap attributes.
> >
> > Is the implication here that the size of the array and thus depth of the
> > tags is fixed?
> 
> I think we could use a variable sized array and then use the size from
> the Netlink attribute to get the number of elements.  That way we
> could easily expand it over time.

Thanks, I think that has worked out pretty cleanly.

What I have done is to allow the attribute to be an array of one or more
struct ovs_key_mpls. This doesn't actually require any code changes because
its a superset of the previous arrangement of an element with
exactly one struct ovs_key_mpls. However, I added a little bit of code
to the datapath to allow verification parsing of the array. This is shown
in the incremental patch at the bottom of this email which I will roll
into the next revision of the patch to add MPLS support to the datapath
(the patch this thread is about).

I also added a bit of code to ovs-vswitchd to exercise this lightly
which seems to work out well enough.

> 
> >> > diff --git a/datapath/actions.c b/datapath/actions.c
> >> > index f638ffc..60522be 100644
> >> > --- a/datapath/actions.c
> >> > +++ b/datapath/actions.c
> >> > +       new_mpls_lse = (__be32 *)(skb_mac_header(skb) + l2_size);
> >> > +       *new_mpls_lse = mpls->mpls_lse;
> >> > +
> >> > +       set_ethertype(skb, mpls->mpls_ethertype);
> >> > +       return 0;
> >> > +}
> >>
> >> There are potentially a number of issues with offloading here: if we
> >> have a packet that requires some form of offloading (say TSO) and we
> >> push an MPLS header on the front, then it's quite likely that the NIC
> >> can no longer handle the packet and we must emulate in software first.
> >>  That would ideally mean GSO support for MPLS (as was done for vlan).
> >
> > I take it that would be work in the core kernel code.
> > Could you give me a few pointers as to what I should be looking at?
> 
> It would mostly be the code around skb_gso_segment() that would need
> to skip over and replicate the MPLS tags.  dev_hard_start_xmit() also
> likely needs some work to make sure that we take the software path
> instead of using hardware that can't offload.

Thanks. I have added it to the TODO list and will look into it.
No doubt I will have more questions at that time.

> >> > diff --git a/datapath/datapath.c b/datapath/datapath.c
> >> > index 87c96ae..1386917 100644
> >> > --- a/datapath/datapath.c
> >> > +++ b/datapath/datapath.c
> >> > +void skb_cb_set_l2_size(struct sk_buff *skb)
> >> > +{
> >> > +       struct ethhdr *eth;
> >> > +       int nh_ofs;
> >> > +       __be16 dl_type = 0;
> >> > +
> >> > +       skb_reset_mac_header(skb);
> >> > +
> >> > +       eth = eth_hdr(skb);
> >> > +       nh_ofs = sizeof(struct ethhdr);
> >> > +       if (likely(eth->h_proto >= htons(ETH_TYPE_MIN))) {
> >> > +               dl_type = eth->h_proto;
> >> > +
> >> > +               while (dl_type == htons(ETH_P_8021Q) &&
> >> > +                               skb->len >= nh_ofs + sizeof(struct vlan_hdr)) {
> >> > +                       struct vlan_hdr *vh = (struct vlan_hdr*)(skb->data + nh_ofs);
> >> > +                       dl_type = vh->h_vlan_encapsulated_proto;
> >> > +                       nh_ofs += sizeof(struct vlan_hdr);
> >> > +               }
> >> > +
> >> > +               OVS_CB(skb)->l2_size = nh_ofs;
> >> > +       } else {
> >> > +               OVS_CB(skb)->l2_size = 0;
> >> > +       }
> >> > +}
> >>
> >> It really seems like this should happen as part of flow extraction,
> >> similar to the initialization of the other layer pointers, rather than
> >> separately in every place that we get a packet.
> >
> > Yes, I think that should work.
> >
> > Thinking out aloud:
> >
> > I think that it should be possible to implement your idea by
> > having ovs_flow_extract() set OVS_CB(skb)->l2_size.
> >
> > Currently skb_cb_set_l2_size() is called in two locations,
> > ovs_packet_cmd_execute() and ovs_vport_receive().
> >
> > In the case of ovs_packet_cmd_execute() there is a previously a
> > call to ovs_flow_extract().
> >
> > And in the case of ovs_vport_receive() there is subsequently a
> > call to ovs_dp_process_received_packet() which in turn calls
> > ovs_flow_extract() if OVS_CB(skb)->flow is not set.
> 
> All of that sounds right to me.
> 
> >> > diff --git a/datapath/flow.c b/datapath/flow.c
> >> > index fad9e19..27e1920 100644
> >> > --- a/datapath/flow.c
> >> > +++ b/datapath/flow.c
> >> > @@ -728,6 +728,17 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key,
> >> >                         memcpy(key->ipv4.arp.tha, arp->ar_tha, ETH_ALEN);
> >> >                         key_len = SW_FLOW_KEY_OFFSET(ipv4.arp);
> >> >                 }
> >> > +       } else if (eth_p_mpls(key->eth.type)) {
> >> > +               error = check_header(skb, MPLS_HLEN);
> >> > +               if (unlikely(error))
> >> > +                       goto out;
> >> > +
> >> > +               key_len = SW_FLOW_KEY_OFFSET(mpls.top_lse);
> >> > +               memcpy(&key->mpls.top_lse, skb_network_header(skb), MPLS_HLEN);
> >> > +
> >> > +               /* Update network header */
> >> > +               skb_set_network_header(skb, skb_network_header(skb) -
> >> > +                                      skb->data + MPLS_HLEN);
> >>
> >> Is it possible to actually use this network header pointer?
> >
> > It is possible to use it in ovs_flow_extract_l3_onwards() in conjunction
> > with the inner/outer flow changes I have in other patches (and you have
> > asked me to drop).
> >
> > It does seem to me that it may be incorrect if there is more than one MPLS
> > LSE present in the frame.
> >
> > But I'm not sure if either of those answers relate to the point you are
> > making. Are there use-cases which concern you?
> 
> I was just unsure of when it would be useful.  It seems like if we're
> not going to try to act beyond the MPLS labels that we can parse then
> it makes the most sense to just leave the network layer unset (maybe
> we can use it to point to the start of the MPLS stack instead of
> needing a special l2 length then?).

I think that removing the call to skb_set_network_header() you have
isolated above is harmless enough as I don't think it is used because
if frame is MPLS then no l3+ processing is done.

I looked at making use of the network header to mark the top of
the MPLS stack and not adding the special l2 length, however, I believe
it breaks down in the situation I will now describe.

Suppose an IP frame has two actions applied, dec_ttl and push_mpls.
This is valid because an IP match may be made and ovs-vswitchd
can install set(ip) and push_mpls actions in the datapath.

Now, for some reason that I must confess that I don't fully understand
at this moment the actions are installed as push_mpls and then set(ip).

I believe that for the scheme you suggest above to work push_mpls (and
pop_mpls) need to update the network_header such that it always points to
the to of the MPLS stack. This is to allow multiple mpls actions to behave
correctly, e.g. mpls_push then mpls_set.

The problem being that set(ip) uses the network_header and if it has been
set to the top of the MPLS stack by push_mpls then the set(ip) action will
corrupt the packet.



------ incremental patch to allow OVS_KEY_ATTR_MPLS to be an array -------



diff --git a/datapath/datapath.c b/datapath/datapath.c
index eb2e91b..af0dba8 100644
--- a/datapath/datapath.c
+++ b/datapath/datapath.c
@@ -599,8 +599,7 @@ static int validate_set(const struct nlattr *a,
 		return -EINVAL;
 
 	if (key_type > OVS_KEY_ATTR_MAX ||
-	    (ovs_key_lens[key_type] != nla_len(ovs_key) &&
-	     ovs_key_lens[key_type] != -1))
+	    ovs_flow_verify_key_len(key_type, ovs_key))
 		return -EINVAL;
 
 	switch (key_type) {
diff --git a/datapath/flow.c b/datapath/flow.c
index c99a6f9..7528eb4 100644
--- a/datapath/flow.c
+++ b/datapath/flow.c
@@ -858,7 +858,7 @@ void ovs_flow_tbl_remove(struct flow_table *table, struct sw_flow *flow)
 }
 
 /* The size of the argument for each %OVS_KEY_ATTR_* Netlink attribute.  */
-const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
+static const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
 	[OVS_KEY_ATTR_ENCAP] = -1,
 	[OVS_KEY_ATTR_PRIORITY] = sizeof(u32),
 	[OVS_KEY_ATTR_IN_PORT] = sizeof(u32),
@@ -881,6 +881,11 @@ const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
 	[OVS_KEY_ATTR_TUN_ID] = sizeof(__be64),
 };
 
+/* Set the bit of a type in ovs_key_arrays if it is
+ * an array of one or more elements of size ovs_key_lens[type].
+ */
+static const u64 ovs_array_keys = (1ULL << OVS_KEY_ATTR_MPLS);
+
 static int ipv4_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_len,
 				  const struct nlattr *a[], u64 *attrs)
 {
@@ -987,6 +992,27 @@ static int ipv6_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_len,
 	return 0;
 }
 
+int ovs_flow_verify_key_len(u16 type, const struct nlattr *nla)
+{
+	int expected_len = ovs_key_lens[type];
+	int len = nla_len(nla);
+
+	if (len == -1)
+		return 0;
+
+	if (ovs_array_keys & (1ULL << type)) {
+		/* The type is an array.
+		 * Check if len is a non-zero, non-negative multiple
+		 * of expected_len.
+		 */
+		if (len < expected_len || len % expected_len)
+			return -EINVAL;
+	} else if  (len != expected_len)
+		return -EINVAL;
+
+	return 0;
+}
+
 static int parse_flow_nlattrs(const struct nlattr *attr,
 			      const struct nlattr *a[], u64 *attrsp)
 {
@@ -997,14 +1023,12 @@ static int parse_flow_nlattrs(const struct nlattr *attr,
 	attrs = 0;
 	nla_for_each_nested(nla, attr, rem) {
 		u16 type = nla_type(nla);
-		int expected_len;
 
 		if (type > OVS_KEY_ATTR_MAX || attrs & (1ULL << type))
 			return -EINVAL;
 
-		expected_len = ovs_key_lens[type];
-		if (nla_len(nla) != expected_len && expected_len != -1)
-			return -EINVAL;
+		if (ovs_flow_verify_key_len(type, nla))
+		    return -EINVAL;
 
 		attrs |= 1ULL << type;
 		a[type] = nla;
@@ -1305,7 +1329,6 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp,
 		memcpy(swkey->ipv4.arp.tha, arp_key->arp_tha, ETH_ALEN);
 	} else if (eth_p_mpls(swkey->eth.type)) {
 		const struct ovs_key_mpls *mpls_key;
-
 		if (!(attrs & (1ULL << OVS_KEY_ATTR_MPLS)))
 			return -EINVAL;
 		attrs &= ~(1ULL << OVS_KEY_ATTR_MPLS);
@@ -1353,7 +1376,7 @@ int ovs_flow_metadata_from_nlattrs(struct sw_flow *flow, int key_len, const stru
 		if (type <= OVS_KEY_ATTR_MAX && ovs_key_lens[type] > 0) {
 			int err;
 
-			if (nla_len(nla) != ovs_key_lens[type])
+			if (ovs_flow_verify_key_len(type, nla))
 				return -EINVAL;
 
 			switch (type) {
diff --git a/datapath/flow.h b/datapath/flow.h
index 99429d2..27e394b 100644
--- a/datapath/flow.h
+++ b/datapath/flow.h
@@ -240,7 +240,7 @@ void ovs_flow_tbl_insert(struct flow_table *table, struct sw_flow *flow,
 void ovs_flow_tbl_remove(struct flow_table *table, struct sw_flow *flow);
 
 struct sw_flow *ovs_flow_tbl_next(struct flow_table *table, u32 *bucket, u32 *idx);
-extern const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1];
+int ovs_flow_verify_key_len(u16 type, const struct nlattr *nla);
 int ipv4_tun_from_nlattr(const struct nlattr *attr,
 			 struct ovs_key_ipv4_tunnel *tun_key);
 int ipv4_tun_to_nlattr(struct sk_buff *skb,
diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
index 63d1cac..f7b788c 100644
--- a/include/linux/openvswitch.h
+++ b/include/linux/openvswitch.h
@@ -285,7 +285,7 @@ enum ovs_key_attr {
 	OVS_KEY_ATTR_IPV4_TUNNEL,  /* struct ovs_key_ipv4_tunnel */
 #endif
 
-	OVS_KEY_ATTR_MPLS = 62, /* struct ovs_key_mpls */
+	OVS_KEY_ATTR_MPLS = 62, /* Array of one or more struct ovs_key_mpls */
 	OVS_KEY_ATTR_TUN_ID = 63,  /* be64 tunnel ID */
 	__OVS_KEY_ATTR_MAX
 };



More information about the dev mailing list