[ovs-dev] [PATCH 1/1 V4] Add support for tun_key to OVS datapath
Kyle Mestery (kmestery)
kmestery at cisco.com
Fri Sep 7 19:56:45 UTC 2012
On Sep 6, 2012, at 10:24 PM, Jesse Gross wrote:
> On Wed, Sep 5, 2012 at 2:58 PM, Kyle Mestery <kmestery at cisco.com> wrote:
>> This is a first pass at providing a tun_key which can be
>> used as the basis for flow-based tunnelling. The tun_key
>> includes and replaces the tun_id in both struct ovs_skb_cb
>> and struct sw_tun_key.
>>
>> This patch allows all existing tun_id behaviour to still work. Existing
>> users of tun_id are redirected to tun_key->tun_id to retain compatibility.
>> However, when the userspace code is updated to make use of the new tun_key,
>> the old behaviour will be deprecated and removed.
>>
>> NOTE: With these changes, the tunneling code no longer assumes input and
>> output keys are symmetric. If they are not, PMTUD needs to be disabled
>> for tunneling to work.
>>
>> Signed-off-by: Kyle Mestery <kmestery at cisco.com>
>> Cc: Simon Horman <horms at verge.net.au>
>> Cc: Jesse Gross <jesse at nicira.com>
>
> A couple high level Linux kernel style comments:
> * I noticed a couple places that use spaces instead of tabs.
> * The multiline comment style used in networking (although not
> currently respected everywhere) is:
> /* Comment
> * comment
> */
OK, cleaned those up.
>
>> diff --git a/NEWS b/NEWS
>> index cbc5c58..0abf2fa 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -1,5 +1,7 @@
>> post-v1.8.0
>> ------------------------
>> + - The tunneling code no longer assumes input and output keys are symmetric.
>> + If they are not, PMTUD needs to be disabled for tunneling to work.
>
> I would probably note that this only applies in the case of flow-based keys.
>
Fixed.
>> diff --git a/datapath/actions.c b/datapath/actions.c
>> index ec9b595..31f6bd6 100644
>> --- a/datapath/actions.c
>> +++ b/datapath/actions.c
>> @@ -377,6 +381,16 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
>> int prev_port = -1;
>> const struct nlattr *a;
>> int rem;
>> + struct ovs_key_ipv4_tunnel tun_key;
>> +
>> + /*
>> + * If tun_key is NULL for this skb, point it at one on the stack for
>> + * action processing and output. This can disappear once we drop support
>> + * for setting tun_id outside of tun_key.
>> + */
>> + memset(&tun_key, 0, sizeof(struct ovs_key_ipv4_tunnel));
>> + if (OVS_CB(skb)->tun_key == NULL)
>> + OVS_CB(skb)->tun_key = &tun_key;
>
> I think ideally we want to push this down as far as possible. struct
> ovs_key_ipv4_tunnel needs to be on the stack in do_execute_actions()
> so that it is in scope when we output but it's best to do the
> memsetting and assignment only when (and if) we actually set the
> tun_id. We can just pass a pointer into execute_set_action().
>
OK, fixed.
>> diff --git a/datapath/flow.c b/datapath/flow.c
>> index d07337c..3b85c62 100644
>> --- a/datapath/flow.c
>> +++ b/datapath/flow.c
>> @@ -1023,10 +1026,28 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp,
>> }
>>
>> if (attrs & (1ULL << OVS_KEY_ATTR_TUN_ID)) {
>> - swkey->phy.tun_id = nla_get_be64(a[OVS_KEY_ATTR_TUN_ID]);
>> + tun_id = nla_get_be64(a[OVS_KEY_ATTR_TUN_ID]);
>> attrs &= ~(1ULL << OVS_KEY_ATTR_TUN_ID);
>> }
>>
>> + if (attrs & (1ULL << OVS_KEY_ATTR_IPV4_TUNNEL)) {
>> + struct ovs_key_ipv4_tunnel *tun_key;
>> + tun_key = nla_data(a[OVS_KEY_ATTR_IPV4_TUNNEL]);
>> +
>> + /*
>> + * If both OVS_KEY_ATTR_TUN_ID and OVS_KEY_ATTR_IPV4_TUNNEL are
>> + * present, ensure they are consistent or error out.
>> + */
>> + if (tun_id != 0) {
>> + /* Verify they match */
>> + if (tun_id != tun_key->tun_id)
>> + return -EINVAL;
>> + }
>> +
>> + swkey->tun.tun_key = *tun_key;
>> + attrs &= ~(1ULL << OVS_KEY_ATTR_IPV4_TUNNEL);
>> + }
>
> I think if only OVS_KEY_ATTR_TUN_ID is sent then this will result in
> tun_key never getting set at all. This shouldn't happen in practice
> because on flow setup we always will send both together and they
> should be echoed back together. It doesn't seem right to silently
> ignore one though.
>
> Here's a different option - we could enforce that both must come
> together and where there is overlap the information is consistent.
> This should be easier to check and we can avoid ambiguities around the
> value zero. It also treats them as one, which they effectively are.
> On both the sent and receive sides I would combine the if blocks to
> really make it explicit that they go together.
>
Done. I checked for both being set, and compared the values if so. I
then checked for only one being set, and returned an error if so.
>> +int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port,
>> + struct ovs_key_ipv4_tunnel *tun_key,
>> const struct nlattr *attr)
>> {
>> const struct nlattr *nla;
>> int rem;
>>
>> *in_port = DP_MAX_PORTS;
>> - *tun_id = 0;
>> + memset(tun_key, 0, sizeof(struct ovs_key_ipv4_tunnel));
>
> Can you use sizeof(*tun_key) here?
>
Fixed.
>> @@ -1185,7 +1207,12 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64 *tun_id,
>> break;
>>
>> case OVS_KEY_ATTR_TUN_ID:
>> - *tun_id = nla_get_be64(nla);
>> + tun_key->tun_id = nla_get_be64(nla);
>> + break;
>> +
>> + case OVS_KEY_ATTR_IPV4_TUNNEL:
>> + memcpy(tun_key, nla_data(nla),
>> + sizeof(*tun_key));
>
> We should enforce consistency here as well (this one is probably more
> likely to be broken anyways because flow setups just echo back what
> the kernel supplied but this is generated "by hand"). In this case we
> can't enforce that both come together.
>
I don't think there is an implicit order to the netlink attributes here, so what I did
was something like this:
diff --git a/datapath/flow.c b/datapath/flow.c
index 1c4eb99..5be492e 100644
--- a/datapath/flow.c
+++ b/datapath/flow.c
@@ -1187,9 +1187,10 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port,
{
const struct nlattr *nla;
int rem;
+ __be64 tun_id = 0;
*in_port = DP_MAX_PORTS;
- memset(tun_key, 0, sizeof(struct ovs_key_ipv4_tunnel));
+ memset(tun_key, 0, sizeof(*tun_key));
*priority = 0;
nla_for_each_nested(nla, attr, rem) {
@@ -1205,12 +1206,19 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port,
break;
case OVS_KEY_ATTR_TUN_ID:
- tun_key->tun_id = nla_get_be64(nla);
+ tun_id = nla_get_be64(nla);
+ if (tun_key->tun_id != 0 &&
+ tun_key->tun_id != tun_id)
+ return -EINVAL;
break;
case OVS_KEY_ATTR_IPV4_TUNNEL:
+ if (tun_key->tun_id != 0)
+ tun_id = tun_key->tun_id;
memcpy(tun_key, nla_data(nla),
sizeof(*tun_key));
+ if (tun_id != 0 && tun_id != tun_key->tun_id)
+ return -EINVAL;
break;
>> diff --git a/datapath/flow.h b/datapath/flow.h
>> index 02c563a..5d4fcde 100644
>> --- a/datapath/flow.h
>> +++ b/datapath/flow.h
>> @@ -42,7 +42,6 @@ struct sw_flow_actions {
>>
>> struct sw_flow_key {
>> struct {
>> - __be64 tun_id; /* Encapsulating tunnel ID. */
>> u32 priority; /* Packet QoS priority. */
>> u16 in_port; /* Input switch port (or DP_MAX_PORTS). */
>> } phy;
>> @@ -92,6 +91,9 @@ struct sw_flow_key {
>> } nd;
>> } ipv6;
>> };
>> + struct {
>> + struct ovs_key_ipv4_tunnel tun_key; /* Encapsulating tunnel key. */
>> + } tun;
>> };
>
> The solution for this is a little more complicated because for
> efficiency we only match on as much of the flow struct as is
> necessary. Since we currently calculate how far we need to search
> (i.e. TCP/IPv4) without taking into account a tunnel header at the end
> this will never match on the tunnel. The other less serious problem
> is that this reduces the effectiveness of the partial length match
> when tunneling because IPv4 packets will have to match over a series
> of zeros where the IPv6 header exists. Basically we want to stick
> tunnel headers at the end of the current match, where ever that might
> be.
>
> One way of doing this is to add a copy of the tun_key to each union
> that could possibly be the end of the struct. We essentially do that
> with the TCP/UDP ports which could follow either a IPv4 or IPv6
> header. However in the case of tunnels there are many more possible
> end locations (I count 7) and whereas L4 ports logically follows the
> L3 header that's not really the case with tunnels, we're just moving
> them around because we can and the benefit is large.
>
> I think what I would do is to make struct tun completely independent
> of struct sw_flow_key and then have it "float" over where ever we
> decide is the end of the flow struct. It's a little messy if you want
> random access to it because you have to compute the size of the
> earlier members but in practice don't think we ever do so it shouldn't
> be too bad.
>
I think I see what you're looking for here. I'm going to make this a separate patch
built on top of the other stuff, since it seems this may be a little more complicated
as you indicate. I'll send it out with the other patch, though.
>> diff --git a/datapath/tunnel.c b/datapath/tunnel.c
>> index d651c11..0687f80 100644
>> --- a/datapath/tunnel.c
>> +++ b/datapath/tunnel.c
>> @@ -799,10 +786,10 @@ static void create_tunnel_header(const struct vport *vport,
>> iph->ihl = sizeof(struct iphdr) >> 2;
>> iph->frag_off = htons(IP_DF);
>> iph->protocol = tnl_vport->tnl_ops->ipproto;
>> - iph->tos = mutable->tos;
>> + iph->tos = mutable->tos;
>> iph->daddr = rt->rt_dst;
>> iph->saddr = rt->rt_src;
>> - iph->ttl = mutable->ttl;
>> + iph->ttl = mutable->ttl;
>
> It looks like these might have been converted from tabs to spaces.
>
Fixed.
>> @@ -1029,7 +1017,8 @@ static struct rtable *__find_route(const struct tnl_mutable_config *mutable,
>>
>> static struct rtable *find_route(struct vport *vport,
>> const struct tnl_mutable_config *mutable,
>> - u8 tos, struct tnl_cache **cache)
>> + u8 tos, __be32 daddr, __be32 saddr,
>> + struct tnl_cache **cache)
>
> Can we make the argument list here match __find_route (where
> possible)? In particular, it's confusing that daddr and saddr are in
> opposite order and no static checker will be able to catch those
> mistakes.
>
Corrected.
>> @@ -1219,11 +1213,23 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
>>
>> if (mutable->flags & TNL_F_TOS_INHERIT)
>> tos = inner_tos;
>> + else if (OVS_CB(skb)->tun_key && OVS_CB(skb)->tun_key->ipv4_tos != 0)
>> + tos = OVS_CB(skb)->tun_key->ipv4_tos;
>> else
>> tos = mutable->tos;
>>
>> + if (OVS_CB(skb)->tun_key && OVS_CB(skb)->tun_key->ipv4_dst != 0)
>> + daddr = OVS_CB(skb)->tun_key->ipv4_dst;
>> + else
>> + daddr = mutable->key.daddr;
>> +
>> + if (OVS_CB(skb)->tun_key && OVS_CB(skb)->tun_key->ipv4_src != 0)
>> + saddr = OVS_CB(skb)->tun_key->ipv4_src;
>> + else
>> + saddr = mutable->key.saddr;
>
> For these checks I think we actually want to always check for
> tun_key->ipv4_dst != 0 as indicator for which type of tunnel key we
> are using rather than to check for the actual value. For example, in
> the case of ipv4_tos it's quite possible that 0 is the actual value
> that is intended to be set. If it is set then I would use it,
> overriding any other settings.
>
Corrected.
>> diff --git a/datapath/tunnel.h b/datapath/tunnel.h
>> index 1924017..78a4d14 100644
>> --- a/datapath/tunnel.h
>> +++ b/datapath/tunnel.h
>> @@ -286,4 +286,14 @@ static inline struct tnl_vport *tnl_vport_priv(const struct vport *vport)
>> return vport_priv(vport);
>> }
>>
>> +static inline void tun_key_init(struct ovs_key_ipv4_tunnel *tun_key,
>> + const struct iphdr *iph, __be64 tun_id)
>
> Can you use tnl_ as the prefix here like the other functions in this file?
>
Fixed.
>> diff --git a/datapath/vport-capwap.c b/datapath/vport-capwap.c
>> index 05a099d..1e08d5a 100644
>> --- a/datapath/vport-capwap.c
>> +++ b/datapath/vport-capwap.c
>> @@ -220,7 +220,7 @@ static struct sk_buff *capwap_update_header(const struct vport *vport,
>> struct capwaphdr_wsi *wsi = (struct capwaphdr_wsi *)(cwh + 1);
>> struct capwaphdr_wsi_key *opt = (struct capwaphdr_wsi_key *)(wsi + 1);
>>
>> - opt->key = OVS_CB(skb)->tun_id;
>> + opt->key = OVS_CB(skb)->tun_key->tun_id;
>
> In theory it's possible that tun_key is NULL (at the moment we always
> initialize to something but that's not the long term goal), which
> means that this will crash. While we could add a check here I think
> it's probably better to do it in the generic tunneling code since once
> we switch over to flow based tunneling it will be required that you
> set the tun_key (for the IP addresses if for no other reason). What I
> would do is add a check for tun_key == NULL in tnl_send(). If it is
> NULL we can add a zeroed out dummy since there isn't a requirement
> that the user call set tun_id yet.
>
Checked added in ovs_tnl_send().
>> diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
>> index f5c9cca..f9f0c66 100644
>> --- a/include/linux/openvswitch.h
>> +++ b/include/linux/openvswitch.h
>> @@ -278,7 +278,8 @@ enum ovs_key_attr {
>> OVS_KEY_ATTR_ICMPV6, /* struct ovs_key_icmpv6 */
>> OVS_KEY_ATTR_ARP, /* struct ovs_key_arp */
>> OVS_KEY_ATTR_ND, /* struct ovs_key_nd */
>> - OVS_KEY_ATTR_TUN_ID = 63, /* be64 tunnel ID */
>> + OVS_KEY_ATTR_IPV4_TUNNEL, /* struct ovs_key_ipv4_tunnel */
>> + OVS_KEY_ATTR_TUN_ID = 63, /* be64 tunnel ID */
>
> I would assign OVS_KEY_ATTR_IPV4_TUNNEL the value of 62 for the time
> being until we are ready to lock things down.
>
Got it, done.
>> diff --git a/lib/odp-util.c b/lib/odp-util.c
>> index 901dac3..5a17e11 100644
>> --- a/lib/odp-util.c
>> +++ b/lib/odp-util.c
>> @@ -99,13 +99,14 @@ ovs_key_attr_to_string(enum ovs_key_attr attr)
>> case OVS_KEY_ATTR_ETHERTYPE: return "eth_type";
>> case OVS_KEY_ATTR_IPV4: return "ipv4";
>> case OVS_KEY_ATTR_IPV6: return "ipv6";
>> + case OVS_KEY_ATTR_TUN_ID: return "tun_id";
>> + case OVS_KEY_ATTR_IPV4_TUNNEL: return "ipv4_tunnel";
>> case OVS_KEY_ATTR_TCP: return "tcp";
>> case OVS_KEY_ATTR_UDP: return "udp";
>> case OVS_KEY_ATTR_ICMP: return "icmp";
>> case OVS_KEY_ATTR_ICMPV6: return "icmpv6";
>> case OVS_KEY_ATTR_ARP: return "arp";
>> case OVS_KEY_ATTR_ND: return "nd";
>> - case OVS_KEY_ATTR_TUN_ID: return "tun_id";
>>
>> case __OVS_KEY_ATTR_MAX:
>> default:
>> @@ -601,13 +602,14 @@ odp_flow_key_attr_len(uint16_t type)
>> switch ((enum ovs_key_attr) type) {
>> case OVS_KEY_ATTR_ENCAP: return -2;
>> case OVS_KEY_ATTR_PRIORITY: return 4;
>> - case OVS_KEY_ATTR_TUN_ID: return 8;
>> case OVS_KEY_ATTR_IN_PORT: return 4;
>> case OVS_KEY_ATTR_ETHERNET: return sizeof(struct ovs_key_ethernet);
>> case OVS_KEY_ATTR_VLAN: return sizeof(ovs_be16);
>> case OVS_KEY_ATTR_ETHERTYPE: return 2;
>> case OVS_KEY_ATTR_IPV4: return sizeof(struct ovs_key_ipv4);
>> case OVS_KEY_ATTR_IPV6: return sizeof(struct ovs_key_ipv6);
>> + case OVS_KEY_ATTR_TUN_ID: return 8;
>> + case OVS_KEY_ATTR_IPV4_TUNNEL: return sizeof(struct ovs_key_ipv4_tunnel);
>
> Since the IP keys actually refer to the inner header I would logically
> group the tunnel attributes together where they are in the second
> block - after priority.
>
OK, makes sense. Corrected.
>> diff --git a/lib/odp-util.h b/lib/odp-util.h
>> index 16f2b15..250302e 100644
>> --- a/lib/odp-util.h
>> +++ b/lib/odp-util.h
>> @@ -90,12 +91,12 @@ int odp_actions_from_string(const char *, const struct simap *port_names,
>> * OVS_KEY_ATTR_ICMPV6 2 2 4 8
>> * OVS_KEY_ATTR_ND 28 -- 4 32
>> * -------------------------------------------------
>> - * total 156
>> + * total 184
>> *
>> * We include some slack space in case the calculation isn't quite right or we
>> * add another field and forget to adjust this value.
>> */
>> -#define ODPUTIL_FLOW_KEY_BYTES 200
>> +#define ODPUTIL_FLOW_KEY_BYTES 204
>
> Since this is just an arbitrary amount of padding we can probably keep
> it at 200 for the time being as a nice round number.
Fixed.
More information about the dev
mailing list