[ovs-dev] [PATCH 1/2] Add support for tun_key to OVS datapath
Kyle Mestery (kmestery)
kmestery at cisco.com
Wed Oct 3 19:01:04 UTC 2012
On Oct 3, 2012, at 11:38 AM, Jesse Gross wrote:
> On Mon, Oct 1, 2012 at 8:52 AM, Kyle Mestery <kmestery at cisco.com> wrote:
>> This is a first pass at providing a tun_key which can be
>> used as the basis for flow-based tunnelling. The
>> tun_key includes and replaces the tun_id in both struct
>> ovs_skb_cb and struct sw_tun_key.
>>
>> This patch allows all existing tun_id behaviour to still work. Existing
>> users of tun_id are redirected to tun_key->tun_id to retain compatibility.
>> However, when the userspace code is updated to make use of the new tun_key,
>> the old behaviour will be deprecated and removed.
>>
>> NOTE: With these changes, the tunneling code no longer assumes input and
>> output keys are symmetric. If they are not, PMTUD needs to be disabled
>> for tunneling to work.
>>
>> Signed-off-by: Kyle Mestery <kmestery at cisco.com>
>> Cc: Simon Horman <horms at verge.net.au>
>> Cc: Jesse Gross <jesse at nicira.com>
>
> I see some new compiler warnings with these changes applied:
>
> CC [M] /home/jgross/openvswitch/datapath/linux/flow.o
> /home/jgross/openvswitch/datapath/linux/flow.c: In function
> ‘ovs_flow_from_nlattrs’:
> /home/jgross/openvswitch/datapath/linux/flow.c:1044:6: warning:
> suggest parentheses around operand of ‘!’ or change ‘&’ to ‘&&’ or ‘!’
> to ‘~’ [-Wparentheses]
> /home/jgross/openvswitch/datapath/linux/flow.c:1046:6: warning:
> suggest parentheses around operand of ‘!’ or change ‘&’ to ‘&&’ or ‘!’
> to ‘~’ [-Wparentheses]
>
Thanks, I'll clean these up.
> Also, when I was working on the userspace portions I defined three
> flags to be used instead of the current tunnel port configuration:
>
> Flow lib/flow.h:
> #define FLOW_TNL_F_DONT_FRAGMENT (1 << 0)
> #define FLOW_TNL_F_CSUM (1 << 1)
> #define FLOW_TNL_F_KEY (1 << 2)
>
> Can you add these to the kernel header and allow them to control
> encapsulation/decapsulation? This was one of the main blockers that I
> ran into when doing the userspace work.
>
Sure, I can add this.
> Finally, the remaining piece in order for this to be useful in the new
> model is to allow packets to be send and received on tunnel ports
> without a dest IP. Otherwise, it's not possible to create the tunnel
> ports that can handle traffic on a flow basis. That could possibly be
> a follow up patch though.
>
I'll look into this, maybe make it a subsequent patch.
>> diff --git a/datapath/actions.c b/datapath/actions.c
>> index ec9b595..a70cde8 100644
>> --- a/datapath/actions.c
>> +++ b/datapath/actions.c
>> @@ -377,6 +392,7 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
>> int prev_port = -1;
>> const struct nlattr *a;
>> int rem;
>> + struct ovs_key_ipv4_tunnel tun_key;
>
> Unfortunately, I think this needs to be on the stack in the top level
> ovs_execute_actions(). Imagine this set of actions:
> sample(0.5, set(tun_id=10)), output: tunnel
>
> When we assign tun_key the stack frame will look like this:
> #1 ovs_execute_actions()
> #2 do_execute_actions()
> #3 sample()
> #4 do_execute_actions()
> #5 execute_set_action()
>
> tun_key will be from stack frame #4.
>
> However, when we actually use it the stack will look like:
> #1 ovs_execute_actions()
> #2 do_execute_actions()
> #3 do_output()
>
OK, thanks for the explanation here.
>> diff --git a/datapath/flow.c b/datapath/flow.c
>> index d07337c..5be492e 100644
>> --- a/datapath/flow.c
>> +++ b/datapath/flow.c
>> static int ipv4_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_len,
>> @@ -996,6 +998,7 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp,
>> {
>> const struct nlattr *a[OVS_KEY_ATTR_MAX + 1];
>> const struct ovs_key_ethernet *eth_key;
>> + __be64 tun_id = 0;
>
> I think this is only used in the block where we parse
> TUN_ID/IPV4_TUNNEL, so we can just declare it there.
>
Sure, will move it.
>> @@ -1022,10 +1025,26 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp,
>> swkey->phy.in_port = DP_MAX_PORTS;
>> }
>>
>> - if (attrs & (1ULL << OVS_KEY_ATTR_TUN_ID)) {
>> - swkey->phy.tun_id = nla_get_be64(a[OVS_KEY_ATTR_TUN_ID]);
>> + /* OVS_KEY_ATTR_TUN_ID and OVS_KEY_ATTR_IPV4_TUNEL must both arrive
>> + * together.
>> + */
>> + if (attrs & (1ULL << OVS_KEY_ATTR_TUN_ID) &&
>> + attrs & (1ULL << OVS_KEY_ATTR_IPV4_TUNNEL)) {
>
> I know I said that we should combine these checks together but I
> realized that there is a problem. In order to maintain compatibility
> in the future, we want to enforce sanity all that we want to make now.
> It's fine to make this that were previously rejected OK but we can't
> do the reverse. Specifically, we should enforce that whenever we have
> an OVS_KEY_ATTR_IPV4_TUNNEL, it should be valid (i.e. have a
> ipv4_dst). In the case of flow misses, this isn't a problem because
> all of our tunnels will produce a valid IPV4_TUNNEL if they have a
> TUN_ID, so we could mandate both.
>
> However, in the case of flow metadata, like a packet out from a
> controller, we won't have IPV4_TUNNEL information until we actually
> have the full userspace infrastructure. It's possible these packets
> end up going back to userspace again and when we serialize the flow,
> we'll break the rule that both IPV4_TUNNEL and TUN_ID come together.
>
> This is a long way of saying that we do have to allow these to be
> separate but if both come together then we should check that the
> tun_id matches and should enforce that ipv4_dst != 0 everywhere.
>
OK, I think I may have had it this way at one point, I'll go ahead and reformulate
this check as above.
>> + swkey->tun.tun_key = *tun_key;
>> attrs &= ~(1ULL << OVS_KEY_ATTR_TUN_ID);
>> - }
>> + attrs &= ~(1ULL << OVS_KEY_ATTR_IPV4_TUNNEL);
>> + } else if ((attrs & (1ULL << OVS_KEY_ATTR_TUN_ID) &&
>> + !attrs & (1ULL << OVS_KEY_ATTR_IPV4_TUNNEL)) ||
>> + (attrs & (1ULL << OVS_KEY_ATTR_IPV4_TUNNEL) &&
>> + !attrs & (1ULL << OVS_KEY_ATTR_TUN_ID)))
>> + return -EINVAL;
>
> I don't believe that this else if block is necessary. If we don't get
> the correct tunnel information then we won't zero out the appropriate
> attributes and we'll return an error at the end for having unparsed
> information.
>
OK, will remove this.
>> @@ -1185,7 +1206,19 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64 *tun_id,
>> break;
>>
>> case OVS_KEY_ATTR_TUN_ID:
>> - *tun_id = nla_get_be64(nla);
>> + tun_id = nla_get_be64(nla);
>> + if (tun_key->tun_id != 0 &&
>> + tun_key->tun_id != tun_id)
>> + return -EINVAL;
>> + break;
>> +
>> + case OVS_KEY_ATTR_IPV4_TUNNEL:
>> + if (tun_key->tun_id != 0)
>> + tun_id = tun_key->tun_id;
>> + memcpy(tun_key, nla_data(nla),
>> + sizeof(*tun_key));
>> + if (tun_id != 0 && tun_id != tun_key->tun_id)
>> + return -EINVAL;
>> break;
>>
>> case OVS_KEY_ATTR_IN_PORT:
>
> Similar to my comments above, we should check that ipv4_dst != 0. I
> think we should also use a flag to distinguish the case of receiving a
> tun_id from the value 0.
>
Yes, I noticed in a few spots a check for tun_id == 0, and I believe it was
causing issues while I was unit testing this. I'll make the change.
>> diff --git a/datapath/flow.h b/datapath/flow.h
>> index 02c563a..5d4fcde 100644
>> --- a/datapath/flow.h
>> +++ b/datapath/flow.h
>> @@ -42,7 +42,6 @@ struct sw_flow_actions {
>>
>> struct sw_flow_key {
>> struct {
>> - __be64 tun_id; /* Encapsulating tunnel ID. */
>> u32 priority; /* Packet QoS priority. */
>> u16 in_port; /* Input switch port (or DP_MAX_PORTS). */
>> } phy;
>> @@ -92,6 +91,9 @@ struct sw_flow_key {
>> } nd;
>> } ipv6;
>> };
>> + struct {
>> + struct ovs_key_ipv4_tunnel tun_key; /* Encapsulating tunnel key. */
>> + } tun;
>> };
>
> For this patch we should keep struct tun at the beginning of the flow.
> Otherwise, the variable length flow matching won't take it into
> account and it won't get matched at all. In the next patch we can
> optimize it.
>
OK, got it, will migrate this.
>> diff --git a/datapath/tunnel.c b/datapath/tunnel.c
>> index d651c11..b62c469 100644
>> --- a/datapath/tunnel.c
>> +++ b/datapath/tunnel.c
>> @@ -1217,13 +1212,32 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
>> else
>> inner_tos = 0;
>>
>> + /* If OVS_CB(skb)->tun_key is NULL, point it at the local tun_key here,
>> + * and zero it out.
>> + */
>> + if (OVS_CB(skb)->tun_key == NULL) {
>> + memset(&tun_key, 0, sizeof(tun_key));
>> + OVS_CB(skb)->tun_key = &tun_key;
>> + }
>
> Can you stick this above the inner_tos block? Currently, it's right
> in the middle of the ToS code. I'd be inclined to put the address
> computation before ToS as because it feels more natural but that's
> less significant.
>
Will change this.
>> if (mutable->flags & TNL_F_TOS_INHERIT)
>> tos = inner_tos;
>> + else if (OVS_CB(skb)->tun_key && OVS_CB(skb)->tun_key->ipv4_dst != 0)
>> + tos = OVS_CB(skb)->tun_key->ipv4_tos;
>> else
>> tos = mutable->tos;
>
> We can drop all the checks for !OVS_CB(skb)->tun_key at this point
> since we just initialized it.
>
OK.
> However, if we do have OVS_CB(skb)->tun_key->ipv4_dst != 0, then it
> should completely override all other ToS processing, including both
> inheritance and ECN encapsulation.
>
Got it.
>> @@ -1260,7 +1274,10 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
>> }
>>
>> /* TTL */
>> - ttl = mutable->ttl;
>> + if (OVS_CB(skb)->tun_key && OVS_CB(skb)->tun_key->ipv4_dst != 0)
>> + ttl = OVS_CB(skb)->tun_key->ipv4_ttl;
>> + else
>> + ttl = mutable->ttl;
>> if (!ttl)
>> ttl = ip4_dst_hoplimit(&rt_dst(rt));
>
> I think the check for !ttl belongs in the else block. I can't think
> of a reason why userspace would use a TTL of zero but I'd prefer to
> keep these types of checks out of OVS once we move completely over to
> the new model.
>
Will make this change.
>> diff --git a/datapath/tunnel.h b/datapath/tunnel.h
>> index 1924017..9ff6e61 100644
>> --- a/datapath/tunnel.h
>> +++ b/datapath/tunnel.h
>> +static inline void tnl_tun_key_init(struct ovs_key_ipv4_tunnel *tun_key,
>> + const struct iphdr *iph, __be64 tun_id)
>> +{
>> + tun_key->tun_id = tun_id;
>> + tun_key->ipv4_src = iph->saddr;
>> + tun_key->ipv4_dst = iph->daddr;
>> + tun_key->ipv4_tos = iph->tos;
>> + tun_key->ipv4_ttl = iph->ttl;
>
> I noticed that this leaves flags uninitialized and is currently
> reporting garbage to userspace on receive. However, when we add the
> new flags to the kernel interface we'll actually need to populate it
> anyways since CSUM and KEY are relevant on receive.
OK, will update this as well.
More information about the dev
mailing list