[ovs-dev] [PATCH 1/2] Add support for tun_key to OVS datapath

Kyle Mestery (kmestery) kmestery at cisco.com
Wed Oct 3 19:01:04 UTC 2012


On Oct 3, 2012, at 11:38 AM, Jesse Gross wrote:
> On Mon, Oct 1, 2012 at 8:52 AM, Kyle Mestery <kmestery at cisco.com> wrote:
>> This is a first pass at providing a tun_key which can be
>> used as the basis for flow-based tunnelling. The
>> tun_key includes and replaces the tun_id in both struct
>> ovs_skb_cb and struct sw_tun_key.
>> 
>> This patch allows all existing tun_id behaviour to still work. Existing
>> users of tun_id are redirected to tun_key->tun_id to retain compatibility.
>> However, when the userspace code is updated to make use of the new tun_key,
>> the old behaviour will be deprecated and removed.
>> 
>> NOTE: With these changes, the tunneling code no longer assumes input and
>> output keys are symmetric.  If they are not, PMTUD needs to be disabled
>> for tunneling to work.
>> 
>> Signed-off-by: Kyle Mestery <kmestery at cisco.com>
>> Cc: Simon Horman <horms at verge.net.au>
>> Cc: Jesse Gross <jesse at nicira.com>
> 
> I see some new compiler warnings with these changes applied:
> 
>  CC [M]  /home/jgross/openvswitch/datapath/linux/flow.o
> /home/jgross/openvswitch/datapath/linux/flow.c: In function
> ‘ovs_flow_from_nlattrs’:
> /home/jgross/openvswitch/datapath/linux/flow.c:1044:6: warning:
> suggest parentheses around operand of ‘!’ or change ‘&’ to ‘&&’ or ‘!’
> to ‘~’ [-Wparentheses]
> /home/jgross/openvswitch/datapath/linux/flow.c:1046:6: warning:
> suggest parentheses around operand of ‘!’ or change ‘&’ to ‘&&’ or ‘!’
> to ‘~’ [-Wparentheses]
> 
Thanks, I'll clean these up.

> Also, when I was working on the userspace portions I defined three
> flags to be used instead of the current tunnel port configuration:
> 
> Flow lib/flow.h:
> #define FLOW_TNL_F_DONT_FRAGMENT (1 << 0)
> #define FLOW_TNL_F_CSUM (1 << 1)
> #define FLOW_TNL_F_KEY (1 << 2)
> 
> Can you add these to the kernel header and allow them to control
> encapsulation/decapsulation?  This was one of the main blockers that I
> ran into when doing the userspace work.
> 
Sure, I can add this.

> Finally, the remaining piece in order for this to be useful in the new
> model is to allow packets to be send and received on tunnel ports
> without a dest IP.  Otherwise, it's not possible to create the tunnel
> ports that can handle traffic on a flow basis.  That could possibly be
> a follow up patch though.
> 
I'll look into this, maybe make it a subsequent patch.

>> diff --git a/datapath/actions.c b/datapath/actions.c
>> index ec9b595..a70cde8 100644
>> --- a/datapath/actions.c
>> +++ b/datapath/actions.c
>> @@ -377,6 +392,7 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
>>        int prev_port = -1;
>>        const struct nlattr *a;
>>        int rem;
>> +       struct ovs_key_ipv4_tunnel tun_key;
> 
> Unfortunately, I think this needs to be on the stack in the top level
> ovs_execute_actions().  Imagine this set of actions:
> sample(0.5, set(tun_id=10)), output: tunnel
> 
> When we assign tun_key the stack frame will look like this:
> #1 ovs_execute_actions()
> #2 do_execute_actions()
> #3 sample()
> #4 do_execute_actions()
> #5 execute_set_action()
> 
> tun_key will be from stack frame #4.
> 
> However, when we actually use it the stack will look like:
> #1 ovs_execute_actions()
> #2 do_execute_actions()
> #3 do_output()
> 
OK, thanks for the explanation here.

>> diff --git a/datapath/flow.c b/datapath/flow.c
>> index d07337c..5be492e 100644
>> --- a/datapath/flow.c
>> +++ b/datapath/flow.c
>> static int ipv4_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_len,
>> @@ -996,6 +998,7 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp,
>> {
>>        const struct nlattr *a[OVS_KEY_ATTR_MAX + 1];
>>        const struct ovs_key_ethernet *eth_key;
>> +       __be64 tun_id = 0;
> 
> I think this is only used in the block where we parse
> TUN_ID/IPV4_TUNNEL, so we can just declare it there.
> 
Sure, will move it.

>> @@ -1022,10 +1025,26 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp,
>>                swkey->phy.in_port = DP_MAX_PORTS;
>>        }
>> 
>> -       if (attrs & (1ULL << OVS_KEY_ATTR_TUN_ID)) {
>> -               swkey->phy.tun_id = nla_get_be64(a[OVS_KEY_ATTR_TUN_ID]);
>> +       /* OVS_KEY_ATTR_TUN_ID and OVS_KEY_ATTR_IPV4_TUNEL must both arrive
>> +        * together.
>> +        */
>> +       if (attrs & (1ULL << OVS_KEY_ATTR_TUN_ID) &&
>> +           attrs & (1ULL << OVS_KEY_ATTR_IPV4_TUNNEL)) {
> 
> I know I said that we should combine these checks together but I
> realized that there is a problem.  In order to maintain compatibility
> in the future, we want to enforce sanity all that we want to make now.
> It's fine to make this that were previously rejected OK but we can't
> do the reverse.  Specifically, we should enforce that whenever we have
> an OVS_KEY_ATTR_IPV4_TUNNEL, it should be valid (i.e. have a
> ipv4_dst).  In the case of flow misses, this isn't a problem because
> all of our tunnels will produce a valid IPV4_TUNNEL if they have a
> TUN_ID, so we could mandate both.
> 
> However, in the case of flow metadata, like a packet out from a
> controller, we won't have IPV4_TUNNEL information until we actually
> have the full userspace infrastructure.  It's possible these packets
> end up going back to userspace again and when we serialize the flow,
> we'll break the rule that both IPV4_TUNNEL and TUN_ID come together.
> 
> This is a long way of saying that we do have to allow these to be
> separate but if both come together then we should check that the
> tun_id matches and should enforce that ipv4_dst != 0 everywhere.
> 
OK, I think I may have had it this way at one point, I'll go ahead and reformulate
this check as above.

>> +               swkey->tun.tun_key = *tun_key;
>>                attrs &= ~(1ULL << OVS_KEY_ATTR_TUN_ID);
>> -       }
>> +               attrs &= ~(1ULL << OVS_KEY_ATTR_IPV4_TUNNEL);
>> +       } else if ((attrs & (1ULL << OVS_KEY_ATTR_TUN_ID) &&
>> +                  !attrs & (1ULL << OVS_KEY_ATTR_IPV4_TUNNEL)) ||
>> +                  (attrs & (1ULL << OVS_KEY_ATTR_IPV4_TUNNEL) &&
>> +                  !attrs & (1ULL << OVS_KEY_ATTR_TUN_ID)))
>> +               return -EINVAL;
> 
> I don't believe that this else if block is necessary.  If we don't get
> the correct tunnel information then we won't zero out the appropriate
> attributes and we'll return an error at the end for having unparsed
> information.
> 
OK, will remove this.

>> @@ -1185,7 +1206,19 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64 *tun_id,
>>                                break;
>> 
>>                        case OVS_KEY_ATTR_TUN_ID:
>> -                               *tun_id = nla_get_be64(nla);
>> +                               tun_id = nla_get_be64(nla);
>> +                               if (tun_key->tun_id != 0 &&
>> +                                   tun_key->tun_id != tun_id)
>> +                                       return -EINVAL;
>> +                               break;
>> +
>> +                       case OVS_KEY_ATTR_IPV4_TUNNEL:
>> +                               if (tun_key->tun_id != 0)
>> +                                       tun_id = tun_key->tun_id;
>> +                               memcpy(tun_key, nla_data(nla),
>> +                                      sizeof(*tun_key));
>> +                               if (tun_id != 0 && tun_id != tun_key->tun_id)
>> +                                       return -EINVAL;
>>                                break;
>> 
>>                        case OVS_KEY_ATTR_IN_PORT:
> 
> Similar to my comments above, we should check that ipv4_dst != 0.  I
> think we should also use a flag to distinguish the case of receiving a
> tun_id from the value 0.
> 
Yes, I noticed in a few spots a check for tun_id == 0, and I believe it was
causing issues while I was unit testing this. I'll make the change.

>> diff --git a/datapath/flow.h b/datapath/flow.h
>> index 02c563a..5d4fcde 100644
>> --- a/datapath/flow.h
>> +++ b/datapath/flow.h
>> @@ -42,7 +42,6 @@ struct sw_flow_actions {
>> 
>> struct sw_flow_key {
>>        struct {
>> -               __be64  tun_id;         /* Encapsulating tunnel ID. */
>>                u32     priority;       /* Packet QoS priority. */
>>                u16     in_port;        /* Input switch port (or DP_MAX_PORTS). */
>>        } phy;
>> @@ -92,6 +91,9 @@ struct sw_flow_key {
>>                        } nd;
>>                } ipv6;
>>        };
>> +        struct {
>> +               struct ovs_key_ipv4_tunnel tun_key;  /* Encapsulating tunnel key. */
>> +       } tun;
>> };
> 
> For this patch we should keep struct tun at the beginning of the flow.
> Otherwise, the variable length flow matching won't take it into
> account and it won't get matched at all.  In the next patch we can
> optimize it.
> 
OK, got it, will migrate this.

>> diff --git a/datapath/tunnel.c b/datapath/tunnel.c
>> index d651c11..b62c469 100644
>> --- a/datapath/tunnel.c
>> +++ b/datapath/tunnel.c
>> @@ -1217,13 +1212,32 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
>>        else
>>                inner_tos = 0;
>> 
>> +       /* If OVS_CB(skb)->tun_key is NULL, point it at the local tun_key here,
>> +        * and zero it out.
>> +        */
>> +       if (OVS_CB(skb)->tun_key == NULL) {
>> +               memset(&tun_key, 0, sizeof(tun_key));
>> +               OVS_CB(skb)->tun_key = &tun_key;
>> +       }
> 
> Can you stick this above the inner_tos block?  Currently, it's right
> in the middle of the ToS code.  I'd be inclined to put the address
> computation before ToS as because it feels more natural but that's
> less significant.
> 
Will change this.

>>        if (mutable->flags & TNL_F_TOS_INHERIT)
>>                tos = inner_tos;
>> +       else if (OVS_CB(skb)->tun_key && OVS_CB(skb)->tun_key->ipv4_dst != 0)
>> +               tos = OVS_CB(skb)->tun_key->ipv4_tos;
>>        else
>>                tos = mutable->tos;
> 
> We can drop all the checks for !OVS_CB(skb)->tun_key at this point
> since we just initialized it.
> 
OK.

> However, if we do have OVS_CB(skb)->tun_key->ipv4_dst != 0, then it
> should completely override all other ToS processing, including both
> inheritance and ECN encapsulation.
> 
Got it.

>> @@ -1260,7 +1274,10 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
>>        }
>> 
>>        /* TTL */
>> -       ttl = mutable->ttl;
>> +       if (OVS_CB(skb)->tun_key && OVS_CB(skb)->tun_key->ipv4_dst != 0)
>> +               ttl = OVS_CB(skb)->tun_key->ipv4_ttl;
>> +       else
>> +               ttl = mutable->ttl;
>>        if (!ttl)
>>                ttl = ip4_dst_hoplimit(&rt_dst(rt));
> 
> I think the check for !ttl belongs in the else block.  I can't think
> of a reason why userspace would use a TTL of zero but I'd prefer to
> keep these types of checks out of OVS once we move completely over to
> the new model.
> 
Will make this change.

>> diff --git a/datapath/tunnel.h b/datapath/tunnel.h
>> index 1924017..9ff6e61 100644
>> --- a/datapath/tunnel.h
>> +++ b/datapath/tunnel.h
>> +static inline void tnl_tun_key_init(struct ovs_key_ipv4_tunnel *tun_key,
>> +                                   const struct iphdr *iph, __be64 tun_id)
>> +{
>> +       tun_key->tun_id = tun_id;
>> +       tun_key->ipv4_src = iph->saddr;
>> +       tun_key->ipv4_dst = iph->daddr;
>> +       tun_key->ipv4_tos = iph->tos;
>> +       tun_key->ipv4_ttl = iph->ttl;
> 
> I noticed that this leaves flags uninitialized and is currently
> reporting garbage to userspace on receive.  However, when we add the
> new flags to the kernel interface we'll actually need to populate it
> anyways since CSUM and KEY are relevant on receive.


OK, will update this as well.




More information about the dev mailing list