[ovs-dev] [PATCH 1/2] Add support for tun_key to OVS datapath

Kyle Mestery (kmestery) kmestery at cisco.com
Thu Oct 4 20:12:23 UTC 2012


On Oct 3, 2012, at 11:38 AM, Jesse Gross wrote:
> On Mon, Oct 1, 2012 at 8:52 AM, Kyle Mestery <kmestery at cisco.com> wrote:
>> This is a first pass at providing a tun_key which can be
>> used as the basis for flow-based tunnelling. The
>> tun_key includes and replaces the tun_id in both struct
>> ovs_skb_cb and struct sw_tun_key.
>> 
>> This patch allows all existing tun_id behaviour to still work. Existing
>> users of tun_id are redirected to tun_key->tun_id to retain compatibility.
>> However, when the userspace code is updated to make use of the new tun_key,
>> the old behaviour will be deprecated and removed.
>> 
>> NOTE: With these changes, the tunneling code no longer assumes input and
>> output keys are symmetric.  If they are not, PMTUD needs to be disabled
>> for tunneling to work.
>> 
>> Signed-off-by: Kyle Mestery <kmestery at cisco.com>
>> Cc: Simon Horman <horms at verge.net.au>
>> Cc: Jesse Gross <jesse at nicira.com>
> 
> I see some new compiler warnings with these changes applied:
> 
>  CC [M]  /home/jgross/openvswitch/datapath/linux/flow.o
> /home/jgross/openvswitch/datapath/linux/flow.c: In function
> ‘ovs_flow_from_nlattrs’:
> /home/jgross/openvswitch/datapath/linux/flow.c:1044:6: warning:
> suggest parentheses around operand of ‘!’ or change ‘&’ to ‘&&’ or ‘!’
> to ‘~’ [-Wparentheses]
> /home/jgross/openvswitch/datapath/linux/flow.c:1046:6: warning:
> suggest parentheses around operand of ‘!’ or change ‘&’ to ‘&&’ or ‘!’
> to ‘~’ [-Wparentheses]
> 
> Also, when I was working on the userspace portions I defined three
> flags to be used instead of the current tunnel port configuration:
> 
> Flow lib/flow.h:
> #define FLOW_TNL_F_DONT_FRAGMENT (1 << 0)
> #define FLOW_TNL_F_CSUM (1 << 1)
> #define FLOW_TNL_F_KEY (1 << 2)
> 
> Can you add these to the kernel header and allow them to control
> encapsulation/decapsulation?  This was one of the main blockers that I
> ran into when doing the userspace work.
> 
Jesse, just to clarify, for this you mean modify the pieces in tunnel.c where
it's using this configuration (similar flags) from the tunnel itself, and instead
make the code use the values from the tun_key, right? If we do this, won't
this be a tipping point which will break backwards compatibility? How should
we proceed here, just use the values from tun_key only, or do you want me
to provide compatibility with both until your changes come in?

Thanks,
Kyle

> Finally, the remaining piece in order for this to be useful in the new
> model is to allow packets to be send and received on tunnel ports
> without a dest IP.  Otherwise, it's not possible to create the tunnel
> ports that can handle traffic on a flow basis.  That could possibly be
> a follow up patch though.
> 
>> diff --git a/datapath/actions.c b/datapath/actions.c
>> index ec9b595..a70cde8 100644
>> --- a/datapath/actions.c
>> +++ b/datapath/actions.c
>> @@ -377,6 +392,7 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
>>        int prev_port = -1;
>>        const struct nlattr *a;
>>        int rem;
>> +       struct ovs_key_ipv4_tunnel tun_key;
> 
> Unfortunately, I think this needs to be on the stack in the top level
> ovs_execute_actions().  Imagine this set of actions:
> sample(0.5, set(tun_id=10)), output: tunnel
> 
> When we assign tun_key the stack frame will look like this:
> #1 ovs_execute_actions()
> #2 do_execute_actions()
> #3 sample()
> #4 do_execute_actions()
> #5 execute_set_action()
> 
> tun_key will be from stack frame #4.
> 
> However, when we actually use it the stack will look like:
> #1 ovs_execute_actions()
> #2 do_execute_actions()
> #3 do_output()
> 
>> diff --git a/datapath/flow.c b/datapath/flow.c
>> index d07337c..5be492e 100644
>> --- a/datapath/flow.c
>> +++ b/datapath/flow.c
>> static int ipv4_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_len,
>> @@ -996,6 +998,7 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp,
>> {
>>        const struct nlattr *a[OVS_KEY_ATTR_MAX + 1];
>>        const struct ovs_key_ethernet *eth_key;
>> +       __be64 tun_id = 0;
> 
> I think this is only used in the block where we parse
> TUN_ID/IPV4_TUNNEL, so we can just declare it there.
> 
>> @@ -1022,10 +1025,26 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp,
>>                swkey->phy.in_port = DP_MAX_PORTS;
>>        }
>> 
>> -       if (attrs & (1ULL << OVS_KEY_ATTR_TUN_ID)) {
>> -               swkey->phy.tun_id = nla_get_be64(a[OVS_KEY_ATTR_TUN_ID]);
>> +       /* OVS_KEY_ATTR_TUN_ID and OVS_KEY_ATTR_IPV4_TUNEL must both arrive
>> +        * together.
>> +        */
>> +       if (attrs & (1ULL << OVS_KEY_ATTR_TUN_ID) &&
>> +           attrs & (1ULL << OVS_KEY_ATTR_IPV4_TUNNEL)) {
> 
> I know I said that we should combine these checks together but I
> realized that there is a problem.  In order to maintain compatibility
> in the future, we want to enforce sanity all that we want to make now.
> It's fine to make this that were previously rejected OK but we can't
> do the reverse.  Specifically, we should enforce that whenever we have
> an OVS_KEY_ATTR_IPV4_TUNNEL, it should be valid (i.e. have a
> ipv4_dst).  In the case of flow misses, this isn't a problem because
> all of our tunnels will produce a valid IPV4_TUNNEL if they have a
> TUN_ID, so we could mandate both.
> 
> However, in the case of flow metadata, like a packet out from a
> controller, we won't have IPV4_TUNNEL information until we actually
> have the full userspace infrastructure.  It's possible these packets
> end up going back to userspace again and when we serialize the flow,
> we'll break the rule that both IPV4_TUNNEL and TUN_ID come together.
> 
> This is a long way of saying that we do have to allow these to be
> separate but if both come together then we should check that the
> tun_id matches and should enforce that ipv4_dst != 0 everywhere.
> 
>> +               swkey->tun.tun_key = *tun_key;
>>                attrs &= ~(1ULL << OVS_KEY_ATTR_TUN_ID);
>> -       }
>> +               attrs &= ~(1ULL << OVS_KEY_ATTR_IPV4_TUNNEL);
>> +       } else if ((attrs & (1ULL << OVS_KEY_ATTR_TUN_ID) &&
>> +                  !attrs & (1ULL << OVS_KEY_ATTR_IPV4_TUNNEL)) ||
>> +                  (attrs & (1ULL << OVS_KEY_ATTR_IPV4_TUNNEL) &&
>> +                  !attrs & (1ULL << OVS_KEY_ATTR_TUN_ID)))
>> +               return -EINVAL;
> 
> I don't believe that this else if block is necessary.  If we don't get
> the correct tunnel information then we won't zero out the appropriate
> attributes and we'll return an error at the end for having unparsed
> information.
> 
>> @@ -1185,7 +1206,19 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64 *tun_id,
>>                                break;
>> 
>>                        case OVS_KEY_ATTR_TUN_ID:
>> -                               *tun_id = nla_get_be64(nla);
>> +                               tun_id = nla_get_be64(nla);
>> +                               if (tun_key->tun_id != 0 &&
>> +                                   tun_key->tun_id != tun_id)
>> +                                       return -EINVAL;
>> +                               break;
>> +
>> +                       case OVS_KEY_ATTR_IPV4_TUNNEL:
>> +                               if (tun_key->tun_id != 0)
>> +                                       tun_id = tun_key->tun_id;
>> +                               memcpy(tun_key, nla_data(nla),
>> +                                      sizeof(*tun_key));
>> +                               if (tun_id != 0 && tun_id != tun_key->tun_id)
>> +                                       return -EINVAL;
>>                                break;
>> 
>>                        case OVS_KEY_ATTR_IN_PORT:
> 
> Similar to my comments above, we should check that ipv4_dst != 0.  I
> think we should also use a flag to distinguish the case of receiving a
> tun_id from the value 0.
> 
>> diff --git a/datapath/flow.h b/datapath/flow.h
>> index 02c563a..5d4fcde 100644
>> --- a/datapath/flow.h
>> +++ b/datapath/flow.h
>> @@ -42,7 +42,6 @@ struct sw_flow_actions {
>> 
>> struct sw_flow_key {
>>        struct {
>> -               __be64  tun_id;         /* Encapsulating tunnel ID. */
>>                u32     priority;       /* Packet QoS priority. */
>>                u16     in_port;        /* Input switch port (or DP_MAX_PORTS). */
>>        } phy;
>> @@ -92,6 +91,9 @@ struct sw_flow_key {
>>                        } nd;
>>                } ipv6;
>>        };
>> +        struct {
>> +               struct ovs_key_ipv4_tunnel tun_key;  /* Encapsulating tunnel key. */
>> +       } tun;
>> };
> 
> For this patch we should keep struct tun at the beginning of the flow.
> Otherwise, the variable length flow matching won't take it into
> account and it won't get matched at all.  In the next patch we can
> optimize it.
> 
>> diff --git a/datapath/tunnel.c b/datapath/tunnel.c
>> index d651c11..b62c469 100644
>> --- a/datapath/tunnel.c
>> +++ b/datapath/tunnel.c
>> @@ -1217,13 +1212,32 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
>>        else
>>                inner_tos = 0;
>> 
>> +       /* If OVS_CB(skb)->tun_key is NULL, point it at the local tun_key here,
>> +        * and zero it out.
>> +        */
>> +       if (OVS_CB(skb)->tun_key == NULL) {
>> +               memset(&tun_key, 0, sizeof(tun_key));
>> +               OVS_CB(skb)->tun_key = &tun_key;
>> +       }
> 
> Can you stick this above the inner_tos block?  Currently, it's right
> in the middle of the ToS code.  I'd be inclined to put the address
> computation before ToS as because it feels more natural but that's
> less significant.
> 
>>        if (mutable->flags & TNL_F_TOS_INHERIT)
>>                tos = inner_tos;
>> +       else if (OVS_CB(skb)->tun_key && OVS_CB(skb)->tun_key->ipv4_dst != 0)
>> +               tos = OVS_CB(skb)->tun_key->ipv4_tos;
>>        else
>>                tos = mutable->tos;
> 
> We can drop all the checks for !OVS_CB(skb)->tun_key at this point
> since we just initialized it.
> 
> However, if we do have OVS_CB(skb)->tun_key->ipv4_dst != 0, then it
> should completely override all other ToS processing, including both
> inheritance and ECN encapsulation.
> 
>> @@ -1260,7 +1274,10 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
>>        }
>> 
>>        /* TTL */
>> -       ttl = mutable->ttl;
>> +       if (OVS_CB(skb)->tun_key && OVS_CB(skb)->tun_key->ipv4_dst != 0)
>> +               ttl = OVS_CB(skb)->tun_key->ipv4_ttl;
>> +       else
>> +               ttl = mutable->ttl;
>>        if (!ttl)
>>                ttl = ip4_dst_hoplimit(&rt_dst(rt));
> 
> I think the check for !ttl belongs in the else block.  I can't think
> of a reason why userspace would use a TTL of zero but I'd prefer to
> keep these types of checks out of OVS once we move completely over to
> the new model.
> 
>> diff --git a/datapath/tunnel.h b/datapath/tunnel.h
>> index 1924017..9ff6e61 100644
>> --- a/datapath/tunnel.h
>> +++ b/datapath/tunnel.h
>> +static inline void tnl_tun_key_init(struct ovs_key_ipv4_tunnel *tun_key,
>> +                                   const struct iphdr *iph, __be64 tun_id)
>> +{
>> +       tun_key->tun_id = tun_id;
>> +       tun_key->ipv4_src = iph->saddr;
>> +       tun_key->ipv4_dst = iph->daddr;
>> +       tun_key->ipv4_tos = iph->tos;
>> +       tun_key->ipv4_ttl = iph->ttl;
> 
> I noticed that this leaves flags uninitialized and is currently
> reporting garbage to userspace on receive.  However, when we add the
> new flags to the kernel interface we'll actually need to populate it
> anyways since CSUM and KEY are relevant on receive.





More information about the dev mailing list