[ovs-dev] [PATCH 01/21] datapath: tunnelling: Replace tun_id with tun_key

Simon Horman horms at verge.net.au
Mon Jun 4 22:34:13 UTC 2012


On Sun, Jun 03, 2012 at 06:15:04PM +0900, Jesse Gross wrote:
> On Thu, May 24, 2012 at 6:08 PM, Simon Horman <horms at verge.net.au> wrote:
> > this is a first pass at providing a tun_key which can be used
> > as the basis for flow-based tunnelling. The tun_key includes and
> > replaces the tun_id in both struct ovs_skb_cb and struct sw_tun_key.
> >
> > In ovs_skb_cb tun_key is a pointer as it is envisaged that it will grow
> > when support for IPv6 to an extent that inlining the structure will result
> > in ovs_skb_cb being larger than the 48 bytes available in skb->cb.
> >
> > As OVS does not support IPv6 as the outer transport protocol for tunnels
> > the IPv6 portions of this change, which appeared in the previous revision,
> > have been dropped in order to limit the scope and size of this patch.
> >
> > This patch does not make any effort to retain the existing tun_id behaviour
> > nor does it fully implement flow-based tunnels. As such it it is incomplete
> > and can't be used in its current form (other than to break OVS tunnelling).
> >
> > ** Please do not apply **
> >
> > Cc: Kyle Mestery <kmestery at cisco.com>
> > Signed-off-by: Simon Horman <horms at verge.net.au>
> 
> Thanks and sorry again about being so slow to look at this.
> 
> Overall, this looks pretty good to me.  The main difficulty that I had
> was in figuring out what should go with the old behavior and what
> should go with the new since it's at an intermediate point between the
> two but I understand that it's difficult to break it up in a way that
> both encapsulates a particular set of functionality and isn't too
> large.  Otherwise, I noticed a few specific things that I noted below.
> 
> > diff --git a/datapath/flow.c b/datapath/flow.c
> > index d07337c..49c0dd8 100644
> > --- a/datapath/flow.c
> > +++ b/datapath/flow.c
> > @@ -1162,14 +1166,15 @@ int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp,
> >  * get the metadata, that is, the parts of the flow key that cannot be
> >  * extracted from the packet itself.
> >  */
> > -int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64 *tun_id,
> > +int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port,
> > +                                  struct ovs_key_ipv4_tunnel *tun_key,
> >                                   const struct nlattr *attr)
> >  {
> >        const struct nlattr *nla;
> >        int rem;
> >
> >        *in_port = DP_MAX_PORTS;
> > -       *tun_id = 0;
> > +       tun_key->tun_id = 0;
> 
> I think we probably want to memset the entire tun_key to zero to avoid
> having potentially uninitialized data in the flow.

Sure, that is fine by me.

> > @@ -1204,15 +1210,21 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64 *tun_id,
> >  int ovs_flow_to_nlattrs(const struct sw_flow_key *swkey, struct sk_buff *skb)
> >  {
> >        struct ovs_key_ethernet *eth_key;
> > +       struct ovs_key_ipv4_tunnel *tun_key;
> >        struct nlattr *nla, *encap;
> >
> >        if (swkey->phy.priority &&
> >            nla_put_u32(skb, OVS_KEY_ATTR_PRIORITY, swkey->phy.priority))
> >                goto nla_put_failure;
> >
> > -       if (swkey->phy.tun_id != cpu_to_be64(0) &&
> > -           nla_put_be64(skb, OVS_KEY_ATTR_TUN_ID, swkey->phy.tun_id))
> > -               goto nla_put_failure;
> > +       if (swkey->phy.tun_key.ipv4_dst) {
> 
> It's probably OK to use DIP equal to zero as a not present marker but
> we need to enforce that it's always true - for example we shouldn't
> allow somebody to setup a flow that way or receive packets with a zero
> address.  Alternately, we may be able to find a spare bit to indicate
> this, like is done with vlans.

When I originally wrote this there didn't seem to be any obvious
place in ovs_key_ipv4_tunnel to have an active/inactive bit, which
is in part why the code relies on checking DIP.

However, more recent versions of ovs_key_ipv4_tunnel have a flags field of
which only one bit is currently used. We could use one of the unused flag
bits.

> In any case, I think we need to do some additional validation when
> setting up flows to check reserved space, for example, as otherwise
> that will never match.

Sure. My testing seems to indicate that matching does occur,
though I am quite happy to tighten things up.

> 
> > diff --git a/datapath/flow.h b/datapath/flow.h
> > index 5be481e..bab5363 100644
> > --- a/datapath/flow.h
> > +++ b/datapath/flow.h
> > @@ -42,7 +42,7 @@ struct sw_flow_actions {
> >
> >  struct sw_flow_key {
> >        struct {
> > -               __be64  tun_id;         /* Encapsulating tunnel ID. */
> > +               struct ovs_key_ipv4_tunnel tun_key;  /* Encapsulating tunnel key. */
> 
> This is an optimization but as we get closer I'd like to put the
> tun_key at the end of struct sw_flow_key so that packets that didn't
> come from a tunnel don't have to pay the cost during the lookup (this
> is especially true as we add support for IPv6 tunnels).

Sure.

> In a similar vein, struct ovs_key_ipv4_tunnel contains some fields
> that I think can never apply for lookup such as the flags so it would
> be nice if we could remove that for lookup.

I think they need to be there to be passed around, so I'm not
sure if they can easily be removed from ovs_key_ipv4_tunnel if that
is what you are asking.

> > @@ -150,6 +150,7 @@ u64 ovs_flow_used_time(unsigned long flow_jiffies);
> >  *                         ------  ---  ------  -----
> >  *  OVS_KEY_ATTR_PRIORITY      4    --     4      8
> >  *  OVS_KEY_ATTR_TUN_ID        8    --     4     12
> > + *  OVS_KEY_ATTR_IPV4_TUNNEL  18     2     4     24
> 
> If my math is correct, I think the size of the base struct
> ova_key_ipv4_tunnel is 24 bytes.

Sorry, the size changed a few times and I seem to have forgotten to
update the above table accordingly.

> > +static inline void tun_key_swap_addr(struct ovs_key_ipv4_tunnel *tun_key)
> > +{
> > +       __be32 ndst = tun_key->ipv4_src;
> > +       tun_key->ipv4_src = tun_key->ipv4_dst;
> > +       tun_key->ipv4_dst = ndst;
> > +}
> 
> I'm not quite sure when we would need to swap the addresses in a
> tunnel and I didn't see any uses of this function.

This should no longer be needed - and was always broken - I'll remove it.

> > +static inline void tun_key_init(struct ovs_key_ipv4_tunnel *tun_key,
> > +                               const struct iphdr *iph, __be64 tun_id)
> > +{
> > +       tun_key->tun_id = tun_id;
> > +       tun_key->ipv4_src = iph->saddr;
> > +       tun_key->ipv4_dst = iph->daddr;
> > +       tun_key->ipv4_tos = iph->tos;
> > +       tun_key->ipv4_ttl = iph->ttl;
> > +}
> 
> Aren't there some fields that we need to zero out to avoid problems in
> the lookup?

Thanks, I will check.

> > diff --git a/datapath/tunnel.c b/datapath/tunnel.c
> > index d651c11..010e513 100644
> > --- a/datapath/tunnel.c
> > +++ b/datapath/tunnel.c
> > @@ -367,9 +367,9 @@ struct vport *ovs_tnl_find_port(struct net *net, __be32 saddr, __be32 daddr,
> >        return NULL;
> >  }
> >
> > -static void ecn_decapsulate(struct sk_buff *skb, u8 tos)
> > +static void ecn_decapsulate(struct sk_buff *skb)
> >  {
> > -       if (unlikely(INET_ECN_is_ce(tos))) {
> > +       if (unlikely(INET_ECN_is_ce(OVS_CB(skb)->tun_key->ipv4_tos))) {
> 
> This might come in a later patch, although I didn't see it in a quick
> scan, but it should be possible to implement all the ECN encapsulation
> and decapsulation in userspace, just like we can do with the rest of
> the ToS and TTL.

I hadn't considered that, I will add it to my TODO list.

> >  bool ovs_tnl_frag_needed(struct vport *vport,
> >                         const struct tnl_mutable_config *mutable,
> > -                        struct sk_buff *skb, unsigned int mtu, __be64 flow_key)
> > +                        struct sk_buff *skb, unsigned int mtu,
> > +                        struct ovs_key_ipv4_tunnel *tun_key)
> >  {
> >        unsigned int eth_hdr_len = ETH_HLEN;
> >        unsigned int total_length = 0, header_length = 0, payload_length;
> >        struct ethhdr *eh, *old_eh = eth_hdr(skb);
> >        struct sk_buff *nskb;
> > +       struct ovs_key_ipv4_tunnel ntun_key;
> >
> >        /* Sanity check */
> >        if (skb->protocol == htons(ETH_P_IP)) {
> > @@ -705,8 +707,10 @@ bool ovs_tnl_frag_needed(struct vport *vport,
> >         * any way of synthesizing packets.
> >         */
> >        if ((mutable->flags & (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) ==
> > -           (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION))
> > -               OVS_CB(nskb)->tun_id = flow_key;
> > +           (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) {
> > +               ntun_key = *tun_key;
> > +               OVS_CB(nskb)->tun_key = &ntun_key;
> > +       }
> 
> I guess this is probably where you were going to use the function to
> reverse IP addresses.  The logic doesn't really work but it's moot
> since this is going away anyways.

My latest series includes a clean up to ovs_tnl_frag_needed() to allow
it to work in some circumstances - i.e. those found in my test environment.
That series removes knowledge of tun_key from ovs_tnl_frag_needed().

I am however happy to remove ovs_tnl_frag_needed() completely if you think
that is appropriate.

> > @@ -799,10 +803,8 @@ static void create_tunnel_header(const struct vport *vport,
> >        iph->ihl        = sizeof(struct iphdr) >> 2;
> >        iph->frag_off   = htons(IP_DF);
> >        iph->protocol   = tnl_vport->tnl_ops->ipproto;
> > -       iph->tos        = mutable->tos;
> >        iph->daddr      = rt->rt_dst;
> >        iph->saddr      = rt->rt_src;
> > -       iph->ttl        = mutable->ttl;
> >        if (!iph->ttl)
> >                iph->ttl = ip4_dst_hoplimit(&rt_dst(rt));
> 
> I'm not sure that these changes quite belong in this patch (not that
> it shouldn't be done but it seems like the supporting code isn't there
> yet).

Sorry, I did some merging of my patches and this seems to have been
a case where I merged incorrectly. I think that change belongs in:

[PATCH 18/21] dataptah: remove ttl and tos from tnl_mutable_config

(I need to fix the typo in the title of that patch!)

> > diff --git a/datapath/vport-gre.c b/datapath/vport-gre.c
> > index ab89c5b..fd2b038 100644
> > --- a/datapath/vport-gre.c
> > +++ b/datapath/vport-gre.c
> > @@ -101,10 +101,6 @@ static struct sk_buff *gre_update_header(const struct vport *vport,
> >        __be32 *options = (__be32 *)(skb_network_header(skb) + mutable->tunnel_hlen
> >                                               - GRE_HEADER_SECTION);
> >
> > -       /* Work backwards over the options so the checksum is last. */
> > -       if (mutable->flags & TNL_F_OUT_KEY_ACTION)
> > -               *options = be64_get_low32(OVS_CB(skb)->tun_id);
> 
> Why does this go away?

I agree that looks wrong and my only explanation is that it is remnants
of some hack.

I have done some, hopefully more sensible, re-working of gre_update_header in:

[PATCH 20/21] datapath: Use tun_key flags for id and csum settings on transmit

> > diff --git a/datapath/vport.c b/datapath/vport.c
> > index 172261a..0c77a1b 100644
> > --- a/datapath/vport.c
> > +++ b/datapath/vport.c
> > @@ -462,7 +462,7 @@ void ovs_vport_receive(struct vport *vport, struct sk_buff *skb)
> >                OVS_CB(skb)->flow = NULL;
> >
> >        if (!(vport->ops->flags & VPORT_F_TUN_ID))
> > -               OVS_CB(skb)->tun_id = 0;
> > +               OVS_CB(skb)->tun_key = NULL;
> 
> We probably should rename this flag now.

Yes, I think there are several flags that may now be removed.
I'll add that to my TODO list.

> > diff --git a/lib/odp-util.h b/lib/odp-util.h
> > index d53f083..4e5a8a1 100644
> > --- a/lib/odp-util.h
> > +++ b/lib/odp-util.h
> > @@ -72,6 +72,7 @@ int odp_actions_from_string(const char *, const struct simap *port_names,
> >  *                         ------  ---  ------  -----
> >  *  OVS_KEY_ATTR_PRIORITY      4    --     4      8
> >  *  OVS_KEY_ATTR_TUN_ID        8    --     4     12
> > + *  OVS_KEY_ATTR_IPV4_TUNNEL  18     2     4     24
> 
> Same thing about the size here as well.

Thanks



More information about the dev mailing list