[ovs-dev] [PATCH] [RFC] datapath: tunnelling: Replace tun_id with tun_key

Jesse Gross jesse at nicira.com
Tue May 1 23:50:49 UTC 2012


On Mon, Apr 30, 2012 at 5:13 PM, Simon Horman <horms at verge.net.au> wrote:
> this is a first pass at providing a tun_key which can be used
> as the basis for flow-based tunnelling. The tun_key includes and
> replaces the tun_id in both struct ovs_skb_cb and struct sw_tun_key.

Thanks, this looks like a good start to me.

> A significant problem with the current code, which causes it to fail to
> compile, is that struct ovs_key_ipv6_tunnel is itself 48 bytes and that
> as a result ovs_skb_cb is larger than 48 bytes. This means that it will
> not fit inside skb->cb which is also 48 bytes.

I think it's possible to avoid this problem by just storing a pointer
to the appropriate tunnel data since it exists in memory already.  On
the output side, rather than copying the data from the flow you could
just have a pointer into the flow data directly.  The flow is
guaranteed to stick around for the lifetime of the skb due to RCU.  On
the receive side, it could just be a pointer to some data on the stack
that contains the appropriate struct filled with information.

> At this point it may be just as well just to drop the IPv6 portions of this
> change, as far as I know OVS has not supported IPv6 as the outer transport
> protocol for tunnelled frames. However it does seem to be a problem that
> will arise at some point.

It's good to think about IPv6 but since OVS doesn't currently support
tunnels over IPv6 I wouldn't worry about it much.  I think the overall
change will be big enough that we don't want to add extra work if we
can avoid it.

> One idea that I had was to make the tun_key element of struct ovs_skb_cb
> a pointer and set skb->destructor() to free it as needed. I am, however,
> concerned about the complexity and performance penalty this may introduce.
> Moreover, I'd like some review on the merit of stuffing all the tun_key
> information into skb->cb.

I agree that allocating and freeing the metadata for each packet is
something that we probably want to avoid.

> The patch does make some effort to retain the existing tun_id behaviour.
> I imagine this is required for compatibility reasons.
>
> The patch makes no attempt to use tun_key other than compatibility
> with the tun_id behaviour.

I actually wouldn't worry about kernel side compatibility here too
much.  My general thought is that people should either use the OVS
kernel module with the userspace of the same version or the upstream
module with userspace >= 1.4.  Since none of the tunneling code is
upstream yet we don't have to worry about compatibility and we'll have
to break it anyways if we want to eventually get everything upstream
and have it be clean.

The one place where we will need to maintain compatibility is from OVS
userspace to the outside world.  However, I think that everything we
currently do today can be implemented in terms of flow based tunneling
in userspace.  Therefore, I would go ahead and start making the full
changes around how tunnel lookup, for example, is done.

> diff --git a/datapath/flow.h b/datapath/flow.h
> index 5261fa8..c39023a 100644
> --- a/datapath/flow.h
> +++ b/datapath/flow.h
> +struct sw_tun_key {
> +       __kernel_sa_family_t    tun_af;
> +       union {
> +               __be64                          tun_id;
> +               struct ovs_key_ipv4_tunnel      tun_ipv4;
> +               struct ovs_key_ipv6_tunnel      tun_ipv6;
> +       };
> +};

I think we probably want to pack this struct a little better at some
point since tun_af actually only needs to be a single bit and will
also have a fair amount of padding after it.

>  struct sw_flow_key {
>        struct {
> -               __be64  tun_id;         /* Encapsulating tunnel ID. */
> -               u32     priority;       /* Packet QoS priority. */
> -               u16     in_port;        /* Input switch port (or DP_MAX_PORTS). */
> +               struct sw_tun_key       tun_key;        /* Encapsulating tunnel key. */

Eventually, at least, I'd like to move this to the end of the struct
so that it doesn't affect performance for non-tunneled traffic.

> diff --git a/datapath/tunnel.c b/datapath/tunnel.c
> index d406dbc..0305fbd 100644
> --- a/datapath/tunnel.c
> +++ b/datapath/tunnel.c
> @@ -613,7 +613,8 @@ static void ipv6_build_icmp(struct sk_buff *skb, struct sk_buff *nskb,
>
>  bool ovs_tnl_frag_needed(struct vport *vport,
>                         const struct tnl_mutable_config *mutable,
> -                        struct sk_buff *skb, unsigned int mtu, __be64 flow_key)
> +                        struct sk_buff *skb, unsigned int mtu,
> +                        struct sw_tun_key *tun_key)
>  {
>        unsigned int eth_hdr_len = ETH_HLEN;
>        unsigned int total_length = 0, header_length = 0, payload_length;
> @@ -706,7 +707,7 @@ bool ovs_tnl_frag_needed(struct vport *vport,
>         */
>        if ((mutable->flags & (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION)) ==
>            (TNL_F_IN_KEY_MATCH | TNL_F_OUT_KEY_ACTION))
> -               OVS_CB(nskb)->tun_id = flow_key;
> +               OVS_CB(nskb)->tun_key = *tun_key;

I think the paradigm of echoing back the tunnel information on the
PMTUD packet breaks down here.  At least in simple cases (but even
here not all) keys are often symmetric between receive and transmit so
it's possible to use the same key.  However, once you start including
addresses, it's obviously no longer symmetric.  In retrospect, I think
this implementation of PMTUD was a mistake since it's a hack that
works in many circumstances but not all.  Eventually, it would
probably be better to replace it with an implementation of MSS
clamping in userspace (which also has the advantage of being
implementable completely in userspace).

> diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
> index 0578b5f..da32f7f 100644
> --- a/include/linux/openvswitch.h
> +++ b/include/linux/openvswitch.h
> @@ -278,7 +278,9 @@ enum ovs_key_attr {
>        OVS_KEY_ATTR_ICMPV6,    /* struct ovs_key_icmpv6 */
>        OVS_KEY_ATTR_ARP,       /* struct ovs_key_arp */
>        OVS_KEY_ATTR_ND,        /* struct ovs_key_nd */
> -       OVS_KEY_ATTR_TUN_ID = 63, /* be64 tunnel ID */
> +       OVS_KEY_ATTR_TUN_ID,    /* be64 tunnel ID */
> +       OVS_KEY_ATTR_IPV4_TUNNEL,      /* struct ovs_key_ipv4_tunnel */
> +       OVS_KEY_ATTR_IPV6_TUNNEL = 63, /* struct ovs_key_ipv6_tunnel */

The high valued enum value is just to separate the upstream types from
non-upstream.  Since we plan to push this upstream, you can drop it
(also shrink some data types).

> +struct ovs_key_ipv4_tunnel {
> +       __be64 tun_id;
> +       __be32 ipv4_src;
> +       __be32 ipv4_dst;
> +       __u8   ipv4_tos;
> +       __u8   ipv4_ttl;
> +       __u8   tun_proto;       /* One of TNL_T_PROTO_* */
> +       __u8   reserved;
> +};
> +
> +struct ovs_key_ipv6_tunnel {
> +       __be64 tun_id;
> +       __be32 ipv6_src[4];
> +       __be32 ipv6_dst[4];
> +       __be32 ipv6_label;      /* 20-bits in least-significant bits. */
> +       __u8   ipv6_tclass;
> +       __u8   ipv6_hlimit;
> +       __u8   tun_proto;       /* One of TNL_T_PROTO_* */
> +       __u8   reserved;
> +};

I'm not sure that tun_proto is necessary here - it's implied by the
port that is output to/received from.



More information about the dev mailing list