[ovs-dev] [PATCH] datapath: Integration with upstream kernel tunneling.

Jesse Gross jesse at nicira.com
Tue Apr 2 22:26:50 UTC 2013


On Sat, Mar 30, 2013 at 9:20 AM, Pravin B Shelar <pshelar at nicira.com> wrote:
> Following patch restructure ovs tunneling to make use of kernel
> api. Doing this tunneling code is simplified as most of protocol
> processing on send and recv is pushed to kernel tunneling. This
> way we can share most protocol related code between openvswitch
> tunneling and linux tunnel devices.
>
> Signed-off-by: Pravin B Shelar <pshelar at nicira.com>

I'm going to wait to look at the upstream patches for now since they
are mostly just exporting existing code.  I hope that we can keep more
of the dataplane where it is and not have to export those functions.

I got some sparse warnings:
  CHECK   /home/jgross/openvswitch/datapath/linux/vport-lisp.c
/home/jgross/openvswitch/datapath/linux/vport-lisp.c:331:22: warning:
incorrect type in assignment (different base types)
/home/jgross/openvswitch/datapath/linux/vport-lisp.c:331:22:
expected restricted __be16 [assigned] [usertype] sin_port
/home/jgross/openvswitch/datapath/linux/vport-lisp.c:331:22:    got
restricted __be32 [usertype] dst_port
/home/jgross/openvswitch/datapath/linux/vport-lisp.c:386:60: warning:
incorrect type in argument 2 (different base types)
/home/jgross/openvswitch/datapath/linux/vport-lisp.c:386:60:
expected restricted __be32 [usertype] dst_port
/home/jgross/openvswitch/datapath/linux/vport-lisp.c:386:60:    got
restricted __be16 [assigned] [usertype] dst_port
  CHECK   /home/jgross/openvswitch/datapath/linux/vport-netdev.c
  CHECK   /home/jgross/openvswitch/datapath/linux/vport-vxlan.c
/home/jgross/openvswitch/datapath/linux/vport-vxlan.c:166:5: warning:
symbol 'vxlan_tnl_send' was not declared. Should it be static?

> diff --git a/acinclude.m4 b/acinclude.m4
> index 19a47dd..c30855f 100644
> --- a/acinclude.m4
> +++ b/acinclude.m4
> @@ -254,6 +254,8 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [
>    OVS_GREP_IFELSE([$KSRC/include/linux/skbuff.h], [consume_skb])
>    OVS_GREP_IFELSE([$KSRC/include/linux/skbuff.h], [skb_frag_page])
>    OVS_GREP_IFELSE([$KSRC/include/linux/skbuff.h], [skb_reset_mac_len])
> +  OVS_GREP_IFELSE([$KSRC/include/linux/skbuff.h], [SKB_GSO_GRE])
> +  OVS_GREP_IFELSE([$KSRC/include/linux/skbuff.h], [SKB_GSO_UDP_TUNNEL])
>
>    OVS_GREP_IFELSE([$KSRC/include/linux/string.h], [kmemdup], [],
>                    [OVS_GREP_IFELSE([$KSRC/include/linux/slab.h], [kmemdup])])
> @@ -262,6 +264,8 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [
>                    [OVS_DEFINE([HAVE_BOOL_TYPE])])
>    OVS_GREP_IFELSE([$KSRC/include/linux/types.h], [__wsum],
>                    [OVS_DEFINE([HAVE_CSUM_TYPES])])
> +  OVS_GREP_IFELSE([$KSRC/include/uapi/linux/types.h], [__wsum],
> +                  [OVS_DEFINE([HAVE_CSUM_TYPES])])

Is there a reason to think these have been backported?  They are used
mixed with version checks so it seems to make sense to just use
versions in the absence of another reason.

> diff --git a/datapath/datapath.c b/datapath/datapath.c
> index 9cd4b0e..4d9a3b0 100644
> --- a/datapath/datapath.c
> +++ b/datapath/datapath.c
> @@ -49,8 +49,10 @@
>  #include <linux/rculist.h>
>  #include <linux/dmi.h>
>  #include <net/genetlink.h>
> +#include <net/gre.h>
>  #include <net/net_namespace.h>
>  #include <net/netns/generic.h>
> +#include <net/vxlan.h>
>
>  #include "checksum.h"
>  #include "datapath.h"
> @@ -61,7 +63,7 @@
>  #include "vport-internal_dev.h"
>
>  #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,18) || \
> -    LINUX_VERSION_CODE >= KERNEL_VERSION(3,9,0)
> +    LINUX_VERSION_CODE >= KERNEL_VERSION(3,10,0)
>  #error Kernels before 2.6.18 or after 3.8 are not supported by this version of Open vSwitch.
>  #endif

Can we separate out support for 3.9 into a separate change?  They seem
like they should be independent.

> @@ -2349,10 +2351,18 @@ static int __init dp_init(void)
>         if (err)
>                 goto error_genl_exec;
>
> -       err = ovs_flow_init();
> +       err = gre_compat_init();
>         if (err)
>                 goto error_wq;
>
> +       err = vxlan_compat_init();
> +       if (err)
> +               goto error_gre_compat;

Can we do this initialization when we add the first port?  It will
help isolate the compatibility code and not use resources that we
don't need as the number of protocols grows.

> diff --git a/datapath/tunnel.c b/datapath/tunnel.c
> index 057aaed..bd0ee7e 100644
> --- a/datapath/tunnel.c
> +++ b/datapath/tunnel.c
> +static int ovs_skb_checksum(struct sk_buff *skb)
>  {
[...]
> +static int tnl_send_out(struct vport *vport, struct sk_buff *skb,

These seem very fragile to me and it would be best to not have them in
this file at all.  Ideally we would have something closer to a real
GSO backport.  It's going to be a pain but this is the third place
this has come up (the first two were vlan and MPLS) so it might be
time.

> -int ovs_tnl_send(struct vport *vport, struct sk_buff *skb)
> +int ovs_tnl_send(struct vport *vport, struct sk_buff *skb, u8 ipproto,
> +                int tunnel_hlen,
> +                struct sk_buff *(*build_header)(const struct vport *,
> +                                                struct sk_buff *,
> +                                                int tunnel_hlen))
>  {
> -       struct tnl_vport *tnl_vport = tnl_vport_priv(vport);
> +       struct iphdr *inner_iph;
> +       struct iphdr *iph;
>         struct rtable *rt;
>         __be32 saddr;
>         int sent_len = 0;
> -       int tunnel_hlen;
> +       int min_headroom;
>
>         if (unlikely(!OVS_CB(skb)->tun_key))
>                 goto error_free;
>
> +       if (unlikely(vlan_deaccel_tag(skb)))
> +               goto error;
> +
>         /* Route lookup */
>         saddr = OVS_CB(skb)->tun_key->ipv4_src;
>         rt = find_route(ovs_dp_get_net(vport->dp),
>                         &saddr,
>                         OVS_CB(skb)->tun_key->ipv4_dst,
> -                       tnl_vport->tnl_ops->ipproto,
> +                       ipproto,
>                         OVS_CB(skb)->tun_key->ipv4_tos,
>                         skb_get_mark(skb));
>         if (IS_ERR(rt))
>                 goto error_free;
>
> +       skb_dst_drop(skb);
> +       skb_dst_set(skb, &rt_dst(rt));

I think everything below this point are dataplane components that
should be handled outside of OVS.  Why do we need to handle offloads,
unclone, reserve header space, reset the skb, select identities, etc.?

> diff --git a/datapath/vport-gre.c b/datapath/vport-gre.c
> index 40b96cf..3a722a1 100644
> --- a/datapath/vport-gre.c
> +++ b/datapath/vport-gre.c
> +static struct sk_buff *__gre_build_header(struct sk_buff *skb,
>                                int tunnel_hlen,
>                                bool is_gre64)
> +       skb = gre_handle_offloads(skb, (tun_key->tun_flags & OVS_TNL_F_CSUM));
> +       if (IS_ERR(skb))
> +               return NULL;

Can't we do this in the GRE code?

> @@ -267,27 +188,23 @@ static int gre_rcv(struct sk_buff *skb)
>         if (unlikely(!pskb_may_pull(skb, hdr_len + ETH_HLEN)))
>                 goto error;
>
> -       iph = ip_hdr(skb);
> -       tnl_flags = gre_flags_to_tunnel_flags(gre_flags, is_gre64);
> -       tnl_tun_key_init(&tun_key, iph, key, tnl_flags);
> +       tnl_tun_key_init(&tun_key, ip_hdr(skb), key, tnl_flags);
>         OVS_CB(skb)->tun_key = &tun_key;
>
>         __skb_pull(skb, hdr_len);
>         skb_postpull_rcsum(skb, skb_transport_header(skb), hdr_len + ETH_HLEN);

Things like pulling the headers off should happen in shared code.

> @@ -334,8 +245,11 @@ static struct vport *gre_create(const struct vport_parms *parms)
>         if (rtnl_dereference(ovs_net->vport_net.gre_vport))
>                 return ERR_PTR(-EEXIST);
>
> -       vport = ovs_tnl_create(parms, &ovs_gre_vport_ops, &gre_tnl_ops);
> +       vport = ovs_vport_alloc(0, &ovs_gre_vport_ops, parms);
> +       if (IS_ERR(vport))
> +               return vport;
>
> +       strncpy(ovs_net->vport_net.gre_name, parms->name, IFNAMSIZ);
>         rcu_assign_pointer(ovs_net->vport_net.gre_vport, vport);
>         return vport;

I think we need to move protocol registration in here so that it
happens on demand.  Ideally we should also figure out the right way to
make the modules load dynamically and not enforce either compile or
load time dependencies on the tunnel protocols.

> diff --git a/datapath/vport-vxlan.c b/datapath/vport-vxlan.c
> index 1850fc2..cf3064b 100644
> --- a/datapath/vport-vxlan.c
> +++ b/datapath/vport-vxlan.c
> +static struct vport *vxlan_tnl_create(const struct vport_parms *parms)
>  {
> -       err = vxlan_socket_init(vxlan_port, net);
> +       net = ovs_dp_get_net(parms->dp);
> +       ovs_vxlan_port = vxlan_vport_priv(vport);
> +       strncpy(ovs_vxlan_port->name, parms->name, IFNAMSIZ);
> +       vxlan_port = &ovs_vxlan_port->port;
> +       vxlan_port->portno = htons(dst_port);
> +       vxlan_port->vx_rcv = vxlan_rcv;
> +       vxlan_port->user_data = vport;
> +       err = vxlan_add_handler(net, vxlan_port);

Shouldn't we make this handler more closely resemble GRE in terms of
having priorities, etc.?  Having a limit of 8 ports also seems
somewhat arbitrary and could possibly lead to confusion.

> diff --git a/datapath/vport.c b/datapath/vport.c
> index d458a95..bfaf410 100644
> --- a/datapath/vport.c
> +++ b/datapath/vport.c
> @@ -208,6 +208,21 @@ void ovs_vport_free(struct vport *vport)
>         kfree(vport);
>  }
>
> +static void free_vport_rcu(struct rcu_head *rcu)
> +{
> +       struct vport *vport = container_of(rcu, struct vport, rcu);
> +
> +       ovs_vport_free(vport);
> +}
> +
> +void ovs_vport_deferred_destroy(struct vport *vport)
> +{
> +       if (!vport)
> +               return;
> +
> +       call_rcu(&vport->rcu, free_vport_rcu);
> +}

It seems like these changes could be broken out separately.



More information about the dev mailing list