[ovs-dev] [PATCHv2] Add support for LISP tunneling

Thu Feb 14 23:32:26 UTC 2013

On Wed, Feb 13, 2013 at 6:44 AM, Lorand Jakab <lojakab at cisco.com> wrote:
> LISP is an experimental layer 3 tunneling protocol, described in RFC
> 6830.  This patch adds support for LISP tunneling.  Since LISP
> encapsulated packets do not carry an Ethernet header, it is removed
> before encapsulation, and added with hardcoded source and destination
> MAC addresses after decapsulation.  The harcoded MAC chosen for this
> purpose is the locally administered address 02:00:00:00:00:00.  Flow
> actions can be used to rewrite this MAC for correct reception.  As such,
> this patch is intended to be used for static network configurations, or
> with a LISP capable controller.
>
> Signed-off-by: Lorand Jakab <lojakab at cisco.com>
> Signed-off-by: Kyle Mestery <kmestery at cisco.com>

Sorry about the delay and thanks for the new version.

I received a few sparse errors after I applied this:

  CHECK   /home/jgross/openvswitch/datapath/linux/vport-lisp.c
/home/jgross/openvswitch/datapath/linux/vport-lisp.c:169:18: warning:
restricted __be64 degrades to integer
/home/jgross/openvswitch/datapath/linux/vport-lisp.c:169:51: warning:
restricted __be64 degrades to integer
/home/jgross/openvswitch/datapath/linux/vport-lisp.c:170:18: warning:
restricted __be64 degrades to integer
/home/jgross/openvswitch/datapath/linux/vport-lisp.c:169:80: warning:
incorrect type in return expression (different base types)
/home/jgross/openvswitch/datapath/linux/vport-lisp.c:169:80:
expected restricted __be64
/home/jgross/openvswitch/datapath/linux/vport-lisp.c:169:80:    got
unsigned long long

I think these are just issues with casts though and not real problems.

> diff --git a/NEWS b/NEWS
> index 3a2b5fe..1828294 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -28,6 +28,7 @@ v1.10.0 - xx xxx xxxx
>      - Tunneling:
>        - New support for the VXLAN tunnel protocol (see the IETF draft here:
>          http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-02).
> +      - New support for the LISP tunnel protocol (RFC 6830).

At this point, we've already branched for 1.10, so we'll want to put
this item in the post-1.10 category.  We may also want to note that
this is support for the wire format or otherwise indicate that some
pieces still need to be filled in.

> diff --git a/datapath/vport-lisp.c b/datapath/vport-lisp.c
> new file mode 100644
> index 0000000..9ca851c
> --- /dev/null
> +++ b/datapath/vport-lisp.c
[...]
> +#define LISP_DST_PORT 4341  /* Well known UDP port for LISP data packets. */

Does it make sense to make this configurable as it is for VXLAN?  I
realize that unlike VXLAN, IANA has allocated a port for LISP.
However, perhaps an analogous situation is whether one would want to
bake the port number into an ASIC.

Incidentally, with flow based tunneling, it may be possible to
simplify the port number refcounting.  Since it is now only necessary
to have a single tunnel port for each UDP port, we could just enforce
that in the kernel and not have to deal with counting the number of
users.

> +/* Compute source port for outgoing packet.
> + * Currently we use the flow hash.
> + */
> +static u16 get_src_port(struct sk_buff *skb)
> +{
> +       int low;
> +       int high;
> +       unsigned int range;
> +       u32 hash = OVS_CB(skb)->flow->hash;
> +
> +       inet_get_local_port_range(&low, &high);
> +       range = (high - low) + 1;
> +       return (((u64) hash * range) >> 32) + low;
> +}

Since this is the same as VXLAN, it might be worth factoring it out to
some common location.

> +static int lisp_tnl_send(struct vport *vport, struct sk_buff *skb)
> +{
> +       int network_offset = skb_network_offset(skb);
> +
> +       /* We only encapsulate IPv4 and IPv6 packets */
> +       switch (ntohs(skb->protocol)) {
> +       case ETH_P_IP:
> +       case ETH_P_IPV6:

It's marginally more efficient if you put the byte swap on the case
labels rather than on the switch since they can be computed at compile
time rather than runtime.

> +/* Convert 64 bit tunnel ID to 24 bit Instance ID. */
> +static void tunnel_id_to_instance_id(__be64 tun_id, __u8 *iid)
> +{
> +
> +#ifdef __BIG_ENDIAN
> +       iid[0] = (__force __u8)(tun_id >> 16);
> +       iid[1] = (__force __u8)(tun_id >> 8);
> +       iid[2] = (__force __u8)tun_id;
> +#else
> +       iid[0] = (__force __u8)((__force u64)tun_id >> 40);
> +       iid[1] = (__force __u8)((__force u64)tun_id >> 48);
> +       iid[2] = (__force __u8)((__force u64)tun_id >> 56);
> +#endif
> +}
> +
> +/* Convert 24 bit Instance ID to 64 bit tunnel ID. */
> +static __be64 instance_id_to_tunnel_id(__u8 *iid)
> +{
> +#ifdef __BIG_ENDIAN
> +       return (iid[0] << 16) | (iid[1] << 8) | iid[2];
> +#else
> +       return ((__force __be64)iid[0] << 40) | ((__force __be64)iid[1] << 48) |
> +              ((__force __be64)iid[2] << 56);
> +#endif
> +}

Were you able to test that these conversions are correct by
communicating with an existing implementation? (I'm not saying that
they aren't, it's just good to have a sanity check.)

> +/* Called with rcu_read_lock and BH disabled. */
> +static int lisp_rcv(struct sock *sk, struct sk_buff *skb)
> +{
> +       struct vport *vport;
> +       struct lisphdr *lisph;
> +       const struct tnl_mutable_config *mutable;
> +       struct iphdr *iph, *inner_iph;
> +       struct ovs_key_ipv4_tunnel tun_key;
> +       __be64 key;
> +       u32 tunnel_flags = 0;
> +       struct ethhdr *ethh;
> +       __be16 protocol;
> +
> +       if (unlikely(!pskb_may_pull(skb, LISP_HLEN)))
> +               goto error;
> +
> +       lisph = lisp_hdr(skb);
> +
> +       __skb_pull(skb, LISP_HLEN);
> +       skb_postpull_rcsum(skb, skb_transport_header(skb), LISP_HLEN);

Since the length for __skb_pull() and skb_postpull_rcsum() is the
same, we might as well use skb_pull_rcsum(), which is both easier and
safer.

I think using the transport header as the starting point is not right
since that's actually the UDP header.  Using the combined function
will avoid the problem here but I think it's an issue with all of our
UDP-based protocols.

> +       iph = ip_hdr(skb);
> +       vport = ovs_tnl_find_port(dev_net(skb->dev), iph->daddr, iph->saddr,
> +                                 key, TNL_T_PROTO_LISP, &mutable);
> +       if (unlikely(!vport)) {
> +               icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);

I recently removed sending ICMP packets when tunnels are not found
since it caused problems in a few situations and shouldn't be
necessary any longer.  Although I don't think that it will cause
problems here, we should probably not include it for consistency.

> +       /* Add Ethernet header */
> +       skb_push(skb, ETH_HLEN);
> +
> +       ethh = (struct ethhdr *)skb->data;

skb_push() returns a pointer to the new data, so it would be a little
cleaner if you combined them into a single statement.

> +/* Arbitrary value.  Irrelevant as long as it's not 0 since we set the handler. */
> +#define UDP_ENCAP_LISP 7

I would probably just use 1 here, 7 makes it seem like there is
something else going on.

> diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
> index 7c6e3ab..edccf11 100644
> --- a/include/linux/openvswitch.h
> +++ b/include/linux/openvswitch.h
> @@ -184,6 +184,7 @@ enum ovs_vport_type {
>         OVS_VPORT_TYPE_INTERNAL, /* network device implemented by datapath */
>         OVS_VPORT_TYPE_GRE,      /* GRE tunnel. */
>         OVS_VPORT_TYPE_VXLAN,    /* VXLAN tunnel */
> +       OVS_VPORT_TYPE_LISP,     /* LISP tunnel */

Assuming that this continues to evolve to move natively support L3
tunnels, I think it's going to be necessary to make some incompatible
changes in the future.  Given that, I would probably put the
identifier at the top of the non-upstream range (i.e. 105).