[ovs-dev] [PATCHv2] Add support for LISP tunneling

Lori Jakab lojakab at cisco.com
Fri Feb 15 18:34:38 UTC 2013


On 02/15/13 01:32, Jesse Gross wrote:
> On Wed, Feb 13, 2013 at 6:44 AM, Lorand Jakab <lojakab at cisco.com> wrote:
>> LISP is an experimental layer 3 tunneling protocol, described in RFC
>> 6830.  This patch adds support for LISP tunneling.  Since LISP
>> encapsulated packets do not carry an Ethernet header, it is removed
>> before encapsulation, and added with hardcoded source and destination
>> MAC addresses after decapsulation.  The harcoded MAC chosen for this
>> purpose is the locally administered address 02:00:00:00:00:00.  Flow
>> actions can be used to rewrite this MAC for correct reception.  As such,
>> this patch is intended to be used for static network configurations, or
>> with a LISP capable controller.
>>
>> Signed-off-by: Lorand Jakab <lojakab at cisco.com>
>> Signed-off-by: Kyle Mestery <kmestery at cisco.com>
> 
> Sorry about the delay and thanks for the new version.

Thanks for reviewing!

> 
> I received a few sparse errors after I applied this:
> 
>   CHECK   /home/jgross/openvswitch/datapath/linux/vport-lisp.c
> /home/jgross/openvswitch/datapath/linux/vport-lisp.c:169:18: warning:
> restricted __be64 degrades to integer
> /home/jgross/openvswitch/datapath/linux/vport-lisp.c:169:51: warning:
> restricted __be64 degrades to integer
> /home/jgross/openvswitch/datapath/linux/vport-lisp.c:170:18: warning:
> restricted __be64 degrades to integer
> /home/jgross/openvswitch/datapath/linux/vport-lisp.c:169:80: warning:
> incorrect type in return expression (different base types)
> /home/jgross/openvswitch/datapath/linux/vport-lisp.c:169:80:
> expected restricted __be64
> /home/jgross/openvswitch/datapath/linux/vport-lisp.c:169:80:    got
> unsigned long long
> 
> I think these are just issues with casts though and not real problems.
> 
>> diff --git a/NEWS b/NEWS
>> index 3a2b5fe..1828294 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -28,6 +28,7 @@ v1.10.0 - xx xxx xxxx
>>      - Tunneling:
>>        - New support for the VXLAN tunnel protocol (see the IETF draft here:
>>          http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-02).
>> +      - New support for the LISP tunnel protocol (RFC 6830).
> 
> At this point, we've already branched for 1.10, so we'll want to put
> this item in the post-1.10 category.  We may also want to note that
> this is support for the wire format or otherwise indicate that some
> pieces still need to be filled in.

OK, will move up and clarify that this implements only the LISP data
plane, and a LISP capable controller is necessary to take full advantage
of this tunneling code.

> 
>> diff --git a/datapath/vport-lisp.c b/datapath/vport-lisp.c
>> new file mode 100644
>> index 0000000..9ca851c
>> --- /dev/null
>> +++ b/datapath/vport-lisp.c
> [...]
>> +#define LISP_DST_PORT 4341  /* Well known UDP port for LISP data packets. */
> 
> Does it make sense to make this configurable as it is for VXLAN?  I
> realize that unlike VXLAN, IANA has allocated a port for LISP.
> However, perhaps an analogous situation is whether one would want to
> bake the port number into an ASIC.

Configurability would only be useful between OVS implementations, since
all other implementations to date that I know of have this IANA
allocated port number hardcoded.  It helps network monitoring as well,
if LISP traffic is using the well known port only.  The only benefit of
configurable UDP ports I can think of would be to avoid restrictive
firewalls (set it to port 53?).  So personally I don't see value in
making this configurable, but I'm open to be convinced. ;)

> 
> Incidentally, with flow based tunneling, it may be possible to
> simplify the port number refcounting.  Since it is now only necessary
> to have a single tunnel port for each UDP port, we could just enforce
> that in the kernel and not have to deal with counting the number of
> users.
> 
>> +/* Compute source port for outgoing packet.
>> + * Currently we use the flow hash.
>> + */
>> +static u16 get_src_port(struct sk_buff *skb)
>> +{
>> +       int low;
>> +       int high;
>> +       unsigned int range;
>> +       u32 hash = OVS_CB(skb)->flow->hash;
>> +
>> +       inet_get_local_port_range(&low, &high);
>> +       range = (high - low) + 1;
>> +       return (((u64) hash * range) >> 32) + low;
>> +}
> 
> Since this is the same as VXLAN, it might be worth factoring it out to
> some common location.

What would be the correct place for this?  datapath/tunnel.c ?  Or maybe
as a 'static inline' function in datapath/tunnel.h ?  I will move it
there and remove it from the VXLAN vport code as well.

> 
>> +static int lisp_tnl_send(struct vport *vport, struct sk_buff *skb)
>> +{
>> +       int network_offset = skb_network_offset(skb);
>> +
>> +       /* We only encapsulate IPv4 and IPv6 packets */
>> +       switch (ntohs(skb->protocol)) {
>> +       case ETH_P_IP:
>> +       case ETH_P_IPV6:
> 
> It's marginally more efficient if you put the byte swap on the case
> labels rather than on the switch since they can be computed at compile
> time rather than runtime.

Indeed, I will update.  I like this kind of attention to detail ;)

> 
>> +/* Convert 64 bit tunnel ID to 24 bit Instance ID. */
>> +static void tunnel_id_to_instance_id(__be64 tun_id, __u8 *iid)
>> +{
>> +
>> +#ifdef __BIG_ENDIAN
>> +       iid[0] = (__force __u8)(tun_id >> 16);
>> +       iid[1] = (__force __u8)(tun_id >> 8);
>> +       iid[2] = (__force __u8)tun_id;
>> +#else
>> +       iid[0] = (__force __u8)((__force u64)tun_id >> 40);
>> +       iid[1] = (__force __u8)((__force u64)tun_id >> 48);
>> +       iid[2] = (__force __u8)((__force u64)tun_id >> 56);
>> +#endif
>> +}
>> +
>> +/* Convert 24 bit Instance ID to 64 bit tunnel ID. */
>> +static __be64 instance_id_to_tunnel_id(__u8 *iid)
>> +{
>> +#ifdef __BIG_ENDIAN
>> +       return (iid[0] << 16) | (iid[1] << 8) | iid[2];
>> +#else
>> +       return ((__force __be64)iid[0] << 40) | ((__force __be64)iid[1] << 48) |
>> +              ((__force __be64)iid[2] << 56);
>> +#endif
>> +}
> 
> Were you able to test that these conversions are correct by
> communicating with an existing implementation? (I'm not saying that
> they aren't, it's just good to have a sanity check.)

I only tested communications between two OVS instances so far, but I did
do a sanity check using Wireshark, with IID values such as 0x010203, to
make sure the bytes are there in the correct order, from the correct
positions.

> 
>> +/* Called with rcu_read_lock and BH disabled. */
>> +static int lisp_rcv(struct sock *sk, struct sk_buff *skb)
>> +{
>> +       struct vport *vport;
>> +       struct lisphdr *lisph;
>> +       const struct tnl_mutable_config *mutable;
>> +       struct iphdr *iph, *inner_iph;
>> +       struct ovs_key_ipv4_tunnel tun_key;
>> +       __be64 key;
>> +       u32 tunnel_flags = 0;
>> +       struct ethhdr *ethh;
>> +       __be16 protocol;
>> +
>> +       if (unlikely(!pskb_may_pull(skb, LISP_HLEN)))
>> +               goto error;
>> +
>> +       lisph = lisp_hdr(skb);
>> +
>> +       __skb_pull(skb, LISP_HLEN);
>> +       skb_postpull_rcsum(skb, skb_transport_header(skb), LISP_HLEN);
> 
> Since the length for __skb_pull() and skb_postpull_rcsum() is the
> same, we might as well use skb_pull_rcsum(), which is both easier and
> safer.
> 
> I think using the transport header as the starting point is not right
> since that's actually the UDP header.  Using the combined function
> will avoid the problem here but I think it's an issue with all of our
> UDP-based protocols.

OK, I will pull them into a single call to skb_pull_rcsum().

> 
>> +       iph = ip_hdr(skb);
>> +       vport = ovs_tnl_find_port(dev_net(skb->dev), iph->daddr, iph->saddr,
>> +                                 key, TNL_T_PROTO_LISP, &mutable);
>> +       if (unlikely(!vport)) {
>> +               icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
> 
> I recently removed sending ICMP packets when tunnels are not found
> since it caused problems in a few situations and shouldn't be
> necessary any longer.  Although I don't think that it will cause
> problems here, we should probably not include it for consistency.

Sorry, I missed that when rebasing, I will remove this to have it
consistent with the rest of the code.

> 
>> +       /* Add Ethernet header */
>> +       skb_push(skb, ETH_HLEN);
>> +
>> +       ethh = (struct ethhdr *)skb->data;
> 
> skb_push() returns a pointer to the new data, so it would be a little
> cleaner if you combined them into a single statement.

Will fix, thank you.

> 
>> +/* Arbitrary value.  Irrelevant as long as it's not 0 since we set the handler. */
>> +#define UDP_ENCAP_LISP 7
> 
> I would probably just use 1 here, 7 makes it seem like there is
> something else going on.

OK.

> 
>> diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
>> index 7c6e3ab..edccf11 100644
>> --- a/include/linux/openvswitch.h
>> +++ b/include/linux/openvswitch.h
>> @@ -184,6 +184,7 @@ enum ovs_vport_type {
>>         OVS_VPORT_TYPE_INTERNAL, /* network device implemented by datapath */
>>         OVS_VPORT_TYPE_GRE,      /* GRE tunnel. */
>>         OVS_VPORT_TYPE_VXLAN,    /* VXLAN tunnel */
>> +       OVS_VPORT_TYPE_LISP,     /* LISP tunnel */
> 
> Assuming that this continues to evolve to move natively support L3
> tunnels, I think it's going to be necessary to make some incompatible
> changes in the future.  Given that, I would probably put the
> identifier at the top of the non-upstream range (i.e. 105).

When you say incompatible changes, do you mean breaking existing user
scripts for setting up flows, etc.?  I can only think of the MAC address
rewrite that we have to do now as hack, since we don't do ARP
resolution.  That rule however would not break a future implementation,
and is really simple to remove by users.  Are there other things you
foresee that would change?

If possible, we would like to see LISP as a candidate for upstreaming,
and will try our best to avoid backwards compatible changes.

BTW, how does upstreaming work?  Is there a branch, which gets pulled by
the netdev maintainers, when the kernel merge window opens?  Or do you
sned a pull request from latest master during the window?  Or they pull
point releases?  How often do they pull changes?  Every merge window?
Or upstreaming is not tied to the kernel merge window, because it goes
to the network subsystem tree first?  I'm asking this to see if we can
at least do the design of the L3 work before the next upstream code
pull, and see how backward compatibility would be affected by the
planned changes.

Thanks again for considering,
-Lori



More information about the dev mailing list