[ovs-dev] [PATCH v2 2/2] datapath: Add Stateless TCP Tunneling protocol.
Jesse Gross
jesse at nicira.com
Wed Apr 1 00:07:25 UTC 2015
On Thu, Mar 26, 2015 at 4:41 PM, Pravin B Shelar <pshelar at nicira.com> wrote:
> diff --git a/datapath/datapath.c b/datapath/datapath.c
> index 7f431ed..ea9c6ae 100644
> --- a/datapath/datapath.c
> +++ b/datapath/datapath.c
> @@ -2192,6 +2192,7 @@ static int __net_init ovs_init_net(struct net *net)
>
> INIT_LIST_HEAD(&ovs_net->dps);
> INIT_WORK(&ovs_net->dp_notify_work, ovs_dp_notify_wq);
> + INIT_LIST_HEAD(&ovs_net->vport_net.stt_sock_list);
> return 0;
> }
>
In this previous version, this was a little bit more self contained.
Did you run into a problem with that? If not, the other version seems
a little cleaner to me, especially since this won't be upstream and so
the less invasive it is, the better.
> diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c
> new file mode 100644
> index 0000000..df643de
> --- /dev/null
> +++ b/datapath/linux/compat/stt.c
> +static void copy_skb_metadata(struct sk_buff *to, struct sk_buff *from)
> +{
> + to->tstamp = from->tstamp;
> + to->priority = from->priority;
> + to->mark = from->mark;
> + to->vlan_tci = from->vlan_tci;
> + skb_copy_secmark(to, from);
> +}
Is there any other metadata that we might need? What about vlan_proto?
(I think we also need to check/set this when encapsulating and
decapsulating.)
> +static bool __linearize(struct sk_buff *head, bool ipv4, bool tcp, bool csum_partial)
I might call this need_linearize() or similar, since it doesn't
actually do any linearization.
> +static int skb_list_linearize(struct sk_buff *head, gfp_t gfp_mask)
> +{
> + struct sk_buff *skb;
> + int tlen = 0;
> + int err;
> +
> + skb = head;
> + while (skb) {
> + tlen += skb->len;
> + skb = skb->next;
> + }
> +
> + err = pskb_expand_head(head, 0, tlen, gfp_mask);
> + if (err)
> + return err;
I think this includes the length of head in tlen, which is probably
not necessary.
> +static int __segment_skb(struct sk_buff **headp, bool ipv4, bool tcp, bool csum_partial, int l4_offset)
> +{
> + int err;
> +
> + err = straighten_frag_list(headp);
> + if (unlikely(err))
> + return err;
> +
> + if (__linearize(*headp, ipv4, tcp, csum_partial))
> + return skb_list_linearize(*headp, GFP_ATOMIC);
I don't think this check is in the right place. The conditions for
linearizing only come into play if we can't fully coalesce things and
would otherwise generate a frag_list. But we don't know that before
trying to coalesce.
> +static struct sk_buff *push_stt_header(struct sk_buff *head, __be64 tun_id,
> + __be16 s_port, __be16 d_port,
> + __be32 saddr, __be32 dst,
> + __be16 h_proto, u8 nw_proto,
> + int dst_mtu)
> +{
> + struct sk_buff *skb;
> +
> + if (skb_shinfo(head)->frag_list) {
> + bool ipv4 = (h_proto == htons(ETH_P_IP));
> + bool tcp = (nw_proto == IPPROTO_TCP);
> + bool csum_partial = (head->ip_summed == CHECKSUM_PARTIAL);
> + int l4_offset = skb_transport_offset(head);
> +
> + if (unlikely(__segment_skb(&head, ipv4, tcp,
> + csum_partial, l4_offset))) {
> + goto error;
My guess is that not all of the conditions in __linearize apply to the
transmit case (such as the one about offloading). It's worth checking.
At a higher level, is there a reason why STT is special in this regard
as far as inserting a header into a packet that has a frag_list? Don't
other tunnels need to deal with it?
> +static bool stt_can_offload(struct sk_buff *skb, __be16 h_proto, u8 nw_proto)
> +{
We probably should have a more direct check for GSO types in here. We
specifically check for SKB_GSO_TCP_ECN but a whitelist is better than
a blacklist. I think the tunnel offloads are already a problem is
somebody tries to do double encapsulation.
> +int stt_xmit_skb(struct sk_buff *skb, struct rtable *rt,
> + __be32 src, __be32 dst, __u8 tos,
> + __u8 ttl, __be16 df, __be16 src_port, __be16 dst_port,
> + __be64 tun_id)
> +{
[...]
> + if (!stt_can_offload(skb, inner_h_proto, inner_nw_proto)) {
> + struct sk_buff *nskb;
> +
> + nskb = handle_offloads(skb);
> + if (IS_ERR(nskb)) {
> + ret = PTR_ERR(nskb);
> + goto err_free_rt;
> + }
> + skb = nskb;
We might have an issue with MPLS here - it wasn't available yet in 3.5
so skb_gso_segment() will choke on it (we have backports but they
don't run in direct calls to skb_gso_segment()).
vlans should be OK by this kernel version, although we'll need to push
vlans with non-standard EtherTypes into the packet.
> +static void stt_rcv(struct stt_sock *stt_sock, struct sk_buff *skb)
> +{
[...]
> + if (unlikely(stt_hdr(skb)->version != 0))
> + goto drop;
Does this need to do a pskb_may_pull() first?
> + err = iptunnel_pull_header(skb,
> + sizeof(struct stthdr) + STT_ETH_PAD,
> + htons(ETH_P_TEB));
> + if (unlikely(err))
> + goto drop;
> +
> + if (unlikely(!set_offloads(skb)))
> + goto drop;
>
I don't think the above operations are legal to do on an skb that
hasn't been merged yet. If the header spans STT segments, they will
fail even if it shouldn't.
> diff --git a/datapath/vport.c b/datapath/vport.c
> index 5a7067b..697ee6a 100644
> --- a/datapath/vport.c
> +++ b/datapath/vport.c
> @@ -30,6 +30,7 @@
> #include <linux/compat.h>
> #include <linux/version.h>
> #include <net/net_namespace.h>
> +#include <net/stt.h>
I think STT internals probably don't need to leak into vport through
the header file.
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 81e8b3f..d6cb443 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -1863,6 +1863,17 @@
> </p>
> </dd>
>
> + <dt><code>stt</code></dt>
> + <dd>
> + The Stateless TCP Tunnel (STT) protocol encapsulates traffic in
> + IPv4/TCP packets. All traffic uses a default destination port of 7471.
> + The STT protocol does not engage in the usual TCP 3-way handshake,
> + so it will have difficulty traversing stateful firewalls. Therefore
> + it is better suited for hypervisor to hypervisor tunneling within
> + data center.
> + STT is only available in kernel datapath on kernel 3.5 or newer.
> + </dd>
Can you make the documentation a little more descriptive of the protocol?
More information about the dev
mailing list