[ovs-dev] [PATCH 2/2] datapath: Add Stateless TCP Tunneling protocol.
Jesse Gross
jesse at nicira.com
Mon Mar 23 19:23:43 UTC 2015
On Mon, Mar 9, 2015 at 3:12 PM, Pravin B Shelar <pshelar at nicira.com> wrote:
> openvswitch_headers = \
> diff --git a/datapath/linux/.gitignore b/datapath/linux/.gitignore
> index 69d6658..1d1aae7 100644
> --- a/datapath/linux/.gitignore
> +++ b/datapath/linux/.gitignore
> @@ -50,6 +50,7 @@
> /vport-lisp.c
> /vport-netdev.c
> /vport-patch.c
> +/vport-stt.c
> /vport-vxlan.c
> /vport.c
> /vxlan.c
Should we add stt.c as well?
> diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c
> new file mode 100644
> index 0000000..4cb4ea6
> --- /dev/null
> +++ b/datapath/linux/compat/stt.c
> +static int skb_list_linearize(struct sk_buff *head, gfp_t gfp_mask)
> +{
> + struct sk_buff *skb;
> + int tlen = 0;
> + int err;
> +
> + err = skb_linearize(head);
> + if (err)
> + return err;
> +
> + skb = head->next;
> + while (skb) {
> + tlen += skb->len;
> + skb = skb->next;
> + }
> + err = pskb_expand_head(head, 0, tlen, gfp_mask);
> + if (err)
> + return err;
Were you going to try to avoid linearizing and then expanding as Alex
had mentioned in his comments?
> +static int __build_segments(struct sk_buff **headp, bool ipv4, int l4_offset)
> +{
> + struct sk_buff *head = *headp;
> + struct sk_buff *nskb = NULL;
> + struct sk_buff *rskb, *skb;
> + struct tcphdr *tcph;
> + int seg_len = 0;
> + int hdr_len;
> + int tcp_len;
> + u32 seq;
> +
> + /* GRO can produce skbs with only the headers, which we've
> + * already pulled off. We can just dump them.
> + */
> + while (head->len == 0) {
> + nskb = head->next;
> + copy_skb_metadata(nskb, head);
> + consume_skb(head);
> + head = nskb;
> + }
> + *headp = head;
> + tcph = (struct tcphdr *)(head->data + l4_offset);
> + tcp_len = tcph->doff * 4;
> + hdr_len = l4_offset + tcp_len;
I think there is something wrong with the ordering of the calls in
stt_rcv(). reassemble() will just produce a linked list of skbs that
are expected to be joined here but we've already called functions that
do pskb_may_pull(), which won't traverse that list. In particular, it
doesn't make sense to both remove zero length skbs above and expect
that there is a TCP header available below.
> +
> + if (unlikely((tcp_len < sizeof(struct tcphdr)) ||
> + (head->len < hdr_len)))
> + return -EINVAL;
> +
> + if (unlikely(!pskb_may_pull(head, hdr_len)))
> + return -ENOMEM;
It seems risky to me to combine skb coalescing with TCP header
updating as we would need to very carefully check boundary conditions.
I feel like there are two halves of the loop anyways, so it seems
better to just break them apart.
> +static int __segment_skb(struct sk_buff **headp, bool ipv4, bool tcp, bool csum_partial, int l4_offset)
> +{
> + int err;
> +
> + err = straighten_frag_list(headp);
> + if (unlikely(err))
> + return err;
> +
> + if (__linearize(*headp, ipv4, tcp, csum_partial)) {
> + return skb_list_linearize(*headp, GFP_ATOMIC);
> + }
> +
> + if ((*headp)->next) {
I think we can push this check up above linearization. If there is no
list of skbs, we don't need to linearize either.
> +static int segment_skb(struct sk_buff **headp)
> +{
Isn't there a check for frag_list for the receive path missing
somewhere? It seems like linearization happens somewhat
unconditionally.
> +static int skb_list_xmit(struct rtable *rt, struct sk_buff *skb, __be32 src,
> + __be32 dst, __u8 tos, __u8 ttl, __be16 df)
> +{
> + int len = 0;
> +
> + while (skb) {
> + struct sk_buff *next = skb->next;
> +
> + if (next)
> + dst_clone(&rt->dst);
> +
> + skb->next = NULL;
> + len += iptunnel_xmit(NULL, rt, skb, src, dst, IPPROTO_TCP,
> + tos, ttl, df, false);
I think there may be a problem in our version of ip_local_out() if it
thinks that fix_segment is set in the CB, so we need to find a way to
make sure that it is cleared.
> +int stt_xmit_skb(struct sk_buff *skb, struct rtable *rt,
> + __be32 src, __be32 dst, __u8 tos,
> + __u8 ttl, __be16 df, __be16 src_port, __be16 dst_port,
> + __be64 tun_id)
[...]
> + while (skb) {
> + struct sk_buff *next_skb = skb->next;
> +
> + skb->next = NULL;
> +
> + if (next_skb)
> + dst_clone(&rt->dst);
> +
> + /* Push STT and TCP header. */
> + skb = push_stt_header(skb, tun_id, src_port, dst_port, src,
> + dst, inner_h_proto, inner_nw_proto,
> + dst_mtu(&rt->dst));
> + if (unlikely(!skb))
> + goto next;
I think it's somewhat surprising for a function called
push_stt_header() to free the skb in the event of an error. I actually
think that there is a double free already between __push_stt_header()
and push_stt_header() so it seems like it is better to just propagate
the error back here and then free.
> +static bool set_offloads(struct sk_buff *skb)
> +{
[...]
> + if (proto_type == (STT_PROTO_IPV4 | STT_PROTO_TCP)) {
Should we convert this to a switch statement? It seems mildly cleaner.
> +void stt_sock_release(struct net *net, __be16 port)
Is there a reason to not just pass in stt_sock here? It seems cleaner
and more consistent.
> + struct stt_sock *stt_sock;
> +
> + mutex_lock(&stt_mutex);
> + rcu_read_lock();
> + stt_sock = stt_find_sock(net, port);
> + rcu_read_unlock();
If we can't pass in stt_sock then I don't think we need rcu_read_lock
since we're holding the mutex.
> diff --git a/datapath/vport-stt.c b/datapath/vport-stt.c
> new file mode 100644
> index 0000000..ccd4c71
> --- /dev/null
> +++ b/datapath/vport-stt.c
> +#if LINUX_VERSION_CODE >= KERNEL_VERSION(3,5,0)
I think we need a check for CONFIG_NETFILTER someplace, otherwise the
registration mechanism won't compile. I wonder if both of these checks
should be pushed down to stt.c though since that's where they actually
matter.
> diff --git a/datapath/vport.h b/datapath/vport.h
> index e256fc0..50e6289 100644
> --- a/datapath/vport.h
> +++ b/datapath/vport.h
> @@ -35,6 +35,7 @@ struct vport_parms;
> struct vport_net {
> struct vport __rcu *gre_vport;
> struct vport __rcu *gre64_vport;
> + struct vport __rcu *stt_vport;
I think this is no longer used now that the port number is configurable.
> diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
> index 8e1b542..bd7f146 100644
> --- a/lib/netdev-vport.c
> +++ b/lib/netdev-vport.c
> @@ -153,7 +154,7 @@ netdev_vport_needs_dst_port(const struct netdev *dev)
>
> return (class->get_config == get_tunnel_config &&
> (!strcmp("geneve", type) || !strcmp("vxlan", type) ||
> - !strcmp("lisp", type)));
> + !strcmp("lisp", type) || !strcmp("stt", type)) );
> }
I think we need some additional support for configurable STT ports in
netdev_vport_construct() and get_tunnel_config().
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index dac3756..1b9da1d 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> + <dt><code>stt</code></dt>
> + <dd>
> + The Stateless TCP Tunnel (STT) protocol encapsulates traffic in
> + IPv4/TCP packets. All traffic uses a destination port of 7471.
> + The STT protocol does not engage in the usual TCP 3-way handshake,
> + so it will have difficulty traversing stateful firewalls.
> + STT is only available in kernel datapath on kernel 3.5 or newer.
> + </dd>
Can you make this a little more descriptive? I think right now it is
hard to understand when this should or should not be used.
More information about the dev
mailing list