[ovs-dev] [PATCH] [RFC] [linux kernel patch] Add TCP encap_rcv hook
Simon Horman
horms at verge.net.au
Fri Apr 6 00:35:04 UTC 2012
On Thu, Apr 05, 2012 at 05:00:12PM -0700, Jesse Gross wrote:
> On Wed, Apr 4, 2012 at 6:08 PM, Simon Horman <horms at verge.net.au> wrote:
> > diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> > index 3c7ffdb..36e794b 100644
> > --- a/include/linux/tcp.h
> > +++ b/include/linux/tcp.h
> > @@ -472,6 +472,11 @@ struct tcp_sock {
> > * contains related tcp_cookie_transactions fields.
> > */
> > struct tcp_cookie_values *cookie_values;
> > +
> > + /* For encapsulation sockets. */
> > + __u16 encap_type;
>
> Do we actually get any value out of encap_type? It seems equivalent
> to encap_rcv != NULL.
I agree, but this is in keeping with the UDP implementation.
> > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > index fd54c5f..ce56c6c 100644
> > --- a/net/ipv4/tcp_ipv4.c
> > +++ b/net/ipv4/tcp_ipv4.c
> > @@ -1714,6 +1715,32 @@ process:
> >
> > if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb))
> > goto discard_and_relse;
> > +
> > + tp = tcp_sk(sk);
>
> I wonder if this is the right place for this hook. It seems odd to be
> after the TIME_WAIT check, for example.
True. I pushed it down here because I felt that the xfrm4_policy_check and
ttl checks made sense and the TIME_WAIT would always fail. If not, perhaps
immediately after the socket lookup would be a better location.
>
> > + if (tp->encap_type) {
> > + int (*encap_rcv)(struct sock *sk, struct sk_buff *skb);
> > +
> > + /*
> > + * This is an encapsulation socket so pass the skb to
> > + * the socket's udp_encap_rcv() hook. Otherwise, just
> > + * fall through and pass this up the TCP socket.
> > + * up->encap_rcv() returns the following value:
> > + * =0 if skb was successfully passed to the encap
> > + * handler or was discarded by it.
> > + * >0 if skb should be passed on to TCP.
> > + * <0 if skb should be resubmitted as proto -N
> > + */
> > +
> > + /* if we're overly short, let UDP handle it */
>
> There are still a number of references to UDP here.
Oops.
> > + encap_rcv = ACCESS_ONCE(tp->encap_rcv);
> > + if (encap_rcv) {
> > + int ret = encap_rcv(sk, skb);
> > + if (ret <= 0)
> > + return -ret;
>
> It seems odd to resubmit as a different protocol here. With UDP it's
> common to have something that runs over either bare IP or encapsulated
> in UDP but I'm not sure that the same really applies here.
>
> > + }
Ok, so perhaps it would be best to just have
return encap_rcv(sk, skb);
This was the behaviour of my original version of this patch but
I thought it may make sense to have behaviour closer to UDP's encap_rcv.
>
> Doesn't this leak a reference to the socket if the encap_rcv handler
> takes the packet?
I will look into that.
More information about the dev
mailing list