[ovs-dev] [PATCH] [RFC] [linux kernel patch] Add TCP encap_rcv hook

Simon Horman horms at verge.net.au
Fri Apr 6 00:35:04 UTC 2012


On Thu, Apr 05, 2012 at 05:00:12PM -0700, Jesse Gross wrote:
> On Wed, Apr 4, 2012 at 6:08 PM, Simon Horman <horms at verge.net.au> wrote:
> > diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> > index 3c7ffdb..36e794b 100644
> > --- a/include/linux/tcp.h
> > +++ b/include/linux/tcp.h
> > @@ -472,6 +472,11 @@ struct tcp_sock {
> >         * contains related tcp_cookie_transactions fields.
> >         */
> >        struct tcp_cookie_values  *cookie_values;
> > +
> > +       /* For encapsulation sockets. */
> > +       __u16            encap_type;
> 
> Do we actually get any value out of encap_type?  It seems equivalent
> to encap_rcv != NULL.

I agree, but this is in keeping with the UDP implementation.

> > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> > index fd54c5f..ce56c6c 100644
> > --- a/net/ipv4/tcp_ipv4.c
> > +++ b/net/ipv4/tcp_ipv4.c
> > @@ -1714,6 +1715,32 @@ process:
> >
> >        if (!xfrm4_policy_check(sk, XFRM_POLICY_IN, skb))
> >                goto discard_and_relse;
> > +
> > +       tp = tcp_sk(sk);
> 
> I wonder if this is the right place for this hook.  It seems odd to be
> after the TIME_WAIT check, for example.

True. I pushed it down here because I felt that the xfrm4_policy_check and
ttl checks made sense and the TIME_WAIT would always fail.  If not, perhaps
immediately after the socket lookup would be a better location.

> 
> > +       if (tp->encap_type) {
> > +               int (*encap_rcv)(struct sock *sk, struct sk_buff *skb);
> > +
> > +               /*
> > +                * This is an encapsulation socket so pass the skb to
> > +                * the socket's udp_encap_rcv() hook. Otherwise, just
> > +                * fall through and pass this up the TCP socket.
> > +                * up->encap_rcv() returns the following value:
> > +                * =0 if skb was successfully passed to the encap
> > +                *    handler or was discarded by it.
> > +                * >0 if skb should be passed on to TCP.
> > +                * <0 if skb should be resubmitted as proto -N
> > +                */
> > +
> > +               /* if we're overly short, let UDP handle it */
> 
> There are still a number of references to UDP here.

Oops.

> > +               encap_rcv = ACCESS_ONCE(tp->encap_rcv);
> > +               if (encap_rcv) {
> > +                       int ret = encap_rcv(sk, skb);
> > +                       if (ret <= 0)
> > +                               return -ret;
> 
> It seems odd to resubmit as a different protocol here.  With UDP it's
> common to have something that runs over either bare IP or encapsulated
> in UDP but I'm not sure that the same really applies here.
> 
> > +               }

Ok, so perhaps it would be best to just have

	return encap_rcv(sk, skb);

This was the behaviour of my original version of this patch but
I thought it may make sense to have behaviour closer to UDP's encap_rcv.

> 
> Doesn't this leak a reference to the socket if the encap_rcv handler
> takes the packet?

I will look into that.



More information about the dev mailing list