[ovs-dev] [PATCH] datapath: Avoid null deref when GSO is for verifying header integrity only.

Ben Pfaff blp at nicira.com
Tue Jan 22 00:09:31 UTC 2013


On Mon, Jan 21, 2013 at 04:04:22PM -0800, Jesse Gross wrote:
> On Mon, Jan 21, 2013 at 2:25 PM, Jesse Gross <jesse at nicira.com> wrote:
> > On Mon, Jan 21, 2013 at 10:11 AM, Ben Pfaff <blp at nicira.com> wrote:
> >> skb_gso_segment() has the following comment:
> >>
> >>  *    It may return NULL if the skb requires no segmentation.  This is
> >>  *    only possible when GSO is used for verifying header integrity.
> >>
> >> Somehow queue_gso_packets() has never hit this case before, but some
> >> failures have suddenly been reported.  This commit should fix the problem.
> >>
> >> Bug #14772.
> >> Reported-by: Deepesh Govindan <dgovindan at vmware.com>.
> >> Signed-off-by: Ben Pfaff <blp at nicira.com>
> >
> > Acked-by: Jesse Gross <jesse at nicira.com>
> >
> > However, I think we also potentially have a similar problem in
> > tunnel.c:handle_offloads().
> >
> >> ---
> >> I don't know how to trigger this condition, so I haven't tested this
> >> patch beyond verifying that it compiles against the Linux 3.2 kernel
> >> on which the problem was reported.  The patch was generated against
> >> branch-1.7 because that's where the problem occurred; it looks like
> >> it should apply cleanly against master also.
> >
> > We shouldn't normally be hitting this case because we're actually
> > trying to do GSO, not header validation.  However, I guess the
> > guest/backend must be generating a packet with an MSS, which tricks us
> > into thinking that it's GSO, but no GSO is actually requested.  In the
> > case of the bridge, header validation does take place so the situation
> > is handled already.  It seems not ideal that the network backend
> > doesn't sanitize these packets but it's probably good that we handle
> > it in any case.
> 
> I kept on looking into this and realized that my explanation is
> completely wrong.  Both KVM and Xen do correctly sanitize the GSO
> metadata.  The problem is actually on the receive side - LRO was
> somehow enabled and that creates packets of the form that are causing
> problems here.  We disable LRO when interfaces are added to OVS but
> it's possible that someone turned it on by hand later on.  The real
> source of the problem is that we're not checking and dropping LRO
> packets on receive.  We're supposed to be doing this but when code was
> being restructured for upstreaming the check was accidentally moved
> from the receive to the transmit function where it obviously doesn't
> do anything.  This is clearly a bug and moving it back should solve
> the problem (the logs clearly confirm LRO as the problem).
> 
> As far as the current patch, it should also mostly avoid the problem
> but can potentially cause problems later on in the system.  With this
> patch, we will send large packets up to userspace and they will come
> back down to be forwarded but without any indication that they were
> LRO packets originally.  This will probably just cause them to get
> dropped later on but could cause issues with some subsystems (like
> GRO, which tries to infer MSS based on packet size).  None of the
> other places in the kernel besides dev.c that use skb_gso_segment()
> have this check, so I think we probably should revert it.

Ugh.

Presumably you have a real fix coming up, I hope you're willing to pair
a revert and a fix patch as you commit the fix?



More information about the dev mailing list