[ovs-dev] [PATCH] datapath: Avoid null deref when GSO is for verifying header integrity only.

Jesse Gross jesse at nicira.com
Tue Jan 22 00:11:51 UTC 2013


On Mon, Jan 21, 2013 at 4:09 PM, Ben Pfaff <blp at nicira.com> wrote:
> On Mon, Jan 21, 2013 at 04:04:22PM -0800, Jesse Gross wrote:
>> On Mon, Jan 21, 2013 at 2:25 PM, Jesse Gross <jesse at nicira.com> wrote:
>> > On Mon, Jan 21, 2013 at 10:11 AM, Ben Pfaff <blp at nicira.com> wrote:
>> >> skb_gso_segment() has the following comment:
>> >>
>> >>  *    It may return NULL if the skb requires no segmentation.  This is
>> >>  *    only possible when GSO is used for verifying header integrity.
>> >>
>> >> Somehow queue_gso_packets() has never hit this case before, but some
>> >> failures have suddenly been reported.  This commit should fix the problem.
>> >>
>> >> Bug #14772.
>> >> Reported-by: Deepesh Govindan <dgovindan at vmware.com>.
>> >> Signed-off-by: Ben Pfaff <blp at nicira.com>
>> >
>> > Acked-by: Jesse Gross <jesse at nicira.com>
>> >
>> > However, I think we also potentially have a similar problem in
>> > tunnel.c:handle_offloads().
>> >
>> >> ---
>> >> I don't know how to trigger this condition, so I haven't tested this
>> >> patch beyond verifying that it compiles against the Linux 3.2 kernel
>> >> on which the problem was reported.  The patch was generated against
>> >> branch-1.7 because that's where the problem occurred; it looks like
>> >> it should apply cleanly against master also.
>> >
>> > We shouldn't normally be hitting this case because we're actually
>> > trying to do GSO, not header validation.  However, I guess the
>> > guest/backend must be generating a packet with an MSS, which tricks us
>> > into thinking that it's GSO, but no GSO is actually requested.  In the
>> > case of the bridge, header validation does take place so the situation
>> > is handled already.  It seems not ideal that the network backend
>> > doesn't sanitize these packets but it's probably good that we handle
>> > it in any case.
>>
>> I kept on looking into this and realized that my explanation is
>> completely wrong.  Both KVM and Xen do correctly sanitize the GSO
>> metadata.  The problem is actually on the receive side - LRO was
>> somehow enabled and that creates packets of the form that are causing
>> problems here.  We disable LRO when interfaces are added to OVS but
>> it's possible that someone turned it on by hand later on.  The real
>> source of the problem is that we're not checking and dropping LRO
>> packets on receive.  We're supposed to be doing this but when code was
>> being restructured for upstreaming the check was accidentally moved
>> from the receive to the transmit function where it obviously doesn't
>> do anything.  This is clearly a bug and moving it back should solve
>> the problem (the logs clearly confirm LRO as the problem).
>>
>> As far as the current patch, it should also mostly avoid the problem
>> but can potentially cause problems later on in the system.  With this
>> patch, we will send large packets up to userspace and they will come
>> back down to be forwarded but without any indication that they were
>> LRO packets originally.  This will probably just cause them to get
>> dropped later on but could cause issues with some subsystems (like
>> GRO, which tries to infer MSS based on packet size).  None of the
>> other places in the kernel besides dev.c that use skb_gso_segment()
>> have this check, so I think we probably should revert it.
>
> Ugh.
>
> Presumably you have a real fix coming up, I hope you're willing to pair
> a revert and a fix patch as you commit the fix?

Yeah, I'm working on it.



More information about the dev mailing list