[ovs-dev] [PATCH] datapath: Avoid null deref when GSO is for verifying header integrity only.
jesse at nicira.com
Tue Jan 22 00:04:22 UTC 2013
On Mon, Jan 21, 2013 at 2:25 PM, Jesse Gross <jesse at nicira.com> wrote:
> On Mon, Jan 21, 2013 at 10:11 AM, Ben Pfaff <blp at nicira.com> wrote:
>> skb_gso_segment() has the following comment:
>> * It may return NULL if the skb requires no segmentation. This is
>> * only possible when GSO is used for verifying header integrity.
>> Somehow queue_gso_packets() has never hit this case before, but some
>> failures have suddenly been reported. This commit should fix the problem.
>> Bug #14772.
>> Reported-by: Deepesh Govindan <dgovindan at vmware.com>.
>> Signed-off-by: Ben Pfaff <blp at nicira.com>
> Acked-by: Jesse Gross <jesse at nicira.com>
> However, I think we also potentially have a similar problem in
>> I don't know how to trigger this condition, so I haven't tested this
>> patch beyond verifying that it compiles against the Linux 3.2 kernel
>> on which the problem was reported. The patch was generated against
>> branch-1.7 because that's where the problem occurred; it looks like
>> it should apply cleanly against master also.
> We shouldn't normally be hitting this case because we're actually
> trying to do GSO, not header validation. However, I guess the
> guest/backend must be generating a packet with an MSS, which tricks us
> into thinking that it's GSO, but no GSO is actually requested. In the
> case of the bridge, header validation does take place so the situation
> is handled already. It seems not ideal that the network backend
> doesn't sanitize these packets but it's probably good that we handle
> it in any case.
I kept on looking into this and realized that my explanation is
completely wrong. Both KVM and Xen do correctly sanitize the GSO
metadata. The problem is actually on the receive side - LRO was
somehow enabled and that creates packets of the form that are causing
problems here. We disable LRO when interfaces are added to OVS but
it's possible that someone turned it on by hand later on. The real
source of the problem is that we're not checking and dropping LRO
packets on receive. We're supposed to be doing this but when code was
being restructured for upstreaming the check was accidentally moved
from the receive to the transmit function where it obviously doesn't
do anything. This is clearly a bug and moving it back should solve
the problem (the logs clearly confirm LRO as the problem).
As far as the current patch, it should also mostly avoid the problem
but can potentially cause problems later on in the system. With this
patch, we will send large packets up to userspace and they will come
back down to be forwarded but without any indication that they were
LRO packets originally. This will probably just cause them to get
dropped later on but could cause issues with some subsystems (like
GRO, which tries to infer MSS based on packet size). None of the
other places in the kernel besides dev.c that use skb_gso_segment()
have this check, so I think we probably should revert it.
More information about the dev