[ovs-discuss] null ptr exception in ovs_vport_get_stats+0x6a/0x130 [openvswitch]
ffernand at redhat.com
Tue Jan 5 04:30:01 UTC 2016
On Mon, Jan 4, 2016 at 8:47 PM, Jesse Gross <jesse at kernel.org> wrote:
> On Mon, Jan 4, 2016 at 1:41 PM, Flavio Fernandes <ffernand at redhat.com>
> > So, I'm a happy camper, but can't help but worry a little about the
> > fragility of the
> > system when one attempts to use a port type internal 'directly' as
> > The fix
> > I have in mind is relatively simple: add a check in
> > to gracefully handle cases when ovs_internal_dev_get_vport returns null.
> > simple?
> I don't think that the problem is simply that we are returning NULL
> from ovs_internal_dev_get_vport(). ovs_internal_dev_get_vport() should
> never return NULL to internal_dev_get_stats() because it is checking
> whether the device has a ops structure that is equal to the one that
> leads to internal_dev_get_stats(). And in fact, if you look at the
> full stack trace, the address being dereferenced is 0x0000000000000060
> rather than 0x0 from a real NULL.
ack. If ovs_internal_dev_get_vport
<http://lxr.oss.org.cn/ident?i=ovs_internal_dev_get_vport>() is not
returning NULL then this is
not as simple as what I was interpreting. My thinking was that 0x60 is the
from line 306 in
but you may be right in that if vport was not NULL, then this is an issue in
what ovs_internal_dev_get_vport() is returning.
> This looks like something is overwriting the vport pointer in the
> device structure. If you follow where this is coming from you'll wind
> up at ovs_netdev_get_vport() which is a maze of twisty conditions that
> depend on what kernel version you are using. Particularly on the RHEL
> kernels (which based on your email address I'm guessing you're using),
> the pointer is stashed in a variety of places. My guess is that these
> are not entirely safe in some conditions - likely related to tap
> devices based on your other description. I think the best path forward
> is to try to see which of the conditions your kernel version falls
> into and try to see what might be stomping on the pointer.
I see. So it could be I'm looking at the wrong source code. I am
using Centos 7.2 kernel (3.10.0-327.3.1.el7.x86_64 x86_64); I will
find out more about how that differs from upstream kernel.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the discuss