[ovs-discuss] null ptr exception in ovs_vport_get_stats+0x6a/0x130 [openvswitch]

Jesse Gross jesse at kernel.org
Tue Jan 5 01:47:54 UTC 2016


On Mon, Jan 4, 2016 at 1:41 PM, Flavio Fernandes <ffernand at redhat.com> wrote:
> So, I'm a happy camper, but can't help but worry a little about the
> fragility of the
> system when one attempts to use a port type internal 'directly' as bridged.
> The fix
> I have in mind is relatively simple:  add a check in  internal_dev_get_stats
> to gracefully handle cases when ovs_internal_dev_get_vport returns null. Too
> simple?

I don't think that the problem is simply that we are returning NULL
from ovs_internal_dev_get_vport(). ovs_internal_dev_get_vport() should
never return NULL to internal_dev_get_stats() because it is checking
whether the device has a ops structure that is equal to the one that
leads to internal_dev_get_stats(). And in fact, if you look at the
full stack trace, the address being dereferenced is 0x0000000000000060
rather than 0x0 from a real NULL.

This looks like something is overwriting the vport pointer in the
device structure. If you follow where this is coming from you'll wind
up at ovs_netdev_get_vport() which is a maze of twisty conditions that
depend on what kernel version you are using. Particularly on the RHEL
kernels (which based on your email address I'm guessing you're using),
the pointer is stashed in a variety of places. My guess is that these
are not entirely safe in some conditions - likely related to tap
devices based on your other description. I think the best path forward
is to try to see which of the conditions your kernel version falls
into and try to see what might be stomping on the pointer.



More information about the discuss mailing list