[ovs-dev] ovs-vswitch kernel panic randomly started after 400+ days uptime

Pravin Shelar pshelar at ovn.org
Sat Jan 7 18:01:56 UTC 2017


Thanks for all investigation.

On Sat, Jan 7, 2017 at 12:57 AM, Joe Stringer <joe at ovn.org> wrote:
>
>
> On 5 January 2017 at 19:24, Uri Foox <uri at zoey.com> wrote:
>>
>> Hey Joe,
>>
>> Thank you so much for responding! After 10 days of trying to figure this
>> out I'm at a loss.
>>
>> root at node-8:~# modinfo openvswitch
>> filename:
>> /lib/modules/3.13.0-106-generic/kernel/net/openvswitch/openvswitch.ko
>> license:        GPL
>> description:    Open vSwitch switching datapath
>> srcversion:     94294A72258BA583D666607
>> depends:        libcrc32c,vxlan,gre
>> intree:         Y
>
>
> ^ intree - that is, the version that comes with this kernel.
>
>>
>> vermagic:       3.13.0-106-generic SMP mod_unload modversions
>>
>>
>> Everything you've mentioned is what I've understood so far including the
>> line of code that's triggered. That is what led me to upgrade the kernel to
>> 3.13.0-106 because it claims that the CHECKSUM problems are fixed which I
>> thought this might be related, guess not.
>
>
> I forgot to actually look through those before, but the call chain looks a
> bit different there so I thought it may be a different issue altogether.
>
>>
>> You're saying that skb_headlen is too short for the ethernet header. Do
>> you know what would cause this? This hardware configuration has been running
>> for 400+ days of uptime with no errors or problems and this suddenly started
>> to happen and no matter how many time we reboot things it doesn't go away.
>> I assume given your interpretation we should try to restart the switches
>> connected to the servers. Is there any way to log what packet is causing
>> this issue? Perhaps that would provide more insight?
>
>
> One thing is that it depends on the packets and how they arrive. I'm not too
> familiar with this code, but I could imagine a situation where the IP+GRE
> packet gets fragmented, causing a single inner frame to be split across
> muliple GRE packets. Then, when Linux receives the two separate packets,
> there would be some point in the stack responsible for stitching these
> packets back together; but it may not put them into a single contiguous
> buffer. If this is subsequently decapped for local delivery of the inner
> frame, then perhaps there is less than an ethernet header's worth of packet
> in the first of these buffers. It seems unlikely that packets would be
> deliberately fragmented like this, but if anyone had access to your
> underlying network then they could throw any kind of packet they want to
> your server.
>
> There may be another, more likely, explanation - CC Pravin in case he has
> any ideas.
>
>>
>> As far as 4.4/newer kernel - I wish. I tried to go that far up but Ubuntu
>> wouldn't even boot. The best I could do is 3.13.0-106. I'll try to report it
>> over there as well.
>
>
> That's too bad.
>
> FWIW, I see a check for pskb_may_pull() in the outer gre_rcv function, which
> would check on the whole GRE packet.. this is then passed to gre_cisco_rcv()
> which does the decap and calls through to the OVS gre_rcv() function. At a
> glance, following the OVS' gre_rcv() I didn't see another psukb_may_pull()
> check for the inner packet. By the time it gets to ovs_flow_extract(),
> there's an expectation that this call was made but I'm really not sure who
> was supposed to make that check. Also, it should be ETH_HLEN, which is 14,
> not 12..
>
Right. OVS do expect the-header already in skb linear data. It is done
in iptunnel_pull_header() for tunnel packets. This function is called
for all packets received in GRE module.

http://lxr.free-electrons.com/source/net/ipv4/ip_tunnel_core.c?v=3.13#L96

But the skb eth-header is only pulled for GRE-TAP packets not for
IP-GRE. The change in network could have introduced these IP-GRE
packets that caused the crash.

This bug does not exist in out of tree kernel module that come with
OVS 2.5 and newer. So upgrading OVS kernel module to 2.5 should solve
the problem.

I will sent out a patch for older OVS kernel module.

> Outer gre_rcv():
> http://lxr.free-electrons.com/source/net/ipv4/gre_demux.c?v=3.13#L270
>
> Inner gre_rcv():
> http://lxr.free-electrons.com/source/net/openvswitch/vport-gre.c?v=3.13#L92


More information about the dev mailing list