[ovs-dev] ovs-vswitch kernel panic randomly started after 400+ days uptime

Uri Foox uri at zoey.com
Fri Jan 6 19:47:45 UTC 2017


Hey Joe,

I do agree that the patches for the Linux Kernel were not 1:1 with what our
stack trace showed but it was the only thing we remotely found that
explained our issue. Granted, after upgrading the kernel it was clear that
it fixed nothing - so, back to the drawing board...

Given your initial comment of something above the stack most likely causing
the issue we went through our network switches and took the steps in
disconnecting one of the network interfaces on each of the computing nodes
that communicate to our Juniper Switch which routes internet traffic.
Looking at the Juniper Switch we see a lot of errors about interfaces
flapping on/off although the timing of them does not correlate exactly with
the timing of the crashes (they are plus/minus a few minutes before/after
the crash) but we do see that these errors appear to begin on the same
day/time that we had our first kernel panic and have continued. As soon as
we disconnected the network interface the Juniper stopped logging any error
messages and we have not experienced a kernel panic in nearly six hours
whereas before it was happening as frequently as every two hours. Won't
declare victory yet but it's the first time in a couple of weeks we've had
stability.

Here is a sample of the error message in the Juniper log if it tells you
anything:

Jan  2 00:34:23  pod2-core dfwd[1114]: CH_NET_SERV_KNOB_STATE read failed
(rtslib err 2 - No such file or directory). Setting chassis state to NORMAL
(All FPC) and retry in the idle phase (59 retries)
Jan  2 00:34:48  pod2-core mib2d[1101]: SNMP_TRAP_LINK_DOWN: ifIndex 567,
ifAdminStatus up(1), ifOperStatus down(2), ifName ge-0/0/30
Jan  2 00:35:06  pod2-core mib2d[1101]: SNMP_TRAP_LINK_DOWN: ifIndex 569,
ifAdminStatus up(1), ifOperStatus down(2), ifName ge-0/0/31
Jan  2 00:36:06  pod2-core mib2d[1101]: SNMP_TRAP_LINK_DOWN: ifIndex 569,
ifAdminStatus up(1), ifOperStatus down(2), ifName ge-0/0/31
Jan  2 00:39:06  pod2-core mib2d[1101]: SNMP_TRAP_LINK_DOWN: ifIndex 569,
ifAdminStatus up(1), ifOperStatus down(2), ifName ge-0/0/31
Jan  2 00:44:33  pod2-core mib2d[1101]: SNMP_TRAP_LINK_DOWN: ifIndex 567,
ifAdminStatus up(1), ifOperStatus down(2), ifName ge-0/0/30

Where the interface 0/0/30 can be replaced with any interface that we have
plugged into our computing notes - they all showed errors.

I suspect that your analysis is somewhat accurate and essentially this
switch suffered some sort of failure that has manifested itself in an
extremely odd way with sending some rogue packets that either the kernel or
the version of OVS we are running cannot recover from.

root at node-2:~# ovs-vswitchd -V
ovs-vswitchd (Open vSwitch) 2.0.2
Compiled Nov 28 2014 21:37:19
OpenFlow versions 0x1:0x1

I figured I would follow up with what we did to "solve" the issue. We're
not really sure whether we should reboot or RMA the switch. For now if the
above gives Pravin or you any more insights please do share.

As a side note, I have to say I am extremely thankful for the replies to
this thread. I figured posting something would have a low chance of getting
any attention but your confirmation of what I was able to piece together
gave us the confidence to move in a direction that hopefully brings back
stability.

Thanks,
Uri



On Fri, Jan 6, 2017 at 2:27 PM, Joe Stringer <joe at ovn.org> wrote:

>
>
> On 5 January 2017 at 19:24, Uri Foox <uri at zoey.com> wrote:
>
>> Hey Joe,
>>
>> Thank you so much for responding! After 10 days of trying to figure this
>> out I'm at a loss.
>>
>> root at node-8:~# modinfo openvswitch
>> filename:       /lib/modules/3.13.0-106-generic/kernel/net/openvswitch/
>> openvswitch.ko
>> license:        GPL
>> description:    Open vSwitch switching datapath
>> srcversion:     94294A72258BA583D666607
>> depends:        libcrc32c,vxlan,gre
>> intree:         Y
>>
>
> ^ intree - that is, the version that comes with this kernel.
>
>
>> vermagic:       3.13.0-106-generic SMP mod_unload modversions
>>
>>
>> Everything you've mentioned is what I've understood so far including the
>> line of code that's triggered. That is what led me to upgrade the kernel to
>> 3.13.0-106 because it claims that the CHECKSUM problems are fixed which I
>> thought this might be related, guess not.
>>
>
> I forgot to actually look through those before, but the call chain looks a
> bit different there so I thought it may be a different issue altogether.
>
>
>> You're saying that skb_headlen is too short for the ethernet header. Do
>> you know what would cause this? This hardware configuration has been
>> running for 400+ days of uptime with no errors or problems and this
>> suddenly started to happen and no matter how many time we reboot things it
>> doesn't go away.  I assume given your interpretation we should try to
>> restart the switches connected to the servers. Is there any way to log what
>> packet is causing this issue? Perhaps that would provide more insight?
>>
>
> One thing is that it depends on the packets and how they arrive. I'm not
> too familiar with this code, but I could imagine a situation where the
> IP+GRE packet gets fragmented, causing a single inner frame to be split
> across muliple GRE packets. Then, when Linux receives the two separate
> packets, there would be some point in the stack responsible for stitching
> these packets back together; but it may not put them into a single
> contiguous buffer. If this is subsequently decapped for local delivery of
> the inner frame, then perhaps there is less than an ethernet header's worth
> of packet in the first of these buffers. It seems unlikely that packets
> would be deliberately fragmented like this, but if anyone had access to
> your underlying network then they could throw any kind of packet they want
> to your server.
>
> There may be another, more likely, explanation - CC Pravin in case he has
> any ideas.
>
>
>> As far as 4.4/newer kernel - I wish. I tried to go that far up but Ubuntu
>> wouldn't even boot. The best I could do is 3.13.0-106. I'll try to report
>> it over there as well.
>>
>
> That's too bad.
>
> FWIW, I see a check for pskb_may_pull() in the outer gre_rcv function,
> which would check on the whole GRE packet.. this is then passed to
> gre_cisco_rcv() which does the decap and calls through to the OVS gre_rcv()
> function. At a glance, following the OVS' gre_rcv() I didn't see another
> psukb_may_pull() check for the inner packet. By the time it gets to
> ovs_flow_extract(), there's an expectation that this call was made but I'm
> really not sure who was supposed to make that check. Also, it should be
> ETH_HLEN, which is 14, not 12..
>
> Outer gre_rcv():
> http://lxr.free-electrons.com/source/net/ipv4/gre_demux.c?v=3.13#L270
>
> Inner gre_rcv():
> http://lxr.free-electrons.com/source/net/openvswitch/vport-
> gre.c?v=3.13#L92
>



-- 
Uri Foox | Zoey | Founder
http://www.zoey.com


More information about the dev mailing list