[ovs-discuss] Crash in openvswitch 2.0.2

Joe Stringer joestringer at nicira.com
Fri Apr 3 20:47:10 UTC 2015


I guess James missed the email :-) In January he posted some builds for
further investigation:

https://www.mail-archive.com/discuss@openvswitch.org/msg12049.html

It sounds like it's the exact same code, but with less compiler
optimizations, to see whether this fixed the issue. This might provide a
bit more information.

On 31 March 2015 at 14:43, Marco Kuendig <marco at nuvula.ch> wrote:

> ok, I have tested it and I can reproduce it.
>
> For testing to reproduce that core:
>
> I have a very small VM on KVM, I start and shutdown that VM every 30
> seconds. With that I get:
>
> Mar 31 16:10:13 nuv-vir-kvm-server-1 ovs-vswitchd:
> ovs|00010|daemon(monitor)|ERR|9 crashes: pid 10918 died, killed
> (Segmentation fault), core dumped, restarting
> Mar 31 16:29:20 nuv-vir-kvm-server-1 ovs-vswitchd:
> ovs|00011|daemon(monitor)|ERR|10 crashes: pid 22038 died, killed
> (Segmentation fault), core dumped, restarting
> Mar 31 23:33:04 nuv-vir-kvm-server-1 ovs-vswitchd:
> ovs|00012|daemon(monitor)|ERR|11 crashes: pid 25067 died, killed
> (Segmentation fault), core dumped, restarting
> Mar 31 23:34:43 nuv-vir-kvm-server-1 ovs-vswitchd:
> ovs|00013|daemon(monitor)|ERR|12 crashes: pid 30254 died, killed
> (Segmentation fault), core dumped, restarting
> Mar 31 23:38:49 nuv-vir-kvm-server-1 ovs-vswitchd:
> ovs|00014|daemon(monitor)|ERR|13 crashes: pid 31612 died, killed
> (Segmentation fault), core dumped, restarting
>
> The ovs-vswitchd does not restart every cycle but it cores quite often as
> you can see in the log above.
>
>
>
> [image: Nuvula AG] <http://www.nuvula.ch/>
>
> Marco Kuendig / CEO / Founder
> marco at nuvula.ch / +41 78 751 99 71
>
> Marco's Google Hangout
> <https://plus.google.com/hangouts/_/nuvula.ch/marco>
>
> Nuvula AG - Hybrid Clouds
> Weierbachstrasse 7b 8193 Eglisau Switzerland
> http://www.nuvula.ch
>
> On 31 Mar 2015, at 23:15, Marco Kuendig <marco at nuvula.ch> wrote:
>
> I think I can reproduce that bug. Not 100% sure but today I think I had it
> several times.
>
> Interesting is that it happens when a VM on kvm boots. Certainly strange
> is that several hosts crash simultaneously.
>
> My setup is a lab setup and can be accessed if it helps in troubleshooting.
>
>
> [image: Nuvula AG] <http://www.nuvula.ch/>
>
> Marco Kuendig / CEO / Founder
> marco at nuvula.ch / +41 78 751 99 71
>
> Marco's Google Hangout
> <https://plus.google.com/hangouts/_/nuvula.ch/marco>
>
> Nuvula AG - Hybrid Clouds
> Weierbachstrasse 7b 8193 Eglisau Switzerland
> http://www.nuvula.ch
>
> On 31 Mar 2015, at 23:12, Joe Stringer <joestringer at nicira.com> wrote:
>
> James, I believe you were involved last time this bug came up, I wonder if
> you ever got to the bottom of this?
>
> ---
>
> This looks the same as a bug reported in October:
>
> http://openvswitch.org/pipermail/discuss/2014-October/015429.html
>
> Ben's assessment was that there is no logical issue in the code, so
> perhaps there was weird code generation caused by GCC.
>
>
> On 31 March 2015 at 13:05, Marco Kuendig <marco at nuvula.ch> wrote:
>
>> Reading symbols from /usr/sbin/ovs-vswitchd...Reading symbols from
>> /usr/lib/debug//usr/sbin/ovs-vswitchd...done.
>> done.
>> [New LWP 32725]
>> [New LWP 32732]
>> [New LWP 32726]
>> [New LWP 32730]
>> [New LWP 32727]
>> [New LWP 32728]
>> [New LWP 32729]
>> [New LWP 32731]
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
>> Core was generated by `ovs-vswitchd unix:/var/run/openvswitch/db.sock
>> -vconsole:emer -vsyslog:err -vfi'.
>> Program terminated with signal SIGSEGV, Segmentation fault.
>> #0  nl_attr_get_size (nla=nla at entry=0x0) at ../lib/netlink.c:506
>> 506 ../lib/netlink.c: No such file or directory.
>> (gdb) bt
>> #0  nl_attr_get_size (nla=nla at entry=0x0) at ../lib/netlink.c:506
>> #1  0x0000000000460473 in format_generic_odp_key (a=a at entry=0x0,
>> ds=ds at entry=0x7fff0408f3b0) at ../lib/odp-util.c:767
>> #2  0x0000000000460cd2 in format_odp_key_attr (a=a at entry=0xc485a4,
>> ma=ma at entry=0x0, ds=ds at entry=0x7fff0408f3b0, verbose=verbose at entry=true)
>>     at ../lib/odp-util.c:1332
>> #3  0x00000000004609d7 in odp_flow_format (key=<optimized out>,
>> key_len=40, mask=0x0, mask_len=0, ds=0x7fff0408f3b0, verbose=true) at
>> ../lib/odp-util.c:1402
>> #4  0x0000000000460fc4 in format_odp_key_attr (a=a at entry=0xc48580,
>> ma=ma at entry=0x0, ds=ds at entry=0x7fff0408f3b0, verbose=verbose at entry=true)
>> at ../lib/odp-util.c:987
>> #5  0x00000000004609d7 in odp_flow_format (key=key at entry=0xc48520,
>> key_len=key_len at entry=140, mask=mask at entry=0x0, mask_len=mask_len at entry
>> =0,
>>     ds=ds at entry=0x7fff0408f3b0, verbose=verbose at entry=true) at
>> ../lib/odp-util.c:1402
>> #6  0x00000000004450f3 in log_flow_message (error=error at entry=2,
>> operation=operation at entry=0x4d0e73 "flow_del", key=0xc48520,
>> key_len=140, mask=mask at entry=0x0,
>>     mask_len=mask_len at entry=0, stats=0x0, actions=actions at entry=0x0,
>> actions_len=actions_len at entry=0, dpif=<optimized out>) at
>> ../lib/dpif.c:1354
>> #7  0x00000000004453c9 in log_flow_del_message (dpif=dpif at entry=0xc489c0,
>> del=del at entry=0x7fff0408f460, error=error at entry=2) at ../lib/dpif.c:1397
>> #8  0x0000000000445433 in log_flow_del_message (error=2,
>> del=0x7fff0408f460, dpif=0xc489c0) at ../lib/dpif.c:1396
>> #9  dpif_flow_del__ (dpif=0xc489c0, del=del at entry=0x7fff0408f460) at
>> ../lib/dpif.c:945
>> #10 0x00000000004455ca in dpif_flow_del (dpif=<optimized out>,
>> key=<optimized out>, key_len=<optimized out>, stats=stats at entry=0x7fff0408f490)
>> at ../lib/dpif.c:965
>> #11 0x000000000041b423 in subfacet_uninstall (subfacet=0xbd76a0) at
>> ../ofproto/ofproto-dpif.c:4686
>> #12 0x0000000000420f18 in facet_remove (facet=facet at entry=0xbd72a0) at
>> ../ofproto/ofproto-dpif.c:4014
>> #13 0x0000000000422f52 in facet_revalidate (facet=facet at entry=0xbd72a0)
>> at ../ofproto/ofproto-dpif.c:4321
>> #14 0x0000000000424b5a in facet_lookup_valid (flow=0x7f3e700020a8,
>> ofproto=0xc52600) at ../ofproto/ofproto-dpif.c:4203
>> #15 handle_flow_miss (n_ops=<synthetic pointer>, ops=0x7fff0408fb60,
>> miss=0x7f3e70002090) at ../ofproto/ofproto-dpif.c:3339
>> #16 handle_flow_misses (fmb=fmb at entry=0x7f3e700008e0, backer=<optimized
>> out>) at ../ofproto/ofproto-dpif.c:3410
>> #17 0x0000000000425196 in handle_upcalls (backer=<optimized out>) at
>> ../ofproto/ofproto-dpif.c:3565
>> #18 dpif_backer_run_fast (backer=<optimized out>) at
>> ../ofproto/ofproto-dpif.c:1007
>> #19 type_run_fast (type=<optimized out>) at ../ofproto/ofproto-dpif.c:1024
>> #20 0x00000000004122cf in ofproto_type_run_fast (datapath_type=<optimized
>> out>, datapath_type at entry=0xc4ef20 "system") at ../ofproto/ofproto.c:1326
>> #21 0x00000000004081a5 in bridge_run_fast () at ../vswitchd/bridge.c:2318
>> #22 0x00000000004059c5 in main (argc=<optimized out>, argv=<optimized
>> out>) at ../vswitchd/ovs-vswitchd.c:119
>> (gdb)
>>
>>
>>
>> [image: Nuvula AG] <http://www.nuvula.ch/>
>>
>> Marco Kuendig / CEO / Founder
>> marco at nuvula.ch / +41 78 751 99 71
>>
>> Marco's Google Hangout
>> <https://plus.google.com/hangouts/_/nuvula.ch/marco>
>>
>> Nuvula AG - Hybrid Clouds
>> Weierbachstrasse 7b 8193 Eglisau Switzerland
>> http://www.nuvula.ch
>>
>> On 31 Mar 2015, at 22:04, Joe Stringer <joestringer at nicira.com> wrote:
>>
>> Great, we're moving. Looks like the gdb version of this is working below.
>> Do you get the gdb prompt from there? the command 'bt' should provide the
>> backtrace we're after.
>>
>> On 31 March 2015 at 12:52, Marco Kuendig <marco at nuvula.ch> wrote:
>>
>>> that brought us a step forward. thank Sab.
>>>
>>> Important to know is:
>>>
>>> I got 4 kvm servers, meshed with openvswitch. I use vxlan for tunnelling.
>>>
>>> Sometimes when I restart a domain in kvm, 3 or 4 hosts crash at the same
>>> time.
>>>
>>> I have STP enabled to avoid loops.
>>>
>>>
>>> this is the output now:
>>>
>>> root at nuv-vir-kvm-server-1 ~ # gdb /usr/sbin/ovs-vswitchd
>>> /var/crash/ovs/CoreDump
>>> GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
>>> Copyright (C) 2014 Free Software Foundation, Inc.
>>> License GPLv3+: GNU GPL version 3 or later <
>>> http://gnu.org/licenses/gpl.html>
>>> This is free software: you are free to change and redistribute it.
>>> There is NO WARRANTY, to the extent permitted by law.  Type "show
>>> copying"
>>> and "show warranty" for details.
>>> This GDB was configured as "x86_64-linux-gnu".
>>> Type "show configuration" for configuration details.
>>> For bug reporting instructions, please see:
>>> <http://www.gnu.org/software/gdb/bugs/>.
>>> Find the GDB manual and other documentation resources online at:
>>> <http://www.gnu.org/software/gdb/documentation/>.
>>> For help, type "help".
>>> Type "apropos word" to search for commands related to "word"...
>>> Reading symbols from /usr/sbin/ovs-vswitchd...Reading symbols from
>>> /usr/lib/debug//usr/sbin/ovs-vswitchd...done.
>>> done.
>>> [New LWP 32725]
>>> [New LWP 32732]
>>> [New LWP 32726]
>>> [New LWP 32730]
>>> [New LWP 32727]
>>> [New LWP 32728]
>>> [New LWP 32729]
>>> [New LWP 32731]
>>> [Thread debugging using libthread_db enabled]
>>> Using host libthread_db library
>>> "/lib/x86_64-linux-gnu/libthread_db.so.1".
>>> Core was generated by `ovs-vswitchd unix:/var/run/openvswitch/db.sock
>>> -vconsole:emer -vsyslog:err -vfi'.
>>> Program terminated with signal SIGSEGV, Segmentation fault.
>>> #0  nl_attr_get_size (nla=nla at entry=0x0) at ../lib/netlink.c:506
>>> 506 ../lib/netlink.c: No such file or directory.
>>>
>>>
>>> root at nuv-vir-kvm-server-1 ~ # crash /usr/sbin/ovs-vswitchd
>>> /var/crash/ovs/CoreDump
>>>
>>> crash 7.0.3
>>> Copyright (C) 2002-2013  Red Hat, Inc.
>>> Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
>>> Copyright (C) 1999-2006  Hewlett-Packard Co
>>> Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
>>> Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
>>> Copyright (C) 2005, 2011  NEC Corporation
>>> Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
>>> Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
>>> This program is free software, covered by the GNU General Public License,
>>> and you are welcome to change it and/or distribute copies of it under
>>> certain conditions.  Enter "help copying" to see the conditions.
>>> This program has absolutely no warranty.  Enter "help warranty" for
>>> details.
>>>
>>>
>>> crash: /usr/sbin/ovs-vswitchd: no debugging data available
>>>
>>> root at nuv-vir-kvm-server-1 ~ # ll /var/crash/ovs/
>>> Architecture         Date                 ExecutableTimestamp  ProcCwd
>>>            ProcStatus           UserGroups
>>> CoreDump             DistroRelease        ProblemType
>>>  ProcEnviron          Signal
>>> CrashCounter         ExecutablePath       ProcCmdline          ProcMaps
>>>             Uname
>>>
>>>
>>> [image: Nuvula AG] <http://www.nuvula.ch/>
>>>
>>> Marco Kuendig / CEO / Founder
>>> marco at nuvula.ch / +41 78 751 99 71
>>>
>>> Marco's Google Hangout
>>> <https://plus.google.com/hangouts/_/nuvula.ch/marco>
>>>
>>> Nuvula AG - Hybrid Clouds
>>> Weierbachstrasse 7b 8193 Eglisau Switzerland
>>> http://www.nuvula.ch
>>>
>>> On 31 Mar 2015, at 21:45, Sabyasachi Sengupta <
>>> Sabyasachi.Sengupta at alcatel-lucent.com> wrote:
>>>
>>>
>>> Typically Ubuntu does not unpack the crashes. Can you try apport-unpack?
>>> # apport-unpack /var/crash/<name> <crash-dir>
>>>
>>> On Tue, 31 Mar 2015, Marco Kuendig wrote:
>>>
>>> thanks Joe and Ben
>>> have done:
>>> 1. installed dgb symbols for kernel....doesn't help
>>> 2. installed debug symbols for openvswitch
>>> no change, gdb and crash still don't work for me. I'm not a dev, need
>>> more help to get that backtrace done.
>>> here some output:
>>> root at nuv-vir-kvm-server-1 ~ # crash
>>>  /usr/lib/debug/boot/vmlinux-3.13.0-48-generic
>>> /var/crash/_usr_sbin_ovs-vswitchd.0.crash
>>> crash 7.0.3
>>> Copyright (C) 2002-2013  Red Hat, Inc.
>>> Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
>>> Copyright (C) 1999-2006  Hewlett-Packard Co
>>> Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
>>> Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
>>> Copyright (C) 2005, 2011  NEC Corporation
>>> Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
>>> Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
>>> This program is free software, covered by the GNU General Public License,
>>> and you are welcome to change it and/or distribute copies of it under
>>> certain conditions.  Enter "help copying" to see the conditions.
>>> This program has absolutely no warranty.  Enter "help warranty" for
>>> details.
>>> crash: /var/crash/_usr_sbin_ovs-vswitchd.0.crash: not a supported file
>>> format
>>> Usage:
>>>
>>>   crash [OPTION]... NAMELIST MEMORY-IMAGE  (dumpfile form)
>>>   crash [OPTION]... [NAMELIST]             (live system form)
>>> Enter "crash -h" for details.
>>> root at nuv-vir-kvm-server-1 ~ # gdb /usr/sbin/ovs-vswitchd
>>> /var/crash/_usr_sbin_ovs-vswitchd.0.crash
>>> GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
>>> Copyright (C) 2014 Free Software Foundation, Inc.
>>> License GPLv3+: GNU GPL version 3 or later
>>> <http://gnu.org/licenses/gpl.html>
>>> This is free software: you are free to change and redistribute it.
>>> There is NO WARRANTY, to the extent permitted by law.  Type "show
>>> copying"
>>> and "show warranty" for details.
>>> This GDB was configured as "x86_64-linux-gnu".
>>> Type "show configuration" for configuration details.
>>> For bug reporting instructions, please see:
>>> <http://www.gnu.org/software/gdb/bugs/>.
>>> Find the GDB manual and other documentation resources online at:
>>> <http://www.gnu.org/software/gdb/documentation/>.
>>> For help, type "help".
>>> Type "apropos word" to search for commands related to "word"...
>>> Reading symbols from /usr/sbin/ovs-vswitchd...Reading symbols from
>>> /usr/lib/debug//usr/sbin/ovs-vswitchd...done.
>>> done.
>>> "/var/crash/_usr_sbin_ovs-vswitchd.0.crash" is not a core dump: File
>>> format not recognized
>>> (gdb) q
>>> root at nuv-vir-kvm-server-1 ~ #
>>> Nuvula AG
>>> Marco Kuendig / CEO / Founder marco at nuvula.ch / +41 78 751 99 71
>>> Marco's Google Hangout
>>> Nuvula AG - Hybrid Clouds Weierbachstrasse 7b 8193 Eglisau Switzerland
>>> http://www.nuvula.ch
>>>
>>>      On 31 Mar 2015, at 19:00, Joe Stringer
>>>      <joestringer at nicira.com> wrote:
>>> For the 'File format not recognized' problem, you might have better
>>> luck with the 'crash' utility.
>>> $ crash <binary> <crashdump>
>>> On 31 March 2015 at 08:16, Marco Kuendig <marco at nuvula.ch> wrote:
>>>      Have tried this:
>>> http://openvswitch.org/pipermail/discuss/2015-February/016582.html
>>> this is the output, so doesn't seem to be correct:
>>> root at nuv-vir-kvm-server-2 ~ # gdb /usr/sbin/ovs-vswitchd
>>> /var/crash/_usr_sbin_ovs-vswitchd.0.crash
>>> GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
>>> Copyright (C) 2014 Free Software Foundation, Inc.
>>> License GPLv3+: GNU GPL version 3 or later
>>> <http://gnu.org/licenses/gpl.html>
>>> This is free software: you are free to change and
>>> redistribute it.
>>> There is NO WARRANTY, to the extent permitted by law.  Type
>>> "show copying"
>>> and "show warranty" for details.
>>> This GDB was configured as "x86_64-linux-gnu".
>>> Type "show configuration" for configuration details.
>>> For bug reporting instructions, please see:
>>> <http://www.gnu.org/software/gdb/bugs/>.
>>> Find the GDB manual and other documentation resources online
>>> at:
>>> <http://www.gnu.org/software/gdb/documentation/>.
>>> For help, type "help".
>>> Type "apropos word" to search for commands related to
>>> "word"...
>>> Reading symbols from /usr/sbin/ovs-vswitchd...(no debugging
>>> symbols found)...done.
>>> "/var/crash/_usr_sbin_ovs-vswitchd.0.crash" is not a core
>>> dump: File format not recognized
>>> (gdb) bt
>>> No stack.
>>> (gdb) quit
>>> any more hints please ?
>>> thanks
>>> marco
>>> Nuvula AG
>>> Marco Kuendig / CEO / Founder marco at nuvula.ch / +41 78 751 99 71
>>> Marco's Google Hangout
>>> Nuvula AG - Hybrid Clouds Weierbachstrasse 7b 8193 Eglisau Switzerland
>>> http://www.nuvula.ch
>>>
>>>      On 31 Mar 2015, at 17:00, Ben Pfaff
>>>      <blp at nicira.com> wrote:
>>> Can you get a backtrace for these?
>>> On Tue, Mar 31, 2015 at 7:09 AM, Marco Kuendig
>>> <marco at nuvula.ch> wrote:
>>>      Folks,
>>> any chance of having somebody look at these crash
>>> files ?
>>> I have several servers that are loosing network
>>> connectivity because of this.
>>> Downloads:
>>>
>>> https://drive.google.com/file/d/0Bx_w1Tf2B5VSRU9yUmRpTDJLVEU/view?usp=sharing
>>> Thanks for any hint or fix
>>> marco
>>> Nuvula AG
>>> Marco Kuendig / CEO / Founder marco at nuvula.ch / +41 78 751 99 71
>>> Marco's Google Hangout
>>> Nuvula AG - Hybrid Clouds Weierbachstrasse 7b 8193 Eglisau Switzerland
>>> http://www.nuvula.ch
>>> _______________________________________________
>>> discuss mailing list
>>> discuss at openvswitch.org
>>> http://openvswitch.org/mailman/listinfo/discuss
>>> --
>>> "I don't normally do acked-by's.  I think it's my way
>>> of avoiding
>>> getting blamed when it all blows up." Andrew Morton
>>> _______________________________________________
>>> discuss mailing list
>>> discuss at openvswitch.org
>>> http://openvswitch.org/mailman/listinfo/discuss
>>>
>>>
>>>
>>
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://openvswitch.org/pipermail/ovs-discuss/attachments/20150403/48aa6665/attachment-0002.html>


More information about the discuss mailing list