[ovs-discuss] Restarting network kills ovs-vswitchd (and network)... ?

Flavio Leitner fbl at sysclose.org
Fri May 17 10:13:12 UTC 2019


On Fri, May 17, 2019 at 09:45:36AM +0000, SCHAER Frederic wrote:
> Hi
> 
> Thank you for your answer.
> I actually forgot to say I already had checked the syslogs, the ovs and the network journals/logs... no coredump reference anywhere
> 
> For me a core dump or a crash would not return an exit code of 0, which seems to be what system saw :/
> I even straced -f the ovs-vswitchd process  and made it stop/crash with an ifdown/ifup, but looks to me this is an exit ...
> 
> (I can retry and save the strace output if necessary or usefull)
> End of strace output was (I see "brflat" in the long strings, which is the bridge hosting em1) :

Is this a strace of ovs-vswitchd or ovs-vsctl? Because SIGABRT
happens when the ovs-vsctl is stuck and the alarm fires. Then
this would just point that ovs-vswitchd is not running.

If ovs-vswitchd is not crashing, something is stopping the service
and maybe running sh -x /sbin/ifdown <iface>  helps to shed a light?

Or add 'set -x' to /etc/sysconfig/network-scripts/if*-ovs scripts.

fbl

> 
> [pid 175068] sendmsg(18, {msg_name(0)=NULL, msg_iov(1)=[{",\0\0\0\22\0\1\0\223\6\0\0!\353\377\377\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\v\0\3\0brflat\0\0", 44}], msg_
> controllen=0, msg_flags=0}, 0 <unfinished ...>
> [pid 175233] <... futex resumed> )      = 0
> [pid 175068] <... sendmsg resumed> )    = 44
> [pid 175068] recvmsg(18,  <unfinished ...>
> [pid 175234] futex(0x55b8aaa19128, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
> ...skipping...
> [pid 175233] futex(0x7f7f226b9140, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 175068] <... sendmsg resumed> )    = 44
> [pid 175234] <... futex resumed> )      = 0
> [pid 175233] <... futex resumed> )      = -1 EAGAIN (Resource temporarily unavailable)
> [pid 175068] recvmsg(18,  <unfinished ...>
> [pid 175234] futex(0x7f7f226b9140, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid 175233] futex(0x7f7f226b9140, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
> [pid 175068] <... recvmsg resumed> {msg_name(0)=NULL, msg_iov(2)=[{"\360\4\0\0\20\0\0\0\224\6\0\0!\353\377\377\0\0\1\0\36\0\0\0C\20\1\0\0\0\0\0\v\0\3\0brflat\0\0\10\0\r\0\350\3\0\0\5\0\20\0\0\0\0\0\5\0\21\0\0\0\0\0\10\0\4\0\334\5\0\0\10\0\33\0\0\0\0\0\10\0\36\0\1\0\0\0\10\0\37\0\1\0\0\0\10\0(\0\377\377\0\0\10\0)\0\0\0\1\0\10\0 \0\1\0\0\0\5\0!\0\1\0\0\0\f\0\6\0noqueue\0\10\0#\0\0\0\0\0\5\0'\0\0\0\0\0$\0\16\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0H\234\377\377\n\0\1\0\276J\307\307\207I\0\0\n\0\2\0\377\377\377\377\377\377\0\0\304\0\27\0Y\22\5\0\0\0\0\0Uf\0\0\0\0\0\0^0j\1\0\0\0\0\372\371k\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0d\0\7\0Y\22\5\0Uf\0\0^0j\1\372\371k\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0y0\2\0\0\0\0\0\256\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\214\254\235\0\0\0\0\0 z\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\211\223\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0004\0\6\0\6\0\0\0\0\0\0\0r\5\0\0\0\0\0\0\0\0\0\0\0\0\0\0A\7\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\24\0\7\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5\0\10\0\0\0\0\0", 65536}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT) = 1264
> [pid 175234] <... futex resumed> )      = -1 EAGAIN (Resource temporarily unavailable)
> [pid 175233] <... futex resumed> )      = 0
> [pid 175234] futex(0x7f7f226b9140, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
> [pid 175068] rt_sigprocmask(SIG_UNBLOCK, [ABRT],  <unfinished ...>
> [pid 175234] <... futex resumed> )      = 0
> [pid 175068] <... rt_sigprocmask resumed> NULL, 8) = 0
> [pid 175068] tgkill(175068, 175068, SIGABRT <unfinished ...>
> [pid 175233] futex(0x7f7f226b9140, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
> [pid 175068] <... tgkill resumed> )     = 0
> [pid 175233] <... futex resumed> )      = 0
> [pid 175068] --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=175068, si_uid=393} ---
> [pid 189862] +++ killed by SIGABRT +++
> [pid 175237] +++ killed by SIGABRT +++
> [pid 175236] +++ killed by SIGABRT +++
> [pid 175235] +++ killed by SIGABRT +++
> [pid 175234] +++ killed by SIGABRT +++
> [pid 175233] +++ killed by SIGABRT +++
> [pid 175232] +++ killed by SIGABRT +++
> [pid 175231] +++ killed by SIGABRT +++
> [pid 175230] +++ killed by SIGABRT +++
> [pid 175229] +++ killed by SIGABRT +++
> [pid 175228] +++ killed by SIGABRT +++
> [pid 175227] +++ killed by SIGABRT +++
> [pid 175226] +++ killed by SIGABRT +++
> [pid 175225] +++ killed by SIGABRT +++
> [pid 175224] +++ killed by SIGABRT +++
> [pid 175223] +++ killed by SIGABRT +++
> [pid 175222] +++ killed by SIGABRT +++
> [pid 175085] +++ killed by SIGABRT +++
> +++ killed by SIGABRT +++
> 
> 
> Regards
> 
> > -----Message d'origine-----
> > De : Flavio Leitner <fbl at sysclose.org>
> > Envoyé : vendredi 17 mai 2019 10:29
> > À : SCHAER Frederic <frederic.schaer at cea.fr>
> > Cc : bugs at openvswitch.org
> > Objet : Re: [ovs-discuss] Restarting network kills ovs-vswitchd (and
> > network)... ?
> > 
> > On Thu, May 16, 2019 at 09:34:28AM +0000, SCHAER Frederic wrote:
> > > Hi,
> > > I'm facing an issue with openvswitch, which I think is new (not even sure).
> > > here is the description :
> > >
> > > * What you did that make the problem appear.
> > >
> > > I am configuring openstack (compute, network) nodes using OVS networks
> > for main interfaces and RHEL network scripts, basically using openvswitch to
> > create bridges, set the bridges IPs, and include the real Ethernet devices in
> > the bridges.
> > > On a compute machine (not in production, so not using 3 or more
> > interfaces), I have for instance brflat -> em1.
> > > Brflat has multiple IPs defined using IPADDR1, IPADDR2, etc..
> > > Now : at boot, machine has network. Bur if I ever change anything in
> > network scripts and issue either a network restart, an ifup or an ifdown :
> > network breaks and connectivity is lost.
> > >
> > > Also, on network restarts, I'm getting these logs in the network journal :
> > > May 16 10:26:41 cloud1 ovs-vsctl[1766678]: ovs|00001|vsctl|INFO|Called
> > > as ovs-vsctl -t 10 -- --may-exist add-br brflat May 16 10:26:51 cloud1
> > > ovs-vsctl[1766678]: ovs|00002|fatal_signal|WARN|terminating with
> > > signal 14 (Alarm clock) May 16 10:26:51 cloud1 network[1766482]:
> > > Bringing up interface brflat:
> > > 2019-05-16T08:26:51Z|00002|fatal_signal|WARN|terminating with signal
> > > 14 (Alarm clock)
> > >
> > > * What you expected to happen.
> > >
> > > On network restart... to get back a working network. Not be forced to log in
> > using ipmi console and fix network manually.
> > >
> > > * What actually happened.
> > >
> > > What actually happens is that on ifup/ifdown/network restart, the ovs-
> > vswitchd daemon stops working. According to systemctl, it is actually exiting
> > with code 0.
> > > If I do a ifdown on one interface, then ovs-vswitchd is down.
> > > After ovs-vswitchd restart, I then can ifup that interface : network is still
> > down (no ping, nothing).
> > > Ovs-vswitchd is again dead/stopped/exited 0.
> > > Then : manually starting ovs-vswitchd restores connectivity.
> > >
> > > Please also include the following information:
> > > * The Open vSwitch version number (as output by ovs-vswitchd --version).
> > > ovs-vswitchd (Open vSwitch) 2.10.1
> > 
> > Sounds like OVS is crashing. Please check 'dmesg' if you see segmentation
> > fault messages in there. Or the journal logs.
> > Or the systemd service status.
> > 
> > If it is, then the next step is to enable coredumps to grab one core. Then
> > install openvswitch-debuginfo package to see the stack trace.
> > 
> > You're right that ifdown should not put the service down.
> > 
> > fbl
> _______________________________________________
> discuss mailing list
> discuss at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


More information about the discuss mailing list