[ovs-discuss] systemd ovs-vswitchd starts too early

Flavio Leitner fbl at redhat.com
Mon Feb 1 18:56:31 UTC 2016


On Mon, 1 Feb 2016 07:54:46 -0800
Guru Shetty <guru at ovn.org> wrote:

> On 31 January 2016 at 14:47, Mark Mielke <mark.mielke at gmail.com>
> wrote:
> 
> > I joined this list recently, and encountered something very similar
> > to this user:
> >
> > On 8 January 2016 at 04:52, Benoît <benoitne at gmail.com
> > <http://openvswitch.org/mailman/listinfo/discuss>> wrote:  
> > >* I have an issue where ovs-vswitchd is starting too early.
> > *>* I got a persistent name for an interface (pnic_wwan) but it is
> > happening *>* after ovs-vswitchd starts so it makes an error as it
> > does'nt find the *>* interface name!  
> > *>>*     Bridge vswitch_wwan  
> > *>*         Port pnic_wwan
> > *>*             Interface pnic_wwan
> > *>*                 error: "could not open network device pnic_wwan
> > (No such *>* device)"*  
> >
> >
> >  
> I think you are describing multiple issues and I will try to pick
> only the first one to make it easy to carry on the discussion.
> 
> 
> > I am testing with Fedora 23. It seems that with openvswitch.service
> > enabled, openvswitch-nonetwork.service starts too early, before any
> > of the physical network interfaces have been detected.
> >  
> 
> I haven't kept myself upto date with recent changes in Fedora startup
> (so ccing fedora maintainer).
> When you say openvswitch starts before any physical network
> interfaces are detected, which of the following do you mean to say?
> 1. openvswitch starts even before kernel detects the interface (maybe
> via a kernel module)?
> 2. openvswitch starts before fedora renames and configures the
> physical interface (via udev or something else)?
> 
> If 1. is true, that is a big problem. There has always been an
> implicit assumption that openvswitch starts after physical network
> interfaces are detected.
> 
> If 2. is true, it is a little perplexing to know that openvswitch can
> start before udev has worked on the interfaces.

You have two possible situations, but in most cases which is when people
configure OVS bridges and ports with ifcfg- files, the service is
started as soon as the first OVS ifcfg file is used to bring up the
network device.


> > During a "clean" shutdown process, and if the OVS bridge is
> > configured using /etc/sysconfig/network/* with TYPE=OVSBridge, the
> > bridge is normally removed on shutdown, which leaves the system in
> > an acceptable state as when openvswitch-nonetwork.service starts
> > early, there is no bridge in existence, so there is no problem.

That is correct.

> > However, if shutdown is unclean for any reason - if ifdown-ovs was
> > not executed properly for any reason - then the system comes up
> > with the physical network interface ports already pre-associated
> > with the bridge, and because the bridge is started before
> > networking exists, it leads to "could not open network device
> > ens2f0 (No such device)" (in my case, the persistence naming is the
> > default as selected by udev configuration). 

Yup, known issue fixed by the commit below:

> Where do you see the above error? In ovs-vswitchd.log? If so, I think
> it is okay to ignore as long as port is re-added later. See:
> https://github.com/openvswitch/ovs/commit/24496b4ac2dda14f99fc64e7f68c19b7af27a4c1


> > This error persists, in that the physical ports are unusable in this
> > state. Now, in some cases, the ifup-ovs will delete and re-add the
> > port, so other than errors during startup, the bridge becomes
> > healthy when the port is re-added. In the fali cases, "ovs-vsctl
> > show" will show the physical interfaces with the "No such device"
> > error, even though the interfaces clearly do exist by this point.

See this commit:
https://github.com/openvswitch/ovs/commit/e21c6643a02c6b446d2fbdfde366ea303b4c2730


> > In my case, I am trying to use TYPE=OVSBond. I have dual 10 GbE and
> > I wanted to use an OVS bridge instead of a Linux bridge for my host
> > networking, with several VLAN configured as TYPE=OVSIntPort on the
> > bridge. If I configured the physical interfaces as TYPE=OVSPort,
> > and I have TYPE=OVSBond list them with BOND_IFACES, then I get a
> > different problem at startup...  Where the TYPE=OVSPort
> > initialization tries to re-add the port with:
> >
> > ovs-vsctl -t 10 -- --if-exists del-port ens2f0 -- add-port ens2f0
> >
> > But this fails with "cannot create a port named ens2f0 because an
> > interface named ens2f0 already exists on bridge br-ext". In this
> > case, the port is part of the bond, not directly part of the
> > bridge, and the re-add code isn't able to work around this problem.

What's the OVS version?
Are you using openvswitch rpm package or compiled yourself?


> > During further investigation, I found that after the system is up
> > (and particularly after network.service has been run), I could
> > "systemctl restart openvswitch" and "ovs-vsctl show" would no
> > longer list "No such device" for the physical interface ports.
> >
> > After trying to understand and dis-entangle all the cause and
> > effect, I finally realized that ifup-ovs will start OVS on demand,
> > after the physical interfaces have been detected and assigned names
> > (including possible renames ... eth0 => ens2f0, ...), and that I
> > could avoid starting OVS too early, simple by *not* enabling the
> > openvswitch.service.

Yes, openvswitch and openvswitch-nonetwork are started by demand
when the 'network' script notices an OVS port/bridge.

But enabling openvswitch only shouldn't cause any issues with the
sources containing the above patches.


> > This is now working... By *not* enabling openvswitch.service, and
> > letting ifup-ovs start up openvswitch on demand, the system is
> > coming up reliably whether clean shutdown or force reset (I want
> > the server to be crash-safe, so I explicitly test this case)....
> > But, I'm now concerned about the direction of Fedora and
> > openvswitch-nonetwork.service, and I am wondering if my work-around
> > of not enabling openvswitch.service makes sense, and is part of the
> > design of ifup-ovs that will be supported going forwards, or is
> > just lucky that it works, and this could break with a future
> > openvswitch update, or a future version of Fedora?

It's designed since migration to systemd to start the service on
demand and that can't change anymore.

You are supposed to be able to enable 'openvswitch' service and that
should make no differences in your setup since the interfaces are
brought up by 'network.target' which 'openvswitch' runs after.  By
that time, all interfaces are up including the OVS ones which started
the service on demand. In summary, enabling openvswitch should be a
no-op.


--
fbl


> >
> > I think the openvswitch-nonetwork.service starting early, and
> > presuming that physical interfaces can actually be used that early,
> > is a defect in openvswitch. I think the intent is to make OVS
> > bridges and internal ports available for use with the rest of the
> > networking support, but this only currently works properly for
> > virtual bridges that are not connected to physical interfaces. By
> > "works properly", I mean that it comes up clean whether shutdown
> > was "clean" or "dirty", and doesn't have errors about "No such
> > device", and does not need the port to be re-added to clear this
> > error state.
> >
> > Without any real understanding of the complexity here, I am
> > thinking that when OpenVSwitch starts early, before the physical
> > network interfaces exist according to the kernel, OpenVSwitch
> > should delay initialization of those ports or bonds until the
> > physical network interfaces actually do exist. The "No such device"
> > issue should automatically clear as soon as the device actually
> > does come into existence. In my case, I would like the
> > "bond0" (TYPE=OVSBond) to be re-initialized as soon as one or both
> > of "ens2f0" (TYPE=OVSPort) or "ens2f1" (TYPE=OVSPort) become real,
> > similar to what would happen when the link state for the real
> > interfaces goes up or down. I think this should also applies to
> > regular ports on the bridge. There should be no need for ifup-ovs
> > to re-create the port if it already exists, and just needs to be
> > properly initialized *after* the physical interface comes into
> > existence in the kernel. Is this something that is already
> > understood, or already being worked on? I found very little
> > information on this with Google searching, which is how I stumbled
> > upon this original thread...
> >
> > Other work-arounds that I tried that may be of interest to people to
> > understand exactly how it fails, and how it behaves:
> >
> > 1) I tried to use regular TYPE=Ethernet (instead of TYPE=OVSPort)
> > network interfaces, and "ifup" the physical interfaces as a "Pre"
> > command to the openvswitch-nonetwork.service. This gave a warning
> > about "Delaying initialization" from "ifup". I believe it *did* fix
> > the problem, but only because the "ifup" failed, so the
> > openvswitch-nonetwork.service startup was aborted early, and it
> > happened later due to ifup-ovs. As even "/bin/false" would have had
> > the same effect here, I considered this an invalid work-around and
> > this helped lead me to the conclusion of disabling
> > openvswitch.service altogether as the more sensible work-around.
> >
> > 2) I tried to "modprobe ixgbe" (the network driver for the Intel
> > cards I have) as a "Pre" command to the
> > openvswitch-nonetwork.service. This had similar behaviour to the
> > "ifup" above. Also not a very good solution.
> >
> > --
> > Mark Mielke <mark.mielke at gmail.com>
> >
> >
> > _______________________________________________
> > discuss mailing list
> > discuss at openvswitch.org
> > http://openvswitch.org/mailman/listinfo/discuss
> >
> >  



More information about the discuss mailing list