[ovs-discuss] systemd ovs-vswitchd starts too early

Guru Shetty guru at ovn.org
Mon Feb 1 15:54:46 UTC 2016


On 31 January 2016 at 14:47, Mark Mielke <mark.mielke at gmail.com> wrote:

> I joined this list recently, and encountered something very similar to
> this user:
>
> On 8 January 2016 at 04:52, Benoît <benoitne at gmail.com <http://openvswitch.org/mailman/listinfo/discuss>> wrote:
> >* I have an issue where ovs-vswitchd is starting too early.
> *>* I got a persistent name for an interface (pnic_wwan) but it is happening
> *>* after ovs-vswitchd starts so it makes an error as it does'nt find the
> *>* interface name!
> *>>*     Bridge vswitch_wwan
> *>*         Port pnic_wwan
> *>*             Interface pnic_wwan
> *>*                 error: "could not open network device pnic_wwan (No such
> *>* device)"*
>
>
>
I think you are describing multiple issues and I will try to pick only the
first one to make it easy to carry on the discussion.


> I am testing with Fedora 23. It seems that with openvswitch.service
> enabled, openvswitch-nonetwork.service starts too early, before any of the
> physical network interfaces have been detected.
>

I haven't kept myself upto date with recent changes in Fedora startup (so
ccing fedora maintainer).
When you say openvswitch starts before any physical network interfaces are
detected, which of the following do you mean to say?
1. openvswitch starts even before kernel detects the interface (maybe via a
kernel module)?
2. openvswitch starts before fedora renames and configures the physical
interface (via udev or something else)?

If 1. is true, that is a big problem. There has always been an implicit
assumption that openvswitch starts after physical network interfaces are
detected.

If 2. is true, it is a little perplexing to know that openvswitch can start
before udev has worked on the interfaces.


>
> During a "clean" shutdown process, and if the OVS bridge is configured
> using /etc/sysconfig/network/* with TYPE=OVSBridge, the bridge is normally
> removed on shutdown, which leaves the system in an acceptable state as when
> openvswitch-nonetwork.service starts early, there is no bridge in
> existence, so there is no problem.
>
> However, if shutdown is unclean for any reason - if ifdown-ovs was not
> executed properly for any reason - then the system comes up with the
> physical network interface ports already pre-associated with the bridge,
> and because the bridge is started before networking exists, it leads to
> "could not open network device ens2f0 (No such device)" (in my case, the
> persistence naming is the default as selected by udev configuration).
>

Where do you see the above error? In ovs-vswitchd.log? If so, I think it is
okay to ignore as long as port is re-added later. See:
https://github.com/openvswitch/ovs/commit/24496b4ac2dda14f99fc64e7f68c19b7af27a4c1


>
> This error persists, in that the physical ports are unusable in this
> state. Now, in some cases, the ifup-ovs will delete and re-add the port, so
> other than errors during startup, the bridge becomes healthy when the port
> is re-added. In the fali cases, "ovs-vsctl show" will show the physical
> interfaces with the "No such device" error, even though the interfaces
> clearly do exist by this point.
>
> In my case, I am trying to use TYPE=OVSBond. I have dual 10 GbE and I
> wanted to use an OVS bridge instead of a Linux bridge for my host
> networking, with several VLAN configured as TYPE=OVSIntPort on the bridge.
> If I configured the physical interfaces as TYPE=OVSPort, and I have
> TYPE=OVSBond list them with BOND_IFACES, then I get a different problem at
> startup...  Where the TYPE=OVSPort initialization tries to re-add the port
> with:
>
> ovs-vsctl -t 10 -- --if-exists del-port ens2f0 -- add-port ens2f0
>
> But this fails with "cannot create a port named ens2f0 because an
> interface named ens2f0 already exists on bridge br-ext". In this case, the
> port is part of the bond, not directly part of the bridge, and the re-add
> code isn't able to work around this problem.
>
> During further investigation, I found that after the system is up (and
> particularly after network.service has been run), I could "systemctl
> restart openvswitch" and "ovs-vsctl show" would no longer list "No such
> device" for the physical interface ports.
>
> After trying to understand and dis-entangle all the cause and effect, I
> finally realized that ifup-ovs will start OVS on demand, after the physical
> interfaces have been detected and assigned names (including possible
> renames ... eth0 => ens2f0, ...), and that I could avoid starting OVS too
> early, simple by *not* enabling the openvswitch.service.
>
> This is now working... By *not* enabling openvswitch.service, and letting
> ifup-ovs start up openvswitch on demand, the system is coming up reliably
> whether clean shutdown or force reset (I want the server to be crash-safe,
> so I explicitly test this case).... But, I'm now concerned about the
> direction of Fedora and openvswitch-nonetwork.service, and I am wondering
> if my work-around of not enabling openvswitch.service makes sense, and is
> part of the design of ifup-ovs that will be supported going forwards, or is
> just lucky that it works, and this could break with a future openvswitch
> update, or a future version of Fedora?
>
> I think the openvswitch-nonetwork.service starting early, and presuming
> that physical interfaces can actually be used that early, is a defect in
> openvswitch. I think the intent is to make OVS bridges and internal ports
> available for use with the rest of the networking support, but this only
> currently works properly for virtual bridges that are not connected to
> physical interfaces. By "works properly", I mean that it comes up clean
> whether shutdown was "clean" or "dirty", and doesn't have errors about "No
> such device", and does not need the port to be re-added to clear this error
> state.
>
> Without any real understanding of the complexity here, I am thinking that
> when OpenVSwitch starts early, before the physical network interfaces exist
> according to the kernel, OpenVSwitch should delay initialization of those
> ports or bonds until the physical network interfaces actually do exist. The
> "No such device" issue should automatically clear as soon as the device
> actually does come into existence. In my case, I would like the "bond0"
> (TYPE=OVSBond) to be re-initialized as soon as one or both of "ens2f0"
> (TYPE=OVSPort) or "ens2f1" (TYPE=OVSPort) become real, similar to what
> would happen when the link state for the real interfaces goes up or down. I
> think this should also applies to regular ports on the bridge. There should
> be no need for ifup-ovs to re-create the port if it already exists, and
> just needs to be properly initialized *after* the physical interface comes
> into existence in the kernel. Is this something that is already understood,
> or already being worked on? I found very little information on this with
> Google searching, which is how I stumbled upon this original thread...
>
> Other work-arounds that I tried that may be of interest to people to
> understand exactly how it fails, and how it behaves:
>
> 1) I tried to use regular TYPE=Ethernet (instead of TYPE=OVSPort) network
> interfaces, and "ifup" the physical interfaces as a "Pre" command to the
> openvswitch-nonetwork.service. This gave a warning about "Delaying
> initialization" from "ifup". I believe it *did* fix the problem, but only
> because the "ifup" failed, so the openvswitch-nonetwork.service startup was
> aborted early, and it happened later due to ifup-ovs. As even "/bin/false"
> would have had the same effect here, I considered this an invalid
> work-around and this helped lead me to the conclusion of disabling
> openvswitch.service altogether as the more sensible work-around.
>
> 2) I tried to "modprobe ixgbe" (the network driver for the Intel cards I
> have) as a "Pre" command to the openvswitch-nonetwork.service. This had
> similar behaviour to the "ifup" above. Also not a very good solution.
>
> --
> Mark Mielke <mark.mielke at gmail.com>
>
>
> _______________________________________________
> discuss mailing list
> discuss at openvswitch.org
> http://openvswitch.org/mailman/listinfo/discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://openvswitch.org/pipermail/ovs-discuss/attachments/20160201/3898f359/attachment-0002.html>


More information about the discuss mailing list