[ovs-discuss] systemd ovs-vswitchd starts too early

Mark Mielke mark.mielke at gmail.com
Tue Feb 2 07:06:43 UTC 2016


On Mon, Feb 1, 2016 at 1:56 PM, Flavio Leitner <fbl at redhat.com> wrote:

> On Mon, 1 Feb 2016 07:54:46 -0800
> Guru Shetty <guru at ovn.org> wrote:
> > On 31 January 2016 at 14:47, Mark Mielke <mark.mielke at gmail.com>
> > wrote:
>
> > I am testing with Fedora 23. It seems that with openvswitch.service
> > > enabled, openvswitch-nonetwork.service starts too early, before any
> > > of the physical network interfaces have been detected.
>


> > I haven't kept myself upto date with recent changes in Fedora startup
> > (so ccing fedora maintainer).
> > When you say openvswitch starts before any physical network
> > interfaces are detected, which of the following do you mean to say?
> > 1. openvswitch starts even before kernel detects the interface (maybe
> > via a kernel module)?
> > 2. openvswitch starts before fedora renames and configures the
> > physical interface (via udev or something else)?
> >
> > If 1. is true, that is a big problem. There has always been an
> > implicit assumption that openvswitch starts after physical network
> > interfaces are detected.
> >
> > If 2. is true, it is a little perplexing to know that openvswitch can
> > start before udev has worked on the interfaces.
>
> You have two possible situations, but in most cases which is when people
> configure OVS bridges and ports with ifcfg- files, the service is
> started as soon as the first OVS ifcfg file is used to bring up the
> network device.
>

If I enable openvswitch.service, then I reproducible get the behaviour I
describe.

The "ovs-vsctl show" displays:

704958c4-4abc-4fa8-8a1b-5a976537e92d
    Bridge "ovsbr0"
        Port "ovsbr0"
            Interface "ovsbr0"
                type: internal
        ...
        Port "ovsbond0"
            Interface "ens2f0"
                error: "could not open network device ens2f0 (No such
device)"
            Interface "ens2f1"
                error: "could not open network device ens2f1 (No such
device)"
        ...
    ovs_version: "2.4.0"


This state persists indefinitely on its own. If I "systemctl restart
openvswitch", the problem clears.

The startup log demonstrates that openvswitch is starting *before* the
Intel ixgbe driver is loaded, which is before "ens2f0" and "ens2f1" exist,
and before they are renamed by udev:



Feb  2 01:35:49 onxstack002 systemd: Reached target Remote File Systems
(Pre).
Feb  2 01:35:49 onxstack002 systemd: Starting Remote File Systems (Pre).
Feb  2 01:35:49 onxstack002 systemd: Started System Logging Service.
Feb  2 01:35:49 onxstack002 audit: SERVICE_START pid=1 uid=0
auid=4294967295 ses=4294967295 msg='unit=rsyslog comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Feb  2 01:35:49 onxstack002 ovs-ctl: Starting ovsdb-server [  OK  ]
Feb  2 01:35:49 onxstack002 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
ovs-vsctl --no-wait -- init -- set Open_vSwitch . db-version=7.12.1
Feb  2 01:35:49 onxstack002 kernel: igb 0000:05:00.1: added PHC on eth1
Feb  2 01:35:49 onxstack002 kernel: igb 0000:05:00.1: Intel(R) Gigabit
Ethernet Network Connection
Feb  2 01:35:49 onxstack002 kernel: igb 0000:05:00.1: eth1:
(PCIe:5.0Gb/s:Width x4) 00:25:90:5c:f5:37
Feb  2 01:35:49 onxstack002 kernel: igb 0000:05:00.1: eth1: PBA No:
010A00-000
Feb  2 01:35:49 onxstack002 kernel: igb 0000:05:00.1: Using MSI-X
interrupts. 8 rx queue(s), 8 tx queue(s)
Feb  2 01:35:49 onxstack002 kernel: igb 0000:05:00.0 eno1: renamed from eth0
Feb  2 01:35:49 onxstack002 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
ovs-vsctl --no-wait set Open_vSwitch . ovs-version=2.4.0
"external-ids:system-id=\"70dd7b2c-71a9-4b03-b7b9-4447375292f2\"" "syste
m-type=\"unknown\"" "system-version=\"unknown\""
Feb  2 01:35:49 onxstack002 ovs-ctl: Configuring Open vSwitch system IDs [
 OK  ]
Feb  2 01:35:49 onxstack002 kernel: nf_conntrack version 0.5.0 (65536
buckets, 262144 max)
Feb  2 01:35:49 onxstack002 ovs-ctl: Inserting openvswitch module [  OK  ]
Feb  2 01:35:49 onxstack002 kernel: openvswitch: Open vSwitch switching
datapath
Feb  2 01:35:49 onxstack002 audit: ANOM_PROMISCUOUS dev=ovs-system prom=256
old_prom=0 auid=4294967295 uid=0 gid=0 ses=4294967295
Feb  2 01:35:49 onxstack002 systemd-udevd: Could not generate persistent
MAC address for ovs-system: No such file or directory
Feb  2 01:35:49 onxstack002 kernel: device ovs-system entered promiscuous
mode
...
Feb  2 01:35:49 onxstack002 audit: ANOM_PROMISCUOUS dev=ovsbr0 prom=256
old_prom=0 auid=4294967295 uid=0 gid=0 ses=4294967295
Feb  2 01:35:49 onxstack002 systemd-udevd: Could not generate persistent
MAC address for ovsbr0: No such file or directory
Feb  2 01:35:49 onxstack002 kernel: device ovsbr0 entered promiscuous mode
...

Feb  2 01:35:49 onxstack002 ovs-ctl: Starting ovs-vswitchd [  OK  ]
Feb  2 01:35:49 onxstack002 ovs-ctl: Enabling remote OVSDB managers [  OK  ]
Feb  2 01:35:49 onxstack002 systemd: Started Open vSwitch Internal Unit.
Feb  2 01:35:49 onxstack002 audit: SERVICE_START pid=1 uid=0
auid=4294967295 ses=4294967295 msg='unit=openvswitch-nonetwork
comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=?
res=success'
Feb  2 01:35:49 onxstack002 network: Bringing up loopback interface:  [  OK
 ]
Feb  2 01:35:49 onxstack002 kernel: IPv6: ADDRCONF(NETDEV_UP): eno1: link
is not ready
Feb  2 01:35:50 onxstack002 kernel: ACPI Warning: \_SB_.PCI0.BR2C._PRT:
Return Package has no elements (empty) (20150818/nsprepkg-126)
Feb  2 01:35:50 onxstack002 kernel: ixgbe 0000:03:00.0: Multiqueue Enabled:
Rx Queue count = 16, Tx Queue count = 16
Feb  2 01:35:50 onxstack002 kernel: ixgbe 0000:03:00.0: MAC: 5, PHY: 6, PBA
No: 010A00-000
Feb  2 01:35:50 onxstack002 kernel: ixgbe 0000:03:00.0: 00:25:90:5c:f6:f8
Feb  2 01:35:50 onxstack002 kernel: ixgbe 0000:03:00.0: Intel(R) 10 Gigabit
Network Connection
Feb  2 01:35:50 onxstack002 kernel: ACPI Warning: \_SB_.PCI0.BR2C._PRT:
Return Package has no elements (empty) (20150818/nsprepkg-126)
Feb  2 01:35:52 onxstack002 kernel: ACPI Warning: \_SB_.PCI0.BR2C._PRT:
Return Package has no elements (empty) (20150818/nsprepkg-126)
Feb  2 01:35:52 onxstack002 kernel: ixgbe 0000:03:00.1: Multiqueue Enabled:
Rx Queue count = 16, Tx Queue count = 16
Feb  2 01:35:52 onxstack002 kernel: ixgbe 0000:03:00.1: MAC: 5, PHY: 6, PBA
No: 010A00-000
Feb  2 01:35:52 onxstack002 kernel: ixgbe 0000:03:00.1: 00:25:90:5c:f6:f9
Feb  2 01:35:52 onxstack002 kernel: ixgbe 0000:03:00.1: Intel(R) 10 Gigabit
Network Connection
Feb  2 01:35:52 onxstack002 network: Bringing up interface eno1:  [  OK  ]
Feb  2 01:35:52 onxstack002 kernel: ixgbe 0000:04:00.0: Multiqueue Enabled:
Rx Queue count = 16, Tx Queue count = 16
Feb  2 01:35:52 onxstack002 kernel: ixgbe 0000:04:00.0: PCI Express
bandwidth of 32GT/s available
Feb  2 01:35:52 onxstack002 kernel: ixgbe 0000:04:00.0: (Speed:5.0GT/s,
Width: x8, Encoding Loss:20%)
Feb  2 01:35:52 onxstack002 kernel: ixgbe 0000:04:00.0: MAC: 2, PHY: 18,
SFP+: 5, PBA No: E68785-007
Feb  2 01:35:52 onxstack002 kernel: ixgbe 0000:04:00.0: 90:e2:ba:82:94:8c
Feb  2 01:35:52 onxstack002 kernel: ixgbe 0000:04:00.0: Intel(R) 10 Gigabit
Network Connection
Feb  2 01:35:52 onxstack002 kernel: igb 0000:05:00.0 eno1: igb: eno1 NIC
Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Feb  2 01:35:52 onxstack002 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eno1:
link becomes ready
Feb  2 01:35:53 onxstack002 network: Bringing up interface ens2f0:  Usage:
ifup <configuration>
Feb  2 01:35:53 onxstack002 kernel: ixgbe 0000:04:00.1: Multiqueue Enabled:
Rx Queue count = 16, Tx Queue count = 16
Feb  2 01:35:53 onxstack002 kernel: ixgbe 0000:04:00.1: PCI Express
bandwidth of 32GT/s available
Feb  2 01:35:53 onxstack002 kernel: ixgbe 0000:04:00.1: (Speed:5.0GT/s,
Width: x8, Encoding Loss:20%)
Feb  2 01:35:53 onxstack002 kernel: ixgbe 0000:04:00.1: MAC: 2, PHY: 18,
SFP+: 6, PBA No: E68785-007
Feb  2 01:35:53 onxstack002 kernel: ixgbe 0000:04:00.1: 90:e2:ba:82:94:8d
Feb  2 01:35:53 onxstack002 kernel: ixgbe 0000:04:00.1: Intel(R) 10 Gigabit
Network Connection
Feb  2 01:35:53 onxstack002 kernel: ixgbe 0000:04:00.0 ens2f0: renamed from
eth2
...
Feb  2 01:35:53 onxstack002 kernel: ixgbe 0000:04:00.0 ens2f0: changing MTU
from 1500 to 9000
Feb  2 01:35:53 onxstack002 kernel: ixgbe 0000:04:00.0: registered PHC
device on ens2f0
Feb  2 01:35:53 onxstack002 kernel: IPv6: ADDRCONF(NETDEV_UP): ens2f0: link
is not ready
Feb  2 01:35:53 onxstack002 kernel: ixgbe 0000:04:00.1 ens2f1: renamed from
eth3
Feb  2 01:35:53 onxstack002 kernel: ixgbe 0000:04:00.0 ens2f0: detected
SFP+: 5
Feb  2 01:35:53 onxstack002 kernel: ixgbe 0000:04:00.0 ens2f0: NIC Link is
Up 10 Gbps, Flow Control: RX/TX
Feb  2 01:35:53 onxstack002 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ens2f0:
link becomes ready
Feb  2 01:35:53 onxstack002 kernel: random: nonblocking pool is initialized
Feb  2 01:35:54 onxstack002 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
ovs-vsctl -t 10 -- --if-exists del-port  ens2f0 -- add-port  ens2f0
Feb  2 01:35:54 onxstack002 ovs-vsctl: ovs|00002|vsctl|ERR|cannot create a
port named ens2f0 because an interface named ens2f0 already exists on
bridge ovsbr0
Feb  2 01:35:54 onxstack002 network: ovs-vsctl: cannot create a port named
ens2f0 because an interface named ens2f0 already exists on bridge ovsbr0
Feb  2 01:35:54 onxstack002 network: Usage: ifup <configuration>
Feb  2 01:35:54 onxstack002 network: [FAILED]
Feb  2 01:35:54 onxstack002 network: Bringing up interface ens2f1:  Usage:
ifup <configuration>
Feb  2 01:35:54 onxstack002 kernel: ixgbe 0000:04:00.1 ens2f1: changing MTU
from 1500 to 9000
Feb  2 01:35:54 onxstack002 kernel: ixgbe 0000:04:00.1: registered PHC
device on ens2f1
Feb  2 01:35:54 onxstack002 kernel: IPv6: ADDRCONF(NETDEV_UP): ens2f1: link
is not ready
Feb  2 01:35:54 onxstack002 kernel: ixgbe 0000:04:00.1 ens2f1: detected
SFP+: 6
Feb  2 01:35:54 onxstack002 kernel: ixgbe 0000:04:00.1 ens2f1: NIC Link is
Up 10 Gbps, Flow Control: RX/TX
Feb  2 01:35:54 onxstack002 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ens2f1:
link becomes ready
Feb  2 01:35:55 onxstack002 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
ovs-vsctl -t 10 -- --if-exists del-port  ens2f1 -- add-port  ens2f1
Feb  2 01:35:55 onxstack002 ovs-vsctl: ovs|00002|vsctl|ERR|cannot create a
port named ens2f1 because an interface named ens2f1 already exists on
bridge ovsbr0
Feb  2 01:35:55 onxstack002 network: ovs-vsctl: cannot create a port named
ens2f1 because an interface named ens2f1 already exists on bridge ovsbr0
Feb  2 01:35:55 onxstack002 network: Usage: ifup <configuration>
Feb  2 01:35:55 onxstack002 network: [FAILED]
Feb  2 01:35:55 onxstack002 network: Bringing up interface ovsbond0:
 Usage: ifup <configuration>
Feb  2 01:35:55 onxstack002 network: RTNETLINK answers: File exists
Feb  2 01:35:56 onxstack002 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
ovs-vsctl -t 10 -- --if-exists del-port  ens2f0 -- add-port  ens2f0
Feb  2 01:35:56 onxstack002 ovs-vsctl: ovs|00002|vsctl|ERR|cannot create a
port named ens2f0 because an interface named ens2f0 already exists on
bridge ovsbr0
Feb  2 01:35:56 onxstack002 network: ovs-vsctl: cannot create a port named
ens2f0 because an interface named ens2f0 already exists on bridge ovsbr0
Feb  2 01:35:56 onxstack002 network: Usage: ifup <configuration>
Feb  2 01:35:56 onxstack002 network: Usage: ifup <configuration>
Feb  2 01:35:56 onxstack002 network: RTNETLINK answers: File exists
...
Feb  2 01:35:57 onxstack002 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
ovs-vsctl -t 10 -- --if-exists del-port  ens2f1 -- add-port  ens2f1
Feb  2 01:35:57 onxstack002 ovs-vsctl: ovs|00002|vsctl|ERR|cannot create a
port named ens2f1 because an interface named ens2f1 already exists on
bridge ovsbr0
Feb  2 01:35:57 onxstack002 network: ovs-vsctl: cannot create a port named
ens2f1 because an interface named ens2f1 already exists on bridge ovsbr0
Feb  2 01:35:57 onxstack002 network: Usage: ifup <configuration>
Feb  2 01:35:57 onxstack002 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
ovs-vsctl -t 10 -- --may-exist add-bond ovsbr0 ovsbond0 ens2f0 ens2f1
bond_mode=balance-tcp lacp=active
Feb  2 01:35:57 onxstack002 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
ovs-vsctl -t 10 -- --may-exist add-br ovsbr0
Feb  2 01:35:57 onxstack002 network: [  OK  ]
Feb  2 01:35:57 onxstack002 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
ovs-vsctl -t 10 -- --may-exist add-br ovsbr0
Feb  2 01:35:58 onxstack002 network: Bringing up interface ovsbr0:  [  OK  ]
Feb  2 01:35:58 onxstack002 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
ovs-vsctl -t 10 -- --may-exist add-port ovsbr0 vlan1003 tag=1003 -- set
Interface vlan1003 type=internal
Feb  2 01:36:01 onxstack002 network: Bringing up interface vlan1003:  [  OK
 ]
Feb  2 01:36:02 onxstack002 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
ovs-vsctl -t 10 -- --may-exist add-port ovsbr0 vlan1004 tag=1004 -- set
Interface vlan1004 type=internal
Feb  2 01:36:04 onxstack002 network: Bringing up interface vlan1004:  [  OK
 ]
Feb  2 01:36:04 onxstack002 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
ovs-vsctl -t 10 -- --may-exist add-port ovsbr0 vlan1010 tag=1010 -- set
Interface vlan1010 type=internal
Feb  2 01:36:07 onxstack002 network: Bringing up interface vlan1010:  [  OK
 ]
Feb  2 01:36:07 onxstack002 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
ovs-vsctl -t 10 -- --may-exist add-port ovsbr0 vlan1020 tag=1020 -- set
Interface vlan1020 type=internal
Feb  2 01:36:10 onxstack002 network: Bringing up interface vlan1020:  [  OK
 ]
Feb  2 01:36:11 onxstack002 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
ovs-vsctl -t 10 -- --may-exist add-port ovsbr0 vlan1021 tag=1021 -- set
Interface vlan1021 type=internal
Feb  2 01:36:13 onxstack002 network: Bringing up interface vlan1021:  [  OK
 ]
Feb  2 01:36:14 onxstack002 ovs-vsctl: ovs|00001|vsctl|INFO|Called as
ovs-vsctl -t 10 -- --may-exist add-port ovsbr0 vlan1022 tag=1022 -- set
Interface vlan1022 type=internal
Feb  2 01:36:17 onxstack002 network: Bringing up interface vlan1022:  [  OK
 ]
Feb  2 01:36:17 onxstack002 systemd: network.service: Control process
exited, code=exited status=1
Feb  2 01:36:17 onxstack002 systemd: Failed to start LSB: Bring up/down
networking.
Feb  2 01:36:17 onxstack002 systemd: network.service: Unit entered failed
state.
Feb  2 01:36:17 onxstack002 audit: SERVICE_START pid=1 uid=0
auid=4294967295 ses=4294967295 msg='unit=network comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Feb  2 01:36:17 onxstack002 systemd: network.service: Failed with result
'exit-code'.
Feb  2 01:36:17 onxstack002 systemd: Reached target Network.
Feb  2 01:36:17 onxstack002 systemd: Starting Network.
Feb  2 01:36:17 onxstack002 systemd: Starting Notify NFS peers of a
restart...
Feb  2 01:36:17 onxstack002 systemd: Starting Open vSwitch...
Feb  2 01:36:17 onxstack002 systemd: Started OpenSSH server daemon.
Feb  2 01:36:17 onxstack002 audit: SERVICE_START pid=1 uid=0
auid=4294967295 ses=4294967295 msg='unit=sshd comm="systemd"
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Feb  2 01:36:17 onxstack002 systemd: Starting OpenSSH server daemon...
Feb  2 01:36:17 onxstack002 systemd: Reached target Network is Online.
Feb  2 01:36:17 onxstack002 systemd: Starting Network is Online.
Feb  2 01:36:17 onxstack002 systemd: Mounting /mnt/openstack_data...
Feb  2 01:36:17 onxstack002 systemd: Started Open vSwitch.



You can see how it tries to re-add the ports, but fails due to "cannot
create a port named ens2f1 because an interface named ens2f1 already exists
on bridge ovsbr0".



> > > During a "clean" shutdown process, and if the OVS bridge is
> > > configured using /etc/sysconfig/network/* with TYPE=OVSBridge, the
> > > bridge is normally removed on shutdown, which leaves the system in
> > > an acceptable state as when openvswitch-nonetwork.service starts
> > > early, there is no bridge in existence, so there is no problem.
>
> That is correct.
>
> > > However, if shutdown is unclean for any reason - if ifdown-ovs was
> > > not executed properly for any reason - then the system comes up
> > > with the physical network interface ports already pre-associated
> > > with the bridge, and because the bridge is started before
> > > networking exists, it leads to "could not open network device
> > > ens2f0 (No such device)" (in my case, the persistence naming is the
> > > default as selected by udev configuration).
>
> Yup, known issue fixed by the commit below:
>
> > Where do you see the above error? In ovs-vswitchd.log? If so, I think
> > it is okay to ignore as long as port is re-added later. See:
> >
> https://github.com/openvswitch/ovs/commit/24496b4ac2dda14f99fc64e7f68c19b7af27a4c1
>


Fedora 23 has openvswitch-2.4.0-1.fc23.x86_64 which appears to contain the
above commit. I checked ifup-ovs and the new code is there exactly as the
diff describes. However, it is not good enough as shown by the log output
above. I think it may work fine for regular TYPE=OVSPort, but it is not
working in my case, because I want to use TYPE=OVSBond with
BOND_IFACES="ens2f0 ens2f1", and the attempt to "del-port" fails because of
"cannot create a port named ens2f1 because an interface named ens2f1
already exists on bridge ovsbr0". See the "ovs-vsctl show" output from
earlier, where it shows that ens2f1 is part of the ovsbond0.



> > > This error persists, in that the physical ports are unusable in this
> > > state. Now, in some cases, the ifup-ovs will delete and re-add the
> > > port, so other than errors during startup, the bridge becomes
> > > healthy when the port is re-added. In the fali cases, "ovs-vsctl
> > > show" will show the physical interfaces with the "No such device"
> > > error, even though the interfaces clearly do exist by this point.
>
> See this commit:
>
> https://github.com/openvswitch/ovs/commit/e21c6643a02c6b446d2fbdfde366ea303b4c2730
>

It looks like this is in "master", but not "v2.4.0", correct?

This commit looks promising. I guess I would have to build from source to
pick it up though? I will see if I can run a test with "master" and see if
this problem is automatically corrected at runtime even if it initially
fails.



> > > In my case, I am trying to use TYPE=OVSBond. I have dual 10 GbE and
> > > I wanted to use an OVS bridge instead of a Linux bridge for my host
> > > networking, with several VLAN configured as TYPE=OVSIntPort on the
> > > bridge. If I configured the physical interfaces as TYPE=OVSPort,
> > > and I have TYPE=OVSBond list them with BOND_IFACES, then I get a
> > > different problem at startup...  Where the TYPE=OVSPort
> > > initialization tries to re-add the port with:
> > >
> > > ovs-vsctl -t 10 -- --if-exists del-port ens2f0 -- add-port ens2f0
> > >
> > > But this fails with "cannot create a port named ens2f0 because an
> > > interface named ens2f0 already exists on bridge br-ext". In this
> > > case, the port is part of the bond, not directly part of the
> > > bridge, and the re-add code isn't able to work around this problem.
>
> What's the OVS version?
> Are you using openvswitch rpm package or compiled yourself?
>


For this particular test, it is Fedora 23 with latest updates, which
includes v2.4.0 with whatever Fedora has added on top to make it -1.



> > > During further investigation, I found that after the system is up
> > > (and particularly after network.service has been run), I could
> > > "systemctl restart openvswitch" and "ovs-vsctl show" would no
> > > longer list "No such device" for the physical interface ports.
> > >
> > > After trying to understand and dis-entangle all the cause and
> > > effect, I finally realized that ifup-ovs will start OVS on demand,
> > > after the physical interfaces have been detected and assigned names
> > > (including possible renames ... eth0 => ens2f0, ...), and that I
> > > could avoid starting OVS too early, simple by *not* enabling the
> > > openvswitch.service.
>
> Yes, openvswitch and openvswitch-nonetwork are started by demand
> when the 'network' script notices an OVS port/bridge.
>
> But enabling openvswitch only shouldn't cause any issues with the
> sources containing the above patches.
>


The first patch is already part of the system, and it does result in the
errors I described. I'll have to test if the second patch works around this
problem...



>
> > > This is now working... By *not* enabling openvswitch.service, and
> > > letting ifup-ovs start up openvswitch on demand, the system is
> > > coming up reliably whether clean shutdown or force reset (I want
> > > the server to be crash-safe, so I explicitly test this case)....
> > > But, I'm now concerned about the direction of Fedora and
> > > openvswitch-nonetwork.service, and I am wondering if my work-around
> > > of not enabling openvswitch.service makes sense, and is part of the
> > > design of ifup-ovs that will be supported going forwards, or is
> > > just lucky that it works, and this could break with a future
> > > openvswitch update, or a future version of Fedora?
>
> It's designed since migration to systemd to start the service on
> demand and that can't change anymore.
>
> You are supposed to be able to enable 'openvswitch' service and that
> should make no differences in your setup since the interfaces are
> brought up by 'network.target' which 'openvswitch' runs after.  By
> that time, all interfaces are up including the OVS ones which started
> the service on demand. In summary, enabling openvswitch should be a
> no-op.
>


I think it is not a no-op, at least on Fedora. Now that you have explained
the expectation a little bit more, I think I am seeing that this is
probably a race condition. Mainly, I think that on my hardware, with
"openvswitch.service" enabled, System sees that both
"openvswitch-nonetwork.service" and "network.service" should both be
started, "openvswitch-nonetwork.service" is being started at the same time
as "network.service", and "openvswitch" is just *faster* than
"network.service" to try to access the network interfaces. Because
"network.service" is taking a while to startup, and begin to start the
modprobe and other activities that activate the network interface,
"openvswitch" gets ahead of it and encounters several failures.

With the second commit you referred to, it may be able to work around this
problem, but I think there is still a race problem here that should be
discussed further?

Thanks for considering this.


-- 
Mark Mielke <mark.mielke at gmail.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://openvswitch.org/pipermail/ovs-discuss/attachments/20160202/dfb16dd4/attachment-0002.html>


More information about the discuss mailing list