[ovs-discuss] BUG? hot upgrade with primary controllers
johannes.naab at hetzner-cloud.de
Wed Dec 16 18:34:17 UTC 2020
I am trying to perform a hot upgrade
The upgrade/restart works as expected only if no primary controller is
configured. If a primary controller is configured, the flows are not
properly restored (more specific: they are restored but later seem to be
My current understanding on what happens:
- `ovs-ctl` dumps the flows somewhere in /tmp/
- `ovsdb-server` is restarted, `flow-restore-wait` is set
- `ovs-vswitchd` is restarted
- `bridge_configure_remotes()` in `vswitchd/bridge.c` checks for
`flow-restore-wait` (currently set) and the configured primary
controllers are skipped for now.
This is expected and intentional as per
7ed73428a675a174d629d694e483f81358dc907e (bridge.c: prevent
controller connects while flow-restore-wait) in 2.11.
- the flows are restored via the management socket from /tmp/.
- `flow-restore-wait` is set to false/removed, signaling that the work
- `brige_configure_remotes()` is triggered, and the configured primary
controllers are now considered for connection.
- in `connmgr_set_controllers()` in `ofproto/connmgr.c`
`had_controllers` is false, since previously no (primary) controller
But the controllers are now being configured.
Thus, the conditional for `had_controllers !=
connmgr_has_controllers(mgr)` will later be executed. This will
flush the (previously reinstalled) flows. This is to implement the
state transition between a standalone and a managed switch.
The combination of not directly connecting to the primary controllers
during flow restore, together with the standalone/managed state
transition seems to currently break any attempts for a consistent flow
Running ovs-vswitchd with 7ed73428a675a174d629d694e483f81358dc907e
(bridge.c: prevent controller connects while flow-restore-wait)
reverted, I am able to restore flows even if a primary controller is
Are there any obvious ways to get flow restore with primary controllers
So far I have only come up with the approach (not implemented) of
explicitly detecting the falling edge for `flow-restore-wait`, and
passing this information all the way through to
`connmgr_set_controllers`. But that seems a bit invasive...
More information about the discuss