[ovs-discuss] BUG? hot upgrade with primary controllers
Johannes Naab
johannes.naab at hetzner-cloud.de
Wed Dec 16 18:34:17 UTC 2020
Hi,
I am trying to perform a hot upgrade
(https://docs.openvswitch.org/en/latest/intro/install/general/#hot-upgrading,
via `scripts/ovs-ctl`).
The upgrade/restart works as expected only if no primary controller is
configured. If a primary controller is configured, the flows are not
properly restored (more specific: they are restored but later seem to be
flushed again).
My current understanding on what happens:
- `ovs-ctl` dumps the flows somewhere in /tmp/
- `ovsdb-server` is restarted, `flow-restore-wait` is set
- `ovs-vswitchd` is restarted
- `bridge_configure_remotes()` in `vswitchd/bridge.c` checks for
`flow-restore-wait` (currently set) and the configured primary
controllers are skipped for now.
This is expected and intentional as per
7ed73428a675a174d629d694e483f81358dc907e (bridge.c: prevent
controller connects while flow-restore-wait) in 2.11.
- the flows are restored via the management socket from /tmp/.
- `flow-restore-wait` is set to false/removed, signaling that the work
is done
- `brige_configure_remotes()` is triggered, and the configured primary
controllers are now considered for connection.
- in `connmgr_set_controllers()` in `ofproto/connmgr.c`
`had_controllers` is false, since previously no (primary) controller
was configured.
But the controllers are now being configured.
Thus, the conditional for `had_controllers !=
connmgr_has_controllers(mgr)` will later be executed. This will
flush the (previously reinstalled) flows. This is to implement the
state transition between a standalone and a managed switch.
The combination of not directly connecting to the primary controllers
during flow restore, together with the standalone/managed state
transition seems to currently break any attempts for a consistent flow
restore.
Running ovs-vswitchd with 7ed73428a675a174d629d694e483f81358dc907e
(bridge.c: prevent controller connects while flow-restore-wait)
reverted, I am able to restore flows even if a primary controller is
configured.
Are there any obvious ways to get flow restore with primary controllers
working (again)?
So far I have only come up with the approach (not implemented) of
explicitly detecting the falling edge for `flow-restore-wait`, and
passing this information all the way through to
`connmgr_set_controllers`. But that seems a bit invasive...
Best Regards,
Johannes
More information about the discuss
mailing list