[ovs-discuss] BUG? hot upgrade with primary controllers

Johannes Naab johannes.naab at hetzner-cloud.de
Wed Dec 16 18:34:17 UTC 2020


Hi,

I am trying to perform a hot upgrade
(https://docs.openvswitch.org/en/latest/intro/install/general/#hot-upgrading,
via `scripts/ovs-ctl`).
The upgrade/restart works as expected only if no primary controller is
configured. If a primary controller is configured, the flows are not
properly restored (more specific: they are restored but later seem to be
flushed again).

My current understanding on what happens:
- `ovs-ctl` dumps the flows somewhere in /tmp/
- `ovsdb-server` is restarted, `flow-restore-wait` is set
- `ovs-vswitchd` is restarted
  - `bridge_configure_remotes()` in `vswitchd/bridge.c` checks for
    `flow-restore-wait` (currently set) and the configured primary
    controllers are skipped for now.
    This is expected and intentional as per
    7ed73428a675a174d629d694e483f81358dc907e (bridge.c: prevent
    controller connects while flow-restore-wait) in 2.11.
- the flows are restored via the management socket from /tmp/.
- `flow-restore-wait` is set to false/removed, signaling that the work
  is done
  - `brige_configure_remotes()` is triggered, and the configured primary
    controllers are now considered for connection.
  - in `connmgr_set_controllers()` in `ofproto/connmgr.c`
    `had_controllers` is false, since previously no (primary) controller
    was configured.
    But the controllers are now being configured.
    Thus, the conditional for `had_controllers !=
    connmgr_has_controllers(mgr)` will later be executed. This will
    flush the (previously reinstalled) flows. This is to implement the
    state transition between a standalone and a managed switch.

The combination of not directly connecting to the primary controllers
during flow restore, together with the standalone/managed state
transition seems to currently break any attempts for a consistent flow
restore.

Running ovs-vswitchd with 7ed73428a675a174d629d694e483f81358dc907e
(bridge.c: prevent controller connects while flow-restore-wait)
reverted, I am able to restore flows even if a primary controller is
configured.

Are there any obvious ways to get flow restore with primary controllers
working (again)?
So far I have only come up with the approach (not implemented) of
explicitly detecting the falling edge for `flow-restore-wait`, and
passing this information all the way through to
`connmgr_set_controllers`. But that seems a bit invasive...


Best Regards,
Johannes


More information about the discuss mailing list