[ovs-dev] Proposed OVN live migration workflow
Ben Pfaff
blp at ovn.org
Thu Mar 16 17:39:35 UTC 2017
On Mon, Feb 27, 2017 at 11:12:07AM -0500, Russell Bryant wrote:
> This is a proposed update to the VM live migration workflow with OVN.
>
> Currently, when doing live migration, you must not add the iface-id to the
> port for the destination VM until migration is complete. Otherwise, while
> migration is in progress, ovn-controller on two different chassis will
> fight over the port binding.
>
> This workflow is problematic for libvirt-based live migration (at least) as
> it creates an identical VM on the destination host, which includes all
> config such as the ovs port iface-id. This results in ovn-controller on
> two hosts fighting over the port binding for the duration of the migration.
>
>
> Proposed new workflow for a migration from host A to host B:
>
> 1) The CMS sets a new option on Logical_Switch_Port called
> "migration-destination". The value would be the chassis name of the
> destination chassis of the upcoming live migration (host B in this case).
>
> 2) While this option is set, if host B claims the port binding, host A will
> not try to re-claim it.
>
> 3) While this option is set, if host B sees the new port appear, it will
> not immediately update the port binding. Instead, it will set up flows
> watching for a GARP from the VM. GARP packets would be forwarded to
> ovn-controller. All other packets would be dropped. If a GARP is seen,
> then host B will update the port binding to reflect that the port is now
> active on host B.
>
> At least for KVM VMs, qemu is already generating a GARP when migration is
> complete. I'm not familiar with Xen or other virtualization technologies,
> but it seems like this would be a common requirement for VM migration.
>
> 4) When the migration is either completed or aborted, the CMS will remove
> the "migration-destination" option from the Logical_Switch_Port in
> OVN_Northbound. At this point, ovn-controller will resume normal
> behavior. If for some reason a GARP was not seen, host B would update the
> port binding at this point.
This seems like a reasonable approach to me. I've spent a few minutes
trying to think of problems with it. It adds a little bit of
complexity, but not enough to worry me. It doesn't seem to add races
that cause a problem. It still works even if the GARPs are lost or
never sent.
However, I find myself wondering how aware the hypervisors are that a
migration is taking place. If the source and destination hypervisors
could note locally that a migration is going on for a given VM, then
they could handle this situation without needing anything to happen in
the OVN databases. For example, consider the Interface table
integration documented in Documentation/topics/integration.rst. Suppose
we added external_ids:migration-status, documented as follows:
This field is omitted except for a VM that is currently migrating.
For a VM that is migrating away from this chassis, this field is set
to "outgoing". For a VM that is migrating to this chassis, this
field is set to "incoming".
If we had this, then we could use the following workflow from a
migration from A to B:
1) CMS integration sets migration-status appropriately on A ("outgoing")
and B ("incoming").
2) While migration-status is "outgoing", A will not try to reclaim a
port claimed by a different chassis.
3) While migration-status is "incoming", B will not grab the port
binding unless and until it sees a GARP.
4) When the migration is completed successfully, A's port gets destroyed
and B's migration-status gets removed, so at this point B claims it in
case it didn't see a GARP. If the migration fails, B's port gets
destroyed and A's migration-status gets removed, so at this point A
reclaims it if necessary.
As I wrote up the above, though, I found myself thinking about how hard
it is to update all the hypervisor integrations and make them correct in
the corner cases. I know that's tough from experience. (Also, this
would be the first change to the integration spec in years.) I don't
know whether updating the CMSes is harder or easier. But maybe you are
planning to do it yourself (at least for Neutron?) in which case I think
that the CMS-based approach is probably the right one.
More information about the dev
mailing list