[ovs-dev] Proposed OVN live migration workflow

Thu Mar 16 17:39:35 UTC 2017

On Mon, Feb 27, 2017 at 11:12:07AM -0500, Russell Bryant wrote:
> This is a proposed update to the VM live migration workflow with OVN.
> 
> Currently, when doing live migration, you must not add the iface-id to the
> port for the destination VM until migration is complete.  Otherwise, while
> migration is in progress, ovn-controller on two different chassis will
> fight over the port binding.
> 
> This workflow is problematic for libvirt-based live migration (at least) as
> it creates an identical VM on the destination host, which includes all
> config such as the ovs port iface-id.  This results in ovn-controller on
> two hosts fighting over the port binding for the duration of the migration.
> 
> 
> Proposed new workflow for a migration from host A to host B:
> 
> 1) The CMS sets a new option on Logical_Switch_Port called
> "migration-destination".  The value would be the chassis name of the
> destination chassis of the upcoming live migration (host B in this case).
> 
> 2) While this option is set, if host B claims the port binding, host A will
> not try to re-claim it.
> 
> 3) While this option is set, if host B sees the new port appear, it will
> not immediately update the port binding.  Instead, it will set up flows
> watching for a GARP from the VM.  GARP packets would be forwarded to
> ovn-controller.  All other packets would be dropped.  If a GARP is seen,
> then host B will update the port binding to reflect that the port is now
> active on host B.
> 
> At least for KVM VMs, qemu is already generating a GARP when migration is
> complete.  I'm not familiar with Xen or other virtualization technologies,
> but it seems like this would be a common requirement for VM migration.
> 
> 4) When the migration is either completed or aborted, the CMS will remove
> the "migration-destination" option from the Logical_Switch_Port in
> OVN_Northbound.  At this point, ovn-controller will resume normal
> behavior.  If for some reason a GARP was not seen, host B would update the
> port binding at this point.

This seems like a reasonable approach to me.  I've spent a few minutes
trying to think of problems with it.  It adds a little bit of
complexity, but not enough to worry me.  It doesn't seem to add races
that cause a problem.  It still works even if the GARPs are lost or
never sent.

However, I find myself wondering how aware the hypervisors are that a
migration is taking place.  If the source and destination hypervisors
could note locally that a migration is going on for a given VM, then
they could handle this situation without needing anything to happen in
the OVN databases.  For example, consider the Interface table
integration documented in Documentation/topics/integration.rst.  Suppose
we added external_ids:migration-status, documented as follows:

    This field is omitted except for a VM that is currently migrating.
    For a VM that is migrating away from this chassis, this field is set
    to "outgoing".  For a VM that is migrating to this chassis, this
    field is set to "incoming".

If we had this, then we could use the following workflow from a
migration from A to B:

1) CMS integration sets migration-status appropriately on A ("outgoing")
and B ("incoming").

2) While migration-status is "outgoing", A will not try to reclaim a
port claimed by a different chassis.

3) While migration-status is "incoming", B will not grab the port
binding unless and until it sees a GARP.

4) When the migration is completed successfully, A's port gets destroyed
and B's migration-status gets removed, so at this point B claims it in
case it didn't see a GARP.  If the migration fails, B's port gets
destroyed and A's migration-status gets removed, so at this point A
reclaims it if necessary.

As I wrote up the above, though, I found myself thinking about how hard
it is to update all the hypervisor integrations and make them correct in
the corner cases.  I know that's tough from experience.  (Also, this
would be the first change to the integration spec in years.)  I don't
know whether updating the CMSes is harder or easier.  But maybe you are
planning to do it yourself (at least for Neutron?) in which case I think
that the CMS-based approach is probably the right one.