[ovs-dev] locks for clustered OVSDB

Ben Pfaff blp at ovn.org
Fri Sep 22 16:58:19 UTC 2017


We've had a couple of brief discussions during the OVN meeting about
locks in OVSDB.  As I understand it, a few services use OVSDB locks to
avoid duplicating work.  The question is whether and how to extend OVSDB
locks to a distributed context.

First, I think it's worth reviewing how OVSDB locks work, filling in
some of the implications that aren't covered by RFC 7047.  OVSDB locks
are server-level (not database-level) objects that can be owned by at
most one client at a time.  Clients can obtain them either through a
"lock" operation, in which case they get queued to obtain the lock when
it's no longer owned by anyone else, or through a "steal" operation that
always succeeds immediately, kicking out whoever (if anyone) previously
owned the lock.  A client loses a lock whenever it releases it with an
"unlock" operation or whenever its connection to the server drops.  The
server notifies a client whenever it acquires a lock or whenever it is
stolen by another client.

This scheme works perfectly for one particular scenario: where the
resource protected by the lock is an OVSDB database (or part of one) on
the same server as the lock.  This is because OVSDB transactions include
an "assert" operation that names a lock and aborts the transaction if
the client does not hold the lock.  Since the server is both the lock
manager and the implementer of the transaction, it can always make the
correct decision.  This scenario could be extended to distributed locks
with the same guarantee.

Another scenario that could work acceptably with distributed OVSDB locks
is one where the lock guards against duplicated work.  For example,
suppose a couple of ovn-northd instances both try to grab a lock, with
only the winner actually running, to avoid having both of them spend a
lot of CPU time recomputing the southbound flow table.  A distributed
version of OVSDB locks would probably work fine in practice for this,
although occasionally due to network propagation delays, "steal"
operations, or different ideas between client and server of when a
session has dropped, both ovn-northd might think they have the lock.
(If, however, they combined this with "assert" when they actually
committed their changes to the southbound database, then they would
never actually interfere with each other in database commits.)

A scenario that would not work acceptably with distributed OVSDB locks,
without a change to the model, is where the lock ensures correctness,
that is, if two clients both think they have the lock then bad things
happen.  I believe that this requires clients to understand a concept of
leases, which OVSDB doesn't currently have.  The "steal" operation is
also problematic in this model since it would require canceling a
lease.  (This scenario also does not work acceptably with single-server
OVSDB locks.)

I'd appreciate anyone's thoughts on the topic.

This webpage is good reading:
https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html

Thanks,

Ben.


More information about the dev mailing list