[ovs-dev] locks for clustered OVSDB

Ben Pfaff blp at ovn.org
Mon Oct 9 17:45:23 UTC 2017


On Mon, Oct 09, 2017 at 01:13:56PM -0400, Russell Bryant wrote:
> On Mon, Sep 25, 2017 at 2:29 PM, Ben Pfaff <blp at ovn.org> wrote:
> > On Mon, Sep 25, 2017 at 11:09:49AM -0700, Han Zhou wrote:
> >> On Mon, Sep 25, 2017 at 2:36 AM, Miguel Angel Ajo Pelayo <
> >> majopela at redhat.com> wrote:
> >> >
> >> > I believe Lucas Alvares could give you valuable feedback on this as
> >> > he was planning to use this as a mechanism for synchronization on
> >> > the networking-ovn side (if I didn't get it wrong).
> >> >
> >> > I believe he's back by October.
> >> >
> >> > Best regards.
> >> > Miguel Ángel.
> >> >
> >> > On Fri, Sep 22, 2017 at 6:58 PM, Ben Pfaff <blp at ovn.org> wrote:
> >> >
> >> > > We've had a couple of brief discussions during the OVN meeting about
> >> > > locks in OVSDB.  As I understand it, a few services use OVSDB locks to
> >> > > avoid duplicating work.  The question is whether and how to extend OVSDB
> >> > > locks to a distributed context.
> >> > >
> >> > > First, I think it's worth reviewing how OVSDB locks work, filling in
> >> > > some of the implications that aren't covered by RFC 7047.  OVSDB locks
> >> > > are server-level (not database-level) objects that can be owned by at
> >> > > most one client at a time.  Clients can obtain them either through a
> >> > > "lock" operation, in which case they get queued to obtain the lock when
> >> > > it's no longer owned by anyone else, or through a "steal" operation that
> >> > > always succeeds immediately, kicking out whoever (if anyone) previously
> >> > > owned the lock.  A client loses a lock whenever it releases it with an
> >> > > "unlock" operation or whenever its connection to the server drops.  The
> >> > > server notifies a client whenever it acquires a lock or whenever it is
> >> > > stolen by another client.
> >> > >
> >> > > This scheme works perfectly for one particular scenario: where the
> >> > > resource protected by the lock is an OVSDB database (or part of one) on
> >> > > the same server as the lock.  This is because OVSDB transactions include
> >> > > an "assert" operation that names a lock and aborts the transaction if
> >> > > the client does not hold the lock.  Since the server is both the lock
> >> > > manager and the implementer of the transaction, it can always make the
> >> > > correct decision.  This scenario could be extended to distributed locks
> >> > > with the same guarantee.
> >> > >
> >> > > Another scenario that could work acceptably with distributed OVSDB locks
> >> > > is one where the lock guards against duplicated work.  For example,
> >> > > suppose a couple of ovn-northd instances both try to grab a lock, with
> >> > > only the winner actually running, to avoid having both of them spend a
> >> > > lot of CPU time recomputing the southbound flow table.  A distributed
> >> > > version of OVSDB locks would probably work fine in practice for this,
> >> > > although occasionally due to network propagation delays, "steal"
> >> > > operations, or different ideas between client and server of when a
> >> > > session has dropped, both ovn-northd might think they have the lock.
> >> > > (If, however, they combined this with "assert" when they actually
> >> > > committed their changes to the southbound database, then they would
> >> > > never actually interfere with each other in database commits.)
> >> > >
> >> > > A scenario that would not work acceptably with distributed OVSDB locks,
> >> > > without a change to the model, is where the lock ensures correctness,
> >> > > that is, if two clients both think they have the lock then bad things
> >> > > happen.  I believe that this requires clients to understand a concept of
> >> > > leases, which OVSDB doesn't currently have.  The "steal" operation is
> >> > > also problematic in this model since it would require canceling a
> >> > > lease.  (This scenario also does not work acceptably with single-server
> >> > > OVSDB locks.)
> >> > >
> >> > > I'd appreciate anyone's thoughts on the topic.
> >> > >
> >> > > This webpage is good reading:
> >> > >
> >> https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Ben.
> >>
> >> Hi Ben,
> >>
> >> If I understand correctly, you are saying that the clustering wouldn't
> >> introduce any new restriction to the locking mechanism, comparing with the
> >> current single node implementation. Both new and old approach support
> >> avoiding redundant work, but not for correctness (unless "assert" or some
> >> other "fence" is used). Is this correct?
> >
> > It's accurate that clustering would not technically introduce new
> > restrictions.  It will increase race windows, especially over Unix
> > sockets, so anyone who is currently (incorrectly) relying on OVSDB
> > locking for correctness will probably start seeing failures that they
> > did not see before.  I'd be pleased to hear that no one is doing this.
> 
> You discussed the ovn-northd use case in your original post (thanks!).
> 
> The existing Neutron integration use case should be fine.  In that
> case, it's not committing any transactions.  The lock is only used to
> ensure that only one server is processing logical switch port "up"
> state.  If more than one thinks it has a lock, the worst that can
> happen is we send the same port event through OpenStack more than
> once.  That's mostly harmless, aside from a log message.
> 
> Miguel mentioned that it might be used for an additional use case that
> Lucas is working on, but OVSDB locks are not used there.

OK, thanks.

My current patch series do not implement distributed locks, but now I
can start designing the feature.


More information about the dev mailing list