Clustered Locking

Document created by resplin Employee on Jun 6, 2015
Version 1Show Document
  • View in full screen mode

Obsolete Pages{{Obsolete}}

The official documentation is at: http://docs.alfresco.com



High AvailabilityDraft Pages

DRAFT


Why is clustered locking required?


Currently there is no way to easily address some issues with clustered deployments:


  • accidental simultaneous bootstrap (install or update); and
  • identical configuration of the same quartz jobs, for example, two LDAP imports running at the same time.



Both these can be addressed, respectively, by:


  • only starting one server to bootstrap or apply patches; and
  • secondly to use the cluster quartz options or configure schedules differently on each member of the cluster.



The quartz support for clustering could be used but it has several draw backs:


  • it does not support connection pooling;
  • it has at least one known Spring issue (see AR-1054)
  • it makes configuration hard and uses its own sub config file; and
  • means job definitions become persistent and therefore require more active management (e.g. job removal)

An alternative to clustered locking is to share job definitions via a Quartz JobStore implemented against Hibernate.
However, the Quartz JobStore interface is a large API to implement and understand in comparison to this approach.


Overall Design


A lock is held or shared by a lock owner, identified by a GUID. The GUID will be assigned to a thread local if not already present. Application locks should not use this mechanism and lock as appropriate in the JVM.

The time at which a lock is acquired is recorded and the lock has a fixed time-to-live. This could be indefinite (TTL = 0) but must be used with caution.

If an operation is not completed within the time-to-live period then it is possible but not guaranteed that the lock is still held. Another thread could have taken this lock; the first thread would then not be able to complete the work to be done while the lock was held.

Each lock has a status. Requests for locks are also persisted for dead lock detection, to order locks, and for more complex lock sequencing.

The status can be one of:


  • SHARED
  • EXCLUSIVE
  • SHARED_REQUEST
  • EXCLUSIVE_REQUEST
  • FREE

The request status can be one of:


  • SHARED
  • EXCLUSIVE



A lock specifies a resource to lock. This is of the form 'foo.bar.woof'. If a request is made for this lock then an existing lock, or existing higher level locks (e.g. 'foo.bar' or 'foo'), could block this request. Similarly, a request for 'foo' would have to wait for all subordinate locks to clear. The request could prevent subordinate locks from being taken.


Service


boolean getLock(String resource, long TTL)
get the named lock or return immediately.

boolean getLock(String resource, long TTL, long lockWaitTime)
get the named lock and wait for the lockWaitTime if already held.

releaseLock(String resource)

commitLock(String resource)
release the lock in the same TX

rollbackLock(String resource)
release the lock in another TX

Implementation


Lock use must follow a pattern where exceptions are caught and the lock released.
If not others will have to wait until the TTL expires. Util and methods interceptor support to enforce this.
This should be the only code to use the ClusteredLockService.


Initialisation


Make sure a row exists for each lock that can be acquired


Lock Acquisition


In a new transaction


  1. Select by key
  2. Check current lock status
    1. Fail if locked
    2. select for update (in order foo, foo.bar, foo.bar.woof): may block on another lock set but the order should prevent deadlock.
    3. recheck status: may block or cause other locks to be blocked.
      1. fail
      2. update
        1. commit

May pause and retry (end TX to release any locks)

Return lock status


Commit


Attempt to release the lock in the same TX as the work that needs committing.
If the TX fails then release the lock in another transaction.


Rollback


Release the lock in a new transaction.


Release


  1. select
  2. check status (is the lock still held)
  3. select for update as before
  4. recheck
  5. update - set to FREE status 
  6. commit

Deadlock detection


Requires request tracking.


Schema


LockTable


  • resource
  • crc (for resource)
  • timestamp acquired
  • time-to-live
  • status

RequestTable
- as above


Method Interceptors


Three general interceptors


  • acquire an application lock when executing all methods
  • acquire a service lock when executing all methods
  • acquire a method lock when executing all methods

Three Regexp patterns. As above, but the method name must match a regular expression to require the specified locking.




Util Support


Util methods to support executing code in a given lock.


V1


  • No lock hierarchy
  • Simple deadlock detection - one thread can hold one lock
  • Add machine ip address
  • Report blocking on Bootstrap lock during bootstrap
  • No need to register intent

Attachments

    Outcomes