AnsweredAssumed Answered

Deadlock in cluster during AcquireJobsCmd

Question asked by tomasz.lewandowski.lit on Dec 18, 2013
Latest reply on Dec 19, 2013 by jbarrez
I have a cluster of 2 Activiti 5.14 instances, sharing the same PostgreSQL 9.1 database. I have a process with timers.
From time to time I experience a deadlock during AcquireJobsCmd execution:


2013-12-17 12:00:01,083 Thread-96            ERROR [                   impl.interceptor.CommandContext] [                         close] [140] : Error while closing command context: org.apache.ibatis.exceptions.PersistenceException:
### Error updating database.  Cause: org.postgresql.util.PSQLException: BŁĄD: deadlock detected
  Szczegóły: Proces 17382 oczekuje na ShareLock na transakcja 1398139; zablokowany przez 21060.
Proces 21060 oczekuje na ShareLock na transakcja 1398140; zablokowany przez 17382.
  Wskazówka: Przejrzyj dziennik serwera by znaleźć szczegóły zapytania.
### The error may involve org.activiti.engine.impl.persistence.entity.JobEntity.updateTimer-Inline
### The error occurred while setting parameters
### SQL: update ACT_RU_JOB      SET REV_ =  ?,        LOCK_EXP_TIME_ = ?,        LOCK_OWNER_ = ?,        RETRIES_ = ?,        EXCEPTION_STACK_ID_ = ?,        EXCEPTION_MSG_ = ?,        DUEDATE_ = ?      where ID_= ?       and REV_ = ?
### Cause: org.postgresql.util.PSQLException: BŁĄD: wykryto zakleszczenie
  Szczegóły: Proces 17382 oczekuje na ShareLock na transakcja 1398139; zablokowany przez 21060.
Proces 21060 oczekuje na ShareLock na transakcja 1398140; zablokowany przez 17382.
  Wskazówka: Przejrzyj dziennik serwera by znaleźć szczegóły zapytania.
   at org.apache.ibatis.exceptions.ExceptionFactory.wrapException(ExceptionFactory.java:23)
   at org.apache.ibatis.session.defaults.DefaultSqlSession.update(DefaultSqlSession.java:147)
   at org.activiti.engine.impl.db.DbSqlSession.flushUpdates(DbSqlSession.java:560)
   at org.activiti.engine.impl.db.DbSqlSession.flush(DbSqlSession.java:444)
   at org.activiti.engine.impl.interceptor.CommandContext.flushSessions(CommandContext.java:170)
   at org.activiti.engine.impl.interceptor.CommandContext.close(CommandContext.java:117)
   at org.activiti.engine.impl.interceptor.CommandContextInterceptor.execute(CommandContextInterceptor.java:66)
   at org.activiti.spring.SpringTransactionInterceptor$1.doInTransaction(SpringTransactionInterceptor.java:47)
   at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:130)
   at org.activiti.spring.SpringTransactionInterceptor.execute(SpringTransactionInterceptor.java:45)
   at org.activiti.engine.impl.interceptor.LogInterceptor.execute(LogInterceptor.java:31)
   at org.activiti.engine.impl.cfg.CommandExecutorImpl.execute(CommandExecutorImpl.java:40)
   at org.activiti.engine.impl.cfg.CommandExecutorImpl.execute(CommandExecutorImpl.java:35)
   at org.activiti.engine.impl.jobexecutor.AcquireJobsRunnable.run(AcquireJobsRunnable.java:59)
   at java.lang.Thread.run(Thread.java:662)
Caused by: org.postgresql.util.PSQLException: BŁĄD: wykryto zakleszczenie
  Szczegóły: Proces 17382 oczekuje na ShareLock na transakcja 1398139; zablokowany przez 21060.
Proces 21060 oczekuje na ShareLock na transakcja 1398140; zablokowany przez 17382.
  Wskazówka: Przejrzyj dziennik serwera by znaleźć szczegóły zapytania.
   at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2103)
   at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1836)
   at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
   at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:512)
   at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
   at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:381)
   at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.postgresql.ds.jdbc23.AbstractJdbc23PooledConnection$StatementHandler.invoke(AbstractJdbc23PooledConnection.java:455)
   at com.sun.proxy.$Proxy57.execute(Unknown Source)
   at org.jboss.jca.adapters.jdbc.CachedPreparedStatement.execute(CachedPreparedStatement.java:297)
   at org.jboss.jca.adapters.jdbc.WrappedPreparedStatement.execute(WrappedPreparedStatement.java:404)
   at org.apache.ibatis.executor.statement.PreparedStatementHandler.update(PreparedStatementHandler.java:41)
   at org.apache.ibatis.executor.statement.RoutingStatementHandler.update(RoutingStatementHandler.java:66)
   at org.apache.ibatis.executor.SimpleExecutor.doUpdate(SimpleExecutor.java:45)
   at org.apache.ibatis.executor.BaseExecutor.update(BaseExecutor.java:108)
   at org.apache.ibatis.executor.CachingExecutor.update(CachingExecutor.java:75)
   at org.apache.ibatis.session.defaults.DefaultSqlSession.update(DefaultSqlSession.java:145)
   … 13 more


From PostgreSQL log I can see statements  that caused this deadlock:

        Proces 17382: update ACT_RU_JOB
             SET REV_ =  $1,
               LOCK_EXP_TIME_ = $2,
               LOCK_OWNER_ = $3,
               RETRIES_ = $4,
               EXCEPTION_STACK_ID_ = $5,
               EXCEPTION_MSG_ = $6,
               DUEDATE_ = $7
            where ID_= $8
              and REV_ = $9
        Proces 21060: update ACT_RU_JOB
             SET REV_ =  $1,
               LOCK_EXP_TIME_ = $2,
               LOCK_OWNER_ = $3,
               RETRIES_ = $4,
               EXCEPTION_STACK_ID_ = $5,
               EXCEPTION_MSG_ = $6,
               DUEDATE_ = $7
            where ID_= $8
              and REV_ = $9

(though I cant see the history of these transactions)

I suspect that there is a flaw in AcquireJobsCmd class. It selects a set of jobs ready to run and then updates one by one these jobs. But the select doesnt specify "order by" so the order of rows and the order of updates is not guarantied, so i.e. if we have 2 jobs: A and B, then first instance of activiti may get rows in order A,B and the second instance B,A. First instance updates (locks) row A, second instance updates row B. Then first instance tries to update row A and the second instance tries to update row B. And there is a deadlock.

Is this a bug or I am missing something?

Outcomes