AnsweredAssumed Answered

Workflow execution is looping infinitely at a wait timer

Question asked by karanbir1994 on Nov 22, 2018

We have a workflow that executes a task to check for a record in DB when data is not found it waits for a minute and execute the task. Below is the excerpt from our workflow.

<intermediateCatchEvent id="BHTimer" name="Wait 1 Minute">   <incoming>BHNotActive</incoming>   <outgoing>IsTickOpen</outgoing>   <timerEventDefinition>     <timeDuration xsi:type="tFormalExpression">PT1M</timeDuration>   </timerEventDefinition> </intermediateCatchEvent>

We noticed that this task keeps looping infinitely even when the DB record is added. Strangely, the frequency at which the task is executed changes from 1 min to ~200ms causing millions of records accumulating in ACT_HI_ACTINST table. Below are the table data stats of one of many such processes in our system.

In a few seconds, the event has been executed thousands of times and continues forever creating millions of process entries for the same job under database tables “ACT_HI_ACTINST” & “ACT_RU_EXECUTION”.

Running below query returns millions of records:
1. select * from ACT_HI_ACTINST where PROC_INST_ID_ = 'f33c539a-dfe2-11e8-9d30-0050569941b2'; 2. select * from ACT_RU_EXECUTION where PROC_INST_ID_ = 'f33c539a-dfe2-11e8-9d30-0050569941b2';

Following are the statistics of activiti tables when we got the performance issues.

Table nameNumber of records

Table nameNumber of records




Some of these processes become orphan (processes have not ended when close was issued). Another thing we noticed is the exception message in act_ru_job table for such processes - "JobEntity [id=2786e249-dff6-11e8-a9c8-005056990bf2] was updated by another transaction concurrently" message from exception message column.

We have a purge job to remove data related to completed processes (processes that have end_time_ populated in the act_hi_procinst table, but these processes don't get deleted as they never end looping infinitely).

We have examined our workflow and we don't see any parallel execution paths, so we are not sure why this error could be occurring. One thing to note is that this is deployed on 2 node cluster environment, could it be possible that both nodes are picking up the process for execution at the same time.

Our questions are:

1. How does activiti make process execution cluster safe? is there any cluster specific config?

2. The workflows that we generated using a designer are flawed? Please have a look at the attached workflow XML, diagram, and advice.

This is causing serious performance issues in our production environment, any help in resolving this is much appreciated.

Workflows are generated using BPMN activiti designer.

Activiti version : 5.17.0

Database: Oracle

Web server: Tomcat