AnsweredAssumed Answered

Query Scalability

Question asked by brianshowers on May 31, 2012
Latest reply on Jun 6, 2012 by jbarrez
I'm currently in evaluation mode to decide if Activiti is the right BPM solution for our project.  From an API perspective, everything looks great so far.  The APIs are intuitive, and the Spring integration is simple.  However, I've noticed a problem with respect to scalability that is concerning.  I'm hoping that I just have something configured poorly.  I've performed the test with two different database servers and I've received similar results with both.

Tomcat, MySQL 5.5
Tomcat, PostgreSQL 9.1

Test Setup
1. Create 100,000 process instances for a process definition that contains a user task
2. Repeatedly query for the first 10 unassigned tasks ordered by priority and due date
3. Create 900,000 additional process instances (1MM total)
4. Repeatedly query (50 times) for the first 10 unassigned tasks ordered by priority and due date

The problem is that the task queries appear to load the full task list into memory before limiting the set down to the first 10.  This manifests itself in two ways.  First, the query time jumps from ~1 sec to ~11 sec when the number of rows increases.  Second, memory is severly impacted.  With 100,000 tasks, it take ~120MB of heap space to find the 10 tasks.  While this might be acceptable for a system with a single user, it won't scale at all to a system that has hundreds of concurrent users.  When the number of tasks grows to 1MM, it takes ~800MB of ram to complete a query.  This is unacceptable even for a system with a single user.

In my tests, the application server and the database server are on the same machine.  I suspect that if I moved to a more production equivalent setup, the query times would get far worse due to the network latency of moving so much data across the wire.

For reference, I'm querying for the tasks with the following code:
taskService.createTaskQuery().taskUnnassigned().orderByTaskPriority().desc().orderByDueDate().asc().listPage(0, num);

I've done some tracing through the activiti library, and it looks like it is using the Mybatis RowBounds construct to manage pagination at the JDBC driver level rather than in the SQL via LIMIT/OFFSET.  Is there something specific that I need to add into the DataSource configuration that will make these queries more performant?  Or is there someway to configure the underlying Mybatis library so that it will translate a RowBounds object into actual SQL fragments?