C.A.R = Capacity of an alfresco repository server
If you think about it A ECM repository is very similar to a database in regards to the type of events that they manage and execute, specially when we think about transactions being considered as a basic repository operation such as create, browse, download, update and delete. To know the capacity an Alfresco repository server we can introduce the concept of “transactions per second” where a “transaction” is considered to be a basic repository operation (create, browse, download, update and delete) .
The 'C.A.R.' methodology
Aim is to define common standard figure that can empirically define the capacity of a repository instance. The C.A.R. methodology is based on the following sentence and its represented in Transactions per second.
The capacity of an Alfresco repository server is determined by the number transactions it can handle in a single second before degrading the expected performance.
To create a formula that reflects that sentence we need to introduce the 2 important figures :
EC = The expected concurrency represented in number of users.
TT = user think time represented in seconds, means that in average for a period of time ('The think time') the system will receive requests from N different users.
ERT = Expected response times object.
Decreasing ERT generally means the necessity to increase the capacity of the alfresco repository server.
This a complex object represented is key value pairs with the types of response times being considered, the weight of each type and the correspondent value in seconds. It takes expected user behavior as arguments.
When we decrease our ERT arguments values we normally will need to scale (up/out) our Alfresco and database Servers.
Introducing those 3 attributes (EC,TT and ERT) we can say that the C.A.R.of an Alfresco repository server is :
Number of transactions that that the server can handle in one second under the expected concurrency(EC) with the agreed Think Time (TT) ensuring the expected response times(ERT) .
Shape shifting - a flexible formula approach
The C.A.R. formula is not deterministic as it cannot reflect the variables on different use cases. Its dynamic and specific to each use case and it is built on a system of attributes, values, weights and affected areas.
To have a definition of the formula that really represents the capacity of the alfresco servers on your infra-structure you need to consider one or more ERT(expected response times) objects, representing the use case expected response times on use case specific operations. Those objects act as increasers and removers(-) on the server throughput.
The formula can shift and may be adapted with more ERT Objects that define the use case for fine tuned predictions.
The Heartbeat of the Alfresco server
The easiest way is to enable the audit trail and parse that tail computing for the transactions that are occurring on the repository. With the new reporting and analysis features coming up on Alfresco One 5.0 it will be even simpler to get access to this information.
Some initial lab tests
We've executed some simple lab tests configured with one server running Alfresco and another running the database. and observed that a single server.
Alfresco Server Details
- Processor: 64-bit Intel Xeon 3.3Ghz (Quad-Core)
- Memory: 8GB RAM
- JVM: 64-bit Sun Java 7 (JDK 1.7)
- Operating System: 64-bit Red Hat Linux
- ERT = The sample ERT values shown on this post
- Think Time = 30 seconds.
- EC = 150 users
The C.A.R. of the server was between 10-15 TPS during usage peaks. Through JVM tuning along with network and database optimizations, this number can rise over 30 TPS.
I think this is a very reliable form of definition on the capacity on an Alfresco repository that can be used as a support for a sizing exercise. What do you think ? Opinions are welcome and highly appreciated, use the comments area on the blog to add yours !
'OpenSource - Together we are Stronger ! One Love',