Skip navigation
All Places > Alfresco Premier Services > 2017 > May
2017

Troubleshooting application performance can be tricky, especially when the application in question has dependencies on other systems.  Alfresco's Content Services (ACS) platform is exactly that type of application.  ACS relies on a relational database, an index, a content store, and potentially several other supporting applications to provide a broad set of services related to content management.  What do you do when you have a call that is slow to respond, or a page that seems to take forever to load?  After several years of helping customers diagnose and fix these kinds of issues, my advice is to start at the bottom and work your way up (mostly).

 

When faced with a performance issue in Alfresco Content Services, the first step is to identify exactly what calls are responding slowly.  If you are using CMIS or the ACS REST API, this is simple enough, you'll know which call is running slowly and exactly how you are calling it.  It's your code making the call, after all.  If you are using an ADF application, Alfresco Share or a custom UI it can become a bit more involved.  Identifying the exact call is straightforward and you can approach this the same way you would approach it for any web application.  I usually use Chrome's built in dev tools for this purpose.  Take as an example the screenshot below, which shows some of the requests captured when loading the Share Document Library on a test site:

 

 

In this panel we can see the individual XHR requests that Share uses to populate the document library view.  This is the first place to look if we have a page loading slowly.  Is it the document list that is taking too long to load?  Is it the tag list?  Is it a custom component?  Once we know exactly what call is responding slowly, we can begin to get to the root of our performance issue.

 

When you start troubleshooting ACS performance, it pays to start at the bottom and work your way up.  This usually means starting at the JVM.  Take a look at your JVM stats with your profiler of choice.  Do you see excessive CPU utilization?  How often is garbage collection running?  Is the system constantly running at or close to your maximum memory allocation?  Is there enough system memory available to the operating system to support the amount that has been allocated to the JVM without swapping?  It is difficult to provide "one size fits all" guidance for JVM tuning as the requirements will vary based on the the type of workload Alfresco is handling. Luis Cabaceira has provided some excellent guidance on this subject in his blog.  I highly recommend his series of articles on performance, tuning and scale.  When troubleshooting ACS performance, start by ensuring you see healthy JVM behavior across all of your application tiers.  Avoid the temptation to just throw more memory at the problem, as this can sometimes make things worse.

 

Unpacking Search Performance

 

Assuming that the JVM behavior looks normal, the next step is to look at the other components on which ACS depends.  There are three main subsystems that ACS uses to read / write information:  The database, the index, and the content store.  Before we can start troubleshooting, we need to know which one(s) are being used in the use case that is experiencing a performance problem.  In order to do this, you will need to know a bit about how search is configured on your Alfresco installation.  Depending on the version you have installed, Alfresco Content Services (5.2+) / Alfresco One (4.x / 5.0.x / 5.1.x) supports multiple search subsystem options and configurations.  It could be Solr 1.4, Solr 4, Solr 6 or your queries could be going directly against the database.  If you are on on Alfresco 4.x, your system could also be configured to use the legacy Lucene index subsystem, but that is out of scope for this guide.  The easiest way to find out which index subsystem is in use is to look at the admin console.  Here's a screenshot from my test 5.2 installation that shows the options:

 

 

Now that we know for sure which search subsystem is configured, we need to know a little bit more about search configuration.  Alfresco Content Services supports something known as Transactional Metadata Queries.  This feature was added to supplement Solr for certain use cases.  The way Solr and ACS are integrated is "eventually consistent".  That is to say that content added to the repository is not indexed in-transaction.  Instead, Solr queries the repository for change sets, and then indexes those changes.  This makes the whole system more scalable and performant when compared with the older Lucene implementation, especially where large documents are concerned.  The drawback to this is that content is not immediately queryable when it is added.  Transactional Metadata Queries work around this by using the metadata in the database to perform certain types of queries, allowing for immediate results.  When troubleshooting performance, it is important to know exactly what type of query is executed, and whether it runs against the database or the index.  Transactional metadata queries can be independently turned on or off to various degrees for both Alfresco Full Text Search and CMIS.  To find out how your system is configured, we can again rely on the ACS admin console:

 

The full scope of Transactional Metadata Queries is too broad for this guide, but everything you need to know is in the Alfresco documentation on the topic.  Armed with knowledge of our search subsystem and Transactional Metadata Query configurations, we can get down to the business of troubleshooting our queries.  Given a particular CMIS or AFTS query, how do we know if it is being executed against the DB or the index?  If this is a component running a query you wrote, then you can look at the Transactional Metadata Query documentation to see if Alfresco would try to run it against the database.  If you are troubleshooting a query baked into the product, or you want to see for sure how your own query is being executed, turn on debug logging for class DbOrIndexSwitchingQueryLanguage.  This will tell you for sure exactly where the query in question is being run.

 

The Database

 

If you suspect that the cause may be a slow DB query, there are several ways to investigate.  Every DB platform that Alfresco supports for production use has tools to identify slow queries.  That's a good place to start, but sometimes it isn't possible to do because you, as a developer or ACS admin, don't have the right access to the DB to use those tools.  If that's the case you can contact your DBA or you can look at it from the application server side.  To get the app server view of your query performance you again have a few options.  You could use a JDBC proxy driver like Log4JDBC or JDBCSpy that can output query timing to the logs.  It seems that Log4JDBC has seen more recent development so that might be the better choice if you go the proxy driver route.  Another option is to attach a profiler. JProfiler and YourKit both support probing JDBC performance.  YourKit is what we use most often at Alfresco, and here's a small example of what it can show us about our database connections:

 

 

With this view it is straightforward to see what queries are taking the most time.  We can also profile DB connection open / close and several other database related bits that may be of interest.  The ACS schema is battle tested and performant at this point in the product lifecycle, but it is fairly common to see slow queries as a result of a database configuration problem, an overloaded shared database server, poor network connection to the database, out of date index statistics or a number of other causes.  If you see a slow query show up during your analysis, you should first check the database server configuration and tuning.  If you suspect a poorly optimized query (which is rare) contact Alfresco support.

 

One other common source of database related performance woes is the database connection pool.  Alfresco recommends setting the maximum database connection pool size on each cluster node to the number of concurrent application server worker threads + 75 to cover overhead from scheduled jobs, etc.  If you have Tomcat configured to allow for 200 worker threads (200 concurrent HTTP connections) then you'll need to set the database pool maximum size to 275.  Note that this may also require you to increase the limit on the database side as well.  If you have a lot of requests waiting on a connection from the pool that is not going to do good things for performance.

 

The Index

 

The other place where a query can run is against the ACS index.  As stated earlier, the index may be one of several types and versions, depending on exactly what version of Alfresco Content Services / Alfresco One you are using and how it is configured.  The good news is that you can get total query execution time the same way no matter which version of Solr your Alfresco installation is using.  To see the exact query that is being run, how long it takes to execute and how many results are being returned, just turn on debug logging for class SolrQueryHttpClient.  This will output debug information to the log that will tell you exactly what queries are being executed and how long each execution takes.  Note that this is the query time as returned in the result set, and should just be the Solr execution time without including the round trip time to / from the server.  This is an important distinction, especially where large result sets are concerned.  If the connection between ACS and the search service is slow then a query may complete very quickly but the results could take a while to arrive back at the application server.  In this case the index performance may be just fine, but the network performance is the bottleneck.

 

If the queries are running slowly, there are several things to check.  Good Solr performance depends heavily on good underlying disk I/O performance.  Alfresco has some specific recommendations for minimum disk performance.  A fast connection between the index and repository tiers is essential, so make sure that any load balancers or other network hardware that sit between the tiers are providing good performance.  Another thing to check is the Solr cache configuration.  Alfresco's search subsystem provides a number of caches that improve search performance at the cost of additional memory.  Make sure your system is sized appropriately using the guidance Alfresco provides on index memory requirements and cache sizes.  Alfresco's index services and Solr can show you detailed cache statistics that you can use to better understand critical performance factors like hit rates, evictions, warm up times, etc as shown in this screenshot from Alfresco 5.2 with Solr 4:

 

 

In the case of a large repository, it might also help to take a deeper look at how sharding is configured including the number of shards and hosts and whether or not the configuration is appropriate.  For example, if you are sharding by ACL and most of your documents have the same permissions, then it's possible the shards are a bit unbalanced and the majority of requests are hitting a single shard.  For this case, sharding by DBID (which ensures an even distribution) might be more appropriate and yield better performance.

 

It is also possible that a slow running query against the index might need some tuning itself.  The queries that Alfresco uses are well optimized, but if you are developing an extension and want to time your own queries I recommend looking at the Alfresco Javascript Console.  This is one of the best community developed extensions out there, and it can show you execution time for a chunk of Alfresco server-side Javascript.  If all that Javascript does is execute a query, you can get a good idea of your query performance and tweak / tune it accordingly.

 

The Content Store

 

Of all of the subsystems used for storing data in Alfresco, the content store is the one that has (typically) the least impact on overall system performance.  The content store is only used when reading / writing content streams.  This may be when a new document is uploaded, when a document is previewed, or when Solr requests a text version of a document for indexing.  Poor content store performance can show itself as long upload times under load, long preview times, or long delays when Solr is indexing content for the first time.  Troubleshooting this means looking at disk utilization or (if the content store resides on a remote filesystem) network utilization.

 

Into The Code

 

A full discussion of profiling running code would turn this from an article into a book, but any good systems person should know how to hook up a profiler or APM tool and look for long running calls.  Many Alfresco customers use things like Appdynamics or New Relic to do just that.  Splunk is also a common choice, as is the open source ELK stack.  All of these suites can provide a lot more than just what a profiler can do and can save your team a ton of time and money.  Alfresco's support team also finds JStack thread dumps useful.  If we see a lot of blocked threads that can help narrow down the source of a problem.  Regardless of the tools you choose, setting up good monitoring can help you find emerging performance problems before they become user problems.

 

Conclusion

 

This guide is nowhere near comprehensive, but it does cover off some of the most common causes for Alfresco performance issues we have seen in the support world.  In the future we'll do a deeper dive into the index, repository and index tier caching, content transformer selection and execution, permission checking, governance services and other advanced topics in performance and scalability.

If you are serious about Alfresco in your IT infrastructure you most certainly have a "User Acceptance Tests" environment. If you don't... you really should consider setting one up (don't make Common mistakes)!

When you initially set up your environments everything is quiet, users are not using the system yet and you just don't care about data freshness. However, soon after you go live, the production system will start being fed with data. Basically it is this data, stored in Alfresco Content Service, that make your system or application valuable.

When the moment comes to upgrade or deploy a new customization/application, you will obviously test it first on your UAT (or pre-production or test, whatever you call it) environment. Yes you will!

When you do so, having a UAT environment that doesn't have an up-to-date set of data can make the tests pointless or more difficult to interpret. This is also true if you plan to do kick off performance tests. If the tests are done on a data set that is only one third the size of the production data, it's pointless.

Basically that's why you need to refresh your UAT data with production data every now and then or at least when you know it's going to be needed.

The scope of this document is not to provide you with a step by step guide on how to refresh your repository. Alfresco Content Services being a platform, this highly depends on what you actually do with your repository, the kind of customization you are using and 3rd parties apps that could be link to Alfresco anyhow. This document will mainly highlight things you should thoroughly check when refreshing your dataset.

 

Prerequisites

Some thing must be validated before going further:

  • Production & UAT environments should have the same architecture (same number of servers, same components installed, and so on and so forth...)
  • Production & UAT environments should have the same sizing (while you can forget this for functional tests only, this is a true requirement for performance tests of course)
  • Production & UAT environments should be hosted on different, clearly separated networks (mostly if using cluster)

 

What is it about?

In order to refresh your UAT repository with data from production you will simply go through the normal restore process of an Alfresco repository.

Here I consider backup strategy is not a topic... If you don't have proper backups already set up, that's where you should start: Performing a hot backup | Alfresco Documentation

      

The required assets to restore are:

  • Alfresco's database
  • Filesystem repository
  • Indexes

Before you start your fresh UAT

There a re a number of things you should check before starting your refreshed environment.

 

Reconfigure cluster

Although the recommendation is to isolate environments it is better to specify different cluster configuration for both environments. That will allow for a less confusing administration and log analysis and also prevent information leaking from one network to another in case isolation is not that good.

When starting a refreshed UAT cluster, you should always make sure you are setting a cluster password or a cluster name that is different from production cluster. Doing so you prevent yourself from cluster communication to happen between nodes that are actually not part of the same cluster:

alfresco.hazelcast.password=someotherpassword

Alfresco 4.2 onward

   
alfresco.cluster.name=uatCluster

Alfresco pre-4.2

   

On the Share side, it is possible to change more parameters in order to isolate clusters but we will still apply the same logic for the sake of simplicity. Here you would change the Hazelcast password in the custom-slingshot-application-context.xml configuration file inside the {web-extension} directory.

<hz:topic id="topic" instance-ref="webframework.cluster.slingshot" name="slingshot-topic"/>
   <hz:hazelcast id="webframework.cluster.slingshot">
     <hz:config>
       <hz:group name="slingshot" password="notthesamepsecret"/>
       <hz:network port="5801" port-auto-increment="true">
         <hz:join>
           <hz:multicast enabled="true" multicast-group="224.2.2.5" multicast-port="54327"/>
           <hz:tcp-ip enabled="false">
             <hz:members></hz:members>
           </hz:tcp-ip>
        </hz:join>
...

Email notifications

It's very unlikely that your UAT environment needs to send emails or notifications to real users. Your production system is already sending digest and other emails to users and you don't want them to get confused because they received similar emails from other systems. So you have to make sure emails are either:

  • Sent to a black hole destination
  • Sent to some other place where users can't see them

If you really don't care about emails generated by Alfresco, then you can choose the "black hole" option. There are many different ways to do that, among which configuring your local MTA to send all emails to a single local user and optionally link his mailbox to /dev/null (with postfix you could use canonical_maps directive and mbox storage). Another way  to do that would be to use the java DevNull SMTP server. It is very simple to use as it is just a jar file you can launch

java -jar -console -p 10025 DevNull.jar

On the other hand, as part of your users tests, you may be interested in knowing and analyzing what emails generated by your Alfresco instance. In this case you could still use previous options. Both are indeed able to store emails instead of swallowing them, postfix by not linking the mbox storage to /dev/null, and DevNull SMTP server by using the "-s /some/path/" option. However storing emails on the filesystem is not really handy if you want to check their content and the way it renders (for instance).

If emails is a matter of interest then you can use other products like mailhog or mailtrap.io. Both offer an SMTP server that stores emails for you instead of sending it to the outside world, but they also offer a neat way to visualize them, just like a webmail would do.

 

Mailhog WebUI

Mailtrap.io is a service that also offer advanced features like POP3 (so you can see emails in "real-life" clients), SPAM score testing, content analysis and for subscription based users, collaboration features.

 

Whatever option is yours, and based on the chosen configuration you'll have to switch the following properties for you UAT Alfresco nodes:

mail.host
mail.port
mail.smtp.auth
mail.smtps.auth
mail.username
mail.password
mail.smtp.starttls.enable
mail.protocol

Jobs & FSTR synchronisation

Alfresco allows an administrator to schedule jobs and setup replication to another remote repository.

Scheduled jobs are carried over from production environments to UAT if you cloned environments or proceeded to a backup/restore of production data. However you most certainly don't want the same job to run twice from two different environments.

Defining whether or not a job should run in UAT depends on a lot of factor and is very much related to what the job actually does. Here we cannot give a list of precise actions to take in order to avoid problem. It is the administrator's call to review Scheduled jobs and decide whether or not he should/can disable them.

Jobs to review can be found a in spring bean definitions file like ${extensionRoot}/extension/my-scheduler-context.xml.

One easy way to disable jobs can be to set a cron expression to a far future (or past)

 

<property name="cronExpression">
    <value>0 50 0 * * 1970</value>
</property>

The repository can also hold synchronization jobs. Mainly those jobs that are used in File Transfer Receiver setups.

In that case the administrator surely have to disable such jobs (or at least reconfigure them) as you do not want UAT frozen data to be synced to a remote location where live production data is expected!

Disabling this kind of jobs is pretty simple. You can do it using the Share web UI by going to the "Repository \ Data Dictionary \ Transfers \ Default Target group \ Group1" and edit properties of the "Group1" folder. In the property editor form, just untick the "Activated" checkbox.

 

Repository ID & Cloud synchronization

Alfresco repository IDs must be universally unique. And of course if you clone environments,  you create duplicated repository IDs. One of the well known issue that can be triggered by duplicate IDs is for hybrid cloud setups where synchronization is enabled between the production environment and the Cloud my.alfresco.com. If your UAT servers connect to the cloud with the production's ID you can be sure synchronization will fail at some point and could even trigger data loss on your production system. You really want to avoid that from happening!

One very easy way to prevent this from happening is to simply disable clouds sync on the UAT environment.

system.serverMode=UAT

Any string other than "PRODUCTION" can be used here. Also be aware that this property can only be set in alfresco-global.properties file

 

Also if you are using APIs that need to specify the repository ID in order to request Alfresco (like old CMIS endpoint used to) then such API calls may stop working in UAT as the repo ID is now the one from production (in the case the calls where initially written with a previous ID, and it is not gathered previously - which would be a poor  approach in most cases).

Starting with Alfresco 4.2, CMIS now returns the string "-default-" as a  repository ID, for all new API endpoints (e.g. atompub /alfresco/api/-default-/public/cmis/versions/1.1/atom), while previous endpoint (e.g. atompub /alfresco/cmisatom) returns a Universally Unique IDentifier.

 

If you think you need to change the repository ID, please contact Alfresco support. It makes the procedure heavier (a re-index is expected) and should be thoroughly planned.

 

Carrying unwanted configuration

If you stick to the best practices for production, you probably try to have all your configuration in properties or xml files in the {extensionRoot} directory.

But on the other hand, you may sometimes use the great facilities offered by Alfresco enterprise, such you as JMX interface or the admin console. You must then remember those tools will persist configuration information to the database. This means that, when restoring a database from one environment to another one, you may end up starting an Alfresco instance with wrong parameters. 

Here is a quite handy SQL query you can use *before* starting your new Alfresco UAT. It will report all the properties that are stored in the database. You can then make sure none of them is harmful or points to a production system.

SELECT APSVk.string_value AS property, APSVv.string_value AS value
  FROM alf_prop_link APL
    JOIN alf_prop_value APVv ON APL.value_prop_id=APVv.id
    JOIN alf_prop_value APVk ON APL.key_prop_id=APVk.id
    JOIN alf_prop_string_value APSVk ON APVk.long_value=APSVk.id
    JOIN alf_prop_string_value APSVv ON APVv.long_value=APSVv.id
WHERE APL.key_prop_id <> APL.value_prop_id
    AND APL.root_prop_id IN (SELECT prop1_id FROM alf_prop_unique_ctx);

                 property                 |                value
------------------------------------------+--------------------------------------
alfresco.port                            | 8084

Do not try to delete those entries from the database straight away this is likely to brake things!

   

If any property is conflicting with the new environment, it should be removed.

Do it wisely! An administrator should ALWAYS prefer using the "Revert" operations available through the JMX interface!

The "revert()" method is available using jconsole, in the Mbean tab, within the appropriate section:

revert property using jconsole

"revert()" may revert more properties than just the one you target. If unsure how to get rid of a single property, please contact alfresco support.

   

Other typical properties to change:

When moving from environment the properties bellow are likely to be different in UAT environment (that may not be the case for you or you may have others). As said earlier they should be set in the ${extensionroot} folder to a value that is specific to UAT (and they should not be present in database):

ldap.authentication.java.naming.provider.url
ldap.synchronization.java.naming.security.principal
ldap.synchronization.java.naming.security.credentials
solr.host
solr.port