Skip navigation
All Places > Alfresco Premier Services
1 2 Previous Next

Alfresco Premier Services

20 posts



Project Objective

In this blog we cover the deployment of ACS (Alfresco Content Services) 6.1 Enterprise on AWS. The deployment consists of two mayor steps:

  1. Building AWS AMIs (Amazon Machine Images) containing a base installation of ACS 6.1 Enterprise.
  2. Building AWS infrastructure (VPC, ELB, RDS, etc) and deploying ACS 6.1 Enterprise to it.


Please make sure your Alfresco license subscription entitles you to install and run ACS 6.1 Enterprise and Alfresco Search Services with Insight Engine.


The final ACS architecture looks like this:



The tools and code used in this blog to deploy ACS 6.1Enterprise are not officially supported by Alfresco. They are used as a POC to show you how you can use OpenSource tools to deploy and configure resources and applications to AWS in an automated way.


The software used to build and deploy ACS 6.1 Enterprise is available in a public repository in GitHub.


Software requirements

The following software is required to create and deploy resources in AWS:

  • Packer - used to automate the creation of AMIs.
  • Terraform - used to create and update infrastructure resources.
  •  Ansible - Ansible is an IT automation tool used to deploy and configure systems.
  • AWS CLI - The AWS Command Line Interface (CLI) is a unified tool to manage your AWS services.


Make sure these tools have been installed on the computer you are using to build and deploy ACS 6.1Enterprise.


Authenticating to AWS

Both Packer and Terraform need to authenticate to AWS in order to create resources. Since we are creating a large number of infrastructure resources, the user we authenticate with to AWS, needs to have administrator access.

There are multiple ways to configure AWS credentials for Packer and Terraform, the following links will show you how to do it:


Note: Whatever method you use make sure you keep your AWS credentials private and secure at all times.

Authenticating to Nexus

Nexus is a repository manager used by Alfresco to publish software artifacts. Packer will connect to Nexus repository to download the necessary software to install ACS 6.1 Enterprise.

Alfresco Enterprise customers have Nexus credentials as part of their license subscription. Please refer to your CRM if you don't know or have your Nexus credentials.


Building AWS AMIs

The first step to deploy ACS 6.1 Enterprise is to build two types of AMIs:

  1. Repository AMI - containing Alfresco Repository, Share and ADW (Alfresco Digital Workspace)
  2. Search Services AMI - containing Alfresco Repository and Search Services with Insight Engine.


Repository AMI

For this process we use Packer and Ansible. We first export the "nexus_user" and "nexus_password" environment variables containing credentials to access the Nexus repository. These are stored in the ~/.nexus-cfg file contains the following.

export nexus_user=xxxxxxxxx
export nexus_password=xxxxxxxxx


Note that the .nexus-cfg file is in the user home folder, keep this file and its contents private and secured at all times.


If you want to include custom amps add them to the amps and amps_share folder and they will be deployed to the AMI.

For custom jar files add them to the modules/platform and modules/share folders.


If you want to deploy ADW (Alfresco Digital Workspace) place the digital-workspace.war file in the acs-61-files/downloaded folder.


We can now execute packer by calling the script.

cd acs-61-repo-aws-packer

This shell script will load the nexus environment variables and call packer build using a template file for the provisioning of the AMI and a variables file containing deployment specific information such as your default VPC Id, the AWS region, etc.

Make sure you change the value of the vpc_id variable to use your default VPC Id.


Search Services AMI

As on the previous section, we use Packer and Ansible to create a Search Services AMI.


Make sure you change the value of the vpc_id variable to your default VPC Id before running the script.

cd acs-61-repo-aws-packer


As the script runs you can see what is is doing during its execution...

▶ ./
amazon-ebs output will be in this color.

==> amazon-ebs: Prevalidating AMI Name: acs-61-repo-1557828971
amazon-ebs: Found Image ID: ami-00846a67
==> amazon-ebs: Creating temporary keypair: packer_5cda956b-bd62-1d09-cef2-639152741025
==> amazon-ebs: Creating temporary security group for this instance: packer_5cda956b-345b-2321-afd5-40b1b06a6bc1
==> amazon-ebs: Authorizing access to port 22 from in the temporary security group...
==> amazon-ebs: Launching a source AWS instance...
==> amazon-ebs: Adding tags to source instance
amazon-ebs: Adding tag: "Name": "Packer Builder"
amazon-ebs: Instance ID: i-0f80505eb56dccbb7
==> amazon-ebs: Waiting for instance (i-0f80505eb56dccbb7) to become ready...


On completion the script will output the AMI id of the newly created AMI. Keep track of both AMI Ids, as we will need to use them in the Terraform script next.

Build 'amazon-ebs' finished.

==> Builds finished. The artifacts of successful builds are:
--> amazon-ebs: AMIs were created:
eu-west-2: ami-08fd6196500dbcb01


Building the AWS Infrastructure and Deploying ACS 6.1 Enterprise

Now that we have created both the Repository and the Search Services AMIs we can start building the AWS infrastructure and deploy ACS 6.1 Enterprise.

In the acs-61-aws-terraform folder we have the terraform.tfvars containing configuration specific for the AWS and ACS deployments.


Some of the variables that will need to be updated are:

  • resource-prefix - this is used to name all resources created with some initials to identify the resources belonging to this deployment.
  • aws-region
  • aws-availability-zones
  • vpc-cidr
  • autoscaling-group-key-name
  • s3-bucket-location


and of course we need to set the auto scaling image id with the newly generated AMIs

  • autoscaling-repo-group-image-id
  • autoscaling-solr-group-image-id


Once the configuration has been set we are ready to start building the solution. We first need initialize terraform with the "terraform init" command:


▶ terraform init
Initializing modules...
- module.vpc
- module.rds
- module.alfresco-repo
- module.alfresco-solr
- module.bastion
- module.alb
- module.internal-nlb
- module.activemq

Initializing provider plugins...

The following providers do not have any version constraints in configuration,
so the latest version was installed.

To prevent automatic upgrades to new major versions that may contain breaking
changes, it is recommended to add version = "..." constraints to the
corresponding provider blocks in configuration, with the constraint strings
suggested below.

* version = "~> 2.10"

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.


We can now issue the apply command to start the build. Upon completion (it will take around 15 minutes) we will get notification of the URLs available to connect to Alfresco.

Apply complete! Resources: 51 added, 0 changed, 0 destroyed.


Alfresco Digital Workspace =
Alfresco Share =
Alfresco Solr =
Alfresco Zeppelin =
RDS Endpoint =
VPC ID = vpc-006f0c6354656e96d5c


To destroy the resources issue a "terraform destroy" command.


Terraform will perform the following actions:

Plan: 0 to add, 0 to change, 51 to destroy.

Do you really want to destroy all resources?
Terraform will destroy all your managed infrastructure, as shown above.
There is no undo. Only 'yes' will be accepted to confirm.


To do list

There are a couple of things to add to this project:

  • CI/CD scripts - I have already implemented this and will publish it on a different blog.
  • On the Search Services instances we should download a backup of the Solr indexes when starting a new instance instead of building the indexes from scratch.

Being able to edit a file concurrently by multiple users is a need we’re coming across more and more when dicussing with customers on the field.
At the time of writing, the out-of-the-box solution to deliver this kind of feature within Alfresco is to use the GoogleDocs module.
This module allows for content stored in Alfresco to be collaboratively edited using Google’s online application suite (text editor, spreasheet editor, presentations) and saved back in the repository.
However some customer may not want to use Google services for different reasons (e.g, cost or data sensitivity), in which case there are far less options.
If you’re concerned about your data being sent to a public cloud and prefer having them securely stored on-premise in Alfresco instead, there may be a solution to help you.

LibreOffice OnLine (LOOL)

Alfresco uses LibreOffice (and formerly OpenOffice) for a very long time. This is one of the component providing our out-of-the-box transformation service (either through OODirect or jodconverter).
After delivering an opensource productivity Suite on the desktop, the Libreoffice team has started working on a similar feature set with a more SaaS approach: LibreOffice OnLine (LOOL).
Don’t get me wrong here: LOOL is not a SaaS solution you have to subscribe to, and that’s the intersting thing with it. LOOL is a service you can install on-premise in order to provide edition tools for office documents. And of course, as that’s what’s our main interest here, it provide collaborative edition capabilities. LOOL itself already provides a solution to having collaborative edition, while keeping your data in compliance with your SSI company policy!
Here I’ll detail how to integrate LOOL with your prefered on-premise content management system, thus bringing the collaborative edition feature inside Alfresco!

Alfresco integration

LibreOffice OnLine is actually a WOPI client and needs to talk to a WOPI server.

If you want to know more about the WOPi protocol you can check its definition here

The WOPI server role will be endorsed by Alfresco itself using 2 modules (one for the repo and one for Share). Those AMPs have been created by Magenta. All credits goes to them, here, I’m just giving guidance on how to install and configure it for Alfresco Content Services:

In terms of network flows, the following diagram shows what conenction are used

network flows Alfresco Libreofice Online

In this document we’ll use Alfresco Content Service 5.2.4.


LOOL installation

Fortunately it is now very simple to install LOOL (using the CODE distribution). The simple commands bellow should work for a Debian based Linux distribution.
Alongside this document we’ll use Debian 9.

$ echo 'deb ./' | sudo tee /etc/apt/sources.list.d/code.list
$ sudo apt-key adv --keyserver --recv-keys 0C54D189F4BA284D
$ sudo apt update
$ sudo apt install loolwsd code-brand

If you’re just testing you’ll probably be interested in using the docker image available at docker hub

LOOL configuration

By default the office online suite is configured to use SSL but the certificates are not provided. We then have to create those (or disable SSL if not targeting production).

$ sudo mkdir /etc/loolwsd/ssl

Copy to this newly created folder:

  • the certificate private key: /etc/loolwsd/ssl/loolwsd.key (make sure it’s only readable to the user runnning LOOL: lool)
  • the certificate itself: /etc/loolwsd/ssl/loolwsd.crt
  • the public CA certificate: /etc/loolwsd/ssl/cacert.pem

If you want ot use selfsigned certificates, this is a bit more tricky but do-able. Start with the commands bellow to generate the self-signed certificate:

$ sudo openssl genrsa -out /etc/loolwsd/ssl/loolwsd.key
$ sudo chown lool /etc/loolwsd/ssl/loolwsd.key
$ sudo chmod 400 /etc/loolwsd/ssl/loolwsd.key
$ cp /etc/ssl/openssl.cnf /tmp/loolwsd_ssl.cnf
$ cat >> /tmp/loolwsd_ssl.cnf <<EOT
[ san ]
subjectAltName = @alt_names
IP.1 =
$ sudo openssl req -new -x509 -sha256 -nodes -key /etc/loolwsd/ssl/loolwsd.key -days 9999 -out /etc/loolwsd/ssl/loolwsd.crt -config /tmp/loolwsd_ssl.cnf -extensions san

IP.1 must match the IP address where the libreoffice online service is available. Change it to match your needs

Additionnally, if you’re using self signed certificate, it is required the Alfresco JVM trusts this certificate .

$ keytool -importcert -alias lool -file /etc/loolwsd/ssl/loolwsd.crt -keystore $JAVA_HOME/lib/security/cacerts -storepass changeit -storetype JKS

Of course, keystore path, type and passwords must match your environment

The client browser must also trust this certificate!

If LOOL traffic is wrapped in SSL, you’ll also need to have Alfresco protected by SSL. This is because most browser today will prevent pages with mixed content (http & https) from being displayed.
It means you have to configure Alfresco for SSL. Please refer to the official documentation in order to do that:
Again if you use a selfsigned certificate (or a certificate from a private pki ) for Alfresco, it is required to let LOOL trust that certificate. The way to do it depends on the distribution the service is running on. On Debian-like systems you can do:

$ keytool -exportcert -alias -keystore alf_data/keystore/ssl.keystore -storetype JCEKS -storepass kT9X6oe68t | openssl x509 -inform DER -outform PEM -in - | sudo tee /usr/local/ca-certificates/alfresco.crt
$ sudo update-ca-certificates

The example command above uses the default Alfresco Keystore, path and password which you should have changed. Make sure to use the correct ones for your environment.

Now open the service configuration file /etc/loolwsd/loolwsd.xml and edit the ssl section as shown bellow:

<ssl desc="SSL settings">
 <enable type="bool" desc="Controls whether SSL encryption is enable (do not disable for production deployment). If default is false, must first be compiled with SSL support to enable." default="true">true</enable>
 <termination desc="Connection via proxy where loolwsd acts as working via https, but actually uses http." type="bool" default="true">false</termination>
 <cert_file_path desc="Path to the cert file" relative="false">/etc/loolwsd/ssl/loolwsd.crt</cert_file_path>
 <key_file_path desc="Path to the key file" relative="false">/etc/loolwsd/ssl/loolwsd.key</key_file_path>
 <ca_file_path desc="Path to the ca file" relative="false">/etc/loolwsd/ca-chain.cert.pem</ca_file_path>
 <cipher_list desc="List of OpenSSL ciphers to accept" default="ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH"></cipher_list>
 <hpkp desc="Enable HTTP Public key pinning" enable="false" report_only="false">
  <max_age desc="HPKP's max-age directive - time in seconds browser should remember the pins" enable="true">1000</max_age>
<report_uri desc="HPKP's report-uri directive - pin validation failure are reported at this URL" enable="false"></report_uri>
 <pins desc="Base64 encoded SPKI fingerprints of keys to be pinned">

Also edit the net section to match your needs:

<net desc="Network settings">
 <proto type="string" default="all" desc="Protocol to use IPv4, IPv6 or all for both">all</proto>
 <listen type="string" default="any" desc="Listen address that loolwsd binds to. Can be 'any' or 'loopback'.">any</listen>
 <service_root type="path" default="" desc="Prefix all the pages, websockets, etc. with this path."></service_root>
 <post_allow desc="Allow/deny client IP address for POST(REST)." allow="true">
 <host desc="The IPv4 private 192.168 block as plain IPv4 dotted decimal addresses.">192\.168\.[0-9]{1,3}\.[0-9]{1,3}</host>
 <host desc="Ditto, but as IPv4-mapped IPv6 addresses">::ffff:192\.168\.[0-9]{1,3}\.[0-9]{1,3}</host>
 <host desc="The IPv4 loopback (localhost) address.">127\.0\.0\.1</host>
 <host desc="Ditto, but as IPv4-mapped IPv6 address">::ffff:127\.0\.0\.1</host>
 <host desc="The IPv6 loopback (localhost) address.">::1</host>
 <frame_ancestors desc="Specify who is allowed to embed the LO Online iframe (loolwsd and WOPI host are always allowed). Separate multiple hosts by space."></frame_ancestors>

Pay attention to the post_allow element, its value has to match the IP of your clients (all the browser which may request to edit files). The default configuration is to allow local access and access from a network.

The frame_ancestors element should be left alone as the wopi host (alfresco) is normally part of it by default. However there may be cases where it is required (e.g if running on a non-default network port)

Now make sure to have you alfresco host declared as an authorized wopi host in the storage section:

<storage desc="Backend storage">
    <filesystem allow="false" />
    <wopi desc="Allow/deny wopi storage. Mutually exclusive with webdav." allow="true">
       <host desc="Allow from Alfresco" allow="true"></host>
        <max_file_size desc="Maximum document size in bytes to load. 0 for unlimited." type="uint">0</max_file_size>                                                                      
    <webdav desc="Allow/deny webdav storage. Mutually exclusive with wopi." allow="false">
        <host desc="Hostname to allow" allow="false">localhost</host>

the wopi host specified must match you alfresco hostname as used by web browsers

On the Alfresco side add the following properties to the file:


Where loolhost is the server name where you installed LOOL, and alfrescohost the local server where alfresco is running
It is possible to install both on the same server of course.

Only use FQDN names (matching the certificates CN if using SSL), do not use localhost.

Use appropriate ports

Applying AMPs

There are 2 AMPs available. In order to turn Alfresco into a WOPI host you’ll need the repo AMP, and to add the necessary Share pages and buttons to allow UI integration the share AMP is needed.
We’ll first need to get the sources and build them:

$ git clone
$ cd alfresco-repo-libreoffice-online-module
$ vim pom.xml
$ mvn package

When editing the pom.xml make sure to:

  • set alfresco.platform.version & alfresco.share.version to your version in the repo pom.xml
  • set to the surf version matching you alfresco version in the share pom.xml
  • set maven.alfresco.edition to enterprise

Copy the resulting .amp files located in target/ to the amps and amps_share folders of your alfresco installation and run:

$ ./bin/

You can now restart the services:

$ sudo systemctl restart loolwsd
$ ./ restart tomcat

You can now test editing Office documents simultaneously with differents users and see how convenient LibreOffice OnLine makes it.

Below examples of spreadsheet and presentations concurrent edition by "Administrator" and "Alex" users:


Each user can see what the others are doing and who's editing.


calc collab

impress collabAs you can see in the screenshot above, the share module needs some tweaking if you're not using english locale. But that should really just be a matter of adding the right message bundle. to the share AMP



Posted by fmalagrino Employee Aug 20, 2018

Welcome to WCMQS Part II! If you didn't read part I so far, please do so for some more background information.


WCMQS is based on Spring Surf and not Aikau as few people thought.


What does it mean that it is based on Spring Surf?


Spring Surf needs for rendering a new page a page.xml and page.ftl a template.xml and template.ftl and a webscript(s).


When you install WCMQS the alfresco data dictionary is extended by a new content model names ws:webSiteModel

the model is designed to be as generic as possible

The model have new types :

  • ws:website type
  • ws:section type
  • ws:webroot type
  • ws:webassetCollectionFolder type
  • ws:webassetCollection type
  • ws:indexPage type
  • ws:article type
  • ws:image type
  • ws:visitorFeedback type
  • ws:visitorFeedbackSummary type
  • ws: publishqueueentry type
  • ws:webasset aspect


Let's see what are these new types in details:


The ws:website type:

Is derived from the cm:folder type and represent a website.

This type has properties that apply to an entire website, such as host name, port and context of the web application that delivers the website.

If you read part I, you should understand yet where we modified these properties, in the Quick Start data in Alfresco Share. The folder at Documents/Alfresco Quick Start/Quick Start Editorial / Live is of this type. You can see the Host name. port and context in the folder's metadata.



Click "Edit Properties":


The ws:section type:

Is as well derived from the cm:folder type and represents a section of a website. 

The website is modeled as a tree of sections. 

A section defines an element in the website navigation and can contain child sections and web assets, such as the section's landing page, articles, collections of articles and images.

In the Web Quick Start data in Alfresco Share, in the Documents/Alfresco Quick Start/ Quick Start Editorial/root/blog  folder is of this type.


The ws:webroot type:

Is derived from the ws:section type and represents the root of a website's tree of sections.

This type extends the section type so therefore a web a webroot folder *is* also a section.

In the Web Quick Start data in Alfresco Share, in the Documents/Alfresco Quick Start/ Quick Start Editorial/root  folder is of this type.



Edit Properties:


The ws:webassetCollectionFolder type:

Is derived from the cm:folder type and is used to hold asset collections.

Each section folder has a webassetCollectionFolder below it named 'collections' in which all of that section's asset collections are placed.

In the Web Quick Start data in Alfresco Share, in the Documents/Alfresco Quick Start/ Quick Start Editorial/root/blog/collections folder is of this type.


The ws:webassetCollection type:

Is derived from the cm:folder type and represents an asset collection.

In the Web Quick Start data in Alfresco Share, in the Documents/Alfresco Quick Start/ Quick Start Editorial/root/blog/collections/latest.articles folder is of this type.



Edit Properties:


The ws:indexPage type:

is derived from the cm:content type and represents an index page of a section (also know as the section's landing page).

In the Web Quick Start data in Alfresco Share, in the Documents/Alfresco Quick Start/ Quick Start Editorial/root/blog/index.html asset is of this type.



More --> Edit Properties: --> all properties :

The ws:article type:

Is derived from the cm:content type and represents any piece of text-based content such as a news article or a blog post.

The article type defines few associations that allow an article to be linked with related articles and a couple of images.

In the Web Quick Start data in Alfresco Share, in the Documents/Alfresco Quick Start/ Quick Start Editorial/root/blog/blog1.html file is of this type.



More -->Edit properties --> All Properties:


The ws:image type:

Is derived from the cm:content type and is used for general image assets.

In the Web Quick Start data in Alfresco Share, in the Documents/Alfresco Quick Start/ Quick Start Editorial/root/publications/Alfresco-Office.jpg file is of this type.




The ws:visitorFeedback type:

Is derived from the dt:dataListItem type and represents feedback that has been submitted by visitors to a website.

The Intention is to be sufficiently generic to be useful for a number of different types of feedback including comments, reviews, ratings, and questions.

Each website has a Share data list created for it into which items of visitor feedback are placed.

When first Installed, Quick Start does not have any visitor feedback.

Submitting a comment to a blog post or a 'contact us' request from the website will create a node of this type in the repository

To add a datalist to your website you need to customize the site and drag and drop the datalist to current site pages.




After you clicked "OK" you should see a new option in the Menu called Data Lists,



If you use a blog or contact us



The ws:visitorFeedbackSummary type:

Is derived from the cm:object type and is used to record summary information about visitor feedback received for a given asset such as the number of comments received and the average rating was given.


The ws:publishqueueenty type:

Is derived from the cm:object type and is used to record nodes that have been queued up for publishing.


In share there are two workflow.

  • Review and Publish Section Structure 
  • Review and Publish

The first enable the review  and publish the structure of a section of the website. To publish particular section, initialize a workflow on that section's index.html. It will publish that section folder collection folder and as well subsections folders.


The second instead enables you to review and publish web content. Too publish the content you start the Review and Publish workflow.


The ws:webasset type:

Is derived from the cm:titled aspect and is used to mark any piece of content that is addressable through a website.

Among other things, a web asset (a node with the ws:webasset aspect) has two multi-valued NodeRef properties (ws:parentSection and ws:ancestorSections) that contain the identifier(s) of the section(s) in which  the asset is placed and its ancestor sections.

When an asset is created in, moved to, or removed from a section, these properties are updated to reflect the asset's new location.

This is done to make certain kinds of common queries very fast.


Other useful  properties :

Order index:

To configure the order in which navigation looks display. 

Lowest value of order index appeared to the right of the Home link and the remaining links are added in ascending numerical order.

NB : this is a mandatory aspect used on the folder called ws:ordered



Exclude From Navigation :

It allows you to show or hide that section from the menu.

Hi everybody,


To make a label bold or with a different color in APS you need to create a new class and add the new CSS property inside it.


The Process in my example will have only one human task with four check-boxes.



From the four check-boxes in the example two check-boxes should be bold and two normal but with blue as the color.


Let's create first the four check-boxes :


After this click the tab style and create a Classname


After you create the Classname you can create the style.


In the Style section let's create two classes: One to be bold and one to have the rest of the checkboxes blue but not bold.


Now that you have created this the next step is to add the class in the fields that we need.


In my example, test and test 3 will be bold and test2 and test4 blue.


Edit your first checkbox, click style and add the class to make it bold (this is the class from the style definition).




Now let's do the same for the second checkbox and let's make it blue and not bold.



Now edit the third and fourth checkbox similar to number one and two to apply the style on the check-boxes.


Now it's time to run the app and test the configuration. If all was done correctly you should have as result two check-boxes bold and two check-boxes blue as shown in the next screenshot:



NB: This work only on Activiti-app at the moment Workspace doesn't read any inline CSS or JS if you want use Workspace you'll need to create a custom control to do validation or hook into an event for that specific component or treat it as a custom stencil


If you want to download the app is available here.

Having a broad and efficient monitoring system is key! It allows for corrective maintenance to happen asap and, if properly setup, it allows as well for pro-active maintenance.

Miguel Rodriguez already touched that topic in a very good post addressing it with ELK

Here I'll focus on a different solution and scope.


Within Alfresco Content Service, one of the component which is often the poor cousin of monitoring is Solr. In best case scenarios solr http interface is monitored with its Heap usage. Yet, it does need a lot of care in order to make sure it works efficiently. Enhancing monitoring of this component will make you detect problems sooner and fix it before it impacts users. It will also help you drawing a picture of how your search service evolves: Is it becoming slower than before? Is the number of segments increasing to a critical point? Are the index getting bigger for some reasons? And last but not least, having a deeper monitoring helps you in capacity planning tasks!


Here I'll explain why I wrote this little check for Solr and how to use it.


Choosing the monitoring system

As explained earlier there are already some materials regarding monitoring Alfresco with ELK, so there was no use re-inventing the wheel. However some of the things monitored here are not in the ELK setup described above, so I could have enrich the ELK project...

But at the root of this plugin is a customer request. That customer wanted to know when Solr is missing some content while indexing. He also wanted to receive alerts for that kind of events (which is not the primary role of an ELK stack). And more importantly, that customer was using an opensource monitoring system called Centreon. This solution is compatible with (and originally derived from) Nagios plugins system. That plugin system is a kind of de-facto standard with some specifications (Development Guidelines · Nagios Plugins) and many monitoring solutions supports Nagios plugins. As a consequence it made sense to work on such a plugin as I hope it can benefit to some others people.

Nagios plugins offer a wide range of probes (jmx, http, disk space, heap space, ...), some of which can be used out-of-the-box to monitor Alfresco Content Service and even provide basic checks for Solr. I won't focuse on those plugins as the goal here is to bring more monitoring capabilities than what already exists.

Those plugin provide two features:

  • Alerts: event triggered based on defined thresholds
  • Performance data: metrics used to generate graphs


In this blog post I'll be showing how it all works with Centreon, as it supports both features, but any other system support Nagios checks should work in a similar way.


Plugin Installation


I assume the monitoring system is already up and running and won't present how to set it up (remember it should work with any system supporting Nagios checks).




The system must have python 2.7 or higher (should work with python3) and appropriate python libraries.
This plugin uses a library for Nagios plugins called nagiosplugin and urllib3 which both need to be installed.


On Debian-like systems the following should work:


$ sudo apt install python-nagiosplugin python-urllib3


Use python3-urllib3 & python3-nagiosplugin for systems using by default python3.

Otherwise simply install it with pip


$ sudo pip install nagiosplugin
$ sudo pip install urllib3


Plugin deployment


The plugin is available here on github. You can clone the repo or just download the python file Then simply copy the file  to the nagios plugin directory, usually something like /usr/lib/nagios/plugins and set the execution rights.


$ sudo cp /usr/lib/nagios/plugins && sudo chmod +x /usr/lib/nagios/plugins/


Setting up Monitoring

First of all Id' like to explain a little deeper what the plugin does so we better understand what kind of metrics we're tracking here.


Quick description

The monitoring plugin uses information gathered from the Solr status page and the Solr summary page. Both are relatively lightweight and querying them on a regular basis should not impact performance to a noticeable point.


Among the information we gather and monitor some are documented in existing Alfresco documentation. You can find out more reading the Unindexed Solr Transactions | Alfresco Documentation 

In addition to that, the plugin returns the total size of the index core, its number of documents and its number of segments.

Any of those metrics can trigger an alert and all of them are producing performance data.


Some others are not that well documented but are still important.

In the summary page Solr exposes data regarding caches. We know caches are very important for Solr to perform well. Undersize your caches and your search will be slow like an old dog. Oversize them and you may end up consuming way more expensive memory than you actually need.

Cache sizes are used in the overall memory requirement calculation for Solr. See Alfresco documentation bellow:

Calculate the memory needed for Solr nodes | Alfresco Documentation

The plugin will report data on cache usage such as:

  • number of lookups (incremental counter rested periodically)
  • cache size (number of item in the cache)
  • evictions (accumulated evictions from the cache since server startup)
  • hitratio (accumulated ratio of hits vs misses)

The plugin will return warning or critical alerts based on thresholds passed as arguments. Those thresholds are applied on the hitratio only. It means you can track and graph those metrics but only hitratio can trigger alerts (eg an email or text message depending on the monitoring system configuration.)

There is number of caches in cluded in the Solr summary pages which are:

  • /alfrescoPathCache: a cache used to speedup path queries (Alfresco specific cahe)
  • /alfrescoAuthorityCache: a cache used to compute  permissions on search results (Alfresco specific)
  • /queryResultCache: a generic Solr cache to store ordered sets of document IDs
  • /filterCache: a generic Solr cache to store unordered sets of document IDs


Handler data is also provided by Solr summary page. Handlers are HTTP endpoints Solr uses to handle requests... hence the name. Data provided for the handlers and used by the plugin are:

  • errors: number of queries which caused an error (500 http status code)
  • timeouts: number of queries which could not ne fullfilled before timeout
  • requests: overall number of requests
  • avgTimePerRequest:  Average time to fullfill a request
  • 75thPcRequestTime: maximum time it takes to reply to the 75% of the fastest requests
  • 99thPcRequestTime: maximum time it takes to reply to the 99% of the fastest requests

Request time data are typically useful in order to track how the search service evolves and anticipate when scaling  is needed or if something is running abnormally (e.g. slower disk access, or network latency). avgTimePerRequest is used to trigger alerts while percentile request times are only used for performance data.

A handler returning an error count higher than zero will always trigger a critical alert, and one returning a timeout will always trigger a warning alert.

While they are not really a problem (from an operation point of view), syntaxically incorrect searches will increment the error counter and thus generate critical alerts. It can be useful to have such alerts to track improper use of the API or suspicious behaviours. However having alerts for this kind of events may not be appropriate in some environments so this can be changed using a command line option (`--relaxed`), in which case only time based alerts will be triggered using the threshold provided as parameters.

There are three handlers exposed by Solr in the summary page, all three can be monitored:

  • /alfresco
  • /afts
  • /cmis


Plugin configuration

Now that we understand what the plugin can monitor, and it is deployed on the Centreon (Nagios or similar) server, let's take a look of what we need to do to set monitoring up.

In Centreon most configuration is done through a web interface, so first of all we'll login to the web UI as admin. In Nagios, same configuration applies but editing configuration files has to be done using a good old text editor (all configuration files should be located in /etc/nagios).


Check commands

With Nagios like system it all begins with a new command to add to the system. When deploying the plugin we've copied it to the Nagios plugin directory (by default /usr/lib/nagios/plugins). In Centreon most configuration is done through a web interface and the plugin directory is referred to as $CENTREONPLUGINS$. On Nagios similar configuration applies but editing configuration files has to be done through a regular text editor (all configuration files should be located in /etc/nagios).

So in the "Configuration\Commands\Checks" menu click the "Add" button to add a new command and set it as shown bellow:



the command can be explained as follow:

  • --relaxed: do not trigger alerts if handlers reports errors or timeouts
  • --host $HOSTADDRESS$: specify the Solr hostname (will be expanded from the Solr server configuration)
  • --port $_HOSTSOLRPORT$: specify the Solr port (will be expanded from the service template)
  • --scheme $_HOSTSOLRSCHEME$: specify the Solr http scheme (will be expanded from the service template)
  • --admin $_HOSTSOLRADMINURL$: specify the Solr admin URL (will be expanded from the service template)
  • --monitor $_SERVICESOLRMONITOR$: specify the kind of element we want to monitor (handlers or caches or index values)
  • --item "$_SERVICESOLRMONITORITEM$": specify the name of the item we monitor
  • -w $ARG1$: the warning threshold triggering the warning alerts (value depends on what we want to monitor
  • -c $ARG2$: the critical threshold triggering the critical alerts (value depends on what we want to monitor
  • $ARG3$: the name of the Solr core to monitor (will be expanded from the service template.


Service templates

Each item we want monitor will be defined as a service attached to a host. In the end, we will need to attach one service per solr host, per core, per cache and/or handler, per index metrics we want to monitor.

This can lead to numerous services and can be long to configure. For this reason it is not desirable to trigger alters for every single metric we have access to. For example triggering alerts for "Alfresco Nodes in Index" doesn't make much sense while "Alfresco Error Nodes in Index" does.

Nagios provides inheritance and template features, this is also very handy to avoid duplicating configuration for each service. To be concrete, it means we can define a service using a template which inherits values from each other. For instance, in order to define a service to monitor the "/queryresultCache" cache, we can define a general template for caches (e.g. setting $SOLRMONITOR$ , solr core name together with the warning and critical thresholds), and a more specific template for the "/queryresultCache" (setting the $SOLRMONITORITEM$) which inherit from the general one.


Below I explain how to setup the Solr "/queryResultCache". For a broader monitoring you should also set more service and service templates for the needed items. See the help message of the plugin by using (or see documentation on the git repo):


$ python --help


So in the admin, let's go to the "Configuration\Services\Templates" section and create a first template called "alfresco-solr-caches":


  • Warning threshold is set so that a hitratio lower than 40% will trigger a warning alert
  • Critical threshold is set so that a hitratio lower than 20% will trigger a critical alert


Then we create a second, more specific, template called "alfresc-solr-queryResultCache" which inherits from the general one:


Here we just set the specific cache we want to monitor. Also note the "Template" field which reference the general template. Due to the template inheritance we can leave other fields blank and inherit values.


Host Template

Just like we created service templates, we will create a host template. This is to avoid manually attaching the service templates we created to each individual Solr server we want to monitor.

In the Centreon admin navigate to "Configuration\Host\Template" and create a new template called "alfresco-solr" (for instance). Then go in the "Relations" tab of this host and add the service templates we defined earlier and are relevant for you:


Click Save.


At this point we can define the host and monitoring will be ready. Either you can create a new host (if it doesn't exist already) or you can just apply the newly defined host template to an existing host.


Doing so will add Solr specific macros  you will need to fill when to validate the host:



As an example bellow is the monitoring service page of a solr host:



And here the graph you can expect:

For index:


For FTS:


For a cache:


For a handler:


Of course monitoring doesn't fix anything by itself. It requires thoughtful configuration (to match your instance workload) and to be watched carefully by admins who know what to do in case of alerts.

For example in case your monitoring system reports "Alfresco Error Nodes in Index" your first action will probably be to trigger a "FIX" action in Solr admin console.

For general Solr troubleshooting please refer to the Alfresco documentation here.

This article provides information on how to migrate a project using the Alfresco Maven SDK version 2.2.0 to the new version 3.0.


Before starting I recommend to read the official SDK 3.0 documentation :, the Alfresco Beta User guide is also a good reference :


Why upgrade to new SDK


The SDK version 3 get rid of the compatibility matrix and supports the following Alfresco version :

  • 4.2.X
  • 5.0.X
  • 5.1.X
  • 5.2.X


The SDK also supports the community release.


The targeted version is set by a property (alfresco.platform.version and alfresco.share.version) so it's easy to switch target version.


Upgrading to newest version of the SDK is easier than before since the version is set using the property : alfresco.sdk.version


The SDK 3 now produce JAR file by default, but can also produce AMPs if you need. All the archetype have been revamped to follow standard folder structure.


Hot reloading is available using HotSwapAgent or Jrebel (you need license for this one). Both allow to reload classes and web-resources. But Jrebel is more powerful and allows to reload spring context file.

For more information about HotReloading and how to setup please refer to the official documentation :


The new version of SDK is highly configurable. The Alfresco Maven plugin that handle running and testing your module offer 40 configurable parameters. The execution is controlled by the maven command: "mvn install alfresco:run"


Migrating your project


The SDK 3.0 revamped project structure introduces a major change since the version 2.2.0 so it's no longer compatible. The recommended approach is to create a new project based on the version 3.0 and migrate your project file to this new project.


Generate and configure your new project


The maven archetype are up to date and can be used to generate a new project compatible to the new version with the command "mvn archetype:generate -Dfilter=org.alfresco:"

Since we are talking about an old All-In-One project we will use the all-in-one archetype (org.alfresco.maven.archetype:alfresco-allinone-archetype) in the version 3.0.1.

Maven will take care of generating the folder structure and the default configuration.


By default generated project are configured to use the following community version:

  • Platform : 5.2.e
  • Share: 5.2.f


You can adjust the version by updating the properties :

  • alfresco.platform.version
  • alfrescco.share.version


If you want to use enterprise version you will need to update the properties "maven.alfresco.edition" replacing "community" with "enterprise". You also need valid credential for the alfresco-private repository to retrieve archetype.


If you need to test your extension with external dependency (such as RM), you can configure the Alfresco Maven Plugin to install those module on your platform or your share when running locally.


In order to achieve this you will need to add the module to the "platformModules" and "shareModules" in the configuration section for "alfresco-maven-plugin".


For example if you want to add Record Management Community version 2.6.0 to your platform and share you need to add:

  • In the platformModules section:









  • In the shareModules section:









You can find those information on Alfresco Nexus repository.

You can start alfresco locally using the Alfresco Maven plugin to test your configuration.


Move your project to SDK 3.0 project


In the following section I will use an OOTB SDK 2.2.0 project generated with maven :


$ mvn archetype:generate -Dfilter=org.alfresco.maven.archetype:


With the following parameter:


Archetype version: 2.2.0

Define value for property 'groupId': com.example

Define value for property 'artifactId': aio

[INFO] Using property: version = 1.0-SNAPSHOT

Define value for property 'package' com.example: :


The target project will be generated from maven too with the same command and the following parameters :

Archetype version: 3.0.1

Define value for property 'groupId': com.example

Define value for property 'artifactId': aio30

[INFO] Using property: version = 1.0-SNAPSHOT

Define value for property 'package' com.example: :


So our project to migrate holds 2 extensions sub-module :

  • aio-repo-amp : I will refer to the root of this sub-module as AIO_REPO_ROOT
  • aio-share-amp  : I will refer to the root of this sub-module as AIO_SHARE_ROOT


And the new project also holds 2 extensions:

  • aio30-platform-jar:  : I will refer to the root of this sub-module as AIO30_PLATFORM_ROOT
  • aio30-share-jar : I will refer to the root of this sub-module as AIO30_SHARE_ROOT


Java Code


Java code is the easy part since the location of java class is standard in maven projects:

  • Copy AIO_REPO_ROOT/src/main/java/* to AIO30_PLATFORM_ROOT/src/main/java
  • Copy AIO_SHARE_ROOT/src/main/java/* to AIO30_SHARE_ROOT/src/main/java


Config folder (PROJECT_ROOT/src/main/amps/config)


When installing an AMP into an alfresco.war (or share.war) the content of this folder is expanded into <ROOT>/WEB-INF/classes.  In the new SDK,  such content goes into the resources folder since the resources folder is equivalent to the classpath root.

  • Copy the content of AIO_REPO_ROOT/src/main/amps/config/* to AIO30_PLATFORM_ROOT/src/main/resources
  • Copy the content of AIO_SHARE_ROOT/src/main/amps/config/* to AIO30_SHARE_ROOT/src/main/resources


Web folder (PROJECT_ROOT/src/main/amps/web)


This folder is used to hold web resources (CSS, JS….). In the new SDK those kind of files can go into 2 location:

  • PROJECT_ROOT/src/main/resources/META-INF/resources
  • PROJECT_ROOT/src/main/assembly/web


The first option is compatible with delivering a JAR file or an AMP file. The second can only be used if you plan to deliver AMP file. With the first option, your files will be compressed into the jar file and could not be overridden by other module or someone with access to the filesystem. The second option is the opposite, the file will be expanded in your war file and can be overridden by other module or someone with access to the filesystem.


Since only option one is compatible in all cases, I highly recommend to use that solution:

  • Copy the content of AIO_REPO_ROOT/src/main/amps/web/* to AIO30_PLATFORM_ROOT/src/main/resources/META-INF/resources
  • Copy the content of AIO_SHARE_ROOT/src/main/amps/web/* to AIO30_SHARE_ROOT/src/main/resources/META-INF/resources file


In SDK 2.2.0 the file is located in "amps" folder (PROJECT_ROOT/src/main/amps). In the SDK 3.0 these file goes into the module/<module_name> folder.

  • Copy the file AIO_REPO_ROOT/src/main/amps/config/ to AIO30_PLATFORM_ROOT/src/main/resources/alfresco/module/<module_name>


Fixing path


In spring context file some path might be depending on the artifactId. You need to fix those path before testing your migration.

There is 2 way to fix this:

  • Change the artifactId to reflect your old artifactId
  • Rename the folder with the new artifactId




In order to migrate our project from SDK 2.2.0 to SDK 3.0 we have to:

  • Copy AIO_REPO_ROOT/src/main/java/* to AIO30_PLATFORM_ROOT/src/main/java
  • Copy AIO_SHARE_ROOT/src/main/java/* to AIO30_SHARE_ROOT/src/main/java
  • Copy the content of AIO_REPO_ROOT/src/main/amps/config/* to AIO30_PLATFORM_ROOT/src/main/resources
  • Copy the content of AIO_SHARE_ROOT/src/main/amps/config/* to AIO30_SHARE_ROOT/src/main/resources
  • Copy the content of AIO_REPO_ROOT/src/main/amps/web/* to AIO30_PLATFORM_ROOT/src/main/resources/META-INF/resources
  • Copy the content of AIO_SHARE_ROOT/src/main/amps/web/* to AIO30_SHARE_ROOT/src/main/resources/META-INF/resources
  • Fix path dependant from the artifactId

It has now been a few days since DevCon 2018 wrapped up, and after digesting everything that happened I feel like it is a good time to write down a few thoughts.  In short, DevCon was AWESOME!  It's no secret that DevCon was always a favorite of our community, our partners and of course, the extended team at Alfresco.  This year's event felt like we never stopped doing it.  The attendees were engaged, the energy was high, and the sessions were informative and fun.  For that, we owe Kristen Gastaldo, Francesco Corti, Richard Esplin and their collaborators at the Order of the Bee a huge amount of gratitude.  Even though it was a reboot of a crowd favorite event, a lot has changed.


The Alfresco community has always been a driving force at DevCon, as it should be.  We are an open source company, after all, and when you are an open source company your community plays a pivotal role in your success.  This year that was on full display.  Not only were a huge number of our community superstars there in person, but even the talk selection was a joint effort between the community and the company.  The talks themselves were also a great mix of Alfrescians and our extended community, partners and customers.  If I recall correctly it was almost a neat split right down the middle.  Collaboration with the community was a running theme throughout the conference, and was the focus of one of the talks that wrapped up the first day.  Suffice to say, there are a TON of opportunities to get involved.  Whether you blog, write tutorials, create and contribute your own community extensions or work on an existing one, or submit pull requests to one of our numerous open source projects, it isn't hard to find a way to contribute.


As for the talks themselves, the content was of the quality we expect from our team and community.  Several people from the Alfresco Strategic Services team were among those selected to present, but unfortunately I didn't get to see all of them due to conflicts with my own speaking slots and other conference duties.  Jose Portillo spoke on Solr Sharding, Luis Cabaceira shared some great work he has been doing with Piergiorgio Lucidi on ManifoldCF, Richard McKnight covered some best practices for building Alfresco extensions, and Mohammed Gazal took his first turn on the stage to present some work on PDF templating.  Luckily the talks were all recorded, and will be posted in the coming weeks.  If you missed something the first time around, you'll have the whole event at your fingertips. 


One thing in particular that I liked about this year's DevCon was the mix between what is best practice today, what is coming from Alfresco, and a third category that you might call "the art of the possible".  Of course we had some great sessions on securing content with our governance tools, exporting content, building ACS extensions, tuning Solr and building resilient systems, but we also had a lot of forward looking sessions focused on AWS, ADF and other cutting edge stuff coming out of Alfresco.  I'd like to see this continue in the future.  It's important for us to share the best way to do things today as well as preparing ourselves and our community for the future.


With so many submissions not everybody was able to be included in the official conference agenda, but we are working to find ways to get that knowledge out into the world.  Perhaps as blog posts, or maybe some community webcasts?  Not sure yet but I'd love to hear how you would like us to continue the momentum that DevCon started.  I'm coming off this conference with a huge amount of energy, and can't wait to see what the rest of 2018 has in store! 

One of the recurring issues we see raised by customers are regarding slow SQL queries.

Most of the time those are first witnessed within the Share UI page load time or through long running CMIS queries.

Pinpointing those queries can be a pain and this blog post aims at providing some helps in the troubleshooting process.

We will cover:

  • Things to check in the first place
  • Different ways of getting informations on the query execution time
  • Isolating where the RDBMS is spending more time.
  • Present some tools and configuration to help proactively track this kind of issues.

We hope this content will be useful in real life but really having a DBA who takes care of the Alfresco database is the best you can offer to your Alfresco application!

Preliminary checks

Before blaming the database, it's always a good thing to check that the database engine has appropriate resources in order to deliver good performances. You cannot expect any DB engine to perform well with Alfresco with limited resources. For PostgreSQL there are plenty of resource on the web about sizing a database cluster (here I use cluster in the PostgreSQL meaning, which is different from what we call a cluster in Alfresco).



Network latency can be a performance killer. Opening connections is a quite intensive process and if your network is lame, it will impact the application. Simple network tests can make you sure the network is delivering a good enough transport layer. The ping utility is really the first thing to look at. A ping test between Alfresco and its DB server must show a latency under 1ms on a directly connected network (ethernet Gb), or between 1 & 5 ms if you DB server and the Alfresco server are connected through routed networks. Definitely a value around or above 10ms is not what Alfresco expects from a DB server.

alxgomz@alfresco:~$ ping -c 5 -s500
PING ( 500(528) bytes of data.
508 bytes from icmp_seq=1 ttl=64 time=0.436 ms
508 bytes from icmp_seq=2 ttl=64 time=0.364 ms
508 bytes from icmp_seq=3 ttl=64 time=0.342 ms
508 bytes from icmp_seq=9 ttl=64 time=0.273 ms
508 bytes from icmp_seq=10 ttl=64 time=0.232 ms

--- ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 8997ms
rtt min/avg/max/mdev = 0.232/0.329/0.436/0.066 ms


Some more advanced utilities allow to send TCP packets which are more representative to the actual time spent on opening a tcp session (again you should not have values above 10 ms):

alxgomz@alfresco:~$ sudo hping3 -s 1025 -p 80 -S -c 5
HPING (eth0 S set, 40 headers + 0 data bytes
len=44 ip= ttl=64 DF id=0 sport=80 flags=SA seq=0 win=29200 rtt=3.5 ms
len=44 ip= ttl=64 DF id=0 sport=80 flags=SA seq=1 win=29200 rtt=3.4 ms
len=44 ip= ttl=64 DF id=0 sport=80 flags=SA seq=2 win=29200 rtt=3.4 ms
len=44 ip= ttl=64 DF id=0 sport=80 flags=SA seq=3 win=29200 rtt=3.6 ms
len=44 ip= ttl=64 DF id=0 sport=80 flags=SA seq=4 win=29200 rtt=3.5 ms

--- hping statistic ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max = 3.4/3.5/3.6 ms


Overall network quality is also something to check. If your network devices spend their time reassembling unordered packets or retransmiting them the application performances will suffer from it. This can be checked by taking a network dump during query execution and opening it using wireshark.

In order to take the dump you can use the following tcpdump command:

alxgomz@alfresco:~$ tcpdump -ni any port 5432 and host -w /tmp/pgsql.pcap

Opening it in wireshark should give you an idea very quickly. If you see a dump with a lot of red/black lines then it might be an issue and needs further investigations (those lines are colored this way if wireshark has the right syntaxic coloration rules applied).


RDBMS configuration

PostgreSQL comes with a relatively low end configuration. This is intended to allow for it to run on a wide range of hardware (or VM) configurations. However, if you are running your Alfresco (and its database) on high end systems, you most certainly want to tune the configuration in order get the best out of your resources.

This wiki page present in details many parameters you may wan to tweak Tuning Your PostgreSQL Server - PostgreSQL wiki

The first one to look at is the shared_buffers size. It sets the size of the area PostgreSQL uses to cache data, thus improving performance. Among all those parameters some should be "in sync" with your Alfresco configuration. For example by default Alfresco allow 275 tomcat threads at peak time. Each of these threads should be able to open a database connection. As a consequence PostgreSQL (when installed using the installer) sets the max_connections parameter to 300. However we need to understand that each connection will consume resources, and in the first place: memory. The amount of memory dedicated to a PostgreSQL process (that handles a SQL query) is controlled by the work_mem parameter. By default it has a value of 4MB, meaning we can calculate the amount of physical RAM needed by the database server in order to handle peak load:

work_mem * max_connections =  4MB * 300 = 1.2GB

Add the size of the shared_buffers to this and you'll have a good estimate of the amount of RAM Postgres needs to handle peak loads with default configuration. There are some other important values to fiddle with (like effective_cache_size,  checkpoint_completion_target, ...) but making sure those above are aligned with both your alfresco configuration and the hardware resources of your database host is really where to start (refer to the link above).

A qualified DBA should configure and maintain Alfresco's database to ensure continuous performance and stability of the database. If you don't have a DBA internally there are also dozens of companies offering good services around postgreSQL configuration and tuning.


This is a key in the troubleshooting process. Although monitoring will not give you solution to a performance issue it will help you getting on the right track. In this case having a monitoring in place on the DB server is important. If you can correlate an increasingly slow application with a global load increase on the database server, then you've got a good suspect. There are different things to monitor on the BD server but you should at least have the bare minimum:

  • CPU
  • RAM usage
  • disk IO & disk space

Spikes in CPU and IO disk usage can be the sign of a table that grew large without appropriate indexes.

Spikes in the used disk space can be explained because of the RDBMS creating temporary work files due to a lack of physical memory.

Monitoring the RAM can help you anticipate cache disk memory starvation (PostgreSQL heavily rely on this kind of memory).

Alfresco has some tables that are known to be potentially growing very large. A DBA should monitor their size, both in terms of number of rows and disk size. The query bellow is an example of how you can do it:

SELECT table_name as tableName,
  (total_bytes / 1024 / 1024) AS total,
   row_estimate as rowEstimate,
   (index_bytes / 1024 / 1024) AS INDEX,
   (table_bytes / 1024 / 1024) AS TABLE
   FROM (
     SELECT *,
     total_bytes-index_bytes-COALESCE(toast_bytes,0) AS table_bytes
      FROM (
      SELECT c.oid,
       nspname AS table_schema,
       relname AS TABLE_NAME,
       c.reltuples AS row_estimate,
       pg_total_relation_size(c.oid) AS total_bytes,
       pg_indexes_size(c.oid) AS index_bytes,
       pg_total_relation_size(reltoastrelid) AS toast_bytes
       FROM pg_class c LEFT JOIN pg_namespace n ON n.oid = c.relnamespace

       relkind = 'r') a) a
     WHERE table_schema = 'public' order by total_bytes desc;

Usually the alf_prop_*, alf_audit_entry and possibly alf_node_properties are the tables that may appear in the resultset. There is no rule of thumb which dictate when a table is too large. This is more matter of monitoring how the table grow in time.

Another useful thing to monitor are the creation/usage of temporary files. When your DB is not correctly tuned, or doesn't have enough memory allocated to it, it may created temporary files on disk if it needs to further work on a big resultset. This is obvioulsy not as fast as working in-memory and should be avoided. If you don't know how to monitor that with your usual monitoring system there are some good tools that help a DBA be aware of such things hapenning.

pgbadger is an opensource tool which does that, and many other things. If you don't already use it, I strongly encourgae you to deploy it!


Debugging what takes long and how long is it?

Monitoring should have helped you pinpoint your DB server being over loaded, making your application slow. This could be because it is undersized for its workload (in which case you would probably see a steady high resource usage, or it could be some specific operations are too expensive for some reasons. in the former case, there is not much you can do apart from upgrading either the resources or the DB architecture (but that's not topics I want to cover here). In the later case, getting to know how long a query takes is really what people usually want to know. We accomplish that by using one of the debug option bellow.


At the PostgreSQL level

In my opinion, the best place to get this information is really on the DB server. That's where we get to know the actual execution time, without accounting for network round trip time and other delays. More over, PostgreSQL makes it very easy to enable this debug, and you don't even need to restart the server. There are many ways to enable debug in PostgreSQL, but the one that's the most interesting to us is log_min_duration_statement. By default it has a value of "-1" which means nothing will be logged based on its execution time. But for example, if we set in the postgresql.conf file:

log_min_duration_statement = 250

log_line_prefix = '%t [%p-%l] %q%u@%d '

Any query that takes more than 250 milliseconds to execute will be logged.

Setting the log_min_duration_statement value to zero will cause the system to log every single query. While this can be useful for debugging or for temporary audit this will not be very helpful here as we really want to target slow queries only.

      If interested in profiling your DB have a look at the great pgbadger tool from Dalibo
The Alfresco installer sets by default the log_min_messages parameter to fatal. This prevent the log_min_duration_statement from working. Make sure it is set back to its default value or to a value that's higher than LOG.

Then without interrupting the service, PostgreSQL can be reloaded in order for changes to take effect:

$ pg_ctl -U postgres -W -D /data/postgres/9.4/main reload

adapt the command above to your needs with appropriate paths



This will produce an output such as bellow:

2017-12-12 19:54:55 CET [5305-1] LOG: duration: 323 ms execution : select as prop_id,

      pv.actual_type_id as prop_actual_type_id,

      pv.persisted_type as prop_persisted_type,

      pv.long_value as prop_long_value,

      sv.string_value as prop_string_value


      alf_prop_value pv

      join alf_prop_string_value sv on ( = pv.long_value and pv.persisted_type = $1)


      pv.actual_type_id = $2 and

      sv.string_end_lower = $3 and

      sv.string_crc = $4

DETAIL: parameters: $1 = '3', $2 = '1', $3 = '4ed-f2f1d29e8ee7', $4 = '593150149'

Here we gather important informations:

  1. The date and time the query was executed at the beginning of the first line.
  2. The process ID and Session line number. As PostgreSQL forks a new process for each connection, we can map process ID and pool connections. A single connection may contain different transactions, which in turn will contain several statements. Each new statement processing increments the session line number
  3. The execution time on the first line
  4. The execution stage. A query is executed in several steps. With Alfresco making heavy usage of bind parameters, we will often see several lines for the same query (one for each step):
    1. prepare (when the query is parsed),
    2. bind (when parameters are replaced by their values and execution is planned),
    3. execution (when the query is actually executed).
  5. The query itself starting at the end of the first line and continuing on subsequent lines. Here it contains parameters and can't be executed as is.
  6. The bind parameters on the last line.

In order to get the overall execution time we have to sum up the execution time of the different steps. This seems painful but delivers fined grained breakdown of the query execution. However most of the time the majority of the execution time is spent on the execute stage, to better understand what's going on at this stage, we need to dive deeper into the RDBMS (see next chapter about explain plans).


At the application level

It is also possible to debug SQL queries in a very granular manner at the Alfresco level. However it is important to note that this method is way more intrusive as it required, adding additional jar files, modifying the configuration of the application and restarting the application server. It may not be well suited for production environments where any downtime is a problem.

Also note that execution times reported with this method include network round-trip times. In normal circumstances this should be few additionnal milliseconds, but could much more on a lame network.

To allow debugging at the application level we will use a jdbc proxy: p6spy

Impact on the application performances largely depends on the amount of queries that will be logged.


First of all we will get the latest p6spy jar file from the github repository.

Copy this file to the tomcat lib/ directory and add a file in the same location containing the lines bellow:



This will mimic the behaviour we had previously when debugging with PostgreSQL, meaning only queries that take more than 250 milliseconds will be logged.

We then need to tweak the file in order to make it use the p6spy driver instead of the actual driver:




Alfresco must now be restarted and we will have a new file called spy.log which should now be available and contain lines like the one shown bellow:

1513100550017|410|statement|connection 14|update alf_lock set version = version + 1, lock_token = ?, start_ti
me = ?, expiry_time = ? where excl_resource_id = ? and lock_token = ?|update alf_lock set
version = version + 1, lock_token = 'not-locked', start_time = 0, expiry_time = 0 where excl_resource_id
= 9 and lock_token = 'f7a21222-64f9-40ea-a00a-ef95052dafe9'

We find here similar values to what we had with PostgreSQL:

  1. timestamp the query was executed on the application server.
  2. execution time in milliseconds
  3. The connection ID
  4. The query string without bind parameters
  5. The query string with evaluated bind parameters


Understanding the execution plan

Now that we have pinpointed the problematic query(ies) we can deep dive into PostgreSQL's logic and understand why the query is slow. RDBMS rely on their query planner to decide how to deal with a query.

The query planner itself makes decision based on the structure of the query, the structure of the database (e.g presence and types of indexes) and also based on statistics the system maintains during its execution. The more accurate those statistics are, the more efficient the query planer will be.


Explain plans

In order to know what the query planner would do for a specific query, it is possible to run it prefixed with the "EXPLAIN ANALYZE " statement.

To make this chapter more hands-on we'll proceed with an example. Let's consider a query which is issued while browsing the repository (get node informations based on parent). Using one of the methods we've seen above, we have identified that query an running it prefixed with "EXPLAIN ANALYZE" returns the following:

                                                                                 QUERY PLAN
Nested Loop  (cost=6614.54..13906.94 rows=441 width=676) (actual time=1268.230..1728.432 rows=421 loops=1)
   ->  Hash Left Join  (cost=6614.26..13770.40 rows=441 width=639) (actual time=1260.966..1714.577 rows=421 loops=1)
         Hash Cond: (childnode.store_id =
         ->  Hash Right Join  (cost=6599.76..13749.84 rows=441 width=303) (actual time=1251.427..1704.866 rows=421 loops=1)
               Hash Cond: (prop1.node_id =
               ->  Bitmap Heap Scan on alf_node_properties prop1  (cost=2576.73..9277.09 rows=118749 width=71) (actual time=938.409..1595.742 rows=119062 loops=1)
                     Recheck Cond: (qname_id = 26)
                     Heap Blocks: exact=5205
                     ->  Bitmap Index Scan on fk_alf_nprop_qn  (cost=0.00..2547.04 rows=118749 width=0) (actual time=934.132..934.132 rows=119178 loops=1)
                           Index Cond: (qname_id = 26)
               ->  Hash  (cost=4017.52..4017.52 rows=441 width=232) (actual time=90.488..90.488 rows=421 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 83kB
                     ->  Nested Loop  (cost=0.83..4017.52 rows=441 width=232) (actual time=8.228..90.239 rows=421 loops=1)
                           ->  Index Scan using idx_alf_cass_pri on alf_child_assoc assoc  (cost=0.42..736.55 rows=442 width=8) (actual time=2.633..58.377 rows=421 loops=1)
                                 Index Cond: (parent_node_id = 31890)
                                 Filter: (type_qname_id = 33)
                           ->  Index Scan using alf_node_pkey on alf_node childnode  (cost=0.42..7.41 rows=1 width=232) (actual time=0.075..0.075 rows=1 loops=421)
                                 Index Cond: (id = assoc.child_node_id)
                                 Filter: (type_qname_id = ANY ('{142,24,51,200,204,206,81,213,97,103,231,104,107}'::bigint[]))
         ->  Hash  (cost=12.00..12.00 rows=200 width=344) (actual time=9.523..9.523 rows=6 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 1kB
               ->  Seq Scan on alf_store childstore  (cost=0.00..12.00 rows=200 width=344) (actual time=9.517..9.518 rows=6 loops=1)
   ->  Index Scan using alf_transaction_pkey on alf_transaction childtxn  (cost=0.28..0.30 rows=1 width=45) (actual time=0.032..0.032 rows=1 loops=421)
         Index Cond: (id = childnode.transaction_id)
Planning time: 220.119 ms
Execution time: 1728.608 ms

Although I never faced it with PostgreSQL (more with Oracle), there are cases where the explain plans is different depending on whether you pass the query as a complete string or you use bind parameters.

In that case the parameterized query that may be found to be slow in the SQL debug logs might appear fast when executed manually.

To get the explain plan of these slow queries PostgreSQL has a loadable module which can log explain plans the same way we did with the log_duration. See the auto_explain documentation for more details.


We can indeed see that the query is taking rather long just by looking at the end of the output (highlighted in bold). However reading the full explain plan and more importantly understanding it can be challenging.

In the explain plan, PostgreSQL is breaking down the query into "nodes". Those "nodes" represent actions the RDBMS has to run through in order to execute the query. For example, at the bottom of the plan we will find the "scan nodes", which are the statements that actually return rows from tables. And upper in the plan we have "nodes" that correspond to aggregations or ordering. In the end we have an indented/hierarchical tree of the query from which we can detail each step. Each node (line starting with "-->") is shown with:

  • its type, whether the system using indexes and what kind (Index scan) or not any (sequential scan)
  • its estimated cost, an arbitrary representation of how costly an operation is.
  • the estimated number of lines the operation would return

And much more details that make it somewhat hard to read.

And to make it more confusing, some values, like "cost" or "actual time", have two different values. To make it short you should only consider the second one.


The purpose of this article is not to learn how to fully understand a query plan, so instead,we will use a very handy online tool which ill parse the output for us and point out the problems we may have: New explain | 

Only by pasting the same output and submitting the form we will get a better view of what's going on and what we need to look at.

The "exclusive" color mode gives the best representation of how efficient each individual node is.

"Inclusive" mode is cumulative (so the top row will always be dark red as it's equal to the total execution time).

"rows x" shows how accurately the query planner is able to guess the number of rows.

If using "mixed" color mode, each cell in each mode's column will have its own color (which can be a little bit harder to read).

So here we can see straight away that nodes #4 & #5 are where we are spending most time. We can see that those nodes return a big amount of rows (more than 100 000 while there are only 421 of then in the final result set) meaning that the available indexes and statistics are not good enough.

Alfresco normally provide all the necessary indexes for the database to deliver good performance in most of the cases so it is very likely that queries under-performing because of indexes are actually missing indexes. Fortunately, Alfresco also deliver a convenient way to check the database schema for any in consistency.


Alfresco schema validation

When connected to the JMX interface, in the MBeans tab, it is possible to trigger a schema validation while alfresco is running (go to "Alfresco \ DatabaseInformation \ SchemaValidator \ Operations" and launch "valideateSchema()").

This will produce the output bellow in the Alfresco log file:

2017-12-18 15:21:32,194 WARN [domain.schema.SchemaBootstrap] [RMI TCP Connection(6)-] Schema validation found 1 potential problems, results written to: /opt/alfresco/bench/tomcat/temp/Alfresco/Alfresco-PostgreSQLDialect-Validation-alf_-4191170423343040157.txt
2017-12-18 15:21:32,645 INFO [domain.schema.SchemaBootstrap] [RMI TCP Connection(6)-] Compared database schema with reference schema (all OK): class path resource [alfresco/dbscripts/create/org.hibernate.dialect.PostgreSQLDialect/Schema-Reference-ACT.xml]

The log file points us to another file where we can see the details of the validation process. Here for example we can see that an index is indeed missing :

alfresco@alfresco:/opt/alfresco/bench$ cat /opt/alfresco/bench/tomcat/temp/Alfresco/Alfresco-PostgreSQLDialect-Validation-alf_-4191170423343040157.txt
Difference: missing index from database, expected at path: .alf_node_properties.alf_node_properties_pkey

The index can now be re-created either taking a fresh install as a model or by getting in touch with alfresco support to know how to create the index.

The resulting - more efficient - query plan if much better:


Database statistics

Statistics are really critical to PostgreSQL performance as it's what mainly offer the query planner efficiency. With accurate statistics PostgreSQL will make good decision when planning a query. And of course, inaccurate statistics leads to bad decisions and so bad performances.

PostgreSQL has an internal process in charge of keeping statistics up to date (in addition to other house keeping tasks): the autovacuum process.

All versions of PostgreSQL Alfresco supports have this capability and it should always be active! By default this process will try to update statistics according to configuration options set in postgresql.conf. The options bellow can be useful to fine tune the autovacuum behaviour (those are the defaults):

autovacuum=true #Enable the autovacuum daemon

autovacuum_analyze_threshold=50 #Number of tuples modifications that trigger ANALYZE

autovacuum_analyze_scale_factor = 0.1 #Fraction of table modified to trigger ANALYZE

default_statistics_target = 100 # Amount of information to store in the statistics

autovacuum_analyze_threshold & autovacuum_analyze_scale_factor must be summed up in order to know when an ANALYZE will be triggered


If you saw a slow, but constant degradation in queries performances it maybe that some tables grew large enough to make the default parameters not as efficient as they used to be. As the tables grow, the scale factor can make statistics update very infrequent, lowering  autovacuum_analyze_scale_factor will make statistics updates more frequent, thus ensuring stats are more up to date.

The type of data distribution within a database tables can also change during its lifetime, either because of a data model change of simply because of new use-cases. Raising default_statistics_target will make the daemon collect and process more data from the tables when generating or updating statistics, thus making the statistics more accurate.

Of course asking for more frequent dates and more accurate statistics as an impact on the resources needed by the autovacuum process. Such tweaking should be carefully done by your DBA.

Also it is important to note that the options above are applied for every table. You may want to do something more targeted to known big tables. This is doable by changing the storage options for the specific tables:

=# ALTER TABLE alf_node_prop_string_value

-#     ALTER COLUMN string_value SET STATISTICS 1000;

Above SQL statement is really just an example and must not be used without prior investigations


Alfresco's Customer Success team is always looking for ways to better serve our customers.  Sometimes this takes the form of improved processes in support, or implementing 360 degree views of customers for our CSMs, or improving the way we capture and manage best practices for our knowledgebase.  Sometimes this means taking entire teams and realigning them to better match what our customers are telling us that they need. 


Within Alfresco's Customer Success group, we have two teams that provide support and services that go beyond what comes with our primary support:  Alfresco Premier Services and Alfresco Consulting.  Not surprisingly, these two teams often work hand in hand.  Premier Services may work with a customer on an issue until it becomes clear that the business solution will require an extension, and then help the customer to engage Alfresco consulting to get it done.  Conversely, our consulting team provides solutions that affect the recommendations and troubleshooting steps that our support teams will take when working with a customer on subsequent questions.  When we look at the way these two teams work and what they do for our customers, it makes a lot of sense to pull them under the same roof.


Alfresco's Strategic Services organization was created for just that purpose, marrying the best of our elevated operational support offerings with our skilled and seasoned consulting team.  We have already started reimagining parts of our delivery and establishing new processes and procedures for internal knowledge sharing and capture.  In the coming weeks we will be focused on improving our ability to seamlessly move work across the team to make sure the right people have eyes on the right things at the right time.  It's an exciting time for services at Alfresco!

  • Purpose

The purpose of this blog is to show how to scan images containing text so that the text is indexed and searchable by Alfresco. The following file types are supported: PNG, BMP, JPEG, GIF, TIFF and PDF (containing images).


For this exercise we are going to use a Linux OS...but this solution should equally work on Windows OS.


To scan images we are going to use Tesseract-ocr (tesseract). This package contains an OCR engine - libtesseract and a command line program - tesseract

Tesseract has unicode (UTF-8) support, and can recognize more than 100 languages "out of the box".


  • Tesseract

Since we are using tesseract-ocr we need to install tesseract software for our Linux distribution (version 3 or greater)

Please follow the instructions explained here: Installing Tesseract


  • Transformation context file

Create a file named transformer-context.xml in alfresco's extension folder i.e. tomcat/shared/classes/alfresco/extension with the following content:


<?xml version='1.0' encoding='UTF-8'?>
<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor
     license agreements. See the NOTICE file distributed with this work for additional
     information regarding copyright ownership. The ASF licenses this file to
     You under the Apache License, Version 2.0 (the "License"); you may not use
     this file except in compliance with the License. You may obtain a copy of
     the License at Unless required
     by applicable law or agreed to in writing, software distributed under the
     License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS
     OF ANY KIND, either express or implied. See the License for the specific
     language governing permissions and limitations under the License. -->


     <!-- Transforms from TIFF to plain text using Tesseract
           and a custom script -->

     <bean id="transformer.worker.ocr.tiff"

          <property name="mimetypeService">
               <ref bean="mimetypeService" />
          <property name="checkCommand">
               <bean class="org.alfresco.util.exec.RuntimeExec">
                    <property name="commandsAndArguments">
                              <entry key=".*">
                    <property name="errorCodes">

          <property name="transformCommand">
               <bean class="org.alfresco.util.exec.RuntimeExec">
                    <property name="commandsAndArguments">
                              <entry key=".*">
                    <property name="errorCodes">
                    <property name="waitForCompletion">
          <property name="transformerConfig">
               <ref bean="transformerConfig" />

     <bean id="transformer.ocr.tiff"

          <property name="worker">
               <ref bean="transformer.worker.ocr.tiff" />

     <!-- Transforms from PDF to TIFF using Ghostscript -->
     <bean id="transformer.worker.pdf.tiff"

          <property name="mimetypeService">
               <ref bean="mimetypeService" />
          <property name="checkCommand">
               <bean name="transformer.ImageMagick.CheckCommand" class="org.alfresco.util.exec.RuntimeExec">
                    <property name="commandsAndArguments">
                              <entry key=".*">

          <property name="transformCommand">
               <bean class="org.alfresco.util.exec.RuntimeExec">
                    <property name="commandsAndArguments">
                              <entry key=".*">
                    <property name="errorCodes">
                    <property name="waitForCompletion">
          <property name="transformerConfig">
               <ref bean="transformerConfig" />

     <bean id="transformer.pdf.tiff"

          <property name="worker">
               <ref bean="transformer.worker.pdf.tiff" />



We can see we are using a few variables here:

  • tesseract.exe: this is the tesseract binary file, normally installed as /usr/bin/tesseract
  • ocr.script: this is the script we are calling to transform images to text, installed in Alfresco home folder as
  • ghostcript.exe: this is the ghostcript binary file...usually is the gs binary file
  • source: this is the source image file
  • target: this is the resulting text file


  • OCR Script

The next step is to create the script. The location of the script will be reference also in file by the property ocr.script as shown later in this blog.


Assuming Alfresco is installed in /opt/alfresco, create a file name /opt/alfresco/ with the following content:

# save arguments to variables

# Create temp directory if it doesn't exist
mkdir -p $TMPDIR

# to see what happens
# echo "from $SOURCE to $TARGET" >>/tmp/ocrtransform.log


# call tesseract and redirect output to $TARGET
/usr/bin/tesseract $TMPDIR/$OCRFILE ${TARGET%\.*} -l eng

A couple of points to consider here:

  • We are using LD_LIBRARY_PATH to point to the OS library path to find the libraries required by tesseract. If we don't do this it will be using the library path defined by Alfresco pointing to commons/lib folder, but the version of the libraries may not be the ones required by tesseract.
  • We are defining the location of the tesseract binary file as /usr/bin/tesseract. If installed on a different location then adjust the path to tesseract accordingly.


Finally make sure the file has executable permission set. You can set it with the following command: chmod 755 /opt/alfresco/


  • Tesseract properties

The next step is to define a set of properties for tesseract in

# OCR Script

#GS executable

#Tesseract executable

# Define a default priority for this transformer

# List the transformations that are supported

# Define a default priority for this transformer
# List the transformations that are supported

# Commented to be compatible with Alfresco 5.x
# content.transformer.complex.Pdf2OCR.failover=ocr.pdf

# Disable the OOTB transformers


The main property to consider is ocr.script pointing to the location of the file...adjust accordingly. All other properties can be left as they are.


  • Debugging

There are two areas we can debug:

  1. The Alfresco transformation service
  2. Tesseract execution


Alfresco Transformation Service

To debug the transformation service edit the file tomcat/shared/classes/alfresco/extension/ and add the following line at the bottom:


Alfresco needs restarting to pick up this debug entry.


Tesseract execution

To get some execution information from tesseract edit the file /opt/alfresco/ and uncomment the following entry by removing the '#' from the beginning of the line:


# echo "from $SOURCE to $TARGET" >>/tmp/ocrtransform.log


Now when an image file with text is loaded in Alfresco we can see similar entries in alfresco.log file showing the script being called.

2017-10-10 15:20:17,182  DEBUG [content.transform.RuntimeExecutableContentTransformerWorker] [http-bio-8443-exec-6] Transformation completed: 
   source: ContentAccessor[ contentUrl=store:///opt/alfresco/tomcat/temp/Alfresco/ComplextTransformer_intermediate_pdf_9017478201188837562.tiff, mimetype=image/tiff, size=24925880, encoding=UTF-8, locale=en_GB]
   target: ContentAccessor[ contentUrl=store://2017/10/10/15/20/d3b4b9aa-ad28-4c8c-ae86-f99938bf4125.bin, mimetype=text/plain, size=1173, encoding=UTF-8, locale=en_GB]
   options: {maxSourceSizeKBytes=-1, pageLimit=-1, use=index, timeoutMs=120000, maxPages=-1, contentReaderNodeRef=null, sourceContentProperty=null, readLimitKBytes=-1, contentWriterNodeRef=null, targetContentProperty=null, includeEmbedded=null, readLimitTimeMs=-1}
Execution result: 
   os:         Linux
   command:    /opt/alfresco/ /opt/alfresco/tomcat/temp/Alfresco/RuntimeExecutableContentTransformerWorker_source_5734790636289670188.tiff /opt/alfresco/tomcat/temp/Alfresco/RuntimeExecutableContentTransformerWorker_target_1506982845420553983.txt
   succeeded:  true
   exit code:  0
   err:        Tesseract Open Source OCR Engine v3.04.01 with Leptonica
Page 1
 2017-10-10 15:20:17,183  TRACE [content.transform.TransformerLog] [http-bio-8443-exec-6] 4.1.2         tiff txt  INFO <<TemporaryFile>> 23.7 MB 1,950 ms ocr.tiff<<Runtime>>
 2017-10-10 15:20:17,183  TRACE [content.transform.TransformerDebug] [http-bio-8443-exec-6] 4.1.2         Finished in 1,950 ms


We can also take a look at the /tmp/ocrtransform.log file to see what files have been processed.

from /opt/alfresco/tomcat/temp/Alfresco/RuntimeExecutableContentTransformerWorker_source_5734790636289670188.tiff to /opt/alfresco/tomcat/temp/Alfresco/RuntimeExecutableContentTransformerWorker_target_1506982845420553983.txt


That's it, you should now be able to search for the text contained in the image files.


  • References

Most of the information on this blog comes from this GitHub repository, with some additional adjustments and inclusions.


Alfresco recently released a new module that allow modern SSO to be setup using the SAML protocol. SAML is a standard which  has a set of specifications defined by the OASIS consortium.

Like kerberos SAML is considered a secure approach to SSO, it involves signing messages and possibly encrypting them ; but unlike kerberos - which is more targeted to local networks or VPNs extended networks - SAML is really good fit for internet and SaaS services. Mainly SAML requires an Identity Provider (often referred to as IdP) and a Service Provider (SP) to communicate together. Many cloud services offer the SAML Service Provider features, and even sometimes the IdP feature (for example google: Set up your own custom SAML application - G Suite Administrator Help).

LemonLDAP::NG is an open-source software that is actually a handler for the httpd Apache webserver. LemonLDAP::NG supports a wide variety of authentication protocol (HTTP header based, CAS, OpenID Connect, OAuth, kerberos, ...) and backends (MySQL, LDAP, Flat files).


Pre-requisites & Context

LemonLDAP must be installed and configured with an LDAP backend.
Doing so is out of the scope of this document. Please refer to:

LemonLDAP::NG Download
LemonLDAP::NG DEB install page
LemonLDAP::NG LDAP backend configuration

If you just want to test SAML with LemonLDAP::NG and you don’t want the burden of setting up LDAP and configuring LemonLDAP::NG accordingly, you can use the default “demo” backend which is used be default “out of the box”.
In this case you can use the demo user “dwho” (password “dwho”).

At the moment the Alfresco SAML module doesn’t handle the user registry part of the repo. This means that users have to exist prior to login using SAML.
As a consequence, either Alfresco must be setup with ldap synchronisation enabled - synchronisation should be done against the same directory LemonLDAP::NG uses as an LDAP backend for authentication - or users must have been created by administrators (e.g. using the Share admin console, csv import, People API…)

Both the SAML Identity Provider and the Service Provider must be time synchronized using NTP.

In the document bellow we assert that the ACME company setup their SSO system using LemonLDAP::NG on the domain.

authentication portal (where users are redirected in order to login)

manager (for administration purposes - used further in this document)

On the other end, their ECM system is hosted on a Debian-like system at (possibly a SaaS provider or their AWS instance of Alfresco). ACME wants to integrate the Share UI interface with their SSO system.

The Identity Provider

SAMLv2 required libraries

While Alfresco uses opensaml java bindings for its SAML module, LemonLDAP::NG uses the LASSO library perl bindings. Even though LemonLDAP::NG is installed and running, required library may not be installed as they are not direct dependencies.
LASSO is a pretty active project and bugs are fixed regularly. I would then advice to use the latest & greatest version available on their website instead of the one provided by your distribution.
For example if using a Debian based distribution:

$ cat <<EOT | sudo tee /etc/apt/source.list.d/lasso.list deb jessie main deb-src jessie main EOT
$ sudo wget -O - | sudo apt-key add -
$ sudo apt-get update
$ sudo apt-get install liblasso-perl

Make sure you are using the latest version of the LASSO library  and its perl bindings (2.5.1-2 fixes some important issues with SHA2)

LemonLDAP::NG SAMLv2 Identity Provider

As you may know SAML extensively uses XML Dsig. As a specification Dsig provides guidance on how to hash, sign and encrypt XML contents.
In SAML, signing and encrypting rely on asymmetric cryptographic keys.
We will then need to generate such keys (here RSA) to sign and possibly encrypt SAML messages. LemonLDAP offers the possibility to uses different keys for signing and encrypting SAML messages. If you plan to use both signing and encryption, please use the same key for both (follow the procedure bellow only once for signing, encryption will use the same key).

Login to the LemonLDAP::NG manager (usually, in the menu “SAML 2 Service \ Security Parameters \ Signature” and click on “New keys”
Type in a password that will be used to encrypt the private key and remember it!

LemonLDAP::NG signing keys

you’ll need the password in order the generate certificates later on!

We now need to setup the SAML metadata that every Service Provider will use (among which Alfresco Share, and possibly AOS and Alfresco REST APIs).
In the LemonLDAP::NG manager, inside the menu “SAML 2 Service \ Organization”, fill the form with:

Display Name: the ACME company
Name: acme

Of course you will use values that match your environment

Next in the “General Parameters \ Issuer modules \ SAML” menu, make sure the SAML Issuer module is configured as follow:

Activation: On
Path: ^/saml/
Use rule: On

Note that it is possible to allow SAML connection only under certain condition, by using the “Special rule” option.
You then need to define a Perl expression that return either true or false (more information here).

And That’s it, LemonLDAP::NG is now a SAML Identity Provider!

In order to configure Alfresco Service providers we need to export the signing key as a certificate. To do so, copy the private key that was generated in the LemonLDAP::NG manager to a file (e.g. saml.key) and generate a self-signed cert using this private key.

$ openssl req -new -days 365 -nodes -x509 -key saml.key -out saml.crt

use something like CN=LemonLDAP, OU=Premier Services, O=Alfresco, L=Maidenhead, ST=Berkshire, C=UK as a subject

Keep the saml.crt file somewhere you can find it for later use.

SAMLv2 Service Provider

Install SAML Alfresco module package

The Alfresco SAML module can be downloaded from the Alfresco support portal. Only enterprise customers are entitled to this module.
So, we download and unzip it to some place. Then, after stopping Alfresco, we copy the amp files to the each amps directories within the alfresco install directory and deploy them.

$ cp alfresco-saml-repo-1.0.1.amp <ALFRESCO_HOME>/amps
$ cp alfresco-saml-share-1.0.1.amp <ALFRESCO_HOME>/amps_share
$ ./bin/

We now have to generate the certificate we will be using on the SP side:

$ keytool -genkeypair -alias my-saml-key -keypass change-me -storepass change-me -keystore my-saml.keystore -storetype JCEKS

You can use something like CN=Share, OU=Premier Services, O=Alfresco, L=Maidenhead, ST=Berkshire, C=UK as a subject

You can of course choose to use a different password and alias, just remember them for later use.

The keystore must be copied somewhere and Alfresco configured to retrieve it.

$ mv my-saml.keystore alf_data/keystore
$ cat <<EOT > alf_data/keystore/
$ cat <<EOT >> tomcat/shared/classes/


Make sure that:

  • the keystore file is readable to Alfresco (and only to alfresco).
  • the alias and passwords match the one you use when generating the keystore with the keytool command

Next step is to merge the whole <filter/> element provided in the saml distribution (in the share-config-custom.xml.sample file), to your own share-config-custom.xml (which should be located in your {extensionroot} directory).
Bellow is an example section of the CSRF policy:

    <config evaluator="string-compare" condition="CSRFPolicy" replace="true">


        If using https make a CSRFPolicy with replace="true" and override the properties section.
        Note, localhost is there to allow local checks to succeed.






            <!-- SAML SPECIFIC CONFIG -  START -->


             Since we have added the CSRF filter with filter-mapping of "/*" we will catch all public GET's to avoid them
             having to pass through the remaining rules.


            <!-- Incoming posts from IDPs do not require a token -->


            <!-- SAML SPECIFIC CONFIG -  STOP -->


            <!-- EVERYTHING BELOW FROM HERE IS COPIED FROM share-security-config.xml -->


             Certain webscripts shall not be allowed to be accessed directly form the browser.
             Make sure to throw an error if they are used.
                <action name="throwError">
                    <param name="message">It is not allowed to access this url from your browser</param>


             Certain Repo webscripts should be allowed to pass without a token since they have no Share knowledge.
             TODO: Refactor the publishing code so that form that is posted to this URL is a Share webscript with the right tokens.
                <action name="assertReferer">
                    <param name="referer">{referer}</param>
                <action name="assertOrigin">
                    <param name="origin">{origin}</param>


             Certain Surf POST requests from the WebScript console must be allowed to pass without a token since
             the Surf WebScript console code can't be dependent on a Share specific filter.
                <action name="assertReferer">
                    <param name="referer">{referer}</param>
                <action name="assertOrigin">
                    <param name="origin">{origin}</param>


            <!-- Certain Share POST requests does NOT require a token -->
                <action name="assertReferer">
                    <param name="referer">{referer}</param>
                <action name="assertOrigin">
                    <param name="origin">{origin}</param>


            <!-- Assert logout is done from a valid domain, if so clear the token when logging out -->
                <action name="assertReferer">
                    <param name="referer">{referer}</param>
                <action name="assertOrigin">
                    <param name="origin">{origin}</param>
                <action name="clearToken">
                    <param name="session">{token}</param>
                    <param name="cookie">{token}</param>


            <!-- Make sure the first token is generated -->
                        <attribute name="_alf_USER_ID">.+</attribute>
                        <attribute name="{token}"/>
                        <!-- empty attribute element indicates null, meaning the token has not yet been set -->
                <action name="generateToken">
                    <param name="session">{token}</param>
                    <param name="cookie">{token}</param>


            <!-- Refresh token on new "page" visit when a user is logged in -->
                        <attribute name="_alf_USER_ID">.+</attribute>
                        <attribute name="{token}">.+</attribute>
                <action name="generateToken">
                    <param name="session">{token}</param>
                    <param name="cookie">{token}</param>


             Verify multipart requests from logged in users contain the token as a parameter
             and also correct referer & origin header if available
                    <header name="Content-Type">multipart/.+</header>
                        <attribute name="_alf_USER_ID">.+</attribute>
                <action name="assertToken">
                    <param name="session">{token}</param>
                    <param name="parameter">{token}</param>
                <action name="assertReferer">
                    <param name="referer">{referer}</param>
                <action name="assertOrigin">
                    <param name="origin">{origin}</param>


             Verify that all remaining state changing requests from logged in users' requests contains a token in the
             header and correct referer & origin headers if available. We "catch" all content types since just setting it to
             "application/json.*" since a webscript that doesn't require a json request body otherwise would be
             successfully executed using i.e."text/plain".
                        <attribute name="_alf_USER_ID">.+</attribute>
                <action name="assertToken">
                    <param name="session">{token}</param>
                    <param name="header">{token}</param>
                <action name="assertReferer">
                    <param name="referer">{referer}</param>
                <action name="assertOrigin">
                    <param name="origin">{origin}</param>

Configure SAML Alfresco module

We can now configure the SAML service providers we need. Alfresco offers 3 different service providers that can be configured/enabled separately:

  • Share (the Alfresco collaborative UI)
  • AOS (the new Sharepoint protocol interface)
  • REST api (the Alfresco RESTful api)

Configuration can be done in several ways.

Configuring SAML SP using subsystem files:

The alfresco SAML distribution comes with examples of the SAML configuration files. Reusing them is very convenient and allow quick setup.
We’ll copy the files for the required SP and configure each SP as needed.

$ cp -a ~/saml/alfresco/extension/subsystems tomcat/shared/classes/alfresco/extension

Then to configure Share SP, for example, make sure to rename sample files and make sure they contain the needed properties:

$ mv tomcat/shared/classes/alfresco/extension/subsystems/SAML/share/share/ tomcat/shared/classes/alfresco/extension/subsystems/SAML/share/share/


Of course you should use URLs matching your domain name!

As the configuration points to the IdP certificate, we’ll also need to copy it to the Alfresco server as well (we generated this certificate earlier) in the alf_data/keystore folder (or any other folder you may have used as dir.keystore).

$ cp saml.crt alf_data/keystore

Configuring SAML SP using the Alfresco admin console:

Configure SAML service provider using the Alfresco admin console (/alfresco/s/enterprise/admin/admin-saml).
Set the following parameters:

Of course you should use URLs matching your domain name!

Bellow is a screenshot of what it would look like:

Force SAML connection unset lets the user login either as a SAML authenticated user, or as another user, using a different subsystem.

Download the metadata and certificates from the bottom of the page, and then import the certificate you generated earlier using openssl in the Alfresco admin console.
To finish with Alfresco configuration, tick the “Enable SAML authentication (SSO)” box.

Create the SAML Service provider on the Identity Provider

The identity provider must be aware of the SP and its configuration. Using the LemonLDAP manager go to the “SAML Service Provider” section and add a new service provider.
Give it a name like “Alfresco-Share”.

Upload the metadata file exported from Alfresco admin console.

Under the newly created SP, in the “Options \ Authentication response” menu set the following parameters:

Default NameID Format: Unspecified
Force NameID session key: uid

Note that you could use whatever session key that is available and fits your needs. Here, uid makes sense for use with Alfresco logins and works for the “Demo” authentication backend in LemonLDAP::NG. If a real LDAP backend is available and Alfresco is syncing users from that same LDAP directory, then the value for the session key used as NameID value should match the ldap.synchronization.userIdAttributeName defined in the Alfresco’s ldap authentication subsystem.

Optionnally, you can also send some more informations in the authentication response to the Share SP. To do so, under the newly created SP is a section called “Exported attributes”. Configure it as follow:

This requires that the appropriate keys are exported as variables whose names are used as "Key name".

So here, it we would have the following LemonLDAP::NG exported variables:

  • fisrtname
  • lastname
  • mail

Hacks ‘n tweaks

At this point, and given you met the pre-requisites, you should be able to login without any problems. However, there may still be some issues with SP-initiated logout (initiating logout from the IdP should work though), depending on the version of SP and IdP you use. Logouts rely on the SAML SLO profile and the way it's implemented in both Alfresco and LemonLDAP at the moment still have some interoperability issues.

On the Alfresco, SAML module version 1.0.1 is impacted by MNT-18064, which prevents SLO from working properly with LemonLDAP::NG. There is a small patch attached to the JIRA that can be used and adapted to match the NameID format used by your IdP (for the configuration described here, that would be "UNSPECIFIED"):

This JIRA is to be fixed in the next alfresco-saml module (probably 1.0.2). In the mean time you can use the patch in the JIRA 

The LemonLDAP::NG project crew kindly considered re-writing their sessionIndex generation algorythm in order to avoid interoperability problems and security issues. This is needed in order to work with Alfresco and should be added in 1.9.11. Thus, previous versions won’t work:

In the mean time you can use the patch attached to LEMONLDAP-1261



One of the most critical and time consuming process in relation to the Alfresco Content Connector for Salesforce module is migrating all the existing Notes & Attachments content from Salesforce instance to the On-Premise Alfresco Content Services (ACS) instance. This requires lot of planning and thought as to how this use case can be implemented and how the content migration process can be completed successfully. When the end users start using the connector widget in Salesforce they will be able to upload new documents and make changes to existing documents etc within Salesforce but the actual content is stored in an ACS instance. Some of the core components/steps involved in this process are

  • Number of documents/attachments that need to be migrated to Alfresco
  • Salesforce API limits available per 24 hour period. Use SOQL query or ACC API functions to retrieve the content. We need to make sure the available API calls are not fully used by the migration activity.
  • Naming conventions - some of the special characters used in SFDC's filename may not be supported in Alfresco, so some consideration must be given to manipulate the filenames in Alfresco when the documents are saved.
  • Perform a full content migration activity in a Salesforce DEV/UAT environment that has a complete copy of production data. This will assist in determining issues such as naming conventions, unsupported documents/document types etc.
  • Custom Alfresco WebScripts/Scheduled Actions to import all content from SFDC to Alfresco
  • Make sure there are enough system resources in the Alfresco server to hold all the content in the repository. This includes disk capacity, heap/memory allocated to JVM, CPU etc. 

We will look in to all these core components in more detail later in this blog post.


Why Content Migration?


Why would we want to migrate all the existing content from SFDC to Alfresco? of course there are many reasons why this migration activity is required/mandatory for all the businesses/end users who will use this module.

  • End users need to be able to access/find the legacy or existing documents and contracts in the same way as they normally do.
  • Moving content over to ACS instance will save plenty of money for the businesses since the Salesforce storage costs are really expensive. 
  • Easier access to all documents/notes using the ACC search feature available in the widget.
  • Content can be accessed in both Salesforce and Alfresco depending on the end user's preference. Some custom Smart Folders can be configured in ACS to get a holistic view of the contents that the end user require. For example, end users would want to only see the content related to their region. In which case a Smart folder can be configured to query for a Salesforce object based on a particular metadata property value, such as sobjectAccountRegion, sobjectAccount:Regional_Team etc.


Steps Involved


  1. Determine the total number of Notes & Attachments that must be migrated/imported to ACS - This can be determined by using Salesforce Developer Workbench. Use a SOQL query to get a list of all Notes/Attachments of a specific Salesforce object. For example Accounts, Opportunities etc.SOQL Query

         You may also use the RESTExplorer feature in the workbench to execute this query.

Some sample SOQL queries are      

To retrieve attachments of all Opportunity objects in SFDC - Select Name, Id, ParentId from Attachment where ParentId IN (Select Id from Opportunity)
To retrieve Notes of all Account objects in SFDC - Select Id,(Select Title, Id, ParentId, Body from Notes) from Account

REST Explorer View


   2.   Develop Alfresco Webscripts/Aysychronous action to retrieve and migrate all content from SFDC to Alfresco - It is probably a good idea to develop an Alfresco Asynchronous action (Java based) as opposed to a Webscript to perform the migration process. This is to ensure the actual migration runs in the background and there are no timeouts or anything as such we may see using the normal webscipts. Based on the amount of content it take can take few hours for the action to complete successfully. We will use the ACC module's available API's to make a connection to Salesforce and then use the relevant functions to retrieve the content. The code snippet to make a connection and execute a SOQL query are below. To get the content of an Attachment object in SFDC use the getBlob method. It should be something like below,

apiResult = connection.getApi().sObjectsOperations(getInstanceUrl()).getBlob("Attachment", documentId, "Body");

Open a connection to SFDC


SOQL Query


Once the connection and query are established you can then look to save the file in Alfresco based on the end-user/business needs. The code snippet to save the document in alfresco is below.




   3.   “Warm up” of Alfresco folder structures without human intervention - One of the prerequsite to the content migration process is pre-creating all the SFDC objects folder structure within ACS and make sure the folders are mapped to the appropriate SFDC object. This can be achieved by exporting all the SFDC objects details in to CSV file. the CSV file must contain information such as SFDC Record Id, AccountId, Opportunity Id, Region etc. The SFDC Record ID is unique for each and every Salesforce object and the Content Connector widget identifies the mapping between ACS and SFDC using this ID. Before executing the content migration code, we would need to make sure all the objects exist in ACS first. Once CSV file is generated, then develop a custom ACS web script to process the CSV file line by line and create the appropriate folder structures and assign metadata values accordingly. Once ready execute the web script to  auto create all folder structures in ACS. A simple code snippet is below.


Warmup Code


   4.   Trigger the Migration code to import all content from SFDC to ACS - Once all the folder hierarchy for the required SFDC objects are setup in ACS, you may now execute the Asynchronous action developed in Step 2. To execute the Java Asynchronous action you may create a simple web script which executes this action. A simple code snippet is below.

var document = companyhome.childByNamePath("trigger.txt"); var sfdcAction = actions.create("get-sfdcopps-attach"); sfdcAction.executeAsynchronously(document); model.result=document;

You may choose to execute this in multiple transactions instead of a single transaction. If there are tens and thousands of documents that need to be imported then the best approach is to have multiple transaction at the migration code level.

Once the migration process is complete, the documents must be located in the relevant folder hierarchy within ACS and its associated object page in SFDC.


ACC Documents


   5.   Validate the content - Once the document import process is complete, make sure to test & validate that all the documents are imported to the appropriate folder hierarchy in ACS and also it is accessible within the Salesforce Content Connector widget. Key things to check are the thumbnails, preview, file names, file content etc.


Hope all this information helps in your next Alfresco Content Services content migration activity.

1.     Project Objective

The aim of this blog is to show you how to create and run a Docker container with a full ELK (Elasticsearch, Logstash and Kibana) environment containing the necessary configuration and scripts to collect and present data to monitor your Alfresco application.


Elastic tools can ease the processing and manipulation of large amounts of data collected from logs, operating system, network, etc.


Elastic tools can be used to search for data such as errors, exceptions and debug entries and to present statistical information such as throughput and response times in a meaningful way. This information is very useful when monitoring and troubleshooting Alfresco systems.


2.     Install Docker on Host machine

Install Docker on your host machine (server) as per Docker website. Please note the Docker Community Edition is sufficient to run this project (


3.     Virtual Memory

Elasticsearch uses a hybrid mmapfs / niofs directory by default to store its indices. The default operating system limits on mmap counts is likely to be too low, which may result in out of memory exceptions.


On Linux, you can increase the limits by running the following command as root on the host machine:


# sysctl -w vm.max_map_count=262144


To set this value permanently, update the vm.max_map_count setting in /etc/sysctl.conf. To verify the value has been applied run:


# sysctl vm.max_map_count


4.     Download “Docker-ELK-Alfresco-Monitoring” container software

Download the software to create the Docker container from GitHub: and extract the files to the file system.


5.     Creating the Docker Image

Before creating the Docker image we need to configure access to Alfresco’s database from the Docker container. Assuming the files have been extracted to /opt/docker-projects/Docker-ELK-Alfresco-Monitoring-master, edit files and and set the access to the DB server as appropriate, for example:


#postgresql settings






Please make sure the database server allows for remote connections to Alfresco’s database. A couple of examples how to configure the database are shown here:

  • For MySQL

Access your database server as an administrator and grant the correct permissions i.e.


# mysql -u root -p

grant all privileges on alfresco.* to alfresco@'%' identified by 'admin';


The grant command is granting access to all tables in ‘alfresco’ database to ‘alfresco’ user from any host using ‘admin’ password.

Also make sure the bind-address parameter in my.cnf allows for external binding i.e. bind-address =


  • For PostgreSQL

Change the file ‘postgresql.conf’ to listen on all interfaces


listen_addresses = '*'


then add an entry in file ‘pg_hba.conf’ to allow connections from any host


host all all trust


Restart PostgreSQL database server to pick up the changes.

We have installed a small java application inside the container in /opt/activities folder that executes calls against the database configured in /opt/activities/ file.

For example to connect to PostgreSQL we have the following settings:







 We also need to set the timezone in the container, this can be done by editing the following entry in the script.


export TZ=GB


From the command line execute the following command to create the Docker image:


# docker build --tag=alfresco-elk /opt/docker-projects/Docker-Alfresco-ELK-Monitoring/

Sending build context to Docker daemon  188.9MB

Step 1/33 : FROM sebp/elk:530

530: Pulling from sebp/elk



6.     Creating the Docker Container

Once the Docker image has been created we can create the container from it by executing the following command:


# docker create -it -p 5601:5601 --name alfresco-elk alfresco-elk:latest


7.     Starting the Docker Container

Once the Docker container has been created it can be started with the following command:


# docker start alfresco-elk


Verify the ELK stack is running by accessing Kibana on http://localhost:5601 on the host machine.

At this point Elasticsearch and Kibana do not have any data…so we need to get Alfresco’s logstash agent up and running to feed some data to Elasticsearch.


8.     Starting logstash-agent

The logstash agent consists of logstash and some other scripts to capture entries from Alfresco log files, JVM stats using jstatbeat (, entries from Alfresco audit tables, DB slow queries, etc.


Copy the logstash-agent folder to a directory on all the servers running Alfresco or Solr applications.

Assuming you have copied logstash-agent folder to /opt/logstash-agent, edit the file /opt/alfresco-agent/ and set the following properties according to your own settings


export tomcatLogs=/opt/alfresco/tomcat/logs

export logstashAgentDir=/opt/logstash-agent

export logstashAgentLogs=${logstashAgentDir}/logs

export alfrescoELKServer=

 9.    Configuring Alfresco to generate data for monitoring

Alfresco needs some additional configuration to produce data to be sent to the monitoring Docker container.


9.1   Alfresco Logs

Alfresco logs i.e. alfresco.log, share.log, solr.log or the equivalent catalina.out can be parsed to provide information such as number of errors or exceptions over a period of time. We can also search these logs for specific data.


The first thing is to make sure the logs are displaying the full date time format at the beginning of each line. This is important so we can display the entries in the correct order.

Make sure in your log4j properties files (there is more than one) the file layout pattern is as follows:


log4j.appender.File.layout.ConversionPattern=%d{yyyy-MM-dd} %d{ABSOLUTE} %-5p [%c] [%t] %m%n


This will produce log entries with the date at the beginning of the line as this one:

2016-09-12 12:16:28,460 INFO  [org.alfresco.repo.admin] [localhost-startStop-1] Connected to database PostgreSQL version 9.3.6

Important Note: If you upload catalina files then don’t upload alfresco (alfresco, share, solr) log files for the same time period since they contain the same entries and you will end up with duplicate entries in the Log Analyser tool.


Once the logs are processed the resulting data is shown:

  • Number errors, warnings, debug, fatal messages, etc over time
  • Total number of errors, warnings, debug, fatal messages, etc
  • Common messages that may reflect issues with the application
  • Number of entries grouped by java class
  • Number of exceptions logged
  • All log files are searchable using ES (Elasticsearch) search syntax



9.2    Document Transformations

Alfresco performs document transformations for document previews, thumbnails, indexing content, etc. To monitor document transformations enable logging for class “TransformerLog”  by adding the following line to tomcat/shared/classes/alfresco/extension/ on all alfresco nodes:


The following is a sample output from alfresco.log file showing document transformation times, document extensions, transformer used, etc.


2016-07-14 18:24:56,003  DEBUG [content.transform.TransformerLog] [pool-14-thread-1] 0 xlsx png  INFO Calculate_Memory_Solr Beta 0.2.xlsx 200.6 KB 897 ms complex.JodConverter.Image<<Complex>>


Once Alfresco logs are processed the following data is shown for transformations:

  • Response time of transformation requests over time
  • Transformation throughput
  • Total count of transformations grouped by file type
  • Document size, transformation time, transformer used, etc



9.3    Tomcat Access Logs

Tomcat access logs can be used to monitor HTTP requests, throughput and response times. In order to get the right data format in the logs we need to add/replace the “Valve” entry in tomcat/conf/server.xml file, normally located at the end of the file, with this one below.





  prefix="access-" suffix=".log"

  pattern='%a %l %u %t "%r" %s %b "%{Referer}i" "%{User-agent}i" %D "%I"'





 For further clarification on the log pattern refer to:

Sample output from tomcat access log under tomcat/logs directory. The important fields here are the HTTP request, the HTTP response status i.e. 200 and the time taken to process the request i.e. 33 milliseconds - CN=Alfresco Repository Client, OU=Unknown, O=Alfresco Software Ltd., L=Maidenhead, ST=UK, C=GB [14/Jul/2016:18:49:45 +0100] "POST /alfresco/service/api/solr/modelsdiff HTTP/1.1" 200 37 "-" "Spring Surf via Apache HttpClient/3.1" 33 "http-bio-8443-exec-10"


Once the Tomcat access logs are processed the following data is shown: 

  • Response time of HTTP requests over time
  • HTTP traffic throughput
  • Total count of responses grouped by HTTP response code
  • Tomcat access logs files are searchable using ES (Elasticsearch) search syntax



9.4    Solr Searches

We can monitor Solr queries and response times by enabling debug for class SolrQueryHTTPClient by adding the following entry to tomcat/shared/classes/alfresco/extension/ on all Alfresco (front end) nodes:


Sample output from alfresco.log file showing Solr searches response times:


DEBUG [impl.solr.SolrQueryHTTPClient] [http-apr-8080-exec-6]    with: {"queryConsistency":"DEFAULT","textAttributes":[],"allAttributes":[],"templates":[{"template":"%(cm:name cm:title cm:description ia:whatEvent ia:descriptionEvent lnk:title lnk:description TEXT TAG)","name":"keywords"}],"authorities":["GROUP_EVERYONE","ROLE_ADMINISTRATOR","ROLE_AUTHENTICATED","admin"],"tenants":[""],"query":"((test.txt  AND (+TYPE:\"cm:content\" +TYPE:\"cm:folder\")) AND -TYPE:\"cm:thumbnail\" AND -TYPE:\"cm:failedThumbnail\" AND -TYPE:\"cm:rating\") AND NOT ASPECT:\"sys:hidden\"","locales":["en"],"defaultNamespace":"","defaultFTSFieldOperator":"OR","defaultFTSOperator":"OR"}

 2016-03-19 19:55:54,106 


DEBUG [impl.solr.SolrQueryHTTPClient] [http-apr-8080-exec-6] Got: 1 in 21 ms


Note: There is no specific transaction id to correlate the Solr search to the corresponding response. The best way to do this is to look at the time when the search and response were logged together with the java thread name, this should give you a match for the query and its response. 

Once Alfresco logs are processed the following data is shown for Solr searches:

  • Response time for Solr searches over time
  • Solr searches throughput
  • Solr queries, number of results found and individual response times



9.5    Database Monitoring

Database performance can be monitored with two different tools: p6spy and packetbeats. The main difference between these tools is that p6spy acts as a proxy jdbc driver and packetbeat is a network traffic sniffer. Also packetbeat can only sniff traffic for MySQL and PostgreSQL databases while p6spy can also do Oracle among others.



P6spy software is delivered as a jar file that needs to be placed in the application class path i.e. tomcat/lib/ folder. There are 3 steps to get p6spy configured and running.


  • Place p6spy jar file in tomcat/lib/ folder
  • Create file also in tomcat/lib/folder with the following configuration





dateformat=MM-dd-yy HH:mm:ss:SS






# Update driver list correct driver i.e.

# driverlist=oracle.jdbc.OracleDriver

# driverlist=org.mariadb.jdbc.Driver

# driverlist=org.postgresql.Driver



# Location where spy.log file will be created



# Set the execution threshold to log queries taking longer than 1000 milliseconds (slow queries only)



Note: if there are no queries taking longer than the value in executionThreshod (in milliseconds) then the file will not be created.

Note: set the “logfile” variable to the logs folder inside the logstash-agent path as shown above.


  • Add entry to tomcat/conf/Catalina/localhost/alfresco.xml file


Example for PostgreSQL:
















Example for Oracle:
















 Example for MariaDB:
















Once the spy.log file has been processed the following information is show:

  • DB Statements execution time over time
  • DB Statements throughput over time
  • Table showing individual DB statements and execution times
  • DB execution times by connection id 



9.6    Alfresco Auditing

If you want to audit Alfresco access you can enable auditing by adding the following entries to file:


# Enable auditing


Now you can monitor all the events generated by alfresco-access audit group.



Note: Only one of the logstash agents should collect Alfresco's audit data since the script gathers data for the whole cluster/solution. So edit the file logstash_agent/ in one of the other Alfresco nodes and set the variable collectAuditData to "yes" as indicated below:




Note: Also make sure you update the login credentials for Alfresco in the audit*sh files. Defaults to admin/admin.


10.    Starting and Stopping the logstash agent

The logstash agent script can be started from the command line with "./ start" as shown below:


./ start
Starting logstash
Starting jstatbeat
Starting dstat
Staring audit access script


and can be stopped with the command "./ stop" as shown below:


./ stop
Stopping logstash
Stopping jstatbeat
Stopping dstat
Stopping audit access script


11. Accessing the Dashboard

Finally access the dashboard by going to this URL http://<docker host IP>:5601 (use the IP of the server where you installed the Docker container) and clicking on the “Dashboard” link on the left panel and then click on the “Activities” link.



The data should be available for the selected time period.



Navigate to the other dashboards by clicking on the appropriate link.



11.    Accessing the container

To enter the running container use the following command:


# docker exec -i -t alfresco-elk bash


And to exit the container just type “exit” and you will find yourself back on the host machine.


12.    Stopping the container

To stop the container from running type the following command on the host machine:


Header 1

# docker stop alfresco-elk


13.    Removing the Docker Container

To delete the container you first need to stop the container and then run the following command:


# docker rm alfresco-elk


14.    Removing the Docker Image

To delete the container you first need to stop the container and then run the following command:


# docker rmi alfresco-elk:latest


15.   Firewall ports


If you have a firewall make sure the following ports are ope:


Redis: 6379

Kibana: 5601

Database server: this depends on the DB server being used i.e. PostgreSQL is 5432, MySQL 3306, etc



Happy Monitoring!!!

Troubleshooting application performance can be tricky, especially when the application in question has dependencies on other systems.  Alfresco's Content Services (ACS) platform is exactly that type of application.  ACS relies on a relational database, an index, a content store, and potentially several other supporting applications to provide a broad set of services related to content management.  What do you do when you have a call that is slow to respond, or a page that seems to take forever to load?  After several years of helping customers diagnose and fix these kinds of issues, my advice is to start at the bottom and work your way up (mostly).


When faced with a performance issue in Alfresco Content Services, the first step is to identify exactly what calls are responding slowly.  If you are using CMIS or the ACS REST API, this is simple enough, you'll know which call is running slowly and exactly how you are calling it.  It's your code making the call, after all.  If you are using an ADF application, Alfresco Share or a custom UI it can become a bit more involved.  Identifying the exact call is straightforward and you can approach this the same way you would approach it for any web application.  I usually use Chrome's built in dev tools for this purpose.  Take as an example the screenshot below, which shows some of the requests captured when loading the Share Document Library on a test site:



In this panel we can see the individual XHR requests that Share uses to populate the document library view.  This is the first place to look if we have a page loading slowly.  Is it the document list that is taking too long to load?  Is it the tag list?  Is it a custom component?  Once we know exactly what call is responding slowly, we can begin to get to the root of our performance issue.


When you start troubleshooting ACS performance, it pays to start at the bottom and work your way up.  This usually means starting at the JVM.  Take a look at your JVM stats with your profiler of choice.  Do you see excessive CPU utilization?  How often is garbage collection running?  Is the system constantly running at or close to your maximum memory allocation?  Is there enough system memory available to the operating system to support the amount that has been allocated to the JVM without swapping?  It is difficult to provide "one size fits all" guidance for JVM tuning as the requirements will vary based on the the type of workload Alfresco is handling. Luis Cabaceira has provided some excellent guidance on this subject in his blog.  I highly recommend his series of articles on performance, tuning and scale.  When troubleshooting ACS performance, start by ensuring you see healthy JVM behavior across all of your application tiers.  Avoid the temptation to just throw more memory at the problem, as this can sometimes make things worse.


Unpacking Search Performance


Assuming that the JVM behavior looks normal, the next step is to look at the other components on which ACS depends.  There are three main subsystems that ACS uses to read / write information:  The database, the index, and the content store.  Before we can start troubleshooting, we need to know which one(s) are being used in the use case that is experiencing a performance problem.  In order to do this, you will need to know a bit about how search is configured on your Alfresco installation.  Depending on the version you have installed, Alfresco Content Services (5.2+) / Alfresco One (4.x / 5.0.x / 5.1.x) supports multiple search subsystem options and configurations.  It could be Solr 1.4, Solr 4, Solr 6 or your queries could be going directly against the database.  If you are on on Alfresco 4.x, your system could also be configured to use the legacy Lucene index subsystem, but that is out of scope for this guide.  The easiest way to find out which index subsystem is in use is to look at the admin console.  Here's a screenshot from my test 5.2 installation that shows the options:



Now that we know for sure which search subsystem is configured, we need to know a little bit more about search configuration.  Alfresco Content Services supports something known as Transactional Metadata Queries.  This feature was added to supplement Solr for certain use cases.  The way Solr and ACS are integrated is "eventually consistent".  That is to say that content added to the repository is not indexed in-transaction.  Instead, Solr queries the repository for change sets, and then indexes those changes.  This makes the whole system more scalable and performant when compared with the older Lucene implementation, especially where large documents are concerned.  The drawback to this is that content is not immediately queryable when it is added.  Transactional Metadata Queries work around this by using the metadata in the database to perform certain types of queries, allowing for immediate results.  When troubleshooting performance, it is important to know exactly what type of query is executed, and whether it runs against the database or the index.  Transactional metadata queries can be independently turned on or off to various degrees for both Alfresco Full Text Search and CMIS.  To find out how your system is configured, we can again rely on the ACS admin console:


The full scope of Transactional Metadata Queries is too broad for this guide, but everything you need to know is in the Alfresco documentation on the topic.  Armed with knowledge of our search subsystem and Transactional Metadata Query configurations, we can get down to the business of troubleshooting our queries.  Given a particular CMIS or AFTS query, how do we know if it is being executed against the DB or the index?  If this is a component running a query you wrote, then you can look at the Transactional Metadata Query documentation to see if Alfresco would try to run it against the database.  If you are troubleshooting a query baked into the product, or you want to see for sure how your own query is being executed, turn on debug logging for class DbOrIndexSwitchingQueryLanguage.  This will tell you for sure exactly where the query in question is being run.


The Database


If you suspect that the cause may be a slow DB query, there are several ways to investigate.  Every DB platform that Alfresco supports for production use has tools to identify slow queries.  That's a good place to start, but sometimes it isn't possible to do because you, as a developer or ACS admin, don't have the right access to the DB to use those tools.  If that's the case you can contact your DBA or you can look at it from the application server side.  To get the app server view of your query performance you again have a few options.  You could use a JDBC proxy driver like Log4JDBC or JDBCSpy that can output query timing to the logs.  It seems that Log4JDBC has seen more recent development so that might be the better choice if you go the proxy driver route.  Another option is to attach a profiler. JProfiler and YourKit both support probing JDBC performance.  YourKit is what we use most often at Alfresco, and here's a small example of what it can show us about our database connections:



With this view it is straightforward to see what queries are taking the most time.  We can also profile DB connection open / close and several other database related bits that may be of interest.  The ACS schema is battle tested and performant at this point in the product lifecycle, but it is fairly common to see slow queries as a result of a database configuration problem, an overloaded shared database server, poor network connection to the database, out of date index statistics or a number of other causes.  If you see a slow query show up during your analysis, you should first check the database server configuration and tuning.  If you suspect a poorly optimized query (which is rare) contact Alfresco support.


One other common source of database related performance woes is the database connection pool.  Alfresco recommends setting the maximum database connection pool size on each cluster node to the number of concurrent application server worker threads + 75 to cover overhead from scheduled jobs, etc.  If you have Tomcat configured to allow for 200 worker threads (200 concurrent HTTP connections) then you'll need to set the database pool maximum size to 275.  Note that this may also require you to increase the limit on the database side as well.  If you have a lot of requests waiting on a connection from the pool that is not going to do good things for performance.


The Index


The other place where a query can run is against the ACS index.  As stated earlier, the index may be one of several types and versions, depending on exactly what version of Alfresco Content Services / Alfresco One you are using and how it is configured.  The good news is that you can get total query execution time the same way no matter which version of Solr your Alfresco installation is using.  To see the exact query that is being run, how long it takes to execute and how many results are being returned, just turn on debug logging for class SolrQueryHttpClient.  This will output debug information to the log that will tell you exactly what queries are being executed and how long each execution takes.  Note that this is the query time as returned in the result set, and should just be the Solr execution time without including the round trip time to / from the server.  This is an important distinction, especially where large result sets are concerned.  If the connection between ACS and the search service is slow then a query may complete very quickly but the results could take a while to arrive back at the application server.  In this case the index performance may be just fine, but the network performance is the bottleneck.


If the queries are running slowly, there are several things to check.  Good Solr performance depends heavily on good underlying disk I/O performance.  Alfresco has some specific recommendations for minimum disk performance.  A fast connection between the index and repository tiers is essential, so make sure that any load balancers or other network hardware that sit between the tiers are providing good performance.  Another thing to check is the Solr cache configuration.  Alfresco's search subsystem provides a number of caches that improve search performance at the cost of additional memory.  Make sure your system is sized appropriately using the guidance Alfresco provides on index memory requirements and cache sizes.  Alfresco's index services and Solr can show you detailed cache statistics that you can use to better understand critical performance factors like hit rates, evictions, warm up times, etc as shown in this screenshot from Alfresco 5.2 with Solr 4:



In the case of a large repository, it might also help to take a deeper look at how sharding is configured including the number of shards and hosts and whether or not the configuration is appropriate.  For example, if you are sharding by ACL and most of your documents have the same permissions, then it's possible the shards are a bit unbalanced and the majority of requests are hitting a single shard.  For this case, sharding by DBID (which ensures an even distribution) might be more appropriate and yield better performance.


It is also possible that a slow running query against the index might need some tuning itself.  The queries that Alfresco uses are well optimized, but if you are developing an extension and want to time your own queries I recommend looking at the Alfresco Javascript Console.  This is one of the best community developed extensions out there, and it can show you execution time for a chunk of Alfresco server-side Javascript.  If all that Javascript does is execute a query, you can get a good idea of your query performance and tweak / tune it accordingly.


The Content Store


Of all of the subsystems used for storing data in Alfresco, the content store is the one that has (typically) the least impact on overall system performance.  The content store is only used when reading / writing content streams.  This may be when a new document is uploaded, when a document is previewed, or when Solr requests a text version of a document for indexing.  Poor content store performance can show itself as long upload times under load, long preview times, or long delays when Solr is indexing content for the first time.  Troubleshooting this means looking at disk utilization or (if the content store resides on a remote filesystem) network utilization.


Into The Code


A full discussion of profiling running code would turn this from an article into a book, but any good systems person should know how to hook up a profiler or APM tool and look for long running calls.  Many Alfresco customers use things like Appdynamics or New Relic to do just that.  Splunk is also a common choice, as is the open source ELK stack.  All of these suites can provide a lot more than just what a profiler can do and can save your team a ton of time and money.  Alfresco's support team also finds JStack thread dumps useful.  If we see a lot of blocked threads that can help narrow down the source of a problem.  Regardless of the tools you choose, setting up good monitoring can help you find emerging performance problems before they become user problems.




This guide is nowhere near comprehensive, but it does cover off some of the most common causes for Alfresco performance issues we have seen in the support world.  In the future we'll do a deeper dive into the index, repository and index tier caching, content transformer selection and execution, permission checking, governance services and other advanced topics in performance and scalability.

If you are serious about Alfresco in your IT infrastructure you most certainly have a "User Acceptance Tests" environment. If you don't... you really should consider setting one up (don't make Common mistakes)!

When you initially set up your environments everything is quiet, users are not using the system yet and you just don't care about data freshness. However, soon after you go live, the production system will start being fed with data. Basically it is this data, stored in Alfresco Content Service, that make your system or application valuable.

When the moment comes to upgrade or deploy a new customization/application, you will obviously test it first on your UAT (or pre-production or test, whatever you call it) environment. Yes you will!

When you do so, having a UAT environment that doesn't have an up-to-date set of data can make the tests pointless or more difficult to interpret. This is also true if you plan to do kick off performance tests. If the tests are done on a data set that is only one third the size of the production data, it's pointless.

Basically that's why you need to refresh your UAT data with production data every now and then or at least when you know it's going to be needed.

The scope of this document is not to provide you with a step by step guide on how to refresh your repository. Alfresco Content Services being a platform, this highly depends on what you actually do with your repository, the kind of customization you are using and 3rd parties apps that could be link to Alfresco anyhow. This document will mainly highlight things you should thoroughly check when refreshing your dataset.



Some thing must be validated before going further:

  • Production & UAT environments should have the same architecture (same number of servers, same components installed, and so on and so forth...)
  • Production & UAT environments should have the same sizing (while you can forget this for functional tests only, this is a true requirement for performance tests of course)
  • Production & UAT environments should be hosted on different, clearly separated networks (mostly if using cluster)


What is it about?

In order to refresh your UAT repository with data from production you will simply go through the normal restore process of an Alfresco repository.

Here I consider backup strategy is not a topic... If you don't have proper backups already set up, that's where you should start: Performing a hot backup | Alfresco Documentation


The required assets to restore are:

  • Alfresco's database
  • Filesystem repository
  • Indexes

Before you start your fresh UAT

There a re a number of things you should check before starting your refreshed environment.


Reconfigure cluster

Although the recommendation is to isolate environments it is better to specify different cluster configuration for both environments. That will allow for a less confusing administration and log analysis and also prevent information leaking from one network to another in case isolation is not that good.

When starting a refreshed UAT cluster, you should always make sure you are setting a cluster password or a cluster name that is different from production cluster. Doing so you prevent yourself from cluster communication to happen between nodes that are actually not part of the same cluster:


Alfresco 4.2 onward

Alfresco pre-4.2


On the Share side, it is possible to change more parameters in order to isolate clusters but we will still apply the same logic for the sake of simplicity. Here you would change the Hazelcast password in the custom-slingshot-application-context.xml configuration file inside the {web-extension} directory.

<hz:topic id="topic" instance-ref="webframework.cluster.slingshot" name="slingshot-topic"/>
   <hz:hazelcast id="webframework.cluster.slingshot">
       <hz:group name="slingshot" password="notthesamepsecret"/>
       <hz:network port="5801" port-auto-increment="true">
           <hz:multicast enabled="true" multicast-group="" multicast-port="54327"/>
           <hz:tcp-ip enabled="false">

Email notifications

It's very unlikely that your UAT environment needs to send emails or notifications to real users. Your production system is already sending digest and other emails to users and you don't want them to get confused because they received similar emails from other systems. So you have to make sure emails are either:

  • Sent to a black hole destination
  • Sent to some other place where users can't see them

If you really don't care about emails generated by Alfresco, then you can choose the "black hole" option. There are many different ways to do that, among which configuring your local MTA to send all emails to a single local user and optionally link his mailbox to /dev/null (with postfix you could use canonical_maps directive and mbox storage). Another way  to do that would be to use the java DevNull SMTP server. It is very simple to use as it is just a jar file you can launch

java -jar -console -p 10025 DevNull.jar

On the other hand, as part of your users tests, you may be interested in knowing and analyzing what emails generated by your Alfresco instance. In this case you could still use previous options. Both are indeed able to store emails instead of swallowing them, postfix by not linking the mbox storage to /dev/null, and DevNull SMTP server by using the "-s /some/path/" option. However storing emails on the filesystem is not really handy if you want to check their content and the way it renders (for instance).

If emails is a matter of interest then you can use other products like mailhog or Both offer an SMTP server that stores emails for you instead of sending it to the outside world, but they also offer a neat way to visualize them, just like a webmail would do.


Mailhog WebUI is a service that also offer advanced features like POP3 (so you can see emails in "real-life" clients), SPAM score testing, content analysis and for subscription based users, collaboration features.


Whatever option is yours, and based on the chosen configuration you'll have to switch the following properties for you UAT Alfresco nodes:

Jobs & FSTR synchronisation

Alfresco allows an administrator to schedule jobs and setup replication to another remote repository.

Scheduled jobs are carried over from production environments to UAT if you cloned environments or proceeded to a backup/restore of production data. However you most certainly don't want the same job to run twice from two different environments.

Defining whether or not a job should run in UAT depends on a lot of factor and is very much related to what the job actually does. Here we cannot give a list of precise actions to take in order to avoid problem. It is the administrator's call to review Scheduled jobs and decide whether or not he should/can disable them.

Jobs to review can be found a in spring bean definitions file like ${extensionRoot}/extension/my-scheduler-context.xml.

One easy way to disable jobs can be to set a cron expression to a far future (or past)


<property name="cronExpression">
    <value>0 50 0 * * 1970</value>

The repository can also hold synchronization jobs. Mainly those jobs that are used in File Transfer Receiver setups.

In that case the administrator surely have to disable such jobs (or at least reconfigure them) as you do not want UAT frozen data to be synced to a remote location where live production data is expected!

Disabling this kind of jobs is pretty simple. You can do it using the Share web UI by going to the "Repository \ Data Dictionary \ Transfers \ Default Target group \ Group1" and edit properties of the "Group1" folder. In the property editor form, just untick the "Activated" checkbox.


Repository ID & Cloud synchronization

Alfresco repository IDs must be universally unique. And of course if you clone environments,  you create duplicated repository IDs. One of the well known issue that can be triggered by duplicate IDs is for hybrid cloud setups where synchronization is enabled between the production environment and the Cloud If your UAT servers connect to the cloud with the production's ID you can be sure synchronization will fail at some point and could even trigger data loss on your production system. You really want to avoid that from happening!

One very easy way to prevent this from happening is to simply disable clouds sync on the UAT environment.


Any string other than "PRODUCTION" can be used here. Also be aware that this property can only be set in file


Also if you are using APIs that need to specify the repository ID in order to request Alfresco (like old CMIS endpoint used to) then such API calls may stop working in UAT as the repo ID is now the one from production (in the case the calls where initially written with a previous ID, and it is not gathered previously - which would be a poor  approach in most cases).

Starting with Alfresco 4.2, CMIS now returns the string "-default-" as a  repository ID, for all new API endpoints (e.g. atompub /alfresco/api/-default-/public/cmis/versions/1.1/atom), while previous endpoint (e.g. atompub /alfresco/cmisatom) returns a Universally Unique IDentifier.


If you think you need to change the repository ID, please contact Alfresco support. It makes the procedure heavier (a re-index is expected) and should be thoroughly planned.


Carrying unwanted configuration

If you stick to the best practices for production, you probably try to have all your configuration in properties or xml files in the {extensionRoot} directory.

But on the other hand, you may sometimes use the great facilities offered by Alfresco enterprise, such you as JMX interface or the admin console. You must then remember those tools will persist configuration information to the database. This means that, when restoring a database from one environment to another one, you may end up starting an Alfresco instance with wrong parameters. 

Here is a quite handy SQL query you can use *before* starting your new Alfresco UAT. It will report all the properties that are stored in the database. You can then make sure none of them is harmful or points to a production system.

SELECT APSVk.string_value AS property, APSVv.string_value AS value
  FROM alf_prop_link APL
    JOIN alf_prop_value APVv ON
    JOIN alf_prop_value APVk ON
    JOIN alf_prop_string_value APSVk ON
    JOIN alf_prop_string_value APSVv ON
WHERE APL.key_prop_id <> APL.value_prop_id
    AND APL.root_prop_id IN (SELECT prop1_id FROM alf_prop_unique_ctx);

                 property                 |                value
alfresco.port                            | 8084

Do not try to delete those entries from the database straight away this is likely to brake things!


If any property is conflicting with the new environment, it should be removed.

Do it wisely! An administrator should ALWAYS prefer using the "Revert" operations available through the JMX interface!

The "revert()" method is available using jconsole, in the Mbean tab, within the appropriate section:

revert property using jconsole

"revert()" may revert more properties than just the one you target. If unsure how to get rid of a single property, please contact alfresco support.


Other typical properties to change:

When moving from environment the properties bellow are likely to be different in UAT environment (that may not be the case for you or you may have others). As said earlier they should be set in the ${extensionroot} folder to a value that is specific to UAT (and they should not be present in database):