Skip navigation
All Places > Alfresco Premier Services

Introduction

Alfresco recently released a new module that allow modern SSO to be setup using the SAML protocol. SAML is a standard which  has a set of specifications defined by the OASIS consortium.

Like kerberos SAML is considered a secure approach to SSO, it involves signing messages and possibly encrypting them ; but unlike kerberos - which is more targeted to local networks or VPNs extended networks - SAML is really good fit for internet and SaaS services. Mainly SAML requires an Identity Provider (often referred to as IdP) and a Service Provider (SP) to communicate together. Many cloud services offer the SAML Service Provider features, and even sometimes the IdP feature (for example google: Set up your own custom SAML application - G Suite Administrator Help).

LemonLDAP::NG is an open-source software that is actually a handler for the httpd Apache webserver. LemonLDAP::NG supports a wide variety of authentication protocol (HTTP header based, CAS, OpenID Connect, OAuth, kerberos, ...) and backends (MySQL, LDAP, Flat files).

 

Pre-requisites & Context

LemonLDAP must be installed and configured with an LDAP backend.
Doing so is out of the scope of this document. Please refer to:

LemonLDAP::NG Download
LemonLDAP::NG DEB install page
LemonLDAP::NG LDAP backend configuration

If you just want to test SAML with LemonLDAP::NG and you don’t want the burden of setting up LDAP and configuring LemonLDAP::NG accordingly, you can use the default “demo” backend which is used be default “out of the box”.
In this case you can use the demo user “dwho” (password “dwho”).

At the moment the Alfresco SAML module doesn’t handle the user registry part of the repo. This means that users have to exist prior to login using SAML.
As a consequence, either Alfresco must be setup with ldap synchronisation enabled - synchronisation should be done against the same directory LemonLDAP::NG uses as an LDAP backend for authentication - or users must have been created by administrators (e.g. using the Share admin console, csv import, People API…)

Both the SAML Identity Provider and the Service Provider must be time synchronized using NTP.

In the document bellow we assert that the ACME company setup their SSO system using LemonLDAP::NG on the acme.com domain.

ComponentURL
authentication portal (where users are redirected in order to login)

https://auth.acme.com

manager (for administration purposes - used further in this document)

https://manager.acme.com


On the other end, their ECM system is hosted on a Debian-like system at alfresco.myecm.org (possibly a SaaS provider or their AWS instance of Alfresco). ACME wants to integrate the Share UI interface with their SSO system.

The Identity Provider

SAMLv2 required libraries

While Alfresco uses opensaml java bindings for its SAML module, LemonLDAP::NG uses the LASSO library perl bindings. Even though LemonLDAP::NG is installed and running, required library may not be installed as they are not direct dependencies.
LASSO is a pretty active project and bugs are fixed regularly. I would then advice to use the latest & greatest version available on their website instead of the one provided by your distribution.
For example if using a Debian based distribution:

$ cat <<EOT | sudo tee /etc/apt/source.list.d/lasso.list deb http://deb.entrouvert.org/ jessie main deb-src http://deb.entrouvert.org/ jessie main EOT
$ sudo wget -O - https://deb.entrouvert.org/entrouvert.gpg | sudo apt-key add -
$ sudo apt-get update
$ sudo apt-get install liblasso-perl

Make sure you are using the latest version of the LASSO library  and its perl bindings (2.5.1-2 fixes some important issues with SHA2)

LemonLDAP::NG SAMLv2 Identity Provider

As you may know SAML extensively uses XML Dsig. As a specification Dsig provides guidance on how to hash, sign and encrypt XML contents.
In SAML, signing and encrypting rely on asymmetric cryptographic keys.
We will then need to generate such keys (here RSA) to sign and possibly encrypt SAML messages. LemonLDAP offers the possibility to uses different keys for signing and encrypting SAML messages. If you plan to use both signing and encryption, please use the same key for both (follow the procedure bellow only once for signing, encryption will use the same key).

Login to the LemonLDAP::NG manager (usually manager.acme.com), in the menu “SAML 2 Service \ Security Parameters \ Signature” and click on “New keys”
Type in a password that will be used to encrypt the private key and remember it!

LemonLDAP::NG signing keys

you’ll need the password in order the generate certificates later on!

We now need to setup the SAML metadata that every Service Provider will use (among which Alfresco Share, and possibly AOS and Alfresco REST APIs).
In the LemonLDAP::NG manager, inside the menu “SAML 2 Service \ Organization”, fill the form with:

Display Name: the ACME company
Name: acme
URL: http://acme.com

Of course you will use values that match your environment

Next in the “General Parameters \ Issuer modules \ SAML” menu, make sure the SAML Issuer module is configured as follow:

Activation: On
Path: ^/saml/
Use rule: On

Note that it is possible to allow SAML connection only under certain condition, by using the “Special rule” option.
You then need to define a Perl expression that return either true or false (more information here).

And That’s it, LemonLDAP::NG is now a SAML Identity Provider!

In order to configure Alfresco Service providers we need to export the signing key as a certificate. To do so, copy the private key that was generated in the LemonLDAP::NG manager to a file (e.g. saml.key) and generate a self-signed cert using this private key.

$ openssl req -new -days 365 -nodes -x509 -key saml.key -out saml.crt

use something like CN=LemonLDAP, OU=Premier Services, O=Alfresco, L=Maidenhead, ST=Berkshire, C=UK as a subject

Keep the saml.crt file somewhere you can find it for later use.

SAMLv2 Service Provider

Install SAML Alfresco module package

The Alfresco SAML module can be downloaded from the Alfresco support portal. Only enterprise customers are entitled to this module.
So, we download alfresco-saml-1.0.x.zip and unzip it to some place. Then, after stopping Alfresco, we copy the amp files to the each amps directories within the alfresco install directory and deploy them.

$ cp alfresco-saml-repo-1.0.1.amp <ALFRESCO_HOME>/amps
$ cp alfresco-saml-share-1.0.1.amp <ALFRESCO_HOME>/amps_share
$ ./bin/apply_amp.sh

We now have to generate the certificate we will be using on the SP side:

$ keytool -genkeypair -alias my-saml-key -keypass change-me -storepass change-me -keystore my-saml.keystore -storetype JCEKS

You can use something like CN=Share, OU=Premier Services, O=Alfresco, L=Maidenhead, ST=Berkshire, C=UK as a subject

You can of course choose to use a different password and alias, just remember them for later use.

The keystore must be copied somewhere and Alfresco configured to retrieve it.

$ mv my-saml.keystore alf_data/keystore
$ cat <<EOT > alf_data/keystore/my-saml.keystore-metadata.properties
aliases=my-saml-key
keystore.password=change-me
my-saml-key.password=change-me
EOT
$ cat <<EOT >> tomcat/shared/classes/alfresco-global.properties

saml.keystore.location=\${dir.keystore}/my-saml.keystore
saml.keystore.keyMetaData.location=\${dir.keystore}/my-saml.keystore-metadata.properties
EOT

Make sure that:

  • the keystore file is readable to Alfresco (and only to alfresco).
  • the alias and passwords match the one you use when generating the keystore with the keytool command

Next step is to merge the whole <filter/> element provided in the saml distribution (in the share-config-custom.xml.sample file), to your own share-config-custom.xml (which should be located in your {extensionroot} directory).
Bellow is an example section of the CSRF policy:

...
    <config evaluator="string-compare" condition="CSRFPolicy" replace="true">

 

    <!--
        If using https make a CSRFPolicy with replace="true" and override the properties section.
        Note, localhost is there to allow local checks to succeed.

 

        I.e.
        <properties>
            <token>Alfresco-CSRFToken</token>
            <referer>https://your-domain.com/.*|http://localhost:8080/.*</referer>
            <origin>https://your-domain.com|http://localhost:8080</origin>
        </properties>
    -->

 

        <filter>

 

            <!-- SAML SPECIFIC CONFIG -  START -->

 

            <!--
             Since we have added the CSRF filter with filter-mapping of "/*" we will catch all public GET's to avoid them
             having to pass through the remaining rules.
             -->
            <rule>
                <request>
                    <method>GET</method>
                    <path>/res/.*</path>
                </request>
            </rule>

 

            <!-- Incoming posts from IDPs do not require a token -->
            <rule>
                <request>
                    <method>POST</method>
                    <path>/page/saml-authnresponse|/page/saml-logoutresponse|/page/saml-logoutrequest</path>
                </request>
            </rule>

 

            <!-- SAML SPECIFIC CONFIG -  STOP -->

 

            <!-- EVERYTHING BELOW FROM HERE IS COPIED FROM share-security-config.xml -->

 

            <!--
             Certain webscripts shall not be allowed to be accessed directly form the browser.
             Make sure to throw an error if they are used.
             -->
            <rule>
                <request>
                    <path>/proxy/alfresco/remoteadm/.*</path>
                </request>
                <action name="throwError">
                    <param name="message">It is not allowed to access this url from your browser</param>
                </action>
            </rule>

 

            <!--
             Certain Repo webscripts should be allowed to pass without a token since they have no Share knowledge.
             TODO: Refactor the publishing code so that form that is posted to this URL is a Share webscript with the right tokens.
             -->
            <rule>
                <request>
                    <method>POST</method>
                    <path>/proxy/alfresco/api/publishing/channels/.+</path>
                </request>
                <action name="assertReferer">
                    <param name="referer">{referer}</param>
                </action>
                <action name="assertOrigin">
                    <param name="origin">{origin}</param>
                </action>
            </rule>

 

            <!--
             Certain Surf POST requests from the WebScript console must be allowed to pass without a token since
             the Surf WebScript console code can't be dependent on a Share specific filter.
             -->
            <rule>
                <request>
                    <method>POST</method>
                    <path>/page/caches/dependency/clear|/page/index|/page/surfBugStatus|/page/modules/deploy|/page/modules/module|/page/api/javascript/debugger|/page/console</path>
                </request>
                <action name="assertReferer">
                    <param name="referer">{referer}</param>
                </action>
                <action name="assertOrigin">
                    <param name="origin">{origin}</param>
                </action>
            </rule>

 

            <!-- Certain Share POST requests does NOT require a token -->
            <rule>
                <request>
                    <method>POST</method>
                    <path>/page/dologin(\?.+)?|/page/site/[^/]+/start-workflow|/page/start-workflow|/page/context/[^/]+/start-workflow</path>
                </request>
                <action name="assertReferer">
                    <param name="referer">{referer}</param>
                </action>
                <action name="assertOrigin">
                    <param name="origin">{origin}</param>
                </action>
            </rule>

 

            <!-- Assert logout is done from a valid domain, if so clear the token when logging out -->
            <rule>
                <request>
                    <method>POST</method>
                    <path>/page/dologout(\?.+)?</path>
                </request>
                <action name="assertReferer">
                    <param name="referer">{referer}</param>
                </action>
                <action name="assertOrigin">
                    <param name="origin">{origin}</param>
                </action>
                <action name="clearToken">
                    <param name="session">{token}</param>
                    <param name="cookie">{token}</param>
                </action>
            </rule>

 

            <!-- Make sure the first token is generated -->
            <rule>
                <request>
                    <session>
                        <attribute name="_alf_USER_ID">.+</attribute>
                        <attribute name="{token}"/>
                        <!-- empty attribute element indicates null, meaning the token has not yet been set -->
                    </session>
                </request>
                <action name="generateToken">
                    <param name="session">{token}</param>
                    <param name="cookie">{token}</param>
                </action>
            </rule>

 

            <!-- Refresh token on new "page" visit when a user is logged in -->
            <rule>
                <request>
                    <method>GET</method>
                    <path>/page/.*</path>
                    <session>
                        <attribute name="_alf_USER_ID">.+</attribute>
                        <attribute name="{token}">.+</attribute>
                    </session>
                </request>
                <action name="generateToken">
                    <param name="session">{token}</param>
                    <param name="cookie">{token}</param>
                </action>
            </rule>

 

            <!--
             Verify multipart requests from logged in users contain the token as a parameter
             and also correct referer & origin header if available
             -->
            <rule>
                <request>
                    <method>POST</method>
                    <header name="Content-Type">multipart/.+</header>
                    <session>
                        <attribute name="_alf_USER_ID">.+</attribute>
                    </session>
                </request>
                <action name="assertToken">
                    <param name="session">{token}</param>
                    <param name="parameter">{token}</param>
                </action>
                <action name="assertReferer">
                    <param name="referer">{referer}</param>
                </action>
                <action name="assertOrigin">
                    <param name="origin">{origin}</param>
                </action>
            </rule>

 

            <!--
             Verify that all remaining state changing requests from logged in users' requests contains a token in the
             header and correct referer & origin headers if available. We "catch" all content types since just setting it to
             "application/json.*" since a webscript that doesn't require a json request body otherwise would be
             successfully executed using i.e."text/plain".
             -->
            <rule>
                <request>
                    <method>POST|PUT|DELETE</method>
                    <session>
                        <attribute name="_alf_USER_ID">.+</attribute>
                    </session>
                </request>
                <action name="assertToken">
                    <param name="session">{token}</param>
                    <param name="header">{token}</param>
                </action>
                <action name="assertReferer">
                    <param name="referer">{referer}</param>
                </action>
                <action name="assertOrigin">
                    <param name="origin">{origin}</param>
                </action>
            </rule>
        </filter>
    </config>
...

Configure SAML Alfresco module

We can now configure the SAML service providers we need. Alfresco offers 3 different service providers that can be configured/enabled separately:

  • Share (the Alfresco collaborative UI)
  • AOS (the new Sharepoint protocol interface)
  • REST api (the Alfresco RESTful api)

Configuration can be done in several ways.

Configuring SAML SP using subsystem files:

The alfresco SAML distribution comes with examples of the SAML configuration files. Reusing them is very convenient and allow quick setup.
We’ll copy the files for the required SP and configure each SP as needed.

$ cp -a ~/saml/alfresco/extension/subsystems tomcat/shared/classes/alfresco/extension

Then to configure Share SP, for example, make sure to rename sample files and make sure they contain the needed properties:

$ mv tomcat/shared/classes/alfresco/extension/subsystems/SAML/share/share/my-custom-share-sp.properties.sample tomcat/shared/classes/alfresco/extension/subsystems/SAML/share/share/my-custom-share-sp.properties

my-custom-share-sp.properties:

saml.sp.isEnabled=true
saml.sp.isEnforced=false
saml.sp.idp.spIssuer.namePrefix=
saml.sp.idp.description=LemonLDAP::NG
saml.sp.idp.sso.request.url=https://auth.acme.com/saml/singleSignOn
saml.sp.idp.slo.request.url=https://auth.acme.com/saml/singleLogout
saml.sp.idp.slo.response.url=https://auth.acme.com/saml/singleLogoutReturn
saml.sp.idp.spIssuer=http://alfresco.myecm.org:8080/share
saml.sp.user.mapping.id=Subject/NameID
saml.sp.idp.certificatePath=${dir.keystore}/saml.crt

Of course you should use URLs matching your domain name!

As the configuration points to the IdP certificate, we’ll also need to copy it to the Alfresco server as well (we generated this certificate earlier) in the alf_data/keystore folder (or any other folder you may have used as dir.keystore).

$ cp saml.crt alf_data/keystore

Configuring SAML SP using the Alfresco admin console:

Configure SAML service provider using the Alfresco admin console (/alfresco/s/enterprise/admin/admin-saml).
Set the following parameters:

Of course you should use URLs matching your domain name!

Bellow is a screenshot of what it would look like:

Force SAML connection unset lets the user login either as a SAML authenticated user, or as another user, using a different subsystem.

Download the metadata and certificates from the bottom of the page, and then import the certificate you generated earlier using openssl in the Alfresco admin console.
To finish with Alfresco configuration, tick the “Enable SAML authentication (SSO)” box.

Create the SAML Service provider on the Identity Provider

The identity provider must be aware of the SP and its configuration. Using the LemonLDAP manager go to the “SAML Service Provider” section and add a new service provider.
Give it a name like “Alfresco-Share”.

Upload the metadata file exported from Alfresco admin console.

Under the newly created SP, in the “Options \ Authentication response” menu set the following parameters:

Default NameID Format: Unspecified
Force NameID session key: uid

Note that you could use whatever session key that is available and fits your needs. Here, uid makes sense for use with Alfresco logins and works for the “Demo” authentication backend in LemonLDAP::NG. If a real LDAP backend is available and Alfresco is syncing users from that same LDAP directory, then the value for the session key used as NameID value should match the ldap.synchronization.userIdAttributeName defined in the Alfresco’s ldap authentication subsystem.

Optionnally, you can also send some more informations in the authentication response to the Share SP. To do so, under the newly created SP is a section called “Exported attributes”. Configure it as follow:

This requires that the appropriate keys are exported as variables whose names are used as "Key name".

So here, it we would have the following LemonLDAP::NG exported variables:

  • fisrtname
  • lastname
  • mail

Hacks ‘n tweaks

At this point, and given you met the pre-requisites, you should be able to login without any problems. However, there may still be some issues with SP-initiated logout (initiating logout from the IdP should work though), depending on the version of SP and IdP you use. Logouts rely on the SAML SLO profile and the way it's implemented in both Alfresco and LemonLDAP at the moment still have some interoperability issues.

On the Alfresco, SAML module version 1.0.1 is impacted by MNT-18064, which prevents SLO from working properly with LemonLDAP::NG. There is a small patch attached to the JIRA that can be used and adapted to match the NameID format used by your IdP (for the configuration described here, that would be "UNSPECIFIED"):

This JIRA is to be fixed in the next alfresco-saml module (probably 1.0.2). In the mean time you can use the patch in the JIRA 

The LemonLDAP::NG project crew kindly considered re-writing their sessionIndex generation algorythm in order to avoid interoperability problems and security issues. This is needed in order to work with Alfresco and should be added in 1.9.11. Thus, previous versions won’t work:

In the mean time you can use the patch attached to LEMONLDAP-1261

Overview

 

One of the most critical and time consuming process in relation to the Alfresco Content Connector for Salesforce module is migrating all the existing Notes & Attachments content from Salesforce instance to the On-Premise Alfresco Content Services (ACS) instance. This requires lot of planning and thought as to how this use case can be implemented and how the content migration process can be completed successfully. When the end users start using the connector widget in Salesforce they will be able to upload new documents and make changes to existing documents etc within Salesforce but the actual content is stored in an ACS instance. Some of the core components/steps involved in this process are

  • Number of documents/attachments that need to be migrated to Alfresco
  • Salesforce API limits available per 24 hour period. Use SOQL query or ACC API functions to retrieve the content. We need to make sure the available API calls are not fully used by the migration activity.
  • Naming conventions - some of the special characters used in SFDC's filename may not be supported in Alfresco, so some consideration must be given to manipulate the filenames in Alfresco when the documents are saved.
  • Perform a full content migration activity in a Salesforce DEV/UAT environment that has a complete copy of production data. This will assist in determining issues such as naming conventions, unsupported documents/document types etc.
  • Custom Alfresco WebScripts/Scheduled Actions to import all content from SFDC to Alfresco
  • Make sure there are enough system resources in the Alfresco server to hold all the content in the repository. This includes disk capacity, heap/memory allocated to JVM, CPU etc. 

We will look in to all these core components in more detail later in this blog post.

 

Why Content Migration?

 

Why would we want to migrate all the existing content from SFDC to Alfresco? of course there are many reasons why this migration activity is required/mandatory for all the businesses/end users who will use this module.

  • End users need to be able to access/find the legacy or existing documents and contracts in the same way as they normally do.
  • Moving content over to ACS instance will save plenty of money for the businesses since the Salesforce storage costs are really expensive. 
  • Easier access to all documents/notes using the ACC search feature available in the widget.
  • Content can be accessed in both Salesforce and Alfresco depending on the end user's preference. Some custom Smart Folders can be configured in ACS to get a holistic view of the contents that the end user require. For example, end users would want to only see the content related to their region. In which case a Smart folder can be configured to query for a Salesforce object based on a particular metadata property value, such as sobjectAccountRegion, sobjectAccount:Regional_Team etc.

 

Steps Involved

 

  1. Determine the total number of Notes & Attachments that must be migrated/imported to ACS - This can be determined by using Salesforce Developer Workbench. Use a SOQL query to get a list of all Notes/Attachments of a specific Salesforce object. For example Accounts, Opportunities etc.SOQL Query

         You may also use the RESTExplorer feature in the workbench to execute this query.

Some sample SOQL queries are      

To retrieve attachments of all Opportunity objects in SFDC - Select Name, Id, ParentId from Attachment where ParentId IN (Select Id from Opportunity)
To retrieve Notes of all Account objects in SFDC - Select Id,(Select Title, Id, ParentId, Body from Notes) from Account

REST Explorer View

  

   2.   Develop Alfresco Webscripts/Aysychronous action to retrieve and migrate all content from SFDC to Alfresco - It is probably a good idea to develop an Alfresco Asynchronous action (Java based) as opposed to a Webscript to perform the migration process. This is to ensure the actual migration runs in the background and there are no timeouts or anything as such we may see using the normal webscipts. Based on the amount of content it take can take few hours for the action to complete successfully. We will use the ACC module's available API's to make a connection to Salesforce and then use the relevant functions to retrieve the content. The code snippet to make a connection and execute a SOQL query are below. To get the content of an Attachment object in SFDC use the getBlob method. It should be something like below,

apiResult = connection.getApi().sObjectsOperations(getInstanceUrl()).getBlob("Attachment", documentId, "Body");

Open a connection to SFDC

 

SOQL Query

 

Once the connection and query are established you can then look to save the file in Alfresco based on the end-user/business needs. The code snippet to save the document in alfresco is below.

 

Code

 

   3.   “Warm up” of Alfresco folder structures without human intervention - One of the prerequsite to the content migration process is pre-creating all the SFDC objects folder structure within ACS and make sure the folders are mapped to the appropriate SFDC object. This can be achieved by exporting all the SFDC objects details in to CSV file. the CSV file must contain information such as SFDC Record Id, AccountId, Opportunity Id, Region etc. The SFDC Record ID is unique for each and every Salesforce object and the Content Connector widget identifies the mapping between ACS and SFDC using this ID. Before executing the content migration code, we would need to make sure all the objects exist in ACS first. Once CSV file is generated, then develop a custom ACS web script to process the CSV file line by line and create the appropriate folder structures and assign metadata values accordingly. Once ready execute the web script to  auto create all folder structures in ACS. A simple code snippet is below.

 

Warmup Code

 

   4.   Trigger the Migration code to import all content from SFDC to ACS - Once all the folder hierarchy for the required SFDC objects are setup in ACS, you may now execute the Asynchronous action developed in Step 2. To execute the Java Asynchronous action you may create a simple web script which executes this action. A simple code snippet is below.

var document = companyhome.childByNamePath("trigger.txt"); var sfdcAction = actions.create("get-sfdcopps-attach"); sfdcAction.executeAsynchronously(document); model.result=document;

You may choose to execute this in multiple transactions instead of a single transaction. If there are tens and thousands of documents that need to be imported then the best approach is to have multiple transaction at the migration code level.

Once the migration process is complete, the documents must be located in the relevant folder hierarchy within ACS and its associated object page in SFDC.

 

ACC Documents

 

   5.   Validate the content - Once the document import process is complete, make sure to test & validate that all the documents are imported to the appropriate folder hierarchy in ACS and also it is accessible within the Salesforce Content Connector widget. Key things to check are the thumbnails, preview, file names, file content etc.

 

Hope all this information helps in your next Alfresco Content Services content migration activity.

1.     Project Objective

The aim of this blog is to show you how to create and run a Docker container with a full ELK (Elasticsearch, Logstash and Kibana) environment containing the necessary configuration and scripts to collect and present data to monitor your Alfresco application.

 

Elastic tools can ease the processing and manipulation of large amounts of data collected from logs, operating system, network, etc.

 

Elastic tools can be used to search for data such as errors, exceptions and debug entries and to present statistical information such as throughput and response times in a meaningful way. This information is very useful when monitoring and troubleshooting Alfresco systems.

 

2.     Install Docker on Host machine

Install Docker on your host machine (server) as per Docker website. Please note the Docker Community Edition is sufficient to run this project (https://www.docker.com/community-edition)

 

3.     Virtual Memory

Elasticsearch uses a hybrid mmapfs / niofs directory by default to store its indices. The default operating system limits on mmap counts is likely to be too low, which may result in out of memory exceptions.

 

On Linux, you can increase the limits by running the following command as root on the host machine:

 

# sysctl -w vm.max_map_count=262144

 

To set this value permanently, update the vm.max_map_count setting in /etc/sysctl.conf. To verify the value has been applied run:

 

# sysctl vm.max_map_count

 

4.     Download “Docker-ELK-Alfresco-Monitoring” container software

Download the software to create the Docker container from GitHub: https://github.com/miguel-rodriguez/Docker-ELK-Alfresco-Monitoring and extract the files to the file system.

 

5.     Creating the Docker Image

Before creating the Docker image we need to configure access to Alfresco’s database from the Docker container. Assuming the files have been extracted to /opt/docker-projects/Docker-ELK-Alfresco-Monitoring-master, edit files activities.properties and workflows.properties and set the access to the DB server as appropriate, for example:

 

#postgresql settings

db_type=postgresql

db_url=jdbc:postgresql://172.17.0.1:5432/alfresco

db_user=alfresco

db_password=admin

 

Please make sure the database server allows for remote connections to Alfresco’s database. A couple of examples how to configure the database are shown here:

  • For MySQL

Access your database server as an administrator and grant the correct permissions i.e.

 

# mysql -u root -p

grant all privileges on alfresco.* to alfresco@'%' identified by 'admin';

 

The grant command is granting access to all tables in ‘alfresco’ database to ‘alfresco’ user from any host using ‘admin’ password.

Also make sure the bind-address parameter in my.cnf allows for external binding i.e. bind-address = 0.0.0.0

 

  • For PostgreSQL

Change the file ‘postgresql.conf’ to listen on all interfaces

 

listen_addresses = '*'

 

then add an entry in file ‘pg_hba.conf’ to allow connections from any host

 

host all all 0.0.0.0/0 trust

 

Restart PostgreSQL database server to pick up the changes.

We have installed a small java application inside the container in /opt/activities folder that executes calls against the database configured in /opt/activities/activities.properties file.

For example to connect to PostgreSQL we have the following settings:

 

db_type=postgresql

db_url=jdbc:postgresql://172.17.0.1:5432/alfresco

db_user=alfresco

db_password=admin

 

 We also need to set the timezone in the container, this can be done by editing the following entry in the startELK.sh script.

 

export TZ=GB

 

From the command line execute the following command to create the Docker image:

 

# docker build --tag=alfresco-elk /opt/docker-projects/Docker-Alfresco-ELK-Monitoring/

Sending build context to Docker daemon  188.9MB

Step 1/33 : FROM sebp/elk:530

530: Pulling from sebp/elk

.......

 

6.     Creating the Docker Container

Once the Docker image has been created we can create the container from it by executing the following command:

 

# docker create -it -p 5601:5601 --name alfresco-elk alfresco-elk:latest

 

7.     Starting the Docker Container

Once the Docker container has been created it can be started with the following command:

 

# docker start alfresco-elk

 

Verify the ELK stack is running by accessing Kibana on http://localhost:5601 on the host machine.

At this point Elasticsearch and Kibana do not have any data…so we need to get Alfresco’s logstash agent up and running to feed some data to Elasticsearch.

 

8.     Starting logstash-agent

The logstash agent consists of logstash and some other scripts to capture entries from Alfresco log files, JVM stats using jstatbeat (https://github.com/cero-t/jstatbeat), entries from Alfresco audit tables, DB slow queries, etc.

 

Copy the logstash-agent folder to a directory on all the servers running Alfresco or Solr applications.

Assuming you have copied logstash-agent folder to /opt/logstash-agent, edit the file /opt/alfresco-agent/run_logstash.sh and set the following properties according to your own settings

 

export tomcatLogs=/opt/alfresco/tomcat/logs

export logstashAgentDir=/opt/logstash-agent

export logstashAgentLogs=${logstashAgentDir}/logs

export alfrescoELKServer=172.17.0.2


 9.    Configuring Alfresco to generate data for monitoring

Alfresco needs some additional configuration to produce data to be sent to the monitoring Docker container.

 

9.1   Alfresco Logs

Alfresco logs i.e. alfresco.log, share.log, solr.log or the equivalent catalina.out can be parsed to provide information such as number of errors or exceptions over a period of time. We can also search these logs for specific data.

 

The first thing is to make sure the logs are displaying the full date time format at the beginning of each line. This is important so we can display the entries in the correct order.

Make sure in your log4j properties files (there is more than one) the file layout pattern is as follows:

 

log4j.appender.File.layout.ConversionPattern=%d{yyyy-MM-dd} %d{ABSOLUTE} %-5p [%c] [%t] %m%n

 

This will produce log entries with the date at the beginning of the line as this one:

2016-09-12 12:16:28,460 INFO  [org.alfresco.repo.admin] [localhost-startStop-1] Connected to database PostgreSQL version 9.3.6

Important Note: If you upload catalina files then don’t upload alfresco (alfresco, share, solr) log files for the same time period since they contain the same entries and you will end up with duplicate entries in the Log Analyser tool.

 

Once the logs are processed the resulting data is shown:

  • Number errors, warnings, debug, fatal messages, etc over time
  • Total number of errors, warnings, debug, fatal messages, etc
  • Common messages that may reflect issues with the application
  • Number of entries grouped by java class
  • Number of exceptions logged
  • All log files are searchable using ES (Elasticsearch) search syntax

 

 

9.2    Document Transformations

Alfresco performs document transformations for document previews, thumbnails, indexing content, etc. To monitor document transformations enable logging for class “TransformerLog”  by adding the following line to tomcat/shared/classes/alfresco/extension/custom-log4j.properties on all alfresco nodes:

 

log4j.logger.org.alfresco.repo.content.transform.TransformerLog=debug

 

The following is a sample output from alfresco.log file showing document transformation times, document extensions, transformer used, etc.

 

2016-07-14 18:24:56,003  DEBUG [content.transform.TransformerLog] [pool-14-thread-1] 0 xlsx png  INFO Calculate_Memory_Solr Beta 0.2.xlsx 200.6 KB 897 ms complex.JodConverter.Image<<Complex>>

 

Once Alfresco logs are processed the following data is shown for transformations:

  • Response time of transformation requests over time
  • Transformation throughput
  • Total count of transformations grouped by file type
  • Document size, transformation time, transformer used, etc

 

 

9.3    Tomcat Access Logs

Tomcat access logs can be used to monitor HTTP requests, throughput and response times. In order to get the right data format in the logs we need to add/replace the “Valve” entry in tomcat/conf/server.xml file, normally located at the end of the file, with this one below.

 

<Valve

  className="org.apache.catalina.valves.AccessLogValve"   

  directory="logs"

  prefix="access-" suffix=".log"

  pattern='%a %l %u %t "%r" %s %b "%{Referer}i" "%{User-agent}i" %D "%I"'

  resolveHosts="false"

/>

 

 

 For further clarification on the log pattern refer to: https://tomcat.apache.org/tomcat-7.0-doc/config/valve.html#Access_Logging

Sample output from tomcat access log under tomcat/logs directory. The important fields here are the HTTP request, the HTTP response status i.e. 200 and the time taken to process the request i.e. 33 milliseconds

 

127.0.0.1 - CN=Alfresco Repository Client, OU=Unknown, O=Alfresco Software Ltd., L=Maidenhead, ST=UK, C=GB [14/Jul/2016:18:49:45 +0100] "POST /alfresco/service/api/solr/modelsdiff HTTP/1.1" 200 37 "-" "Spring Surf via Apache HttpClient/3.1" 33 "http-bio-8443-exec-10"

 

Once the Tomcat access logs are processed the following data is shown: 

  • Response time of HTTP requests over time
  • HTTP traffic throughput
  • Total count of responses grouped by HTTP response code
  • Tomcat access logs files are searchable using ES (Elasticsearch) search syntax

 

  

9.4    Solr Searches

We can monitor Solr queries and response times by enabling debug for class SolrQueryHTTPClient by adding the following entry to tomcat/shared/classes/alfresco/extension/custom-log4j.properties on all Alfresco (front end) nodes:

 

log4j.logger.org.alfresco.repo.search.impl.solr.SolrQueryHTTPClient=debug

 

Sample output from alfresco.log file showing Solr searches response times:

 

DEBUG [impl.solr.SolrQueryHTTPClient] [http-apr-8080-exec-6]    with: {"queryConsistency":"DEFAULT","textAttributes":[],"allAttributes":[],"templates":[{"template":"%(cm:name cm:title cm:description ia:whatEvent ia:descriptionEvent lnk:title lnk:description TEXT TAG)","name":"keywords"}],"authorities":["GROUP_EVERYONE","ROLE_ADMINISTRATOR","ROLE_AUTHENTICATED","admin"],"tenants":[""],"query":"((test.txt  AND (+TYPE:\"cm:content\" +TYPE:\"cm:folder\")) AND -TYPE:\"cm:thumbnail\" AND -TYPE:\"cm:failedThumbnail\" AND -TYPE:\"cm:rating\") AND NOT ASPECT:\"sys:hidden\"","locales":["en"],"defaultNamespace":"http://www.alfresco.org/model/content/1.0","defaultFTSFieldOperator":"OR","defaultFTSOperator":"OR"}

 2016-03-19 19:55:54,106 

 

DEBUG [impl.solr.SolrQueryHTTPClient] [http-apr-8080-exec-6] Got: 1 in 21 ms

 

Note: There is no specific transaction id to correlate the Solr search to the corresponding response. The best way to do this is to look at the time when the search and response were logged together with the java thread name, this should give you a match for the query and its response. 

Once Alfresco logs are processed the following data is shown for Solr searches:

  • Response time for Solr searches over time
  • Solr searches throughput
  • Solr queries, number of results found and individual response times

 

 

9.5    Database Monitoring

Database performance can be monitored with two different tools: p6spy and packetbeats. The main difference between these tools is that p6spy acts as a proxy jdbc driver and packetbeat is a network traffic sniffer. Also packetbeat can only sniff traffic for MySQL and PostgreSQL databases while p6spy can also do Oracle among others.

 

P6spy

P6spy software is delivered as a jar file that needs to be placed in the application class path i.e. tomcat/lib/ folder. There are 3 steps to get p6spy configured and running.

 

  • Place p6spy jar file in tomcat/lib/ folder
  • Create spy.properties file also in tomcat/lib/folder with the following configuration

 

modulelist=com.p6spy.engine.spy.P6SpyFactory,com.p6spy.engine.logging.P6LogFactory,com.p6spy.engine.outage.P6OutageFactory

appender=com.p6spy.engine.spy.appender.FileLogger

deregisterdrivers=true

dateformat=MM-dd-yy HH:mm:ss:SS

appender=com.p6spy.engine.spy.appender.FileLogger

autoflush=true

append=true

useprefix=true

 

# Update driver list correct driver i.e.

# driverlist=oracle.jdbc.OracleDriver

# driverlist=org.mariadb.jdbc.Driver

# driverlist=org.postgresql.Driver

driverlist=org.postgresql.Driver

 

# Location where spy.log file will be created

logfile=/opt/logstash-agent/logs/spy.log

 

# Set the execution threshold to log queries taking longer than 1000 milliseconds (slow queries only)

executionThreshold=1000

 

Note: if there are no queries taking longer than the value in executionThreshod (in milliseconds) then the file will not be created.

Note: set the “logfile” variable to the logs folder inside the logstash-agent path as shown above.

 

  • Add entry to tomcat/conf/Catalina/localhost/alfresco.xml file

 

Example for PostgreSQL:

 

<Resource

  defaultTransactionIsolation="-1"

  defaultAutoCommit="false"

  maxActive="275"

  initialSize="10"

  password="admin"

  username="alfresco"

  url="jdbc:p6spy:postgresql://localhost/p6spy:alfresco"

  driverClassName="com.p6spy.engine.spy.P6SpyDriver"

  type="javax.sql.DataSource"

  auth="Container"

  name="jdbc/dataSource"

/>

 

Example for Oracle:

 

<Resource

  defaultTransactionIsolation="-1"

  defaultAutoCommit="false"

  maxActive="275"

  initialSize="10"

  password="admin"

  username="alfresco"

  url="jdbc:p6spy:oracle:thin:@192.168.56.101:1521:XE"   

  driverClassName="com.p6spy.engine.spy.P6SpyDriver"

  type="javax.sql.DataSource"

  auth="Container"

  name="jdbc/dataSource"

/>

 

 Example for MariaDB:

 

<Resource

  defaultTransactionIsolation="-1"

  defaultAutoCommit="false"

  maxActive="275"

  initialSize="10"

  password="admin"

  username="alfresco"

  url="jdbc:p6spy:mariadb://localhost:3306/alfresco"

  driverClassName="com.p6spy.engine.spy.P6SpyDriver"  

  type="javax.sql.DataSource"

  auth="Container"

  name="jdbc/dataSource"

/>

 

Once the spy.log file has been processed the following information is show:

  • DB Statements execution time over time
  • DB Statements throughput over time
  • Table showing individual DB statements and execution times
  • DB execution times by connection id 

 

 

9.6    Alfresco Auditing

If you want to audit Alfresco access you can enable auditing by adding the following entries to alfresco-global.properties file:

 

# Enable auditing
audit.enabled=true
audit.alfresco-access.enabled=true
audit.tagging.enabled=true
audit.alfresco-access.sub-actions.enabled=true
audit.cmischangelog.enabled=true

 

Now you can monitor all the events generated by alfresco-access audit group.

 

 

Note: Only one of the logstash agents should collect Alfresco's audit data since the script gathers data for the whole cluster/solution. So edit the file logstash_agent/run_logstash.sh in one of the other Alfresco nodes and set the variable collectAuditData to "yes" as indicated below:

 

collectAuditData="yes"

 

Note: Also make sure you update the login credentials for Alfresco in the audit*sh files. Defaults to admin/admin.

 

10.    Starting and Stopping the logstash agent

The logstash agent script can be started from the command line with "./run_logstash.sh start" as shown below:

 

./run_logstash.sh start
Starting logstash
Starting jstatbeat
Starting dstat
Staring audit access script

 

and can be stopped with the command "./run_logstash.sh stop" as shown below:

 

./run_logstash.sh stop
Stopping logstash
Stopping jstatbeat
Stopping dstat
Stopping audit access script

 

11. Accessing the Dashboard

Finally access the dashboard by going to this URL http://<docker host IP>:5601 (use the IP of the server where you installed the Docker container) and clicking on the “Dashboard” link on the left panel and then click on the “Activities” link.

 

 

The data should be available for the selected time period.

 

 

Navigate to the other dashboards by clicking on the appropriate link.

 

 

11.    Accessing the container

To enter the running container use the following command:

 

# docker exec -i -t alfresco-elk bash

 

And to exit the container just type “exit” and you will find yourself back on the host machine.

 

12.    Stopping the container

To stop the container from running type the following command on the host machine:

 

Header 1

# docker stop alfresco-elk

 

13.    Removing the Docker Container

To delete the container you first need to stop the container and then run the following command:

 

# docker rm alfresco-elk

 

14.    Removing the Docker Image

To delete the container you first need to stop the container and then run the following command:

 

# docker rmi alfresco-elk:latest

 

15.   Firewall ports

 

If you have a firewall make sure the following ports are ope:

 

Redis: 6379

Kibana: 5601

Database server: this depends on the DB server being used i.e. PostgreSQL is 5432, MySQL 3306, etc

 

 

Happy Monitoring!!!

Troubleshooting application performance can be tricky, especially when the application in question has dependencies on other systems.  Alfresco's Content Services (ACS) platform is exactly that type of application.  ACS relies on a relational database, an index, a content store, and potentially several other supporting applications to provide a broad set of services related to content management.  What do you do when you have a call that is slow to respond, or a page that seems to take forever to load?  After several years of helping customers diagnose and fix these kinds of issues, my advice is to start at the bottom and work your way up (mostly).

 

When faced with a performance issue in Alfresco Content Services, the first step is to identify exactly what calls are responding slowly.  If you are using CMIS or the ACS REST API, this is simple enough, you'll know which call is running slowly and exactly how you are calling it.  It's your code making the call, after all.  If you are using an ADF application, Alfresco Share or a custom UI it can become a bit more involved.  Identifying the exact call is straightforward and you can approach this the same way you would approach it for any web application.  I usually use Chrome's built in dev tools for this purpose.  Take as an example the screenshot below, which shows some of the requests captured when loading the Share Document Library on a test site:

 

 

In this panel we can see the individual XHR requests that Share uses to populate the document library view.  This is the first place to look if we have a page loading slowly.  Is it the document list that is taking too long to load?  Is it the tag list?  Is it a custom component?  Once we know exactly what call is responding slowly, we can begin to get to the root of our performance issue.

 

When you start troubleshooting ACS performance, it pays to start at the bottom and work your way up.  This usually means starting at the JVM.  Take a look at your JVM stats with your profiler of choice.  Do you see excessive CPU utilization?  How often is garbage collection running?  Is the system constantly running at or close to your maximum memory allocation?  Is there enough system memory available to the operating system to support the amount that has been allocated to the JVM without swapping?  It is difficult to provide "one size fits all" guidance for JVM tuning as the requirements will vary based on the the type of workload Alfresco is handling. Luis Cabaceira has provided some excellent guidance on this subject in his blog.  I highly recommend his series of articles on performance, tuning and scale.  When troubleshooting ACS performance, start by ensuring you see healthy JVM behavior across all of your application tiers.  Avoid the temptation to just throw more memory at the problem, as this can sometimes make things worse.

 

Unpacking Search Performance

 

Assuming that the JVM behavior looks normal, the next step is to look at the other components on which ACS depends.  There are three main subsystems that ACS uses to read / write information:  The database, the index, and the content store.  Before we can start troubleshooting, we need to know which one(s) are being used in the use case that is experiencing a performance problem.  In order to do this, you will need to know a bit about how search is configured on your Alfresco installation.  Depending on the version you have installed, Alfresco Content Services (5.2+) / Alfresco One (4.x / 5.0.x / 5.1.x) supports multiple search subsystem options and configurations.  It could be Solr 1.4, Solr 4, Solr 6 or your queries could be going directly against the database.  If you are on on Alfresco 4.x, your system could also be configured to use the legacy Lucene index subsystem, but that is out of scope for this guide.  The easiest way to find out which index subsystem is in use is to look at the admin console.  Here's a screenshot from my test 5.2 installation that shows the options:

 

 

Now that we know for sure which search subsystem is configured, we need to know a little bit more about search configuration.  Alfresco Content Services supports something known as Transactional Metadata Queries.  This feature was added to supplement Solr for certain use cases.  The way Solr and ACS are integrated is "eventually consistent".  That is to say that content added to the repository is not indexed in-transaction.  Instead, Solr queries the repository for change sets, and then indexes those changes.  This makes the whole system more scalable and performant when compared with the older Lucene implementation, especially where large documents are concerned.  The drawback to this is that content is not immediately queryable when it is added.  Transactional Metadata Queries work around this by using the metadata in the database to perform certain types of queries, allowing for immediate results.  When troubleshooting performance, it is important to know exactly what type of query is executed, and whether it runs against the database or the index.  Transactional metadata queries can be independently turned on or off to various degrees for both Alfresco Full Text Search and CMIS.  To find out how your system is configured, we can again rely on the ACS admin console:

 

The full scope of Transactional Metadata Queries is too broad for this guide, but everything you need to know is in the Alfresco documentation on the topic.  Armed with knowledge of our search subsystem and Transactional Metadata Query configurations, we can get down to the business of troubleshooting our queries.  Given a particular CMIS or AFTS query, how do we know if it is being executed against the DB or the index?  If this is a component running a query you wrote, then you can look at the Transactional Metadata Query documentation to see if Alfresco would try to run it against the database.  If you are troubleshooting a query baked into the product, or you want to see for sure how your own query is being executed, turn on debug logging for class DbOrIndexSwitchingQueryLanguage.  This will tell you for sure exactly where the query in question is being run.

 

The Database

 

If you suspect that the cause may be a slow DB query, there are several ways to investigate.  Every DB platform that Alfresco supports for production use has tools to identify slow queries.  That's a good place to start, but sometimes it isn't possible to do because you, as a developer or ACS admin, don't have the right access to the DB to use those tools.  If that's the case you can contact your DBA or you can look at it from the application server side.  To get the app server view of your query performance you again have a few options.  You could use a JDBC proxy driver like Log4JDBC or JDBCSpy that can output query timing to the logs.  It seems that Log4JDBC has seen more recent development so that might be the better choice if you go the proxy driver route.  Another option is to attach a profiler. JProfiler and YourKit both support probing JDBC performance.  YourKit is what we use most often at Alfresco, and here's a small example of what it can show us about our database connections:

 

 

With this view it is straightforward to see what queries are taking the most time.  We can also profile DB connection open / close and several other database related bits that may be of interest.  The ACS schema is battle tested and performant at this point in the product lifecycle, but it is fairly common to see slow queries as a result of a database configuration problem, an overloaded shared database server, poor network connection to the database, out of date index statistics or a number of other causes.  If you see a slow query show up during your analysis, you should first check the database server configuration and tuning.  If you suspect a poorly optimized query (which is rare) contact Alfresco support.

 

One other common source of database related performance woes is the database connection pool.  Alfresco recommends setting the maximum database connection pool size on each cluster node to the number of concurrent application server worker threads + 75 to cover overhead from scheduled jobs, etc.  If you have Tomcat configured to allow for 200 worker threads (200 concurrent HTTP connections) then you'll need to set the database pool maximum size to 275.  Note that this may also require you to increase the limit on the database side as well.  If you have a lot of requests waiting on a connection from the pool that is not going to do good things for performance.

 

The Index

 

The other place where a query can run is against the ACS index.  As stated earlier, the index may be one of several types and versions, depending on exactly what version of Alfresco Content Services / Alfresco One you are using and how it is configured.  The good news is that you can get total query execution time the same way no matter which version of Solr your Alfresco installation is using.  To see the exact query that is being run, how long it takes to execute and how many results are being returned, just turn on debug logging for class SolrQueryHttpClient.  This will output debug information to the log that will tell you exactly what queries are being executed and how long each execution takes.  Note that this is the query time as returned in the result set, and should just be the Solr execution time without including the round trip time to / from the server.  This is an important distinction, especially where large result sets are concerned.  If the connection between ACS and the search service is slow then a query may complete very quickly but the results could take a while to arrive back at the application server.  In this case the index performance may be just fine, but the network performance is the bottleneck.

 

If the queries are running slowly, there are several things to check.  Good Solr performance depends heavily on good underlying disk I/O performance.  Alfresco has some specific recommendations for minimum disk performance.  A fast connection between the index and repository tiers is essential, so make sure that any load balancers or other network hardware that sit between the tiers are providing good performance.  Another thing to check is the Solr cache configuration.  Alfresco's search subsystem provides a number of caches that improve search performance at the cost of additional memory.  Make sure your system is sized appropriately using the guidance Alfresco provides on index memory requirements and cache sizes.  Alfresco's index services and Solr can show you detailed cache statistics that you can use to better understand critical performance factors like hit rates, evictions, warm up times, etc as shown in this screenshot from Alfresco 5.2 with Solr 4:

 

 

In the case of a large repository, it might also help to take a deeper look at how sharding is configured including the number of shards and hosts and whether or not the configuration is appropriate.  For example, if you are sharding by ACL and most of your documents have the same permissions, then it's possible the shards are a bit unbalanced and the majority of requests are hitting a single shard.  For this case, sharding by DBID (which ensures an even distribution) might be more appropriate and yield better performance.

 

It is also possible that a slow running query against the index might need some tuning itself.  The queries that Alfresco uses are well optimized, but if you are developing an extension and want to time your own queries I recommend looking at the Alfresco Javascript Console.  This is one of the best community developed extensions out there, and it can show you execution time for a chunk of Alfresco server-side Javascript.  If all that Javascript does is execute a query, you can get a good idea of your query performance and tweak / tune it accordingly.

 

The Content Store

 

Of all of the subsystems used for storing data in Alfresco, the content store is the one that has (typically) the least impact on overall system performance.  The content store is only used when reading / writing content streams.  This may be when a new document is uploaded, when a document is previewed, or when Solr requests a text version of a document for indexing.  Poor content store performance can show itself as long upload times under load, long preview times, or long delays when Solr is indexing content for the first time.  Troubleshooting this means looking at disk utilization or (if the content store resides on a remote filesystem) network utilization.

 

Into The Code

 

A full discussion of profiling running code would turn this from an article into a book, but any good systems person should know how to hook up a profiler or APM tool and look for long running calls.  Many Alfresco customers use things like Appdynamics or New Relic to do just that.  Splunk is also a common choice, as is the open source ELK stack.  All of these suites can provide a lot more than just what a profiler can do and can save your team a ton of time and money.  Alfresco's support team also finds JStack thread dumps useful.  If we see a lot of blocked threads that can help narrow down the source of a problem.  Regardless of the tools you choose, setting up good monitoring can help you find emerging performance problems before they become user problems.

 

Conclusion

 

This guide is nowhere near comprehensive, but it does cover off some of the most common causes for Alfresco performance issues we have seen in the support world.  In the future we'll do a deeper dive into the index, repository and index tier caching, content transformer selection and execution, permission checking, governance services and other advanced topics in performance and scalability.

If you are serious about Alfresco in your IT infrastructure you most certainly have a "User Acceptance Tests" environment. If you don't... you really should consider setting one up (don't make Common mistakes)!

When you initially set up your environments everything is quiet, users are not using the system yet and you just don't care about data freshness. However, soon after you go live, the production system will start being fed with data. Basically it is this data, stored in Alfresco Content Service, that make your system or application valuable.

When the moment comes to upgrade or deploy a new customization/application, you will obviously test it first on your UAT (or pre-production or test, whatever you call it) environment. Yes you will!

When you do so, having a UAT environment that doesn't have an up-to-date set of data can make the tests pointless or more difficult to interpret. This is also true if you plan to do kick off performance tests. If the tests are done on a data set that is only one third the size of the production data, it's pointless.

Basically that's why you need to refresh your UAT data with production data every now and then or at least when you know it's going to be needed.

The scope of this document is not to provide you with a step by step guide on how to refresh your repository. Alfresco Content Services being a platform, this highly depends on what you actually do with your repository, the kind of customization you are using and 3rd parties apps that could be link to Alfresco anyhow. This document will mainly highlight things you should thoroughly check when refreshing your dataset.

 

Prerequisites

Some thing must be validated before going further:

  • Production & UAT environments should have the same architecture (same number of servers, same components installed, and so on and so forth...)
  • Production & UAT environments should have the same sizing (while you can forget this for functional tests only, this is a true requirement for performance tests of course)
  • Production & UAT environments should be hosted on different, clearly separated networks (mostly if using cluster)

 

What is it about?

In order to refresh your UAT repository with data from production you will simply go through the normal restore process of an Alfresco repository.

Here I consider backup strategy is not a topic... If you don't have proper backups already set up, that's where you should start: Performing a hot backup | Alfresco Documentation

      

The required assets to restore are:

  • Alfresco's database
  • Filesystem repository
  • Indexes

Before your start your fresh UAT

There a re a number of things you should check before starting your refreshed environment.

 

Reconfigure cluster

Although the recommendation is to isolate environments it is better to specify different cluster configuration for both environments. That will allow for a less confusing administration and log analysis and also prevent information leaking from one network to another in case isolation is not that good.

When starting a refreshed UAT cluster, you should always make sure you are setting a cluster password or a cluster name that is different from production cluster. Doing so you prevent yourself from cluster communication to happen between nodes that are actually not part of the same cluster:

alfresco.hazelcast.password=someotherpassword

Alfresco 4.2 onward

   
alfresco.cluster.name=uatCluster

Alfresco pre-4.2

   

On the Share side, it is possible to change more parameters in order to isolate clusters but we will still apply the same logic for the sake of simplicity. Here you would change the Hazelcast password in the custom-slingshot-application-context.xml configuration file inside the {web-extension} directory.

<hz:topic id="topic" instance-ref="webframework.cluster.slingshot" name="slingshot-topic"/>
   <hz:hazelcast id="webframework.cluster.slingshot">
     <hz:config>
       <hz:group name="slingshot" password="notthesamepsecret"/>
       <hz:network port="5801" port-auto-increment="true">
         <hz:join>
           <hz:multicast enabled="true" multicast-group="224.2.2.5" multicast-port="54327"/>
           <hz:tcp-ip enabled="false">
             <hz:members></hz:members>
           </hz:tcp-ip>
        </hz:join>
...

Email notifications

It's very unlikely that your UAT environment needs to send emails or notifications to real users. Your production system is already sending digest and other emails to users and you don't want them to get confused because they received similar emails from other systems. So you have to make sure emails are either:

  • Sent to a black hole destination
  • Sent to some other place where users can't see them

If you really don't care about emails generated by Alfresco, then you can choose the "black hole" option. There are many different ways to do that, among which configuring your local MTA to send all emails to a single local user and optionally link his mailbox to /dev/null (with postfix you could use canonical_maps directive and mbox storage). Another way  to do that would be to use the java DevNull SMTP server. It is very simple to use as it is just a jar file you can launch

java -jar -console -p 10025 DevNull.jar

On the other hand, as part of your users tests, you may be interested in knowing and analyzing what emails generated by your Alfresco instance. In this case you could still use previous options. Both are indeed able to store emails instead of swallowing them, postfix by not linking the mbox storage to /dev/null, and DevNull SMTP server by using the "-s /some/path/" option. However storing emails on the filesystem is not really handy if you want to check their content and the way it renders (for instance).

If emails is a matter of interest then you can use other products like mailhog or mailtrap.io. Both offer an SMTP server that stores emails for you instead of sending it to the outside world, but they also offer a neat way to visualize them, just like a webmail would do.

 

Mailhog WebUI

Mailtrap.io is a service that also offer advanced features like POP3 (so you can see emails in "real-life" clients), SPAM score testing, content analysis and for subscription based users, collaboration features.

 

Whatever option is yours, and based on the chosen configuration you'll have to switch the following properties for you UAT Alfresco nodes:

mail.host
mail.port
mail.smtp.auth
mail.smtps.auth
mail.username
mail.password
mail.smtp.starttls.enable
mail.protocol

Jobs & FSTR synchronisation

Alfresco allows an administrator to schedule jobs and setup replication to another remote repository.

Scheduled jobs are carried over from production environments to UAT if you cloned environments or proceeded to a backup/restore of production data. However you most certainly don't want the same job to run twice from two different environments.

Defining whether or not a job should run in UAT depends on a lot of factor and is very much related to what the job actually does. Here we cannot give a list of precise actions to take in order to avoid problem. It is the administrator's call to review Scheduled jobs and decide whether or not he should/can disable them.

Jobs to review can be found a in spring bean definitions file like ${extensionRoot}/extension/my-scheduler-context.xml.

One easy way to disable jobs can be to set a cron expression to a far future (or past)

 

<property name="cronExpression">
    <value>0 50 0 * * 1970</value>
</property>

The repository can also hold synchronization jobs. Mainly those jobs that are used in File Transfer Receiver setups.

In that case the administrator surely have to disable such jobs (or at least reconfigure them) as you do not want UAT frozen data to be synced to a remote location where live production data is expected!

Disabling this kind of jobs is pretty simple. You can do it using the Share web UI by going to the "Repository \ Data Dictionary \ Transfers \ Default Target group \ Group1" and edit properties of the "Group1" folder. In the property editor form, just untick the "Activated" checkbox.

 

Repository ID & Cloud synchronization

Alfresco repository IDs must be universally unique. And of course if you clone environments,  you create duplicated repository IDs. One of the well known issue that can be triggered by duplicate IDs is for hybrid cloud setups where synchronization is enabled between the production environment and the Cloud my.alfresco.com. If your UAT servers connect to the cloud with the production's ID you can be sure synchronization will fail at some point and could even trigger data loss on your production system. You really want to avoid that from happening!

One very easy way to prevent this from happening is to simply disable clouds sync on the UAT environment.

system.serverMode=UAT

Any string other than "PRODUCTION" can be used here. Also be aware that this property can only be set in alfresco-global.properties file

 

Also if you are using APIs that need to specify the repository ID in order to request Alfresco (like old CMIS endpoint used to) then such API calls may stop working in UAT as the repo ID is now the one from production (in the case the calls where initially written with a previous ID, and it is not gathered previously - which would be a poor  approach in most cases).

Starting with Alfresco 4.2, CMIS now returns the string "-default-" as a  repository ID, for all new API endpoints (e.g. atompub /alfresco/api/-default-/public/cmis/versions/1.1/atom), while previous endpoint (e.g. atompub /alfresco/cmisatom) returns a Universally Unique IDentifier.

 

If you think you need to change the repository ID, please contact Alfresco support. It makes the procedure heavier (a re-index is expected) and should be thoroughly planned.

 

Carrying unwanted configuration

If you stick to the best practices for production, you probably try to have all your configuration in properties or xml files in the {extensionRoot} directory.

But on the other hand, you may sometimes use the great facilities offered by Alfresco enterprise, such you as JMX interface or the admin console. You must then remember those tools will persist configuration information to the database. This means that, when restoring a database from one environment to another one, you may end up starting an Alfresco instance with wrong parameters. 

Here is a quite handy SQL query you can use *before* starting your new Alfresco UAT. It will report all the properties that are stored in the database. You can then make sure none of them is harmful or points to a production system.

SELECT APSVk.string_value AS property, APSVv.string_value AS value
  FROM alf_prop_link APL
    JOIN alf_prop_value APVv ON APL.value_prop_id=APVv.id
    JOIN alf_prop_value APVk ON APL.key_prop_id=APVk.id
    JOIN alf_prop_string_value APSVk ON APVk.long_value=APSVk.id
    JOIN alf_prop_string_value APSVv ON APVv.long_value=APSVv.id
WHERE APL.key_prop_id <> APL.value_prop_id
    AND APL.root_prop_id IN (SELECT prop1_id FROM alf_prop_unique_ctx);

                 property                 |                value
------------------------------------------+--------------------------------------
alfresco.port                            | 8084

Do not try to delete those entries from the database straight away this is likely to brake things!

   

If any property is conflicting with the new environment, it should be removed.

Do it wisely! An administrator should ALWAYS prefer using the "Revert" operations available through the JMX interface!

The "revert()" method is available using jconsole, in the Mbean tab, within the appropriate section:

revert property using jconsole

"revert()" may revert more properties than just the one you target. If unsure how to get rid of a single property, please contact alfresco support.

   

Other typical properties to change:

When moving from environment the properties bellow are likely to be different in UAT environment (that may not be the case for you or you may have others). As said earlier they should be set in the ${extensionroot} folder to a value that is specific to UAT (and they should not be present in database):

ldap.authentication.java.naming.provider.url
ldap.synchronization.java.naming.security.principal
ldap.synchronization.java.naming.security.credentials
solr.host
solr.port
fmalagrino

WCMQS Part I

Posted by fmalagrino Employee Apr 28, 2017

What is WCMQS ?

Alfresco Web Quick Start is a set of website design templates and sample architecture, built on the Alfresco Share content management and collaboration framework.

With Quick Start, developers can rapidly build customized and dynamic web applications with powerful content management features for the business users without having to start from scratch.

Using standard development tools developers can quickly deploy the comprehensive content management capabilities of Alfresco to build new and innovative web applications. Developed using the Spring framework with Alfresco Surf, the Web Quick Start allows developers to easily extend Alfresco to add new features to support the demands of the business.

Why is good to know WCMQS ?

  • WCMQS is a powerful component for creating a website
  • You can customize your website as you wish
  • You can create web scripts in JavaScript or Java
  • You can use any JavaScript frameworks like AngularJS, ExpressJS etc, and libraries like jquery and also any responsive frameworks like Bootstrap foundation etc

 

How do you Install WCMQS ?

 

There are two ways to install WCMQS in your alfresco application:

1) If you use the installer you need tick the checkbox Web Quick Start 

2) if you are installing manually Alfresco using the war then you need only to add the war WCMQS to your alfresco application.

 

For checking that WCMQS have been installed correctly on your Alfresco application you can create a collaboration site:

 

 

 

After you have created your collaboration site it is time to add the web quick start dashlet.

 

Below, the screen captures will explain how to add a dashlet on Alfresco.

 

 

Choose "Add Dashlet"

 

 

Drag the "Web Quick Start" into one of the columns

 

The added dashlet will grant you access to two prototype websites: 

Finance (single language)

Government (multilanguage)

 

In my example I select Finance.

 

After you select one of the two prototypes it will create an example website inside the document library.

 

 

Configuring WCMQS :

 

After the installation it is time to configure WCMQS.

 

By default, WCMQS create two folders: One for editorial and one for live content:

 

Editorial and Live folder must have two different host as properties configured. By default the Editorial has the host "localhost" while live is be default configured to "127.0.0.1". Both are configured on port 8080 with the context name "wcmqs"

 

 

 

 

 

If you have a different port and you want a different name you need click the Editorial or Live and modify the properties.

 

In my example, the port is 8888 (because my alfresco run in port 8888) and my website will be called test.

 

 

Same configuration for live (apart from the host name):

 

 

#Once you decided how the site is called you need go under tomcat/webapps/ copy the WCMQS.war and rename to the name of your site.war. Alfresco will create your new site. Restart tomcat and now is time to test your site going to your "host":"port"/"context" of the site in my case will be localhost:8888/test/

 

 

If you have done all correctly you should see the following site :

 

The aim of this blog is to introduce you to Enterprise Integration Patterns and to show you how to create an application to integrate Alfresco with an external application…in this case we will be sending documents on request from Alfresco to Box based on CMIS queries. We will store both the content and the metadata in Box.

 

1.    Enterprise Integration Patterns

EIP (Enterprise Integration Patters) defines a language consisting of 65 integration patterns (http://www.enterpriseintegrationpatterns.com/patterns/messaging/toc.html) to establish a technology-independent vocabulary and a visual notation to design and document integration solutions.

Why EIP? Today's applications rarely live in isolation. Architecting integration solutions is a complex task.

The lack of a common vocabulary and body of knowledge for asynchronous messaging architectures make it difficult to avoid common pitfalls.

For example the following diagram shows how content from one application is routed and transformed to be delivered to another application. Each step can be further detailed with specific annotations.

 

 

  • Channel Patterns describe how messages are transported across a Message Channel. These patterns are implemented by most commercial and open source messaging systems.
  • Message Construction Patterns describe the intent, form and content of the messages that travel across the messaging system.
  • Routing Patterns discuss how messages are routed from a sender to the correct receiver. Message routing patterns consume a message from one channel and republish it message, usually without modification, to another channel based on a set of conditions.
  • Transformation Patterns change the content of a message, for example to accommodate different data formats used by the sending and the receiving system. Data may have to be added, taken away or existing data may have to be rearranged.
  • Endpoint Patterns describe how messaging system clients produce or consume messages.
  • System Management Patterns describe the tools to keep a complex message-based system running, including dealing with error conditions, performance bottlenecks and changes in the participating systems.

 

The following example shows how to maintain the overall message flow when processing a message consisting of multiple elements, each of which may require different processing.

               

 

2.    Apache Camel

Apache Camel (http://camel.apache.org/) is an integration framework whose main goal is to make integration easier. It implements many of the EIP patterns and allows you to focus on solving business problems, freeing you from the burden of plumbing.

At a high level, Camel is composed of components, routes and processors. All of these are contained within the CamelContext .

 

The CamelContext provides access to many useful services, the most notable being components, type converters, a registry, endpoints, routes, data formats, and languages.

 

Service

Description

Components

A Component is essentially a factory of Endpoint instances. To date, there are over 80 components in the Camel ecosystem that range in function from data transports, to DSL s, data formats, and so on i.e. cmis, http, box, salesforce, ftp, smtp, etc

Endpoints

An endpoint is the Camel abstraction that models the end of a channel through which a system can send or receive messages. Endpoints are usually created by a Component and Endpoints are usually referred to in the DSL via their URIs i.e. cmis://cmisServerUrl[?options]

Routes

The steps taken to send a message from one end point to another end point.

Type Converters

Camel provides a built-in type-converter system that automatically converts between well-known types. This system allows Camel components to easily work together without having type mismatches.

Data Formats

Allow messages to be marshaled to and from binary or text formats to support a kind of Message Translator i.e. gzip, json, csv, crypto, etc

Registry

Contains a registry that allows you to look up beans i.e. use a bean that defines the jdbc data source

Languages

To wire processors and endpoints together to form routes, Camel defines a DSL. DSL include among other Java, Groovy, Scala, Spring XML.

 

3.    Building an Integration Application

 

The aim of the application is to send documents on request from Alfresco to Box. We will store both the content and the metadata in Box.

To build an EIP Application we are going to use:

  • Maven to build the application
  • Spring-boot to run the application
  • Apache Camel to integrate Alfresco and Box

 

The full source code is available on GitHub: https://github.com/miguel-rodriguez/Alfresco-Camel

 

The basic message flow is as follows:

 


 

 

3.1          Maven

Apache Maven (https://maven.apache.org/) is a software project management and comprehension tool. Based on the concept of a project object model (POM), Maven can manage a project's build, reporting and documentation from a central piece of information.

 

3.1.1    Maven Pom.xml

For our project the pom.xml brings the required dependencies such as Camel and ActiveMQ. The pom.xml file looks like this:

 

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

    <modelVersion>4.0.0</modelVersion>

   

    <groupId>support.alfresco</groupId>

    <artifactId>camel</artifactId>

    <name>Spring Boot + Camel</name>

    <version>0.0.1-SNAPSHOT</version>

    <description>Project Example.</description>

 

    <!-- Using Spring-boot 1.4.3 -->

    <parent>

        <groupId>org.springframework.boot</groupId>

        <artifactId>spring-boot-starter-parent</artifactId>

        <version>1.4.3.RELEASE</version>

    </parent>

 

    <!-- Using Camel version 2.18.1 -->

    <properties>

        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

        <camel-version>2.18.1</camel-version>

        <app.version>1.0-SNAPSHOT</app.version>

    </properties>

 

    <!-- Spring -->

    <dependencies>

        <dependency>

            <groupId>org.springframework.boot</groupId>

            <artifactId>spring-boot-starter-web</artifactId>

        </dependency>

 

        <!-- The Core Camel Java DSL based router -->

        <dependency>

            <groupId>org.apache.camel</groupId>

            <artifactId>camel-core</artifactId>

            <version>${camel-version}</version>

        </dependency>

 

        <!-- Camel Spring support -->

        <dependency>

            <groupId>org.apache.camel</groupId>

            <artifactId>camel-spring</artifactId>

            <version>${camel-version}</version>

        </dependency>

 

        <!-- Camel Metrics based monitoring component -->

        <dependency>

            <groupId>org.apache.camel</groupId>

            <artifactId>camel-metrics</artifactId>

            <version>${camel-version}</version>

        </dependency>

 

        <!-- Camel JMS support -->

        <dependency>

            <groupId>org.apache.camel</groupId>

            <artifactId>camel-jms</artifactId>

            <version>${camel-version}</version>

        </dependency>

 

        <!-- ActiveMQ component for Camel -->

        <dependency>

            <groupId>org.apache.activemq</groupId>

            <artifactId>activemq-camel</artifactId>

        </dependency>

 

        <!-- Camel CMIS which is based on Apache Chemistry support -->

        <dependency>

            <groupId>org.apache.camel</groupId>

            <artifactId>camel-cmis</artifactId>

            <version>2.14.1</version>

        </dependency>

 

        <!-- Camel Stream (System.in, System.out, System.err) support -->

        <dependency>

            <groupId>org.apache.camel</groupId>

            <artifactId>camel-stream</artifactId>

            <version>${camel-version}</version>

        </dependency>

 

        <!-- Camel JSON Path Language -->

        <dependency>

            <groupId>org.apache.camel</groupId>

            <artifactId>camel-jsonpath</artifactId>

            <version>${camel-version}</version>

        </dependency>

 

       <!-- Apache HttpComponents HttpClient - MIME coded entities -->

        <dependency>

            <groupId>org.apache.httpcomponents</groupId>

            <artifactId>httpmime</artifactId>

        </dependency>

 

        <!-- Camel HTTP (Apache HttpClient 4.x) support -->

        <dependency>

            <groupId>org.apache.camel</groupId>

            <artifactId>camel-http4</artifactId>

            <version>${camel-version}</version>

        </dependency>

 

        <!-- Camel SQL support -->

        <dependency>

            <groupId>org.apache.camel</groupId>

            <artifactId>camel-sql</artifactId>

            <version>${camel-version}</version>

        </dependency>

 

        <!-- Camel Zip file support -->

        <dependency>

            <groupId>org.apache.camel</groupId>

            <artifactId>camel-zipfile</artifactId>

            <version>${camel-version}</version>

        </dependency>

 

        <!-- Support for PostgreSQL database -->

        <dependency>

            <groupId>org.postgresql</groupId>

            <artifactId>postgresql</artifactId>

            <exclusions>

                <exclusion>

                    <groupId>org.slf4j</groupId>

                    <artifactId>slf4j-simple</artifactId>

                </exclusion>

            </exclusions>

        </dependency>

 

        <!-- Camel Component for Box.com -->

        <dependency>

            <groupId>org.apache.camel</groupId>

            <artifactId>camel-box</artifactId>

            <version>${camel-version}</version>

        </dependency>

 

        <!-- Camel script support -->

        <dependency>

            <groupId>org.apache.camel</groupId>

            <artifactId>camel-script</artifactId>

            <version>${camel-version}</version>

        </dependency>

 

        <!-- A simple Java toolkit for JSON -->

        <dependency>

            <groupId>com.googlecode.json-simple</groupId>

            <artifactId>json-simple</artifactId>

            <version>1.1.1</version>

            <!--$NO-MVN-MAN-VER$-->

        </dependency>

 

        <!-- XStream is a Data Format which to marshal and unmarshal Java objects to and from XML -->

        <dependency>

            <groupId>org.apache.camel</groupId>

            <artifactId>camel-xstream</artifactId>

            <version>2.9.2</version>

        </dependency>

 

        <!-- Jackson XML is a Data Format to unmarshal an XML payload into Java objects or to marshal Java objects into an XML payload -->

        <dependency>

            <groupId>org.apache.camel</groupId>

            <artifactId>camel-jackson</artifactId>

            <version>2.9.2</version>

        </dependency>

 

        <!-- test -->

        <dependency>

            <groupId>org.apache.camel</groupId>

            <artifactId>camel-test</artifactId>

            <version>${camel-version}</version>

            <scope>test</scope>

        </dependency>

 

        <!-- logging -->

        <dependency>

            <groupId>commons-logging</groupId>

            <artifactId>commons-logging</artifactId>

            <version>1.1.1</version>

        </dependency>

 

        <dependency>

            <groupId>org.apache.logging.log4j</groupId>

            <artifactId>log4j-api</artifactId>

            <scope>test</scope>

        </dependency>

 

        <dependency>

            <groupId>org.apache.logging.log4j</groupId>

            <artifactId>log4j-core</artifactId>

            <scope>test</scope>

        </dependency>

 

        <dependency>

            <groupId>org.apache.logging.log4j</groupId>

            <artifactId>log4j-slf4j-impl</artifactId>

            <scope>test</scope>

        </dependency>

 

        <!--  monitoring -->

        <dependency>

            <groupId>org.springframework.boot</groupId>

            <artifactId>spring-boot-starter-remote-shell</artifactId>

        </dependency>

 

        <dependency>

            <groupId>org.jolokia</groupId>

            <artifactId>jolokia-core</artifactId>

        </dependency>

 

    </dependencies>

    <build>

        <plugins>

            <plugin>

                <groupId>org.springframework.boot</groupId>

                <artifactId>spring-boot-maven-plugin</artifactId>

            </plugin>

        </plugins>

    </build>

</project>

 

 

3.2          Spring Boot

 

Spring Boot (https://projects.spring.io/spring-boot/) makes it easy to create stand-alone, production-grade Spring based Applications that you can "just run". Most Spring Boot applications need very little Spring configuration.

 

Features

  • Create stand-alone Spring applications
  • Embed Tomcat, Jetty or Undertow directly (no need to deploy WAR files)
  • Provide opinionated 'starter' POMs to simplify your Maven configuration
  • Automatically configure Spring whenever possible
  • Provide production-ready features such as metrics, health checks and externalized configuration

 

3.2.1       Spring Boot applicationContext.xml

We use the applicationContext.xml to define the java beans used by our application. Here we define the beans for connecting to Box, Database connectivity, ActiveMQ and Camel. For the purpose of this application we only need ActiveMQ and Box connectivity.

 

<?xml version="1.0" encoding="UTF-8"?>

<beans xmlns="http://www.springframework.org/schema/beans" xmlns:jdbc="http://www.springframework.org/schema/jdbc" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd

        http://www.springframework.org/schema/jdbc http://www.springframework.org/schema/jdbc/spring-jdbc-3.0.xsd

        http://camel.apache.org/schema/spring http://camel.apache.org/schema/spring/camel-spring.xsd">

   

 <!-- Define configuration file application.properties -->

    <bean id="placeholder" class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">

        <property name="locations">

            <list>

                <value>classpath:application.properties</value>

            </list>

        </property>

        <property name="ignoreResourceNotFound" value="false" />

        <property name="searchSystemEnvironment" value="true" />

        <property name="systemPropertiesModeName" value="SYSTEM_PROPERTIES_MODE_OVERRIDE" />

    </bean>

   

    <!--  Bean for Box authentication. Please note you need a Box developer account -->

    <bean id="box" class="org.apache.camel.component.box.BoxComponent">

        <property name="configuration">

            <bean class="org.apache.camel.component.box.BoxConfiguration">

                <property name="userName" value="${box.userName}" />

                <property name="userPassword" value="${box.userPassword}" />

                <property name="clientId" value="${box.clientId}" />

                <property name="clientSecret" value="${box.clientSecret}" />

            </bean>

        </property>

    </bean>

 

    <!-- Define database connectivity -->

    <bean id="dataSource" class="org.springframework.jdbc.datasource.DriverManagerDataSource">

        <property name="driverClassName" value="org.postgresql.Driver" />

        <property name="url" value="jdbc:postgresql://localhost:5432/alfresco" />

        <property name="username" value="alfresco" />

        <property name="password" value="admin" />

    </bean>

   

    <!-- Configure the Camel SQL component to use the JDBC data source -->

    <bean id="sql" class="org.apache.camel.component.sql.SqlComponent">

        <property name="dataSource" ref="dataSource" />

    </bean>

   

    <!-- Create a connection to ActiveMQ -->

    <bean id="jmsConnectionFactory" class="org.apache.activemq.ActiveMQConnectionFactory">

        <property name="brokerURL" value="tcp://localhost:61616" />

    </bean>

   

    <!-- Create Camel context -->

    <camelContext id="camelContext" xmlns="http://camel.apache.org/schema/spring" autoStartup="true">

        <routeBuilder ref="myRouteBuilder" />

    </camelContext>

   

    <!-- Bean defining Camel routes -->

    <bean id="myRouteBuilder" class="support.alfresco.Route" />

</beans>

 

3.2.2       Application.java

The Application class is used to run our Spring application

 

package support.alfresco;

import org.springframework.boot.SpringApplication;

import org.springframework.context.annotation.ImportResource;

 

@ImportResource("applicationContext.xml")

public class Application {

                    public static void main(String[] args) {

                                        SpringApplication.run(Application.class, args);

                    }

}

 

3.2.3       Route.java

In the Route.java file we define the Camel routes to send traffic from Alfresco to Box.

The code below shows the routes to Execute cmis query, download content and properties, compress it and upload it to Box

 

                                   //////////////////////////////////////

                                        // Download Alfresco documents  //

                                        //////////////////////////////////////

                                        from("jms:alfresco.downloadNodes")

                                        .log("Running query: ${body}")

                                        .setHeader("CamelCMISRetrieveContent", constant(true))

                                        .to(alfrescoSender + "&queryMode=true")

                                        // Class FileContentProcessor is used to store the files in the filesystem together with the metadata

                                        .process(new FileContentProcessor());

                                       

                                       

                                        ///////////////////////////////////////////////

                                        // Move documents and metadata to Box  //

                                        //////////////////////////////////////////////

                                        from("file:/tmp/downloads?antInclude=*")

                                        .marshal().zipFile()

                                        .to("file:/tmp/box");

                                       

                                        from("file:/tmp/metadata?antInclude=*")

                                        .marshal().zipFile()

                                        .to("file:/tmp/box");

                                       

                                        from("file:/tmp/box?noop=false&recursive=true&delete=true")

                                        .to("box://files/uploadFile?inBody=fileUploadRequest");

 

Let’s break it down…

 

1. We read requests messages with a CMIS query from an ActiveMQ queue

from("jms:alfresco.downloadNodes")

 

For example a CMIS query to get the nodes on a specific folder looks like…

SELECT * FROM cmis:document WHERE IN_FOLDER ('workspace://SpacesStore/56c5bc2e-ea5c-4f6a-b817-32f35a7bb195') and cmis:objectTypeId='cmis:document'

 

 For testing purposes we can fire the message requests directly from the ActiveMQ admin UI (http://127.0.0.1:8161/admin/

 

 

2. We send the CMIS query to Alfresco defined as “alfrescoSender”

.to(alfrescoSender + "&queryMode=true")

 

3. Alfresco sender is defined in application.properties as

 

and mapped to “alfrescoSender” variable in Route.java

public static String alfrescoSender;

@Value("${alfresco.sender}")

public void setAlfrescoSender(String inSender) {

        alfrescoSender = inSender;

}

   

4. We store the files retrieved by the CMIS query in the filesystem using class FileContentProcessor for that job

.process(new FileContentProcessor());

 

5. Zip the content file and the metadata file 

from("file:/tmp/downloads?antInclude=*")

.marshal().zipFile()

.to("file:/tmp/box");

                                       

from("file:/tmp/metadata?antInclude=*")

.marshal().zipFile()

.to("file:/tmp/box");

 

6. And finally upload the content to Box 

from("file:/tmp/box?noop=false&recursive=true&delete=true")

.to("box://files/uploadFile?inBody=fileUploadRequest");

 

 

 4.    Building and Running the application

To build the application using maven we execute the following command: 

mvn clean install

 

To run the application execute the following command:

mvn spring-boot:run

5.    Monitoring with Hawtio

Hawtio (http://hawt.io) is a pluggable management console for Java stuff which supports any kind of JVM, any kind of container (Tomcat, Jetty, Karaf, JBoss, Fuse Fabric, etc), and any kind of Java technology and middleware.

Hawtion can help you to visualize Routes with real-time updates on messages metrics.

 

 

You can get statistical data for each individual route.

 

 

I hope this basic introduction to EIP and Apache Camel gives you some idea on how to integrate different applications using the existing end points provided by Apache Camel.

If you need to create a custom property files and you want to use this custom property file in  Process Services you can do it using a Java Delegate or a Spring bean.

 

In my case, I used a Java Delegate.

 

First of all, create your custom property file and upload it to the following folder:

 

/alfresco/process-services-1.6.0/tomcat/webapps/activiti-app/WEB-INF/classes/

 

Now it is time to create a java delegate: But first of all it is important to understand what is the command to load the file properties:

 

It is just one line of code and it's the following line:

InputStream inputStream = this.getClass().getClassLoader().getResourceAsStream("generic.properties");

With this code of line, you tell the java delegate which property file to load.

After you loaded it you can start to save your properties inside the property file and they will be used in Process Services.

String host = "";

String username = "";

String password = "";

Properties properties = new Properties();

properties.load(inputStream);

host = properties.getProperty("generic.host");

username   = properties.getProperty("generic.username");

password = properties.getProperty("generic.password");

execution.setVariable("host", host);

execution.setVariable("username", username);

execution.setVariable("password", password);

Let's explain a bit the code

 

So first as the best practice, we create some string and we set the value to empty. Otherwise you can have a null pointer exception.

 

After that, we load inside the Properties object our custom file property

 

Then we start to set our string with the value of the property that is set inside our file property.

 

In our example, generic.host will have some value inside the property file.

 

After we have assigned our string with all the value that we need from the property file we can set them as a variable 

 

execution.setVariable("host", host);

 

So this piece of code means that you have inside your process a variable called host and you are assigning this variable to the string value of the host entry inside the property file (generic.host)

 

After you finish the code it is time to export it as a jar and add the jar inside the following path:

 

/alfresco/process-services-1.6.0/tomcat/webapps/activiti-app/WEB-INF/lib/

 

To apply your new java delegate you need to create a service task and apply under the class field  the name of your class. Below are few screenshots how to do this:

 

1. A very simple example workflow

the process

 

 

2. Configure the global variable:

 

3. The variables set in step 2.

 

 

4. Setup the java delegate (adding the class name to the "class" field:

 

 

I created 2 forms for testing that my Java delegate is loaded correctly.

 

If the host variable will be empty the process should go one way and if it is correct it should go to the form that will display text with all our information of the file properties 

 

How to configure this:

 

1. Create the condition on one arrow :

 

Condition on arrow

 

2. Enter the condition (host: not empty) that will allow to the form success:

 

 

3. And the second condition:

 

 

 

If the host is not empty the process should go to the human task with the following form:

 

 

The reference form here is "success" and this is how it looks like:

 

 

Otherwise, if there is some problem from the java delegate it will go to the other form called "wrong":

 

 

 

For testing you can use the following snippets:

 

Java code:

package my.Test;

import java.io.InputStream;
import java.util.Properties;

import org.activiti.engine.delegate.DelegateExecution;
import org.activiti.engine.delegate.JavaDelegate;


public class GenericCustomProperties implements JavaDelegate {
String host = "";
String username = "";
String password = "";

   



    public void execute(DelegateExecution execution) throws Exception {
         InputStream inputStream = this.getClass().getClassLoader()
         .getResourceAsStream("generic.properties");
          Properties properties = new Properties();
          properties.load(inputStream);
          host = properties.getProperty("generic.host");
          username   = properties.getProperty("generic.username");
          password = properties.getProperty("generic.password");
          execution.setVariable("host", host);
          execution.setVariable("username", username);
          execution.setVariable("password", password);

         

    }


}

Please be sure to create a java project with a package called my.Test create containing a class called

GenericCustomProperties.java

 

Now that you created this java class export it as jar and upload at the following path: /alfresco/process-services/tomcat/webapps/activiti-app/WEB-INF/lib/

 

As custom properties, you can create a file called generic.properties, open it and write the following snippet:

generic.host=127.0.0.1:8080
generic.username=username
generic.password=password

 

The file needs be located inside the following path:

 

/alfresco/process-services-1.6.0/tomcat/webapps/activiti-app/WEB-INF/classes/

Load balancing a network protocol is something quite common nowadays. There are loads of ways to do that for HTTP for instance, and generally speaking all "single flow" protocols can be load-balanced quite easily. However, some protocols are not as simple as HTTP and require several connections. This is exactly what is FTP.

 

Reminder: FTP modes

Let's take a deeper look at the FTP protocol, in order to better understand how we can load-balance it. In order for an FTP client to work properly, two connections must be opened between the client and the server:

  • A control connection
  • A data connection

The control connection is initiated by the FTP client to the TCP port 21 on the server. On the other end, the data connection can be created in different ways. The first way is the through an "active" FTP session. In this mode the client sends a "PORT" command which randomly opens one of its network port and instruct the server to connect to it using port 20 as source port. This mode is usually discouraged or even server configuration prevent it for security reasons (the server initiate the data connection to the client). The second FTP mode is the "passive" mode. When using the passive mode a client sends a "PASV" command to the server. As a response the server opens a TCP port and sends the number and IP address as part of the PASV response so the client knows what socket to use. Modern FTP clients usually use this mode first if supported by the server. There is a third mode which is the "extended opassive" mode. It is very similar to the "passive" mode but the client sends an "EPSV" command (instead of "PASV") and the server respond with only the number of the TCP port that has been chosen for data connection (without sending the IP address).

 

Load balancing concepts

So now that we know how FTP works we also know that load-balancing FTP requires balancing both the control connections and the data connections. The load balancer must also make sure that data connections are sent the right backend server, the one which replied to the client command.

 

Alfresco configuration

From your ECM side, there is not much to do but there are some pre-requisites:

  • Alfresco nodes must belong to the same (working) cluster
  • Alfresco nodes must be reachable from the load balancer on the FTP ports
  • No FTP related properties should have been persisted in database

The Alfresco configuration presented bellow is valid for both load balancing method presented later. Technically every bit of this Alfresco configuration is not required, depending on the method you choose, but applying the config as shown will work on both cases.

First of all you should prefer setting FTP options in the alfresco-global.properties file as Alfresco cluster nodes have different settings, which you may not set using either the admin-console or the JMX interface.

If you have already set FTP parameters using JMX (or the admin-console), those parameters are persisted in the database and need to be remove from there (using the "revert" action in JMX for example).

Add the following to your alfresco-global.properties and restart Alfresco:

 

### FTP Server Configuration ###
ftp.enabled=true
ftp.port=2121
ftp.dataPortFrom=20000
ftp.dataPortTo=20009

 

ftp.dataPortFrom and ftp.dataPortTo properties need to be different on all servers. So if there were 2 Alfresco nodes alf1 and alf2, the properties for alf2 could be:

ftp.dataPortFrom=20010
ftp.dataPortTo=20019

 

Load balancing with LVS/Keepalived

 

Keepalived is a Linux based load-balancing system. It wraps the IPVS (also called LVS) software stack from the Linux-HA project and offer additional features like backend monitoring and VRRP redundancy. The schema bellow shows how Keepalived proceed with FTP load-balancing. It tracks control connection on port 21 and dynamically handles the data connections using a Linux kernel module called "ip_vs_ftp" which inspect the control connection in order to be aware of the port that will be used to open the data connection.

 

 

Configuration steps are quite simple.

 

First install the software:

sudo apt-get install keepalived

Then create a configuration file using the sample:

sudo cp /usr/share/doc/keepalived/samples/keepalived.conf.sample /etc/keepalived/keepalived.conf

Edit the newly created file in order to add a new virtual server and the associated backend servers: virtual_server

 

192.168.0.39 21 {

    delay_loop 6

    lb_algo rr

    lb_kind NAT

    protocol TCP

    real_server 10.1.2.101 2121 {

        weight 1

        TCP_CHECK {

            connect_port 2121

            connect_timeout 3

        }

    }

    real_server 10.1.2.102 2121 {

        weight 1

        TCP_CHECK {

            connect_port 2121

            connect_timeout 3

        }

    }

}

In a production environment you will most certainly want to use an additional VRRP instance to ensure a highly available load balancer. Please refer to the Keepalived documentation in order to set that up or just use the example given in the distribution files.

The example above defines a virtual server that listen on socket 192.168.0.39:21. Connections sent to this socket are redirected to backend servers using round-robin algorithm (others are available) and after masquerading source IP address. Additionally we need to load the FTP helper in order to track FTP data connections:

 

echo 'ip_vs_ftp' >> /etc/modules

It is important to note that this setup leverage the ftp kernel helper which reads the content of FTP frames. This means that it doesn't work when FTP is secured using SSL/TLS

 

Secure FTP load-balancing

 

Before you go any further:

 

This method has a huge advantage: it can handle FTPs (SSL/TLS). However, it also have a big disadvantage: it doesn't work when the load balancer behaves as a NAT gateway (which is basically what HAProxy does).
This is mainly because at the moment Alfresco doesn't comply with the necessary pre-requisites for secure FTP to work.

 

Some FTP clients may work even with this limitation. It may happen to work if server is using ipv6 or for clients using the "Extended Passive Mode" on ipv4 (which is normally used for ipv6 only). To better understand how, please see FTP client and passive session behing a NAT.

 

This means that what's bellow will mainly only work with macOSX ftp command line and probably no other FTP client!

Don't spend time on it and use previous method if you need other FTP clients or if you have no control over what FTP client your users have.

 

Load balancing with HAProxy

 

This method can also be adapted to Keepalived using iptables mangling and "fwmark" (see Keepalived secure FTP), but you should only need it if you are bound to FTPs as normal FTP is much better handled by previous method.

HAProxy is a modern and widely used load balancer. It provides similar features as Keepalived and much more. Nevertheless HAProxy is not able to track data connections as related to the global FTP session. For this reason we have to trick the FTP protocol in order to provide connection consistency within the session. Basically we will split the load balancing in several parts:

  • control connection load-balancing
  • data connection load balancing or each backend server

So if we have 2 backend servers - as shown in the schema bellow - we will create 3 load balancing connection pools (let's called it like this for now).

 

 

First install the software:

sudo apt-get install haproxy

HAProxy has the notion of "frontends" and "backends". Frontends allow to define specific sockets (or set of sockets) each of which can be linked to different backends. So we can use the configuration bellow:

frontend alfControlChannel

    bind *:21

    default_backend alfPool

frontend alf1DataChannel

    bind *:20000-20009

    default_backend alf1

frontend alf2DataChannel

    bind *:20010-20019

    default_backend alf2

backend alfPool

    server alf1 10.1.2.101:2121 check port 2121 inter 20s

    server alf2 10.1.2.102:2121 check port 2121 inter 20s

backend alf1

    server alf1 10.1.2.101:2121 check port 2121 inter 20s

backend alf2

    server alf2 10.1.2.102:2121 check port 2121 inter 20s

 

So in this case the frontend that handle the control connection load-balancing (alfControlChannel) alternatively sends requests to all backend server (alfPool). Each server (alf1 & alf2) will negotiate a data transfer socket on a different frontend (alf1DataChannel & alf2DataChannel). Each of this frontend will only forward data connection to the only corresponding backend (alf1 or alf2), thus making the load balancing sticky. And... job done!

About three years ago, Alfresco created a new branch of our support and services organization, the Premier Services team.  Since then, the Premier Services team has grown into a first class global support organization, handling many of Alfresco's largest and most complex strategic accounts.  Our group consists of some of Alfresco's most seasoned and senior support staff and has a presence in APAC, EMEA and the US serving customers worldwide.  Today we start a new chapter in our journey.

 

One of the benefits of working with large accounts is the breadth and depth of problems we get to help them solve.  Premier Services accounts tend to be those with extensive integrations, demanding uptime, reliability and performance requirements, complex business environments and product extensions.  We are launching our blog to share best practices, interesting insights, code / configuration examples and problem solving tips that arise from our work.  This blog will also serve as a platform for sharing new service offerings and updates to existing offerings, and to give our customers and community members some insights into the direction that our service offerings will take as they evolve.  For our inaugural blog post, we'd like to talk about three recent changes to the Alfresco Premier Services offerings which we think will help reduce confusion about what we deliver and give our customers some extra value.  

 

First, you may have noticed that our web site has been updated as a part of Alfresco's Digital Business Platform launch.  As a part of this update we have started the process of merging two of our premier services offerings.  Previously we offered an On-site Services Engineer (OSE) and a Remote Services Engineer (RSE).  Going forward we are combining these into a single Premier Services Engineer (PSE) offering.  We can still deliver it on-site or remote, and the pricing has not changed.  Where there were differences in the services, we've taken the more generous option and made it the default.  For example, OSE and RSE service used to come with a different number of Alfresco University Passports.  Going forward all PSE customers will get five passports included with their service.  Future passports issued to Premier Services accounts will also include a certification voucher.

 

The other changes in our service are additions intended to help with a pair of common customer requests.  It is common for customers to start to staff up at the beginning of an Alfresco project, and we are often asked to help evaluate potential hires.  In support of this request we have created a set of hiring profiles that identify the core skills that we find will enable a new hire to come up to speed quickly on the Alfresco platform.  These hiring profiles and assessments are available now to all Premier Services accounts.  A second major change is the addition of Alfresco Developer Support to Premier Services accounts on the Alfresco Digital Business Platform.  What this means is that if you are a Premier Services customer and are using Alfresco Content Services 5.2+ AND Alfresco Process Services powered by Activiti 1.6+, you will get Developer Support for two of your support contacts included with your service at no additional cost.  This is a huge addition to the Premier Services portfolio, and we're excited to be able to offer it in conjunction with our peers on the dev support team.

 

Stay tuned for more from the Premier Services team, our next blog posts will take an in-depth look at some challenging technical issues.