Skip navigation
All Places > Alfresco Premier Services > Authors miguelrodriguez

 

 

Project Objective

In this blog we cover the deployment of ACS (Alfresco Content Services) 6.1 Enterprise on AWS. The deployment consists of two mayor steps:

  1. Building AWS AMIs (Amazon Machine Images) containing a base installation of ACS 6.1 Enterprise.
  2. Building AWS infrastructure (VPC, ELB, RDS, etc) and deploying ACS 6.1 Enterprise to it.

 

Please make sure your Alfresco license subscription entitles you to install and run ACS 6.1 Enterprise and Alfresco Search Services with Insight Engine.

 

The final ACS architecture looks like this:

 

Disclaimer

The tools and code used in this blog to deploy ACS 6.1Enterprise are not officially supported by Alfresco. They are used as a POC to show you how you can use OpenSource tools to deploy and configure resources and applications to AWS in an automated way.

 

The software used to build and deploy ACS 6.1 Enterprise is available in a public repository in GitHub.

 

Software requirements

The following software is required to create and deploy resources in AWS:

  • Packer - used to automate the creation of AMIs.
  • Terraform - used to create and update infrastructure resources.
  •  Ansible - Ansible is an IT automation tool used to deploy and configure systems.
  • AWS CLI - The AWS Command Line Interface (CLI) is a unified tool to manage your AWS services.

 

Make sure these tools have been installed on the computer you are using to build and deploy ACS 6.1Enterprise.

 

Authenticating to AWS

Both Packer and Terraform need to authenticate to AWS in order to create resources. Since we are creating a large number of infrastructure resources, the user we authenticate with to AWS, needs to have administrator access.

There are multiple ways to configure AWS credentials for Packer and Terraform, the following links will show you how to do it:

 

Note: Whatever method you use make sure you keep your AWS credentials private and secure at all times.


Authenticating to Nexus

Nexus is a repository manager used by Alfresco to publish software artifacts. Packer will connect to Nexus repository to download the necessary software to install ACS 6.1 Enterprise.

Alfresco Enterprise customers have Nexus credentials as part of their license subscription. Please refer to your CRM if you don't know or have your Nexus credentials.

 

Building AWS AMIs

The first step to deploy ACS 6.1 Enterprise is to build two types of AMIs:

  1. Repository AMI - containing Alfresco Repository, Share and ADW (Alfresco Digital Workspace)
  2. Search Services AMI - containing Alfresco Repository and Search Services with Insight Engine.

 

Repository AMI

For this process we use Packer and Ansible. We first export the "nexus_user" and "nexus_password" environment variables containing credentials to access the Nexus repository. These are stored in the ~/.nexus-cfg file contains the following.

export nexus_user=xxxxxxxxx
export nexus_password=xxxxxxxxx

 

Note that the .nexus-cfg file is in the user home folder, keep this file and its contents private and secured at all times.

 

If you want to include custom amps add them to the amps and amps_share folder and they will be deployed to the AMI.

For custom jar files add them to the modules/platform and modules/share folders.

 

If you want to deploy ADW (Alfresco Digital Workspace) place the digital-workspace.war file in the acs-61-files/downloaded folder.

 

We can now execute packer by calling the build_61_AMI.sh script.

cd acs-61-repo-aws-packer
./build_61_AMI.sh

This shell script will load the nexus environment variables and call packer build using a template file for the provisioning of the AMI and a variables file containing deployment specific information such as your default VPC Id, the AWS region, etc.

Make sure you change the value of the vpc_id variable to use your default VPC Id.

 

Search Services AMI

As on the previous section, we use Packer and Ansible to create a Search Services AMI.

 

Make sure you change the value of the vpc_id variable to your default VPC Id before running the build_61_AMI.sh script.

cd acs-61-repo-aws-packer
./build_61_AMI.sh

 

As the script runs you can see what is is doing during its execution...

▶ ./build_61_AMI.sh
amazon-ebs output will be in this color.

==> amazon-ebs: Prevalidating AMI Name: acs-61-repo-1557828971
amazon-ebs: Found Image ID: ami-00846a67
==> amazon-ebs: Creating temporary keypair: packer_5cda956b-bd62-1d09-cef2-639152741025
==> amazon-ebs: Creating temporary security group for this instance: packer_5cda956b-345b-2321-afd5-40b1b06a6bc1
==> amazon-ebs: Authorizing access to port 22 from 0.0.0.0/0 in the temporary security group...
==> amazon-ebs: Launching a source AWS instance...
==> amazon-ebs: Adding tags to source instance
amazon-ebs: Adding tag: "Name": "Packer Builder"
amazon-ebs: Instance ID: i-0f80505eb56dccbb7
==> amazon-ebs: Waiting for instance (i-0f80505eb56dccbb7) to become ready...

 

On completion the script will output the AMI id of the newly created AMI. Keep track of both AMI Ids, as we will need to use them in the Terraform script next.

Build 'amazon-ebs' finished.

==> Builds finished. The artifacts of successful builds are:
--> amazon-ebs: AMIs were created:
eu-west-2: ami-08fd6196500dbcb01

 

Building the AWS Infrastructure and Deploying ACS 6.1 Enterprise

Now that we have created both the Repository and the Search Services AMIs we can start building the AWS infrastructure and deploy ACS 6.1 Enterprise.

In the acs-61-aws-terraform folder we have the terraform.tfvars containing configuration specific for the AWS and ACS deployments.

 

Some of the variables that will need to be updated are:

  • resource-prefix - this is used to name all resources created with some initials to identify the resources belonging to this deployment.
  • aws-region
  • aws-availability-zones
  • vpc-cidr
  • autoscaling-group-key-name
  • s3-bucket-location

 

and of course we need to set the auto scaling image id with the newly generated AMIs

  • autoscaling-repo-group-image-id
  • autoscaling-solr-group-image-id

 

Once the configuration has been set we are ready to start building the solution. We first need initialize terraform with the "terraform init" command:

 

▶ terraform init
Initializing modules...
- module.vpc
- module.rds
- module.alfresco-repo
- module.alfresco-solr
- module.bastion
- module.alb
- module.internal-nlb
- module.activemq

Initializing provider plugins...

The following providers do not have any version constraints in configuration,
so the latest version was installed.

To prevent automatic upgrades to new major versions that may contain breaking
changes, it is recommended to add version = "..." constraints to the
corresponding provider blocks in configuration, with the constraint strings
suggested below.

* provider.aws: version = "~> 2.10"

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

 

We can now issue the apply command to start the build. Upon completion (it will take around 15 minutes) we will get notification of the URLs available to connect to Alfresco.

Apply complete! Resources: 51 added, 0 changed, 0 destroyed.

Outputs:

Alfresco Digital Workspace = http://acs-61-alb-1503842282.eu-west-2.elb.amazonaws.com/digital-workspace
Alfresco Share = http://acs-61-alb-1503842282.eu-west-2.elb.amazonaws.com/share
Alfresco Solr = http://acs-61-alb-1503842282.eu-west-2.elb.amazonaws.com/solr
Alfresco Zeppelin = http://acs-61-alb-1503842282.eu-west-2.elb.amazonaws.com/zeppelin
RDS Endpoint = acs-61-db.cmftgvbzqrto.eu-west-2.rds.amazonaws.com:3306
VPC ID = vpc-006f0c6354656e96d5c

 

To destroy the resources issue a "terraform destroy" command.

 

Terraform will perform the following actions:
.......

Plan: 0 to add, 0 to change, 51 to destroy.

Do you really want to destroy all resources?
Terraform will destroy all your managed infrastructure, as shown above.
There is no undo. Only 'yes' will be accepted to confirm.

 

To do list

There are a couple of things to add to this project:

  • CI/CD scripts - I have already implemented this and will publish it on a different blog.
  • On the Search Services instances we should download a backup of the Solr indexes when starting a new instance instead of building the indexes from scratch.

  • Purpose

The purpose of this blog is to show how to scan images containing text so that the text is indexed and searchable by Alfresco. The following file types are supported: PNG, BMP, JPEG, GIF, TIFF and PDF (containing images).

 

For this exercise we are going to use a Linux OS...but this solution should equally work on Windows OS.

 

To scan images we are going to use Tesseract-ocr (tesseract). This package contains an OCR engine - libtesseract and a command line program - tesseract

Tesseract has unicode (UTF-8) support, and can recognize more than 100 languages "out of the box".

 

  • Tesseract

Since we are using tesseract-ocr we need to install tesseract software for our Linux distribution (version 3 or greater)

Please follow the instructions explained here: Installing Tesseract

 

  • Transformation context file

Create a file named transformer-context.xml in alfresco's extension folder i.e. tomcat/shared/classes/alfresco/extension with the following content:

 

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE beans PUBLIC '-//SPRING//DTD BEAN//EN' 'http://www.springframework.org/dtd/spring-beans.dtd'>
<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor
     license agreements. See the NOTICE file distributed with this work for additional
     information regarding copyright ownership. The ASF licenses this file to
     You under the Apache License, Version 2.0 (the "License"); you may not use
     this file except in compliance with the License. You may obtain a copy of
     the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required
     by applicable law or agreed to in writing, software distributed under the
     License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS
     OF ANY KIND, either express or implied. See the License for the specific
     language governing permissions and limitations under the License. -->

<beans>

     <!-- Transforms from TIFF to plain text using Tesseract
           and a custom script -->

     <bean id="transformer.worker.ocr.tiff"
          class="org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker">

          <property name="mimetypeService">
               <ref bean="mimetypeService" />
          </property>
          <property name="checkCommand">
               <bean class="org.alfresco.util.exec.RuntimeExec">
                    <property name="commandsAndArguments">
                         <map>
                              <entry key=".*">
                                   <list>
                                        <value>${tesseract.exe}</value>
                                        <value>-v</value>
                                   </list>
                              </entry>
                         </map>
                    </property>
                    <property name="errorCodes">
                         <value>2</value>
                    </property>
               </bean>
          </property>

          <property name="transformCommand">
               <bean class="org.alfresco.util.exec.RuntimeExec">
                    <property name="commandsAndArguments">
                         <map>
                              <entry key=".*">
                                   <list>
                                        <value>${ocr.script}</value>
                                        <value>${source}</value>
                                        <value>${target}</value>
                                   </list>
                              </entry>
                         </map>
                    </property>
                    <property name="errorCodes">
                         <value>1,2</value>
                    </property>
                    <property name="waitForCompletion">
                         <value>true</value>
                    </property>
               </bean>
          </property>
          <property name="transformerConfig">
               <ref bean="transformerConfig" />
          </property>
     </bean>

     <bean id="transformer.ocr.tiff"
          class="org.alfresco.repo.content.transform.ProxyContentTransformer"
          parent="baseContentTransformer">

          <property name="worker">
               <ref bean="transformer.worker.ocr.tiff" />
          </property>
     </bean>

     <!-- Transforms from PDF to TIFF using Ghostscript -->
     <bean id="transformer.worker.pdf.tiff"
          class="org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker">

          <property name="mimetypeService">
               <ref bean="mimetypeService" />
          </property>
          <property name="checkCommand">
               <bean name="transformer.ImageMagick.CheckCommand" class="org.alfresco.util.exec.RuntimeExec">
                    <property name="commandsAndArguments">
                         <map>
                              <entry key=".*">
                                   <list>
                                        <value>${ghostscript.exe}</value>
                                        <value>-v</value>
                                   </list>
                              </entry>
                         </map>
                    </property>
               </bean>
          </property>

          <property name="transformCommand">
               <bean class="org.alfresco.util.exec.RuntimeExec">
                    <property name="commandsAndArguments">
                         <map>
                              <entry key=".*">
                                   <list>
                                        <value>${ghostscript.exe}</value>
                                        <value>-o</value>
                                        <value>${target}</value>
                                        <value>-sDEVICE=tiff24nc</value>
                                        <value>-r300</value>
                                        <value>${source}</value>
                                   </list>
                              </entry>
                         </map>
                    </property>
                    <property name="errorCodes">
                         <value>1,2</value>
                    </property>
                    <property name="waitForCompletion">
                         <value>true</value>
                    </property>
               </bean>
          </property>
          <property name="transformerConfig">
               <ref bean="transformerConfig" />
          </property>
     </bean>

     <bean id="transformer.pdf.tiff"
          class="org.alfresco.repo.content.transform.ProxyContentTransformer"
          parent="baseContentTransformer">

          <property name="worker">
               <ref bean="transformer.worker.pdf.tiff" />
          </property>
     </bean>

</beans>

 

We can see we are using a few variables here:

  • tesseract.exe: this is the tesseract binary file, normally installed as /usr/bin/tesseract
  • ocr.script: this is the script we are calling to transform images to text, installed in Alfresco home folder as ocr.sh
  • ghostcript.exe: this is the ghostcript binary file...usually is the gs binary file
  • source: this is the source image file
  • target: this is the resulting text file

 

  • OCR Script

The next step is to create the ocr.sh script. The location of the script will be reference also in alfresco-global.properties file by the property ocr.script as shown later in this blog.

 

Assuming Alfresco is installed in /opt/alfresco, create a file name /opt/alfresco/ocr.sh with the following content:

# save arguments to variables
SOURCE=$1
TARGET=$2
TMPDIR=/tmp/tesseract
FILENAME=`basename $SOURCE`
OCRFILE=$FILENAME.tif
LD_LIBRARY_PATH=/usr/lib

# Create temp directory if it doesn't exist
mkdir -p $TMPDIR

# to see what happens
# echo "from $SOURCE to $TARGET" >>/tmp/ocrtransform.log

cp -f $SOURCE $TMPDIR/$OCRFILE

# call tesseract and redirect output to $TARGET
/usr/bin/tesseract $TMPDIR/$OCRFILE ${TARGET%\.*} -l eng
rm -f $TMPDIR/$OCRFILE

A couple of points to consider here:

  • We are using LD_LIBRARY_PATH to point to the OS library path to find the libraries required by tesseract. If we don't do this it will be using the library path defined by Alfresco pointing to commons/lib folder, but the version of the libraries may not be the ones required by tesseract.
  • We are defining the location of the tesseract binary file as /usr/bin/tesseract. If installed on a different location then adjust the path to tesseract accordingly.

 

Finally make sure the ocr.sh file has executable permission set. You can set it with the following command: chmod 755 /opt/alfresco/ocr.sh

 

  • Tesseract properties

The next step is to define a set of properties for tesseract in alfresco-global.properties.

# OCR Script
ocr.script=/opt/alfresco/ocr.sh

#GS executable
ghostscript.exe=gs

#Tesseract executable
tesseract.exe=tesseract

# Define a default priority for this transformer
content.transformer.ocr.tiff.priority=10

# List the transformations that are supported
content.transformer.ocr.tiff.extensions.tiff.txt.supported=true
content.transformer.ocr.tiff.extensions.tiff.txt.priority=10
content.transformer.ocr.tiff.extensions.jpg.txt.supported=true
content.transformer.ocr.tiff.extensions.jpg.txt.priority=10
content.transformer.ocr.tiff.extensions.png.txt.supported=true
content.transformer.ocr.tiff.extensions.png.txt.priority=10
content.transformer.ocr.tiff.extensions.gif.txt.supported=true
content.transformer.ocr.tiff.extensions.gif.txt.priority=10

# Define a default priority for this transformer
content.transformer.pdf.tiff.available=true
content.transformer.pdf.tiff.priority=10
# List the transformations that are supported
content.transformer.pdf.tiff.extensions.pdf.tiff.supported=true
content.transformer.pdf.tiff.extensions.pdf.tiff.priority=10

content.transformer.complex.Pdf2OCR.available=true
# Commented to be compatible with Alfresco 5.x
# content.transformer.complex.Pdf2OCR.failover=ocr.pdf
content.transformer.complex.Pdf2OCR.pipeline=pdf.tiff|tiff|ocr.tiff
content.transformer.complex.Pdf2OCR.extensions.pdf.txt.supported=true
content.transformer.complex.Pdf2OCR.extensions.pdf.txt.priority=10

# Disable the OOTB transformers
content.transformer.double.ImageMagick.extensions.pdf.tiff.supported=false
content.transformer.complex.PDF.Image.extensions.pdf.tiff.supported=false
content.transformer.ImageMagick.extensions.pdf.tiff.supported=false
content.transformer.PdfBox.extensions.pdf.txt.supported=false
content.transformer.TikaAuto.extensions.pdf.txt.supported=false

 

The main property to consider is ocr.script pointing to the location of the ocr.sh file...adjust accordingly. All other properties can be left as they are.

 

  • Debugging

There are two areas we can debug:

  1. The Alfresco transformation service
  2. Tesseract execution

 

Alfresco Transformation Service

To debug the transformation service edit the file tomcat/shared/classes/alfresco/extension/custom-log4j.properties and add the following line at the bottom:

 

log4j.logger.org.alfresco.repo.content.transform=trace

 

Alfresco needs restarting to pick up this debug entry.

 

Tesseract execution

To get some execution information from tesseract edit the file /opt/alfresco/ocr.sh and uncomment the following entry by removing the '#' from the beginning of the line:

 

# echo "from $SOURCE to $TARGET" >>/tmp/ocrtransform.log

 

Now when an image file with text is loaded in Alfresco we can see similar entries in alfresco.log file showing the ocr.sh script being called.

2017-10-10 15:20:17,182  DEBUG [content.transform.RuntimeExecutableContentTransformerWorker] [http-bio-8443-exec-6] Transformation completed: 
   source: ContentAccessor[ contentUrl=store:///opt/alfresco/tomcat/temp/Alfresco/ComplextTransformer_intermediate_pdf_9017478201188837562.tiff, mimetype=image/tiff, size=24925880, encoding=UTF-8, locale=en_GB]
   target: ContentAccessor[ contentUrl=store://2017/10/10/15/20/d3b4b9aa-ad28-4c8c-ae86-f99938bf4125.bin, mimetype=text/plain, size=1173, encoding=UTF-8, locale=en_GB]
   options: {maxSourceSizeKBytes=-1, pageLimit=-1, use=index, timeoutMs=120000, maxPages=-1, contentReaderNodeRef=null, sourceContentProperty=null, readLimitKBytes=-1, contentWriterNodeRef=null, targetContentProperty=null, includeEmbedded=null, readLimitTimeMs=-1}
   result: 
Execution result: 
   os:         Linux
   command:    /opt/alfresco/ocr.sh /opt/alfresco/tomcat/temp/Alfresco/RuntimeExecutableContentTransformerWorker_source_5734790636289670188.tiff /opt/alfresco/tomcat/temp/Alfresco/RuntimeExecutableContentTransformerWorker_target_1506982845420553983.txt
   succeeded:  true
   exit code:  0
   out:        
   err:        Tesseract Open Source OCR Engine v3.04.01 with Leptonica
Page 1
 2017-10-10 15:20:17,183  TRACE [content.transform.TransformerLog] [http-bio-8443-exec-6] 4.1.2         tiff txt  INFO <<TemporaryFile>> 23.7 MB 1,950 ms ocr.tiff<<Runtime>>
 2017-10-10 15:20:17,183  TRACE [content.transform.TransformerDebug] [http-bio-8443-exec-6] 4.1.2         Finished in 1,950 ms

 

We can also take a look at the /tmp/ocrtransform.log file to see what files have been processed.

from /opt/alfresco/tomcat/temp/Alfresco/RuntimeExecutableContentTransformerWorker_source_5734790636289670188.tiff to /opt/alfresco/tomcat/temp/Alfresco/RuntimeExecutableContentTransformerWorker_target_1506982845420553983.txt

 

That's it, you should now be able to search for the text contained in the image files.

 

  • References

Most of the information on this blog comes from this GitHub repository https://github.com/bchevallereau/alfresco-tesseract, with some additional adjustments and inclusions.

1.     Project Objective

The aim of this blog is to show you how to create and run a Docker container with a full ELK (Elasticsearch, Logstash and Kibana) environment containing the necessary configuration and scripts to collect and present data to monitor your Alfresco application.

 

Elastic tools can ease the processing and manipulation of large amounts of data collected from logs, operating system, network, etc.

 

Elastic tools can be used to search for data such as errors, exceptions and debug entries and to present statistical information such as throughput and response times in a meaningful way. This information is very useful when monitoring and troubleshooting Alfresco systems.

 

2.     Install Docker on Host machine

Install Docker on your host machine (server) as per Docker website. Please note the Docker Community Edition is sufficient to run this project (https://www.docker.com/community-edition)

 

3.     Virtual Memory

Elasticsearch uses a hybrid mmapfs / niofs directory by default to store its indices. The default operating system limits on mmap counts is likely to be too low, which may result in out of memory exceptions.

 

On Linux, you can increase the limits by running the following command as root on the host machine:

 

# sysctl -w vm.max_map_count=262144

 

To set this value permanently, update the vm.max_map_count setting in /etc/sysctl.conf. To verify the value has been applied run:

 

# sysctl vm.max_map_count

 

4.     Download “Docker-ELK-Alfresco-Monitoring” container software

Download the software to create the Docker container from GitHub: https://github.com/miguel-rodriguez/Docker-ELK-Alfresco-Monitoring and extract the files to the file system.

 

5.     Creating the Docker Image

Before creating the Docker image we need to configure access to Alfresco’s database from the Docker container. Assuming the files have been extracted to /opt/docker-projects/Docker-ELK-Alfresco-Monitoring-master, edit files activities.properties and workflows.properties and set the access to the DB server as appropriate, for example:

 

#postgresql settings

db_type=postgresql

db_url=jdbc:postgresql://172.17.0.1:5432/alfresco

db_user=alfresco

db_password=admin

 

Please make sure the database server allows for remote connections to Alfresco’s database. A couple of examples how to configure the database are shown here:

  • For MySQL

Access your database server as an administrator and grant the correct permissions i.e.

 

# mysql -u root -p

grant all privileges on alfresco.* to alfresco@'%' identified by 'admin';

 

The grant command is granting access to all tables in ‘alfresco’ database to ‘alfresco’ user from any host using ‘admin’ password.

Also make sure the bind-address parameter in my.cnf allows for external binding i.e. bind-address = 0.0.0.0

 

  • For PostgreSQL

Change the file ‘postgresql.conf’ to listen on all interfaces

 

listen_addresses = '*'

 

then add an entry in file ‘pg_hba.conf’ to allow connections from any host

 

host all all 0.0.0.0/0 trust

 

Restart PostgreSQL database server to pick up the changes.

We have installed a small java application inside the container in /opt/activities folder that executes calls against the database configured in /opt/activities/activities.properties file.

For example to connect to PostgreSQL we have the following settings:

 

db_type=postgresql

db_url=jdbc:postgresql://172.17.0.1:5432/alfresco

db_user=alfresco

db_password=admin

 

 We also need to set the timezone in the container, this can be done by editing the following entry in the startELK.sh script.

 

export TZ=GB

 

From the command line execute the following command to create the Docker image:

 

# docker build --tag=alfresco-elk /opt/docker-projects/Docker-Alfresco-ELK-Monitoring/

Sending build context to Docker daemon  188.9MB

Step 1/33 : FROM sebp/elk:530

530: Pulling from sebp/elk

.......

 

6.     Creating the Docker Container

Once the Docker image has been created we can create the container from it by executing the following command:

 

# docker create -it -p 5601:5601 --name alfresco-elk alfresco-elk:latest

 

7.     Starting the Docker Container

Once the Docker container has been created it can be started with the following command:

 

# docker start alfresco-elk

 

Verify the ELK stack is running by accessing Kibana on http://localhost:5601 on the host machine.

At this point Elasticsearch and Kibana do not have any data…so we need to get Alfresco’s logstash agent up and running to feed some data to Elasticsearch.

 

8.     Starting logstash-agent

The logstash agent consists of logstash and some other scripts to capture entries from Alfresco log files, JVM stats using jstatbeat (https://github.com/cero-t/jstatbeat), entries from Alfresco audit tables, DB slow queries, etc.

 

Copy the logstash-agent folder to a directory on all the servers running Alfresco or Solr applications.

Assuming you have copied logstash-agent folder to /opt/logstash-agent, edit the file /opt/alfresco-agent/run_logstash.sh and set the following properties according to your own settings

 

export tomcatLogs=/opt/alfresco/tomcat/logs

export logstashAgentDir=/opt/logstash-agent

export logstashAgentLogs=${logstashAgentDir}/logs

export alfrescoELKServer=172.17.0.2


 9.    Configuring Alfresco to generate data for monitoring

Alfresco needs some additional configuration to produce data to be sent to the monitoring Docker container.

 

9.1   Alfresco Logs

Alfresco logs i.e. alfresco.log, share.log, solr.log or the equivalent catalina.out can be parsed to provide information such as number of errors or exceptions over a period of time. We can also search these logs for specific data.

 

The first thing is to make sure the logs are displaying the full date time format at the beginning of each line. This is important so we can display the entries in the correct order.

Make sure in your log4j properties files (there is more than one) the file layout pattern is as follows:

 

log4j.appender.File.layout.ConversionPattern=%d{yyyy-MM-dd} %d{ABSOLUTE} %-5p [%c] [%t] %m%n

 

This will produce log entries with the date at the beginning of the line as this one:

2016-09-12 12:16:28,460 INFO  [org.alfresco.repo.admin] [localhost-startStop-1] Connected to database PostgreSQL version 9.3.6

Important Note: If you upload catalina files then don’t upload alfresco (alfresco, share, solr) log files for the same time period since they contain the same entries and you will end up with duplicate entries in the Log Analyser tool.

 

Once the logs are processed the resulting data is shown:

  • Number errors, warnings, debug, fatal messages, etc over time
  • Total number of errors, warnings, debug, fatal messages, etc
  • Common messages that may reflect issues with the application
  • Number of entries grouped by java class
  • Number of exceptions logged
  • All log files are searchable using ES (Elasticsearch) search syntax

 

 

9.2    Document Transformations

Alfresco performs document transformations for document previews, thumbnails, indexing content, etc. To monitor document transformations enable logging for class “TransformerLog”  by adding the following line to tomcat/shared/classes/alfresco/extension/custom-log4j.properties on all alfresco nodes:

 

log4j.logger.org.alfresco.repo.content.transform.TransformerLog=debug

 

The following is a sample output from alfresco.log file showing document transformation times, document extensions, transformer used, etc.

 

2016-07-14 18:24:56,003  DEBUG [content.transform.TransformerLog] [pool-14-thread-1] 0 xlsx png  INFO Calculate_Memory_Solr Beta 0.2.xlsx 200.6 KB 897 ms complex.JodConverter.Image<<Complex>>

 

Once Alfresco logs are processed the following data is shown for transformations:

  • Response time of transformation requests over time
  • Transformation throughput
  • Total count of transformations grouped by file type
  • Document size, transformation time, transformer used, etc

 

 

9.3    Tomcat Access Logs

Tomcat access logs can be used to monitor HTTP requests, throughput and response times. In order to get the right data format in the logs we need to add/replace the “Valve” entry in tomcat/conf/server.xml file, normally located at the end of the file, with this one below.

 

<Valve

  className="org.apache.catalina.valves.AccessLogValve"   

  directory="logs"

  prefix="access-" suffix=".log"

  pattern='%a %l %u %t "%r" %s %b "%{Referer}i" "%{User-agent}i" %D "%I"'

  resolveHosts="false"

/>

 

 

 For further clarification on the log pattern refer to: https://tomcat.apache.org/tomcat-7.0-doc/config/valve.html#Access_Logging

Sample output from tomcat access log under tomcat/logs directory. The important fields here are the HTTP request, the HTTP response status i.e. 200 and the time taken to process the request i.e. 33 milliseconds

 

127.0.0.1 - CN=Alfresco Repository Client, OU=Unknown, O=Alfresco Software Ltd., L=Maidenhead, ST=UK, C=GB [14/Jul/2016:18:49:45 +0100] "POST /alfresco/service/api/solr/modelsdiff HTTP/1.1" 200 37 "-" "Spring Surf via Apache HttpClient/3.1" 33 "http-bio-8443-exec-10"

 

Once the Tomcat access logs are processed the following data is shown: 

  • Response time of HTTP requests over time
  • HTTP traffic throughput
  • Total count of responses grouped by HTTP response code
  • Tomcat access logs files are searchable using ES (Elasticsearch) search syntax

 

  

9.4    Solr Searches

We can monitor Solr queries and response times by enabling debug for class SolrQueryHTTPClient by adding the following entry to tomcat/shared/classes/alfresco/extension/custom-log4j.properties on all Alfresco (front end) nodes:

 

log4j.logger.org.alfresco.repo.search.impl.solr.SolrQueryHTTPClient=debug

 

Sample output from alfresco.log file showing Solr searches response times:

 

DEBUG [impl.solr.SolrQueryHTTPClient] [http-apr-8080-exec-6]    with: {"queryConsistency":"DEFAULT","textAttributes":[],"allAttributes":[],"templates":[{"template":"%(cm:name cm:title cm:description ia:whatEvent ia:descriptionEvent lnk:title lnk:description TEXT TAG)","name":"keywords"}],"authorities":["GROUP_EVERYONE","ROLE_ADMINISTRATOR","ROLE_AUTHENTICATED","admin"],"tenants":[""],"query":"((test.txt  AND (+TYPE:\"cm:content\" +TYPE:\"cm:folder\")) AND -TYPE:\"cm:thumbnail\" AND -TYPE:\"cm:failedThumbnail\" AND -TYPE:\"cm:rating\") AND NOT ASPECT:\"sys:hidden\"","locales":["en"],"defaultNamespace":"http://www.alfresco.org/model/content/1.0","defaultFTSFieldOperator":"OR","defaultFTSOperator":"OR"}

 2016-03-19 19:55:54,106 

 

DEBUG [impl.solr.SolrQueryHTTPClient] [http-apr-8080-exec-6] Got: 1 in 21 ms

 

Note: There is no specific transaction id to correlate the Solr search to the corresponding response. The best way to do this is to look at the time when the search and response were logged together with the java thread name, this should give you a match for the query and its response. 

Once Alfresco logs are processed the following data is shown for Solr searches:

  • Response time for Solr searches over time
  • Solr searches throughput
  • Solr queries, number of results found and individual response times

 

 

9.5    Database Monitoring

Database performance can be monitored with two different tools: p6spy and packetbeats. The main difference between these tools is that p6spy acts as a proxy jdbc driver and packetbeat is a network traffic sniffer. Also packetbeat can only sniff traffic for MySQL and PostgreSQL databases while p6spy can also do Oracle among others.

 

P6spy

P6spy software is delivered as a jar file that needs to be placed in the application class path i.e. tomcat/lib/ folder. There are 3 steps to get p6spy configured and running.

 

  • Place p6spy jar file in tomcat/lib/ folder
  • Create spy.properties file also in tomcat/lib/folder with the following configuration

 

modulelist=com.p6spy.engine.spy.P6SpyFactory,com.p6spy.engine.logging.P6LogFactory,com.p6spy.engine.outage.P6OutageFactory

appender=com.p6spy.engine.spy.appender.FileLogger

deregisterdrivers=true

dateformat=MM-dd-yy HH:mm:ss:SS

appender=com.p6spy.engine.spy.appender.FileLogger

autoflush=true

append=true

useprefix=true

 

# Update driver list correct driver i.e.

# driverlist=oracle.jdbc.OracleDriver

# driverlist=org.mariadb.jdbc.Driver

# driverlist=org.postgresql.Driver

driverlist=org.postgresql.Driver

 

# Location where spy.log file will be created

logfile=/opt/logstash-agent/logs/spy.log

 

# Set the execution threshold to log queries taking longer than 1000 milliseconds (slow queries only)

executionThreshold=1000

 

Note: if there are no queries taking longer than the value in executionThreshod (in milliseconds) then the file will not be created.

Note: set the “logfile” variable to the logs folder inside the logstash-agent path as shown above.

 

  • Add entry to tomcat/conf/Catalina/localhost/alfresco.xml file

 

Example for PostgreSQL:

 

<Resource

  defaultTransactionIsolation="-1"

  defaultAutoCommit="false"

  maxActive="275"

  initialSize="10"

  password="admin"

  username="alfresco"

  url="jdbc:p6spy:postgresql://localhost/p6spy:alfresco"

  driverClassName="com.p6spy.engine.spy.P6SpyDriver"

  type="javax.sql.DataSource"

  auth="Container"

  name="jdbc/dataSource"

/>

 

Example for Oracle:

 

<Resource

  defaultTransactionIsolation="-1"

  defaultAutoCommit="false"

  maxActive="275"

  initialSize="10"

  password="admin"

  username="alfresco"

  url="jdbc:p6spy:oracle:thin:@192.168.56.101:1521:XE"   

  driverClassName="com.p6spy.engine.spy.P6SpyDriver"

  type="javax.sql.DataSource"

  auth="Container"

  name="jdbc/dataSource"

/>

 

 Example for MariaDB:

 

<Resource

  defaultTransactionIsolation="-1"

  defaultAutoCommit="false"

  maxActive="275"

  initialSize="10"

  password="admin"

  username="alfresco"

  url="jdbc:p6spy:mariadb://localhost:3306/alfresco"

  driverClassName="com.p6spy.engine.spy.P6SpyDriver"  

  type="javax.sql.DataSource"

  auth="Container"

  name="jdbc/dataSource"

/>

 

Once the spy.log file has been processed the following information is show:

  • DB Statements execution time over time
  • DB Statements throughput over time
  • Table showing individual DB statements and execution times
  • DB execution times by connection id 

 

 

9.6    Alfresco Auditing

If you want to audit Alfresco access you can enable auditing by adding the following entries to alfresco-global.properties file:

 

# Enable auditing
audit.enabled=true
audit.alfresco-access.enabled=true
audit.tagging.enabled=true
audit.alfresco-access.sub-actions.enabled=true
audit.cmischangelog.enabled=true

 

Now you can monitor all the events generated by alfresco-access audit group.

 

 

Note: Only one of the logstash agents should collect Alfresco's audit data since the script gathers data for the whole cluster/solution. So edit the file logstash_agent/run_logstash.sh in one of the other Alfresco nodes and set the variable collectAuditData to "yes" as indicated below:

 

collectAuditData="yes"

 

Note: Also make sure you update the login credentials for Alfresco in the audit*sh files. Defaults to admin/admin.

 

10.    Starting and Stopping the logstash agent

The logstash agent script can be started from the command line with "./run_logstash.sh start" as shown below:

 

./run_logstash.sh start
Starting logstash
Starting jstatbeat
Starting dstat
Staring audit access script

 

and can be stopped with the command "./run_logstash.sh stop" as shown below:

 

./run_logstash.sh stop
Stopping logstash
Stopping jstatbeat
Stopping dstat
Stopping audit access script

 

11. Accessing the Dashboard

Finally access the dashboard by going to this URL http://<docker host IP>:5601 (use the IP of the server where you installed the Docker container) and clicking on the “Dashboard” link on the left panel and then click on the “Activities” link.

 

 

The data should be available for the selected time period.

 

 

Navigate to the other dashboards by clicking on the appropriate link.

 

 

11.    Accessing the container

To enter the running container use the following command:

 

# docker exec -i -t alfresco-elk bash

 

And to exit the container just type “exit” and you will find yourself back on the host machine.

 

12.    Stopping the container

To stop the container from running type the following command on the host machine:

 

Header 1

# docker stop alfresco-elk

 

13.    Removing the Docker Container

To delete the container you first need to stop the container and then run the following command:

 

# docker rm alfresco-elk

 

14.    Removing the Docker Image

To delete the container you first need to stop the container and then run the following command:

 

# docker rmi alfresco-elk:latest

 

15.   Firewall ports

 

If you have a firewall make sure the following ports are ope:

 

Redis: 6379

Kibana: 5601

Database server: this depends on the DB server being used i.e. PostgreSQL is 5432, MySQL 3306, etc

 

 

Happy Monitoring!!!

The aim of this blog is to introduce you to Enterprise Integration Patterns and to show you how to create an application to integrate Alfresco with an external application…in this case we will be sending documents on request from Alfresco to Box based on CMIS queries. We will store both the content and the metadata in Box.

 

1.    Enterprise Integration Patterns

EIP (Enterprise Integration Patters) defines a language consisting of 65 integration patterns (http://www.enterpriseintegrationpatterns.com/patterns/messaging/toc.html) to establish a technology-independent vocabulary and a visual notation to design and document integration solutions.

Why EIP? Today's applications rarely live in isolation. Architecting integration solutions is a complex task.

The lack of a common vocabulary and body of knowledge for asynchronous messaging architectures make it difficult to avoid common pitfalls.

For example the following diagram shows how content from one application is routed and transformed to be delivered to another application. Each step can be further detailed with specific annotations.

 

 

  • Channel Patterns describe how messages are transported across a Message Channel. These patterns are implemented by most commercial and open source messaging systems.
  • Message Construction Patterns describe the intent, form and content of the messages that travel across the messaging system.
  • Routing Patterns discuss how messages are routed from a sender to the correct receiver. Message routing patterns consume a message from one channel and republish it message, usually without modification, to another channel based on a set of conditions.
  • Transformation Patterns change the content of a message, for example to accommodate different data formats used by the sending and the receiving system. Data may have to be added, taken away or existing data may have to be rearranged.
  • Endpoint Patterns describe how messaging system clients produce or consume messages.
  • System Management Patterns describe the tools to keep a complex message-based system running, including dealing with error conditions, performance bottlenecks and changes in the participating systems.

 

The following example shows how to maintain the overall message flow when processing a message consisting of multiple elements, each of which may require different processing.

               

 

2.    Apache Camel

Apache Camel (http://camel.apache.org/) is an integration framework whose main goal is to make integration easier. It implements many of the EIP patterns and allows you to focus on solving business problems, freeing you from the burden of plumbing.

At a high level, Camel is composed of components, routes and processors. All of these are contained within the CamelContext .

 

The CamelContext provides access to many useful services, the most notable being components, type converters, a registry, endpoints, routes, data formats, and languages.

 

Service

Description

Components

A Component is essentially a factory of Endpoint instances. To date, there are over 80 components in the Camel ecosystem that range in function from data transports, to DSL s, data formats, and so on i.e. cmis, http, box, salesforce, ftp, smtp, etc

Endpoints

An endpoint is the Camel abstraction that models the end of a channel through which a system can send or receive messages. Endpoints are usually created by a Component and Endpoints are usually referred to in the DSL via their URIs i.e. cmis://cmisServerUrl[?options]

Routes

The steps taken to send a message from one end point to another end point.

Type Converters

Camel provides a built-in type-converter system that automatically converts between well-known types. This system allows Camel components to easily work together without having type mismatches.

Data Formats

Allow messages to be marshaled to and from binary or text formats to support a kind of Message Translator i.e. gzip, json, csv, crypto, etc

Registry

Contains a registry that allows you to look up beans i.e. use a bean that defines the jdbc data source

Languages

To wire processors and endpoints together to form routes, Camel defines a DSL. DSL include among other Java, Groovy, Scala, Spring XML.

 

3.    Building an Integration Application

 

The aim of the application is to send documents on request from Alfresco to Box. We will store both the content and the metadata in Box.

To build an EIP Application we are going to use:

  • Maven to build the application
  • Spring-boot to run the application
  • Apache Camel to integrate Alfresco and Box

 

The full source code is available on GitHub: https://github.com/miguel-rodriguez/Alfresco-Camel

 

The basic message flow is as follows:

 


 

 

3.1          Maven

Apache Maven (https://maven.apache.org/) is a software project management and comprehension tool. Based on the concept of a project object model (POM), Maven can manage a project's build, reporting and documentation from a central piece of information.

 

3.1.1    Maven Pom.xml

For our project the pom.xml brings the required dependencies such as Camel and ActiveMQ. The pom.xml file looks like this:

 

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

    <modelVersion>4.0.0</modelVersion>

   

    <groupId>support.alfresco</groupId>

    <artifactId>camel</artifactId>

    <name>Spring Boot + Camel</name>

    <version>0.0.1-SNAPSHOT</version>

    <description>Project Example.</description>

 

    <!-- Using Spring-boot 1.4.3 -->

    <parent>

        <groupId>org.springframework.boot</groupId>

        <artifactId>spring-boot-starter-parent</artifactId>

        <version>1.4.3.RELEASE</version>

    </parent>

 

    <!-- Using Camel version 2.18.1 -->

    <properties>

        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

        <camel-version>2.18.1</camel-version>

        <app.version>1.0-SNAPSHOT</app.version>

    </properties>

 

    <!-- Spring -->

    <dependencies>

        <dependency>

            <groupId>org.springframework.boot</groupId>

            <artifactId>spring-boot-starter-web</artifactId>

        </dependency>

 

        <!-- The Core Camel Java DSL based router -->

        <dependency>

            <groupId>org.apache.camel</groupId>

            <artifactId>camel-core</artifactId>

            <version>${camel-version}</version>

        </dependency>

 

        <!-- Camel Spring support -->

        <dependency>

            <groupId>org.apache.camel</groupId>

            <artifactId>camel-spring</artifactId>

            <version>${camel-version}</version>

        </dependency>

 

        <!-- Camel Metrics based monitoring component -->

        <dependency>

            <groupId>org.apache.camel</groupId>

            <artifactId>camel-metrics</artifactId>

            <version>${camel-version}</version>

        </dependency>

 

        <!-- Camel JMS support -->

        <dependency>

            <groupId>org.apache.camel</groupId>

            <artifactId>camel-jms</artifactId>

            <version>${camel-version}</version>

        </dependency>

 

        <!-- ActiveMQ component for Camel -->

        <dependency>

            <groupId>org.apache.activemq</groupId>

            <artifactId>activemq-camel</artifactId>

        </dependency>

 

        <!-- Camel CMIS which is based on Apache Chemistry support -->

        <dependency>

            <groupId>org.apache.camel</groupId>

            <artifactId>camel-cmis</artifactId>

            <version>2.14.1</version>

        </dependency>

 

        <!-- Camel Stream (System.in, System.out, System.err) support -->

        <dependency>

            <groupId>org.apache.camel</groupId>

            <artifactId>camel-stream</artifactId>

            <version>${camel-version}</version>

        </dependency>

 

        <!-- Camel JSON Path Language -->

        <dependency>

            <groupId>org.apache.camel</groupId>

            <artifactId>camel-jsonpath</artifactId>

            <version>${camel-version}</version>

        </dependency>

 

       <!-- Apache HttpComponents HttpClient - MIME coded entities -->

        <dependency>

            <groupId>org.apache.httpcomponents</groupId>

            <artifactId>httpmime</artifactId>

        </dependency>

 

        <!-- Camel HTTP (Apache HttpClient 4.x) support -->

        <dependency>

            <groupId>org.apache.camel</groupId>

            <artifactId>camel-http4</artifactId>

            <version>${camel-version}</version>

        </dependency>

 

        <!-- Camel SQL support -->

        <dependency>

            <groupId>org.apache.camel</groupId>

            <artifactId>camel-sql</artifactId>

            <version>${camel-version}</version>

        </dependency>

 

        <!-- Camel Zip file support -->

        <dependency>

            <groupId>org.apache.camel</groupId>

            <artifactId>camel-zipfile</artifactId>

            <version>${camel-version}</version>

        </dependency>

 

        <!-- Support for PostgreSQL database -->

        <dependency>

            <groupId>org.postgresql</groupId>

            <artifactId>postgresql</artifactId>

            <exclusions>

                <exclusion>

                    <groupId>org.slf4j</groupId>

                    <artifactId>slf4j-simple</artifactId>

                </exclusion>

            </exclusions>

        </dependency>

 

        <!-- Camel Component for Box.com -->

        <dependency>

            <groupId>org.apache.camel</groupId>

            <artifactId>camel-box</artifactId>

            <version>${camel-version}</version>

        </dependency>

 

        <!-- Camel script support -->

        <dependency>

            <groupId>org.apache.camel</groupId>

            <artifactId>camel-script</artifactId>

            <version>${camel-version}</version>

        </dependency>

 

        <!-- A simple Java toolkit for JSON -->

        <dependency>

            <groupId>com.googlecode.json-simple</groupId>

            <artifactId>json-simple</artifactId>

            <version>1.1.1</version>

            <!--$NO-MVN-MAN-VER$-->

        </dependency>

 

        <!-- XStream is a Data Format which to marshal and unmarshal Java objects to and from XML -->

        <dependency>

            <groupId>org.apache.camel</groupId>

            <artifactId>camel-xstream</artifactId>

            <version>2.9.2</version>

        </dependency>

 

        <!-- Jackson XML is a Data Format to unmarshal an XML payload into Java objects or to marshal Java objects into an XML payload -->

        <dependency>

            <groupId>org.apache.camel</groupId>

            <artifactId>camel-jackson</artifactId>

            <version>2.9.2</version>

        </dependency>

 

        <!-- test -->

        <dependency>

            <groupId>org.apache.camel</groupId>

            <artifactId>camel-test</artifactId>

            <version>${camel-version}</version>

            <scope>test</scope>

        </dependency>

 

        <!-- logging -->

        <dependency>

            <groupId>commons-logging</groupId>

            <artifactId>commons-logging</artifactId>

            <version>1.1.1</version>

        </dependency>

 

        <dependency>

            <groupId>org.apache.logging.log4j</groupId>

            <artifactId>log4j-api</artifactId>

            <scope>test</scope>

        </dependency>

 

        <dependency>

            <groupId>org.apache.logging.log4j</groupId>

            <artifactId>log4j-core</artifactId>

            <scope>test</scope>

        </dependency>

 

        <dependency>

            <groupId>org.apache.logging.log4j</groupId>

            <artifactId>log4j-slf4j-impl</artifactId>

            <scope>test</scope>

        </dependency>

 

        <!--  monitoring -->

        <dependency>

            <groupId>org.springframework.boot</groupId>

            <artifactId>spring-boot-starter-remote-shell</artifactId>

        </dependency>

 

        <dependency>

            <groupId>org.jolokia</groupId>

            <artifactId>jolokia-core</artifactId>

        </dependency>

 

    </dependencies>

    <build>

        <plugins>

            <plugin>

                <groupId>org.springframework.boot</groupId>

                <artifactId>spring-boot-maven-plugin</artifactId>

            </plugin>

        </plugins>

    </build>

</project>

 

 

3.2          Spring Boot

 

Spring Boot (https://projects.spring.io/spring-boot/) makes it easy to create stand-alone, production-grade Spring based Applications that you can "just run". Most Spring Boot applications need very little Spring configuration.

 

Features

  • Create stand-alone Spring applications
  • Embed Tomcat, Jetty or Undertow directly (no need to deploy WAR files)
  • Provide opinionated 'starter' POMs to simplify your Maven configuration
  • Automatically configure Spring whenever possible
  • Provide production-ready features such as metrics, health checks and externalized configuration

 

3.2.1       Spring Boot applicationContext.xml

We use the applicationContext.xml to define the java beans used by our application. Here we define the beans for connecting to Box, Database connectivity, ActiveMQ and Camel. For the purpose of this application we only need ActiveMQ and Box connectivity.

 

<?xml version="1.0" encoding="UTF-8"?>

<beans xmlns="http://www.springframework.org/schema/beans" xmlns:jdbc="http://www.springframework.org/schema/jdbc" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd

        http://www.springframework.org/schema/jdbc http://www.springframework.org/schema/jdbc/spring-jdbc-3.0.xsd

        http://camel.apache.org/schema/spring http://camel.apache.org/schema/spring/camel-spring.xsd">

   

 <!-- Define configuration file application.properties -->

    <bean id="placeholder" class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer">

        <property name="locations">

            <list>

                <value>classpath:application.properties</value>

            </list>

        </property>

        <property name="ignoreResourceNotFound" value="false" />

        <property name="searchSystemEnvironment" value="true" />

        <property name="systemPropertiesModeName" value="SYSTEM_PROPERTIES_MODE_OVERRIDE" />

    </bean>

   

    <!--  Bean for Box authentication. Please note you need a Box developer account -->

    <bean id="box" class="org.apache.camel.component.box.BoxComponent">

        <property name="configuration">

            <bean class="org.apache.camel.component.box.BoxConfiguration">

                <property name="userName" value="${box.userName}" />

                <property name="userPassword" value="${box.userPassword}" />

                <property name="clientId" value="${box.clientId}" />

                <property name="clientSecret" value="${box.clientSecret}" />

            </bean>

        </property>

    </bean>

 

    <!-- Define database connectivity -->

    <bean id="dataSource" class="org.springframework.jdbc.datasource.DriverManagerDataSource">

        <property name="driverClassName" value="org.postgresql.Driver" />

        <property name="url" value="jdbc:postgresql://localhost:5432/alfresco" />

        <property name="username" value="alfresco" />

        <property name="password" value="admin" />

    </bean>

   

    <!-- Configure the Camel SQL component to use the JDBC data source -->

    <bean id="sql" class="org.apache.camel.component.sql.SqlComponent">

        <property name="dataSource" ref="dataSource" />

    </bean>

   

    <!-- Create a connection to ActiveMQ -->

    <bean id="jmsConnectionFactory" class="org.apache.activemq.ActiveMQConnectionFactory">

        <property name="brokerURL" value="tcp://localhost:61616" />

    </bean>

   

    <!-- Create Camel context -->

    <camelContext id="camelContext" xmlns="http://camel.apache.org/schema/spring" autoStartup="true">

        <routeBuilder ref="myRouteBuilder" />

    </camelContext>

   

    <!-- Bean defining Camel routes -->

    <bean id="myRouteBuilder" class="support.alfresco.Route" />

</beans>

 

3.2.2       Application.java

The Application class is used to run our Spring application

 

package support.alfresco;

import org.springframework.boot.SpringApplication;

import org.springframework.context.annotation.ImportResource;

 

@ImportResource("applicationContext.xml")

public class Application {

                    public static void main(String[] args) {

                                        SpringApplication.run(Application.class, args);

                    }

}

 

3.2.3       Route.java

In the Route.java file we define the Camel routes to send traffic from Alfresco to Box.

The code below shows the routes to Execute cmis query, download content and properties, compress it and upload it to Box

 

                                   //////////////////////////////////////

                                        // Download Alfresco documents  //

                                        //////////////////////////////////////

                                        from("jms:alfresco.downloadNodes")

                                        .log("Running query: ${body}")

                                        .setHeader("CamelCMISRetrieveContent", constant(true))

                                        .to(alfrescoSender + "&queryMode=true")

                                        // Class FileContentProcessor is used to store the files in the filesystem together with the metadata

                                        .process(new FileContentProcessor());

                                       

                                       

                                        ///////////////////////////////////////////////

                                        // Move documents and metadata to Box  //

                                        //////////////////////////////////////////////

                                        from("file:/tmp/downloads?antInclude=*")

                                        .marshal().zipFile()

                                        .to("file:/tmp/box");

                                       

                                        from("file:/tmp/metadata?antInclude=*")

                                        .marshal().zipFile()

                                        .to("file:/tmp/box");

                                       

                                        from("file:/tmp/box?noop=false&recursive=true&delete=true")

                                        .to("box://files/uploadFile?inBody=fileUploadRequest");

 

Let’s break it down…

 

1. We read requests messages with a CMIS query from an ActiveMQ queue

from("jms:alfresco.downloadNodes")

 

For example a CMIS query to get the nodes on a specific folder looks like…

SELECT * FROM cmis:document WHERE IN_FOLDER ('workspace://SpacesStore/56c5bc2e-ea5c-4f6a-b817-32f35a7bb195') and cmis:objectTypeId='cmis:document'

 

 For testing purposes we can fire the message requests directly from the ActiveMQ admin UI (http://127.0.0.1:8161/admin/

 

 

2. We send the CMIS query to Alfresco defined as “alfrescoSender”

.to(alfrescoSender + "&queryMode=true")

 

3. Alfresco sender is defined in application.properties as

 

and mapped to “alfrescoSender” variable in Route.java

public static String alfrescoSender;

@Value("${alfresco.sender}")

public void setAlfrescoSender(String inSender) {

        alfrescoSender = inSender;

}

   

4. We store the files retrieved by the CMIS query in the filesystem using class FileContentProcessor for that job

.process(new FileContentProcessor());

 

5. Zip the content file and the metadata file 

from("file:/tmp/downloads?antInclude=*")

.marshal().zipFile()

.to("file:/tmp/box");

                                       

from("file:/tmp/metadata?antInclude=*")

.marshal().zipFile()

.to("file:/tmp/box");

 

6. And finally upload the content to Box 

from("file:/tmp/box?noop=false&recursive=true&delete=true")

.to("box://files/uploadFile?inBody=fileUploadRequest");

 

 

 4.    Building and Running the application

To build the application using maven we execute the following command: 

mvn clean install

 

To run the application execute the following command:

mvn spring-boot:run

5.    Monitoring with Hawtio

Hawtio (http://hawt.io) is a pluggable management console for Java stuff which supports any kind of JVM, any kind of container (Tomcat, Jetty, Karaf, JBoss, Fuse Fabric, etc), and any kind of Java technology and middleware.

Hawtion can help you to visualize Routes with real-time updates on messages metrics.

 

 

You can get statistical data for each individual route.

 

 

I hope this basic introduction to EIP and Apache Camel gives you some idea on how to integrate different applications using the existing end points provided by Apache Camel.