Skip navigation
All Places > Alfresco Content Services (ECM) > Blog > 2017 > June
2017

Introduction

 

Some of this week's questions have been related to query templates. Specifically, how can I use my custom properties in Share live search and standard search to find stuff. To do this you need to change some query templates.

 

What is a query template?

 

A query template is a way of taking a user query and generating a more complex query. Somewhat like the dismax parsers in SOLR. The Share template for live search looks like this:

 

%(cm:name cm:title cm:description TEXT TAG)

 

The % identifies something to replace and is followed by a field or, in this case, a group of fields to use for the replacement.  Whatever query a user enters for the template is applied to those fields. For groups of fields they are ORed together.

 

So for example, if you search for alfresco in the live search it will generate:

 

(cm:name:alfresco OR cm:title:alfresco OR cm:description:alfresco OR TEXT:alfresco OR TAG:alfresco)

 

If you search for =Alfresco in the live search it will generate:

 

(=cm:name:Alfresco OR =cm:title:Alfresco OR =cm:description:Alfresco OR =TEXT:Alfresco OR =TAG:Alfresco)

 

For multiple words they are ANDed together(by default), So for one two you get:

 

(cm:name:(one AND two) OR  cm:title:(one AND two) OR cm:description:(one AND two) OR TEXT:(one AND two) OR TAG:(one AND two))

 

If you search for the phrase "alfresco is great":

 

(cm:name:"alfresco is great" OR cm:title:"alfresco is great" OR cm:description:"alfresco is great" OR TEXT:"alfresco is great" OR TAG"alfresco is great")

 

Here, the template is simply defining the fields used for search. You can also add different importance to each term if you specify each field rather than a replacement group. For example:

 

(%cm:name^10 OR  %cm:title^2  OR %cm:description^1.5 OR %TEXT OR %TAG)

 

Here we are ranking name matches higher then title, title over description and all over TEXT (content) and TAGs.

 

Query templates can contain any query element - so we could limit the results to certain types.....but this would be better as a filter query .....

(%cm:name^10 OR  %cm:title^2  OR %cm:description^1.5 OR %TEXT OR %TAG) AND TYPE:content

 

You could split your template into two parts - one for content and one for folders  - if you want to change the balance of relevance between them.

 

Customizing Share Templates

 

There are two templates for share: one for live search and one for standard search. The configuration of advanced search is discussed elsewhere. In

<tomcat>\shared\classes\alfresco\extension\templates\webscripts\org\alfresco\slingshot\search

 

You will need two files with the following default content

  • search.get.config.xml

    <search>
       <default-operator>AND</default-operator>
       <default-query-template>%(cm:name cm:title cm:description ia:whatEvent ia:descriptionEvent lnk:title lnk:description TEXT TAG)</default-query-template>
    </search>

  • live-search-docs.get.config.xml

    <search>
      <default-operator>AND</default-operator>
      <default-query-template>%(cm:name cm:title cm:description TEXT TAG)</default-query-template>
    </search>

You will probably want to change them so they do not use field groups and you can add boosting as described above.

It is then easy to add your own properties and boosting to one or both of the templates.

 

Search public API

 

The search public API in ACS 5.2 and later supports templates (as has the Java API for some time). Each template is mapped to a field. 

{
    "query": {
        "language": "afts",
        "query": "WOOF:alfresco"
    },
    "include": ["properties"],
    "templates": [
        {
            "name": "WOOF",
            "template": "(%cm:name OR %cm:content^200 OR %cm:title OR %cm:description) AND TYPE:content"
        }
    ]
}

 

A template is assigned to a name which can be used in the query language just like any other field. If the name of a template is set as the default field any part of the query that does not specify a field will go to the template. That is how Share maps the user query to the template and exposes a default Google like query but also allows advanced users to execute any AFTS query. Most of the time it uses the default field and the template above.

 

As of 5.2.1, templates can also be used in the CMIS QL CONTAINS() expression

{
    "query": {
      "language": "cmis",
      "query": "select * from cmis:document where CONTAINS('alfresco')"
  },
  "include": ["properties"],
   "templates": [
    {
      "name": "TEXT",
      "template": "%cmis:name OR %cmis:description^200"
    }
  ]
}

CMIS CONTAINS() defines TEXT as the default field. This normally goes to cm:content but can be redefined, as we do here.

 

Summary

 

Query templates are a great way to hide query mapping from the end user. The Share templates allow you to add your own properties to Share queries and tweak the weight given to each field - perhaps giving name matches greater prominence.

We are happy to announce our latest release of Community RM based on the public community code-line. Community release 2.6.a is now available and provides support for Alfresco Community 201702. In this release we managed to include some exciting new features as well as some important bug fixes.

 

Governance Services v1 REST API

Just as we promised in the last release blog post we completed a set of REST API endpoints enabling you to use custom clients to manage the RM repository. You can easily explore and test the endpoints using the GS API Explorer included in the release. Deploy the REST API Explorer on the same port as Alfresco and test the REST API with your server directly from the API Explorer.

 

 

 

The endpoints cover basic RM functionality. You can manage the RM site, record categories, record folders, unfiled containers and unfiled record folders, upload new records or declare an existing file as record, file records in the fileplan and get information about transfers.

 

You'll notice that the rm-community project now has a new module called rm-community-rest-api-explorer which contains the Swagger REST API definition. Even though we don't generate our backend code from the swagger definition file it should still be useful for those wanting to generate API client code.

 

RM Benchmark Driver

We have also been working on our benchmark driver that allows us to evaluate our performance and has already proven itself useful by helping us detect and fix concurrency and performance issues. If you want to evaluate our performance yourself you can use the rm-benchmark project together with the alfresco benchmark server

 

To start the RM benchmark driver you need to start the Alfresco Benchmark Server as described in its readme file and start the RM benchmark driver by calling mvn clean install -Pstart-benchmark in the rm-benchmark project. You'll notice that when creating a new test you can choose the Alfresco Records Management Benchmark Driver.

 

 

One of the main reasons for using the benchmark driver is to create large amounts of data. This driver allows you to create the RM site, add users, create a fileplan structure and upload records in a very configurable way. By setting Maximum Active Loaders you can see how the environment behaves when accessed by multiple users at once, as it would happen in a real environment.

 

 

Policy notification for record declaration

 

We added a new policy to notify observers when a file has been declared as a record. To register a behavior to this policy implement RecordManagementPolicies.BeforeRecordDeclaration  that fires just before the document is declared as record or RecordManagementPolicies.OnRecordDeclaration that fires right after the document has been declared as record. Both events carry the reference to the node that is being declared as record. For more information refer to the jira story.

 

What's coming up next?

 

We plan to enhance the REST API with more RM specific endpoints allowing for example disposition schedule and holds management. You might have noticed that even though you can only have one RM site on your server the REST API endpoints with names like gs-sites, file-plans and so on seem to allow you to manage multiple sites. You can't yet but you might in the future.

 

Please let us know what you think of RM 2.6.a and what you'd like to see in our future versions. Record Management Community is an open source project so do get involved and help us develop the product.

 

Links

Introduction

 

This blog compares query support provided by transactional metadata query (TMDQ) and the Index Engine. The two differ in a number of respects which are described here. This blog is an evolution of material previously presented at the Alfresco Summit in 2013.

 

TMDQ delivers support for transactional queries that used to be provided by Lucene in Alfresco Content Services (ACS) prior to version 5.0. In ACS 4.1 SOLR was introduced with an eventually consistency query model. In 5.0, Lucene support was removed in favour of TMDQ. As TMDQ replaced Lucene, some restrictions on its use are similar. For example, both post processes results for permissions in a similar way and there are restrictions on the scale of the result sets it can cope with as a result. The Index Engine has no such restrictions. It also seems from use that if a query can be run against the database the scope of the query is such that it will probably have no issue with the number of results returned.

 

 

Overview

 

Some queries can be executed both transactionally against the database or with eventual consistency against the Index Engine. Only queries using the AFTS or CMIS query languages can be executed against the database. The Lucene query language can not be used against the database while selectNodes  (XPATH) on the Java API always goes against the database, walking and fetching nodes as required.

 

In general, TMDQ does not support: structural queries, full text search, special fields like SITE that are derived from structure and long strings (> 1024 characters). Text fields support exact(ish) and pattern based matching subject to the database collation. Filter queries are rewritten along with the main query to create one large query. Ordering is fine, but again subject to database collation for text.

 

TMDQ does not support faceting. It does not support any aggregation: this includes counting the total number of matches for the query. FINGERPRINT support is only on the Index Server.

 

AFTS and CMIS queries are parsed to an abstract form. This is then sent to an execution engine. Today, there are two execution engines: the database and the Index Engine. The default is to try the DB first and fall back to the Index Engine - if the query is not supported against the DB. This is configurable for a search sub-system and per query using the Java API. Requesting consistency should appear in the public API "some time soon".

 

Migrations from Alfresco Content Service prior to 5.0 will require two optional patches to be applied to support TMDQ. Migrations to 5.0 require one patch: 5.0 to 5.1 a second. New installations will support TMDQ by default. The patches add supporting indexes that make the database ~25% larger.

 

 

Public API and TMDQ

 

From the public API, anything that is not a simple query, a filter query, an option that affects these, or an option that affects what is returned for each node in the results, is not supported by TMDQ. The next two sections consider what each query language supports.

 

Explicitly, TMDQ supports: 

  • query
  • paging
  • include
  • includeRequest
  • fields
  • sort
  • defaults
  • filterQueries
  • scope (single)
  • limits for permission evaluation

 

The default limits for permission evaluation will restrict the results returned from TMDQ based on both the number of results processed and time taken. These can be increased if required.

 

The public API does not support TMDQ for:

  • templates
  • localisation and timezone
  • facetQueries
  • facetFields
  • facetIntervals
  • pivots
  • stats
  • spellcheck
  • highlight
  • ranges facets
  • SOLR date math

 

Some of these will be ignored and produce transactional results; others will fail and be eventual.

 

The public API will ignore the SQL select part of a CMIS query and decorate the results as it would do for AFTS.

 

 

CMIS QL & TMDQ

 

For the CMIS query language all expressions except for CONTAINS(), SCORE() and IN_TREE() can now be executed against the database. Most data types are supported except for the CMIS uri and html types. Strings are supported but only if 1024 characters or less in length. In Alfresco Content Services 5.0, OR, decimal and boolean types were not supported; they are from 5.1 on. Primary and secondary types are supported and require inner joins to link them together - they can be somewhat tedious to write and use.

 

You can skip joins to secondary types from the fetch in CMIS using the public API. You would need an explicit SELECT list and supporting joins from a CMIS client. You still need joins to secondary types for predicates and ordering. As CMIS SQL supports ordering as part of the query language you have to do it there and not via the public API sort.  

 

Post 5.2, left outer join from primary and secondary types to secondary types will also be supported. This covers queries to find documents that do not have an aspect applied - which is currently best implemented using something like

CONTAINS('-ASPECT:hidden')

today.

For multi-valued properties, the CMIS query language supports ANY semantics from SQL 92. A query against a multi-lingual property like title or description is treated as multi-valued and may match in any language. In the results you will see the best value for your locale - which may not match the query. Ordering will consider any value.

 

UPPER() and LOWER()

 

UPPER() and LOWER() functions were in early drafts for the CMIS 1.0 specification and sunsequently dropped. These are not part of the CMIS 1.0 or 1.1 specifications. They are currently supported in the CMIS query language for TMDQ only as ways to address specific database collation issues and case sensitivity. Only equality is supported.  LIKE is not currently supported. For example:

 

{
   "query": {
       "language": "cmis",
       "query" : "select * from cmis:document where LOWER(cmis:name) = 'project contract.pdf'"
   }
}

 

Alfresco FTS QL & TMDQ

 

It is more difficult to write AFTS queries that use TMDQ as the default behaviour is to use full text queries for text: these can not go against the database. Again, special fields like SITE and TAG that are derived from structure will not go to the database. TYPE, ASPECT and the related exact matches are OK. All property data types are fine but strings again have to be less than 1024 characters in length. Text queries have to be prefixed with = to avoid full text search. PARENT is supported. OR is supported in 5.1 and later.

 

Ranges are not currently supported - there is no good reason for this - it needs to generate a composite constraint which we have not done. PATH is not supported nor is ANCESTOR.

 

Subtle differences

 

  1. The database has fixed collation as defined by the database schema. SOLR can use any collation. The two engines can produce different results for lexical comparison, case sensitivity, ordering, when using mltext properties, etc

  2. The database results include hidden nodes. You can exclude them in the query. The Index Engine will never include hidden nodes and respects the index control aspect.

  3. The database post filters the results to apply permissions. TMDQ is not intend to scale to more than 10s of thousands of nodes. It will not perform well for users who can read 1 node in a million. It can not and will not tell you how many results matched the query. To do this could require an inordinate number of permission checks. It does enough to give you the page requested and if there is more. The Index Engine can apply permissions at query and facet time to billions of nodes.
    For the same reason, do not expect any aggregation support in TMDQ: there is currently no plan to push access restriction into the database at query time.

  4. CONTAINS() support is actually more complicated. The pure CMIS part of the query and CONTAINS() part are melded together into a single abstract query representation. If the overall query, both parts, can go against the database that is fine. You have to follow the rules for AFTS & TMDQ. By default, in CMIS the CONTAINS() expression implies full text search so queries will go to the Index Server.

  5. The database does not score. It will return results in some order that depends on the query plan - unless you ask for specific ordering. For a three part OR query where some docs match more then one constraint they are treated equal. In the Index Engine - the more parts of an OR match the higher the score. The docs that match more optional parts of the query will come higher up.

  6. Queries from share will not use TMDQ as they will most likely have a full text part to the query and ask for facets.

 

Summary

 

Transactional Metadata Query and the Index Engine are intended to support different use case. They differ in queries and options that they support and subtly in the results with respect to collation and scoring. We default to trying transactional support first for historical reasons and it seems to be what most people prefer if they can have it.

1.     Project Objective

The aim of this blog is to show you how to create and run a Docker container with a full ELK (Elasticsearch, Logstash and Kibana) environment containing the necessary configuration and scripts to collect and present data to monitor your Alfresco application.

 

Elastic tools can ease the processing and manipulation of large amounts of data collected from logs, operating system, network, etc.

 

Elastic tools can be used to search for data such as errors, exceptions and debug entries and to present statistical information such as throughput and response times in a meaningful way. This information is very useful when monitoring and troubleshooting Alfresco systems.

 

2.     Install Docker on Host machine

Install Docker on your host machine (server) as per Docker website. Please note the Docker Community Edition is sufficient to run this project (https://www.docker.com/community-edition)

 

3.     Virtual Memory

Elasticsearch uses a hybrid mmapfs / niofs directory by default to store its indices. The default operating system limits on mmap counts is likely to be too low, which may result in out of memory exceptions.

 

On Linux, you can increase the limits by running the following command as root on the host machine:

 

# sysctl -w vm.max_map_count=262144

 

To set this value permanently, update the vm.max_map_count setting in /etc/sysctl.conf. To verify the value has been applied run:

 

# sysctl vm.max_map_count

 

4.     Download “Docker-ELK-Alfresco-Monitoring” container software

Download the software to create the Docker container from GitHub: https://github.com/miguel-rodriguez/Docker-ELK-Alfresco-Monitoring and extract the files to the file system.

 

5.     Creating the Docker Image

Before creating the Docker image we need to configure access to Alfresco’s database from the Docker container. Assuming the files have been extracted to /opt/docker-projects/Docker-ELK-Alfresco-Monitoring-master, edit files activities.properties and workflows.properties and set the access to the DB server as appropriate, for example:

 

#postgresql settings

db_type=postgresql

db_url=jdbc:postgresql://172.17.0.1:5432/alfresco

db_user=alfresco

db_password=admin

 

Please make sure the database server allows for remote connections to Alfresco’s database. A couple of examples how to configure the database are shown here:

  • For MySQL

Access your database server as an administrator and grant the correct permissions i.e.

 

# mysql -u root -p

grant all privileges on alfresco.* to alfresco@'%' identified by 'admin';

 

The grant command is granting access to all tables in ‘alfresco’ database to ‘alfresco’ user from any host using ‘admin’ password.

Also make sure the bind-address parameter in my.cnf allows for external binding i.e. bind-address = 0.0.0.0

 

  • For PostgreSQL

Change the file ‘postgresql.conf’ to listen on all interfaces

 

listen_addresses = '*'

 

then add an entry in file ‘pg_hba.conf’ to allow connections from any host

 

host all all 0.0.0.0/0 trust

 

Restart PostgreSQL database server to pick up the changes.

We have installed a small java application inside the container in /opt/activities folder that executes calls against the database configured in /opt/activities/activities.properties file.

For example to connect to PostgreSQL we have the following settings:

 

db_type=postgresql

db_url=jdbc:postgresql://172.17.0.1:5432/alfresco

db_user=alfresco

db_password=admin

 

 We also need to set the timezone in the container, this can be done by editing the following entry in the startELK.sh script.

 

export TZ=GB

 

From the command line execute the following command to create the Docker image:

 

# docker build --tag=alfresco-elk /opt/docker-projects/Docker-Alfresco-ELK-Monitoring/

Sending build context to Docker daemon  188.9MB

Step 1/33 : FROM sebp/elk:530

530: Pulling from sebp/elk

.......

 

6.     Creating the Docker Container

Once the Docker image has been created we can create the container from it by executing the following command:

 

# docker create -it -p 5601:5601 --name alfresco-elk alfresco-elk:latest

 

7.     Starting the Docker Container

Once the Docker container has been created it can be started with the following command:

 

# docker start alfresco-elk

 

Verify the ELK stack is running by accessing Kibana on http://localhost:5601 on the host machine.

At this point Elasticsearch and Kibana do not have any data…so we need to get Alfresco’s logstash agent up and running to feed some data to Elasticsearch.

 

8.     Starting logstash-agent

The logstash agent consists of logstash and some other scripts to capture entries from Alfresco log files, JVM stats using jstatbeat (https://github.com/cero-t/jstatbeat), entries from Alfresco audit tables, DB slow queries, etc.

 

Copy the logstash-agent folder to a directory on all the servers running Alfresco or Solr applications.

Assuming you have copied logstash-agent folder to /opt/logstash-agent, edit the file /opt/alfresco-agent/run_logstash.sh and set the following properties according to your own settings

 

export tomcatLogs=/opt/alfresco/tomcat/logs

export logstashAgentDir=/opt/logstash-agent

export logstashAgentLogs=${logstashAgentDir}/logs

export alfrescoELKServer=172.17.0.2


 9.    Configuring Alfresco to generate data for monitoring

Alfresco needs some additional configuration to produce data to be sent to the monitoring Docker container.

 

9.1   Alfresco Logs

Alfresco logs i.e. alfresco.log, share.log, solr.log or the equivalent catalina.out can be parsed to provide information such as number of errors or exceptions over a period of time. We can also search these logs for specific data.

 

The first thing is to make sure the logs are displaying the full date time format at the beginning of each line. This is important so we can display the entries in the correct order.

Make sure in your log4j properties files (there is more than one) the file layout pattern is as follows:

 

log4j.appender.File.layout.ConversionPattern=%d{yyyy-MM-dd} %d{ABSOLUTE} %-5p [%c] [%t] %m%n

 

This will produce log entries with the date at the beginning of the line as this one:

2016-09-12 12:16:28,460 INFO  [org.alfresco.repo.admin] [localhost-startStop-1] Connected to database PostgreSQL version 9.3.6

Important Note: If you upload catalina files then don’t upload alfresco (alfresco, share, solr) log files for the same time period since they contain the same entries and you will end up with duplicate entries in the Log Analyser tool.

 

Once the logs are processed the resulting data is shown:

  • Number errors, warnings, debug, fatal messages, etc over time
  • Total number of errors, warnings, debug, fatal messages, etc
  • Common messages that may reflect issues with the application
  • Number of entries grouped by java class
  • Number of exceptions logged
  • All log files are searchable using ES (Elasticsearch) search syntax

 

 

9.2    Document Transformations

Alfresco performs document transformations for document previews, thumbnails, indexing content, etc. To monitor document transformations enable logging for class “TransformerLog”  by adding the following line to tomcat/shared/classes/alfresco/extension/custom-log4j.properties on all alfresco nodes:

 

log4j.logger.org.alfresco.repo.content.transform.TransformerLog=debug

 

The following is a sample output from alfresco.log file showing document transformation times, document extensions, transformer used, etc.

 

2016-07-14 18:24:56,003  DEBUG [content.transform.TransformerLog] [pool-14-thread-1] 0 xlsx png  INFO Calculate_Memory_Solr Beta 0.2.xlsx 200.6 KB 897 ms complex.JodConverter.Image<<Complex>>

 

Once Alfresco logs are processed the following data is shown for transformations:

  • Response time of transformation requests over time
  • Transformation throughput
  • Total count of transformations grouped by file type
  • Document size, transformation time, transformer used, etc

 

 

9.3    Tomcat Access Logs

Tomcat access logs can be used to monitor HTTP requests, throughput and response times. In order to get the right data format in the logs we need to add/replace the “Valve” entry in tomcat/conf/server.xml file, normally located at the end of the file, with this one below.

 

<Valve

  className="org.apache.catalina.valves.AccessLogValve"   

  directory="logs"

  prefix="access-" suffix=".log"

  pattern='%a %l %u %t "%r" %s %b "%{Referer}i" "%{User-agent}i" %D "%I"'

  resolveHosts="false"

/>

 

 

 For further clarification on the log pattern refer to: https://tomcat.apache.org/tomcat-7.0-doc/config/valve.html#Access_Logging

Sample output from tomcat access log under tomcat/logs directory. The important fields here are the HTTP request, the HTTP response status i.e. 200 and the time taken to process the request i.e. 33 milliseconds

 

127.0.0.1 - CN=Alfresco Repository Client, OU=Unknown, O=Alfresco Software Ltd., L=Maidenhead, ST=UK, C=GB [14/Jul/2016:18:49:45 +0100] "POST /alfresco/service/api/solr/modelsdiff HTTP/1.1" 200 37 "-" "Spring Surf via Apache HttpClient/3.1" 33 "http-bio-8443-exec-10"

 

Once the Tomcat access logs are processed the following data is shown: 

  • Response time of HTTP requests over time
  • HTTP traffic throughput
  • Total count of responses grouped by HTTP response code
  • Tomcat access logs files are searchable using ES (Elasticsearch) search syntax

 

  

9.4    Solr Searches

We can monitor Solr queries and response times by enabling debug for class SolrQueryHTTPClient by adding the following entry to tomcat/shared/classes/alfresco/extension/custom-log4j.properties on all Alfresco (front end) nodes:

 

log4j.logger.org.alfresco.repo.search.impl.solr.SolrQueryHTTPClient=debug

 

Sample output from alfresco.log file showing Solr searches response times:

 

DEBUG [impl.solr.SolrQueryHTTPClient] [http-apr-8080-exec-6]    with: {"queryConsistency":"DEFAULT","textAttributes":[],"allAttributes":[],"templates":[{"template":"%(cm:name cm:title cm:description ia:whatEvent ia:descriptionEvent lnk:title lnk:description TEXT TAG)","name":"keywords"}],"authorities":["GROUP_EVERYONE","ROLE_ADMINISTRATOR","ROLE_AUTHENTICATED","admin"],"tenants":[""],"query":"((test.txt  AND (+TYPE:\"cm:content\" +TYPE:\"cm:folder\")) AND -TYPE:\"cm:thumbnail\" AND -TYPE:\"cm:failedThumbnail\" AND -TYPE:\"cm:rating\") AND NOT ASPECT:\"sys:hidden\"","locales":["en"],"defaultNamespace":"http://www.alfresco.org/model/content/1.0","defaultFTSFieldOperator":"OR","defaultFTSOperator":"OR"}

 2016-03-19 19:55:54,106 

 

DEBUG [impl.solr.SolrQueryHTTPClient] [http-apr-8080-exec-6] Got: 1 in 21 ms

 

Note: There is no specific transaction id to correlate the Solr search to the corresponding response. The best way to do this is to look at the time when the search and response were logged together with the java thread name, this should give you a match for the query and its response. 

Once Alfresco logs are processed the following data is shown for Solr searches:

  • Response time for Solr searches over time
  • Solr searches throughput
  • Solr queries, number of results found and individual response times

 

 

9.5    Database Monitoring

Database performance can be monitored with two different tools: p6spy and packetbeats. The main difference between these tools is that p6spy acts as a proxy jdbc driver and packetbeat is a network traffic sniffer. Also packetbeat can only sniff traffic for MySQL and PostgreSQL databases while p6spy can also do Oracle among others.

 

P6spy

P6spy software is delivered as a jar file that needs to be placed in the application class path i.e. tomcat/lib/ folder. There are 3 steps to get p6spy configured and running.

 

  • Place p6spy jar file in tomcat/lib/ folder
  • Create spy.properties file also in tomcat/lib/folder with the following configuration

 

modulelist=com.p6spy.engine.spy.P6SpyFactory,com.p6spy.engine.logging.P6LogFactory,com.p6spy.engine.outage.P6OutageFactory

appender=com.p6spy.engine.spy.appender.FileLogger

deregisterdrivers=true

dateformat=MM-dd-yy HH:mm:ss:SS

appender=com.p6spy.engine.spy.appender.FileLogger

autoflush=true

append=true

useprefix=true

 

# Update driver list correct driver i.e.

# driverlist=oracle.jdbc.OracleDriver

# driverlist=org.mariadb.jdbc.Driver

# driverlist=org.postgresql.Driver

driverlist=org.postgresql.Driver

 

# Location where spy.log file will be created

logfile=/opt/logstash-agent/logs/spy.log

 

# Set the execution threshold to log queries taking longer than 1000 milliseconds (slow queries only)

executionThreshold=1000

 

Note: if there are no queries taking longer than the value in executionThreshod (in milliseconds) then the file will not be created.

Note: set the “logfile” variable to the logs folder inside the logstash-agent path as shown above.

 

  • Add entry to tomcat/conf/Catalina/localhost/alfresco.xml file

 

Example for PostgreSQL:

 

<Resource

  defaultTransactionIsolation="-1"

  defaultAutoCommit="false"

  maxActive="275"

  initialSize="10"

  password="admin"

  username="alfresco"

  url="jdbc:p6spy:postgresql://localhost/p6spy:alfresco"

  driverClassName="com.p6spy.engine.spy.P6SpyDriver"

  type="javax.sql.DataSource"

  auth="Container"

  name="jdbc/dataSource"

/>

 

Example for Oracle:

 

<Resource

  defaultTransactionIsolation="-1"

  defaultAutoCommit="false"

  maxActive="275"

  initialSize="10"

  password="admin"

  username="alfresco"

  url="jdbc:p6spy:oracle:thin:@192.168.56.101:1521:XE"   

  driverClassName="com.p6spy.engine.spy.P6SpyDriver"

  type="javax.sql.DataSource"

  auth="Container"

  name="jdbc/dataSource"

/>

 

 Example for MariaDB:

 

<Resource

  defaultTransactionIsolation="-1"

  defaultAutoCommit="false"

  maxActive="275"

  initialSize="10"

  password="admin"

  username="alfresco"

  url="jdbc:p6spy:mariadb://localhost:3306/alfresco"

  driverClassName="com.p6spy.engine.spy.P6SpyDriver"  

  type="javax.sql.DataSource"

  auth="Container"

  name="jdbc/dataSource"

/>

 

Once the spy.log file has been processed the following information is show:

  • DB Statements execution time over time
  • DB Statements throughput over time
  • Table showing individual DB statements and execution times
  • DB execution times by connection id 

 

 

9.6    Alfresco Auditing

If you want to audit Alfresco access you can enable auditing by adding the following entries to alfresco-global.properties file:

 

# Enable auditing
audit.enabled=true
audit.alfresco-access.enabled=true
audit.tagging.enabled=true
audit.alfresco-access.sub-actions.enabled=true
audit.cmischangelog.enabled=true

 

Now you can monitor all the events generated by alfresco-access audit group.

 

 

Note: Only one of the logstash agents should collect Alfresco's audit data since the script gathers data for the whole cluster/solution. So edit the file logstash_agent/run_logstash.sh in one of the other Alfresco nodes and set the variable collectAuditData to "yes" as indicated below:

 

collectAuditData="yes"

 

Note: Also make sure you update the login credentials for Alfresco in the audit*sh files. Defaults to admin/admin.

 

10.    Starting and Stopping the logstash agent

The logstash agent script can be started from the command line with "./run_logstash.sh start" as shown below:

 

./run_logstash.sh start
Starting logstash
Starting jstatbeat
Starting dstat
Staring audit access script

 

and can be stopped with the command "./run_logstash.sh stop" as shown below:

 

./run_logstash.sh stop
Stopping logstash
Stopping jstatbeat
Stopping dstat
Stopping audit access script

 

11. Accessing the Dashboard

Finally access the dashboard by going to this URL http://<docker host IP>:5601 (use the IP of the server where you installed the Docker container) and clicking on the “Dashboard” link on the left panel and then click on the “Activities” link.

 

 

The data should be available for the selected time period.

 

 

Navigate to the other dashboards by clicking on the appropriate link.

 

 

11.    Accessing the container

To enter the running container use the following command:

 

# docker exec -i -t alfresco-elk bash

 

And to exit the container just type “exit” and you will find yourself back on the host machine.

 

12.    Stopping the container

To stop the container from running type the following command on the host machine:

 

Header 1

# docker stop alfresco-elk

 

13.    Removing the Docker Container

To delete the container you first need to stop the container and then run the following command:

 

# docker rm alfresco-elk

 

14.    Removing the Docker Image

To delete the container you first need to stop the container and then run the following command:

 

# docker rmi alfresco-elk:latest

 

15.   Firewall ports

 

If you have a firewall make sure the following ports are ope:

 

Redis: 6379

Kibana: 5601

Database server: this depends on the DB server being used i.e. PostgreSQL is 5432, MySQL 3306, etc

 

 

Happy Monitoring!!!

andy1

Explaining Eventual Consistency

Posted by andy1 Employee Jun 19, 2017

Introduction

 

Last week, eventual consistency cropped up more than usual. What it means, how to understand it and how to deal with it. This is not about the pros and cons of eventual consistency, when you may want transactional behaviour, etc. This post describes what eventual consistency is and its foibles in the context of the Alfresco Index Engine. So here are the answers to last week's questions ....

 

Background

 

Back in the day, Alfresco 3.x supported a transactional index of metadata using Apache lucene. Alfresco 4.0 introduced an eventually consistent index based on Apache SOLR 1.4.  Alfresco 5.0 moved to SOLR 4 and also introduced transaction metadata query (TMDQ). TMDQ was added specifically to support the transactional use cases that used to be addressed by the lucene index in previous versions. TMDQ uses the database and adds a bunch of required indexes as optional patches. Alfresco 5.1 supports a later version of SOLR 4 and made improvements to TMDQ. Alfresco Content Services 5.2 supports SOLR 4, SOLR 6 and TMDQ.

 

When changes are made to the repository they are picked up by SOLR via a polling mechanism. The required updates are made to the Index Engine to keep the two in sync. This takes some time. The Index Engine may well be in a state that reflects some previous version of the repository. It will eventually catch up and be consistent with the repository - assuming it is not forever changing.

 

When a query is executed it can happen in one of two ways. By default, if the query can be executed against the database it is; if not, it goes to the Index Engine. There are some subtle differences between the results: For example, collation and how permission are applied. Some queries are just not supported by TMDQ. For example, facets, full text, "in tree" and structure. If a query is not supported by TMDQ it can only go to the Index Engine.

 

What does eventual consistency mean?

 

If the Index Engine is up to date, a query against the database or the Index Engine will see the same state. The results may still be subtly different - this will be the next topic! If the index engine is behind the repository then a query may produce results that do not, as yet, reflect all the changes that have been made to the repository.

 

Nodes may have been deleted

  • Nodes are present in the index but deleted from the repository
    • Deleted nodes are filtered from the results when they are returned from the query
      • As a result you may see a "short page" of results even though there are more results
      • (we used to leave in a "this node has been deleted" place holder but this annoyed people more)
    • The result count may be lower than the facet counts
    • Faceting will include the "to be deleted nodes" in the counts
      • There is no sensible post fix for this other than re-querying to filtering stuff out and someone could have deleted more....

 

Nodes may have been added

  • Nodes have been added to the repository but are not yet in the index at all
    • These new nodes will not be found in the results or included in faceting
  • Nodes have been added to the repository but only the metadata is present in the index
    • These nodes cannot be found by content

 

Nodes metadata has changed

  • The index reflects out of date metadata
    • Some out of date nodes may be in the results when they should not be 
    • Some out of date nodes may be missing from the results when they should not be
    • Some nodes may be counted in the wrong facets due to out of date metadata
    • Some nodes may be ordered using out of date metadata

 

Node Content has changed

  • The index reflects out of date content but the metadata is up to date
    • Some out of date nodes may be in the results when they should not be 
    • Some out of date nodes may be missing from the results when they should not be

 

Node Content and metadata has changed

  • The index reflects the out of date metadata and content
  • The index reflects out of date content (the metadata is updated first)
    • Some out of date nodes may be in the results when they should not be 
    • Some out of date nodes may be missing from the results when they should not be
    • Some nodes may be counted in facets due to out of date metadata

 

An update has been made to an ACL (adding an access control entry to a node)

  • The old ACL is reflected in queries
    • Some out of date nodes may be in the results when they should not be
    • Some out of date nodes may be missing from the results when they should not be
    • The ACLs that are enforced may be out of date but are consistent with the repository state when the node was added to the index. Again, to be clear, the node and ACL may be out of date but permission for the content and metadata is consistent with this prior state. For nodes in the version index, they are assigned the ACL of the "live" node when the version was added to the index.

 

A node may be continually updated

  • It is possible that such a node may never appear in the index.
  • By default, when the Index Engine tracks the repository it only picks up changes that are older than one second. This is configurable. If we are indexing node 27 in state 120, we only add information for node 27 if it is still in that state. If it has moved on to state 236, say, we will skip node 27 until we are indexing state 236 - assuming it has not moved on again. This avoids pulling "later" information into the index which may have an updated ACE or present an overall view inconsistent with a repository state. Any out of date-ness means we have older information in the index - never newer information. 

 

How do I deal with eventual consistency?

 

To a large extent this depends on your use case. If you do need a transactional answer, the default behaviour will give you one if it can. For some queries it is not possible to get a transactional answer. You can force this in the Java API and it will be coming soon in the public API.

 

If you are using SOLR 6, the response from the search public API will return some information to help. It will report the index state consistent with the query.

 

...

"context": {

    "consistency": {

        "lastTxId": 18

    }
},

....

 

This can be compared with the last transaction on the repository. If they are equal the query was consistent.

 

In fact, we know the repository state for each node when we added it to the index. In the future we may check if the index state for a node reflects the repository state for the same node - we can mark nodes as potentially out of date - but only for the page of results. Faceting and aggregation is much more of a pain. Marking potentially out of date nodes and providing other indicators of consistency are on the backlog for the public API.

 

If your query goes to the Index Server and it is not up to date you could see any of the issues described above in what eventual consistency means.

 

Using the Index Engine based on SOLR 6 gives better consistency for metadata updates. Some update operations that infrequently require many nodes to be updated are now done in the background - these are mostly move and and rename operations that affect structure. So a node is now renamed quickly. Any structural information that is consequently changed on all of its children is done after. Alfresco Search Services 1.0.0 also includes improved commit coordination and concurrency improvements. These both reduce the time for changes to be reflected in the index. Some of the delay also comes from the work that SOLR does before an index goes live. This can be reduced by tuning. The cost is usually a query performance hit later.  

 

Hybrid Query?

 

Surely we can take the results from the Index Engine for  transactions 1-1076 and add 1077 - 2012 from TMDQ?

 

It's not quite that simple. TMDQ does not support all queries, it does not currently support faceting and aggregation, scoring does not really exist and collation is not as flexible or the same. You reinvent the query coordination that is already in SOLR to combine the two result sets. It turns out to be a difficult but not forgotten problem.

 

Summary

 

For most use cases eventual consistency is perfectly fine. For transactional use cases TMDQ is the only solution unless the index and repository are in sync. The foibles of eventual consistency are well known and hopefully clearer, particularly in the context of the Alfresco Index Server.

 

 

 

Alfresco SDK Setup on Eclipse

  1.      Install Maven:
  •    Add the path of bin folder of the created directory apache-maven-3.5.0 to the PATH environment variable

Confirm with mvn –v after entering bin directory.

 

  1.      Create Maven project:
  •    Open eclipse

   Make sure your internet connection doesn't have a firewall enabled 

  •  Go to Windows – Open Perspective – Java
  •  File – New –Project – Maven Project – Leave everything as default - Click Next – Click on Configure button – Add Remote catalog –

Catalog File: https://artifacts.alfresco.com/nexus/content/groups/public/archetype-catalog.xml

          Description: Alfresco Public

  •     Click OK
  •     On the ‘New Maven Project’ dialog, select All catalog in the drop down
  • Type org.alfresco.maven in the filter: select allinone – Click Next

Group Id: org.alfresco.training

Artifact Id: ws4js-repo

  •    Click on Finish
  1.         If an error comes saying plugin execution not covered by life cycle configuration.

Right click on error – Quick fix – select permanently mark goal run in pom.xml as ignored in Eclipse build 

  1. 4. Right click on project – Select Maven- Update project
  2.      Create custom Run Configuration for Maven project:
  • On the Eclipse toolbar select 'Run', and then 'Run Configurations’ – This will open the 'Run Configurations' dialog box.
  • Select the Maven Build

"Maven Integration Test REPO" and edit the Base directory to point to your project. Make sure it

Matches the following:

Base directory: ${workspace_loc:/ws4js-repo}

In Goal: clean install –DskipTests=true alfresco:run

 

Note: If an error comes saying: no compiler is provided in this environment perhaps you are running on a jre instead of jdk

Go to Windows – Preferences – Java – Installed JREs

Change the path from jre to jdk click OK

Then Run the same way.

It will take some time

 

  1.     Check browser : localhost:8080/share

Credentials:

Login: admin

Password: admin

Filter Blog

By date: By tag: