Skip navigation
All Places > Alfresco Premier Services > 2018 > January
2018

This article provides information on how to migrate a project using the Alfresco Maven SDK version 2.2.0 to the new version 3.0.

 

Before starting I recommend to read the official SDK 3.0 documentation : http://docs.alfresco.com/5.2/concepts/sdk-getting-started.html, the Alfresco Beta User guide is also a good reference : https://community.alfresco.com/docs/DOC-6855-alfresco-sdk-30-beta-user-guide

 

Why upgrade to new SDK

 

The SDK version 3 get rid of the compatibility matrix and supports the following Alfresco version :

  • 4.2.X
  • 5.0.X
  • 5.1.X
  • 5.2.X

 

The SDK also supports the community release.

 

The targeted version is set by a property (alfresco.platform.version and alfresco.share.version) so it's easy to switch target version.

 

Upgrading to newest version of the SDK is easier than before since the version is set using the property : alfresco.sdk.version

 

The SDK 3 now produce JAR file by default, but can also produce AMPs if you need. All the archetype have been revamped to follow standard folder structure.

 

Hot reloading is available using HotSwapAgent or Jrebel (you need license for this one). Both allow to reload classes and web-resources. But Jrebel is more powerful and allows to reload spring context file.

For more information about HotReloading and how to setup please refer to the official documentation : http://docs.alfresco.com/5.2/concepts/sdk-hot-reloading.html

 

The new version of SDK is highly configurable. The Alfresco Maven plugin that handle running and testing your module offer 40 configurable parameters. The execution is controlled by the maven command: "mvn install alfresco:run"

 

Migrating your project

 

The SDK 3.0 revamped project structure introduces a major change since the version 2.2.0 so it's no longer compatible. The recommended approach is to create a new project based on the version 3.0 and migrate your project file to this new project.

 

Generate and configure your new project

 

The maven archetype are up to date and can be used to generate a new project compatible to the new version with the command "mvn archetype:generate -Dfilter=org.alfresco:"

Since we are talking about an old All-In-One project we will use the all-in-one archetype (org.alfresco.maven.archetype:alfresco-allinone-archetype) in the version 3.0.1.

Maven will take care of generating the folder structure and the default configuration.

 

By default generated project are configured to use the following community version:

  • Platform : 5.2.e
  • Share: 5.2.f

 

You can adjust the version by updating the properties :

  • alfresco.platform.version
  • alfrescco.share.version

 

If you want to use enterprise version you will need to update the properties "maven.alfresco.edition" replacing "community" with "enterprise". You also need valid credential for the alfresco-private repository to retrieve archetype.

 

If you need to test your extension with external dependency (such as RM), you can configure the Alfresco Maven Plugin to install those module on your platform or your share when running locally.

 

In order to achieve this you will need to add the module to the "platformModules" and "shareModules" in the configuration section for "alfresco-maven-plugin".

 

For example if you want to add Record Management Community version 2.6.0 to your platform and share you need to add:

  • In the platformModules section:

 

<moduleDependency>

<groupId>org.alfresco</groupId>

<artifactId>alfresco-rm-community-repo</artifactId>

<version>2.6.c</version>

<type>amp</type>

</moduleDependency>

 

  • In the shareModules section:

 

<moduleDependency>

<groupId>org.alfresco</groupId>

<artifactId>alfresco-rm-community-share</artifactId>

<version>2.6.c</version>

<type>amp</type>

</moduleDependency>

 

You can find those information on Alfresco Nexus repository.

You can start alfresco locally using the Alfresco Maven plugin to test your configuration.

 

Move your project to SDK 3.0 project

 

In the following section I will use an OOTB SDK 2.2.0 project generated with maven :

 

$ mvn archetype:generate -Dfilter=org.alfresco.maven.archetype:

 

With the following parameter:

 

Archetype version: 2.2.0

Define value for property 'groupId': com.example

Define value for property 'artifactId': aio

[INFO] Using property: version = 1.0-SNAPSHOT

Define value for property 'package' com.example: :

 

The target project will be generated from maven too with the same command and the following parameters :

Archetype version: 3.0.1

Define value for property 'groupId': com.example

Define value for property 'artifactId': aio30

[INFO] Using property: version = 1.0-SNAPSHOT

Define value for property 'package' com.example: :

 

So our project to migrate holds 2 extensions sub-module :

  • aio-repo-amp : I will refer to the root of this sub-module as AIO_REPO_ROOT
  • aio-share-amp  : I will refer to the root of this sub-module as AIO_SHARE_ROOT

 

And the new project also holds 2 extensions:

  • aio30-platform-jar:  : I will refer to the root of this sub-module as AIO30_PLATFORM_ROOT
  • aio30-share-jar : I will refer to the root of this sub-module as AIO30_SHARE_ROOT

 

Java Code

 

Java code is the easy part since the location of java class is standard in maven projects:

  • Copy AIO_REPO_ROOT/src/main/java/* to AIO30_PLATFORM_ROOT/src/main/java
  • Copy AIO_SHARE_ROOT/src/main/java/* to AIO30_SHARE_ROOT/src/main/java

 

Config folder (PROJECT_ROOT/src/main/amps/config)

 

When installing an AMP into an alfresco.war (or share.war) the content of this folder is expanded into <ROOT>/WEB-INF/classes.  In the new SDK,  such content goes into the resources folder since the resources folder is equivalent to the classpath root.

  • Copy the content of AIO_REPO_ROOT/src/main/amps/config/* to AIO30_PLATFORM_ROOT/src/main/resources
  • Copy the content of AIO_SHARE_ROOT/src/main/amps/config/* to AIO30_SHARE_ROOT/src/main/resources

 

Web folder (PROJECT_ROOT/src/main/amps/web)

 

This folder is used to hold web resources (CSS, JS….). In the new SDK those kind of files can go into 2 location:

  • PROJECT_ROOT/src/main/resources/META-INF/resources
  • PROJECT_ROOT/src/main/assembly/web

 

The first option is compatible with delivering a JAR file or an AMP file. The second can only be used if you plan to deliver AMP file. With the first option, your files will be compressed into the jar file and could not be overridden by other module or someone with access to the filesystem. The second option is the opposite, the file will be expanded in your war file and can be overridden by other module or someone with access to the filesystem.

 

Since only option one is compatible in all cases, I highly recommend to use that solution:

  • Copy the content of AIO_REPO_ROOT/src/main/amps/web/* to AIO30_PLATFORM_ROOT/src/main/resources/META-INF/resources
  • Copy the content of AIO_SHARE_ROOT/src/main/amps/web/* to AIO30_SHARE_ROOT/src/main/resources/META-INF/resources

 

Module.properties file

 

In SDK 2.2.0 the module.properties file is located in "amps" folder (PROJECT_ROOT/src/main/amps). In the SDK 3.0 these file goes into the module/<module_name> folder.

  • Copy the file AIO_REPO_ROOT/src/main/amps/config/module.properties to AIO30_PLATFORM_ROOT/src/main/resources/alfresco/module/<module_name>

 

Fixing path

 

In spring context file some path might be depending on the artifactId. You need to fix those path before testing your migration.

There is 2 way to fix this:

  • Change the artifactId to reflect your old artifactId
  • Rename the folder with the new artifactId

 

Summary

 

In order to migrate our project from SDK 2.2.0 to SDK 3.0 we have to:

  • Copy AIO_REPO_ROOT/src/main/java/* to AIO30_PLATFORM_ROOT/src/main/java
  • Copy AIO_SHARE_ROOT/src/main/java/* to AIO30_SHARE_ROOT/src/main/java
  • Copy the content of AIO_REPO_ROOT/src/main/amps/config/* to AIO30_PLATFORM_ROOT/src/main/resources
  • Copy the content of AIO_SHARE_ROOT/src/main/amps/config/* to AIO30_SHARE_ROOT/src/main/resources
  • Copy the content of AIO_REPO_ROOT/src/main/amps/web/* to AIO30_PLATFORM_ROOT/src/main/resources/META-INF/resources
  • Copy the content of AIO_SHARE_ROOT/src/main/amps/web/* to AIO30_SHARE_ROOT/src/main/resources/META-INF/resources
  • Fix path dependant from the artifactId

It has now been a few days since DevCon 2018 wrapped up, and after digesting everything that happened I feel like it is a good time to write down a few thoughts.  In short, DevCon was AWESOME!  It's no secret that DevCon was always a favorite of our community, our partners and of course, the extended team at Alfresco.  This year's event felt like we never stopped doing it.  The attendees were engaged, the energy was high, and the sessions were informative and fun.  For that, we owe Kristen Gastaldo, Francesco Corti, Richard Esplin and their collaborators at the Order of the Bee a huge amount of gratitude.  Even though it was a reboot of a crowd favorite event, a lot has changed.

 

The Alfresco community has always been a driving force at DevCon, as it should be.  We are an open source company, after all, and when you are an open source company your community plays a pivotal role in your success.  This year that was on full display.  Not only were a huge number of our community superstars there in person, but even the talk selection was a joint effort between the community and the company.  The talks themselves were also a great mix of Alfrescians and our extended community, partners and customers.  If I recall correctly it was almost a neat split right down the middle.  Collaboration with the community was a running theme throughout the conference, and was the focus of one of the talks that wrapped up the first day.  Suffice to say, there are a TON of opportunities to get involved.  Whether you blog, write tutorials, create and contribute your own community extensions or work on an existing one, or submit pull requests to one of our numerous open source projects, it isn't hard to find a way to contribute.

 

As for the talks themselves, the content was of the quality we expect from our team and community.  Several people from the Alfresco Strategic Services team were among those selected to present, but unfortunately I didn't get to see all of them due to conflicts with my own speaking slots and other conference duties.  Jose Portillo spoke on Solr Sharding, Luis Cabaceira shared some great work he has been doing with Piergiorgio Lucidi on ManifoldCF, Richard McKnight covered some best practices for building Alfresco extensions, and Mohammed Gazal took his first turn on the stage to present some work on PDF templating.  Luckily the talks were all recorded, and will be posted in the coming weeks.  If you missed something the first time around, you'll have the whole event at your fingertips. 

 

One thing in particular that I liked about this year's DevCon was the mix between what is best practice today, what is coming from Alfresco, and a third category that you might call "the art of the possible".  Of course we had some great sessions on securing content with our governance tools, exporting content, building ACS extensions, tuning Solr and building resilient systems, but we also had a lot of forward looking sessions focused on AWS, ADF and other cutting edge stuff coming out of Alfresco.  I'd like to see this continue in the future.  It's important for us to share the best way to do things today as well as preparing ourselves and our community for the future.

 

With so many submissions not everybody was able to be included in the official conference agenda, but we are working to find ways to get that knowledge out into the world.  Perhaps as blog posts, or maybe some community webcasts?  Not sure yet but I'd love to hear how you would like us to continue the momentum that DevCon started.  I'm coming off this conference with a huge amount of energy, and can't wait to see what the rest of 2018 has in store! 

One of the recurring issues we see raised by customers are regarding slow SQL queries.

Most of the time those are first witnessed within the Share UI page load time or through long running CMIS queries.

Pinpointing those queries can be a pain and this blog post aims at providing some helps in the troubleshooting process.

We will cover:

  • Things to check in the first place
  • Different ways of getting informations on the query execution time
  • Isolating where the RDBMS is spending more time.
  • Present some tools and configuration to help proactively track this kind of issues.

We hope this content will be useful in real life but really having a DBA who takes care of the Alfresco database is the best you can offer to your Alfresco application!

Preliminary checks

Before blaming the database, it's always a good thing to check that the database engine has appropriate resources in order to deliver good performances. You cannot expect any DB engine to perform well with Alfresco with limited resources. For PostgreSQL there are plenty of resource on the web about sizing a database cluster (here I use cluster in the PostgreSQL meaning, which is different from what we call a cluster in Alfresco).

 

Latency

Network latency can be a performance killer. Opening connections is a quite intensive process and if your network is lame, it will impact the application. Simple network tests can make you sure the network is delivering a good enough transport layer. The ping utility is really the first thing to look at. A ping test between Alfresco and its DB server must show a latency under 1ms on a directly connected network (ethernet Gb), or between 1 & 5 ms if you DB server and the Alfresco server are connected through routed networks. Definitely a value around or above 10ms is not what Alfresco expects from a DB server.

alxgomz@alfresco:~$ ping -c 5 -s500 192.168.0.68
PING 192.168.0.68 (192.168.0.68) 500(528) bytes of data.
508 bytes from 192.168.0.68: icmp_seq=1 ttl=64 time=0.436 ms
508 bytes from 192.168.0.68: icmp_seq=2 ttl=64 time=0.364 ms
508 bytes from 192.168.0.68: icmp_seq=3 ttl=64 time=0.342 ms
508 bytes from 192.168.0.68: icmp_seq=9 ttl=64 time=0.273 ms
508 bytes from 192.168.0.68: icmp_seq=10 ttl=64 time=0.232 ms

--- 192.168.0.68 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 8997ms
rtt min/avg/max/mdev = 0.232/0.329/0.436/0.066 ms

 

Some more advanced utilities allow to send TCP packets which are more representative to the actual time spent on opening a tcp session (again you should not have values above 10 ms):

alxgomz@alfresco:~$ sudo hping3 -s 1025 -p 80 -S -c 5 192.168.0.68
HPING 192.168.0.68 (eth0 192.168.0.68): S set, 40 headers + 0 data bytes
len=44 ip=192.168.0.68 ttl=64 DF id=0 sport=80 flags=SA seq=0 win=29200 rtt=3.5 ms
len=44 ip=192.168.0.68 ttl=64 DF id=0 sport=80 flags=SA seq=1 win=29200 rtt=3.4 ms
len=44 ip=192.168.0.68 ttl=64 DF id=0 sport=80 flags=SA seq=2 win=29200 rtt=3.4 ms
len=44 ip=192.168.0.68 ttl=64 DF id=0 sport=80 flags=SA seq=3 win=29200 rtt=3.6 ms
len=44 ip=192.168.0.68 ttl=64 DF id=0 sport=80 flags=SA seq=4 win=29200 rtt=3.5 ms

--- 192.168.0.68 hping statistic ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max = 3.4/3.5/3.6 ms

 

Overall network quality is also something to check. If your network devices spend their time reassembling unordered packets or retransmiting them the application performances will suffer from it. This can be checked by taking a network dump during query execution and opening it using wireshark.

In order to take the dump you can use the following tcpdump command:

alxgomz@alfresco:~$ tcpdump -ni any port 5432 and host 192.168.0.68 -w /tmp/pgsql.pcap

Opening it in wireshark should give you an idea very quickly. If you see a dump with a lot of red/black lines then it might be an issue and needs further investigations (those lines are colored this way if wireshark has the right syntaxic coloration rules applied).

 

RDBMS configuration

PostgreSQL comes with a relatively low end configuration. This is intended to allow for it to run on a wide range of hardware (or VM) configurations. However, if you are running your Alfresco (and its database) on high end systems, you most certainly want to tune the configuration in order get the best out of your resources.

This wiki page present in details many parameters you may wan to tweak Tuning Your PostgreSQL Server - PostgreSQL wiki

The first one to look at is the shared_buffers size. It sets the size of the area PostgreSQL uses to cache data, thus improving performance. Among all those parameters some should be "in sync" with your Alfresco configuration. For example by default Alfresco allow 275 tomcat threads at peak time. Each of these threads should be able to open a database connection. As a consequence PostgreSQL (when installed using the installer) sets the max_connections parameter to 300. However we need to understand that each connection will consume resources, and in the first place: memory. The amount of memory dedicated to a PostgreSQL process (that handles a SQL query) is controlled by the work_mem parameter. By default it has a value of 4MB, meaning we can calculate the amount of physical RAM needed by the database server in order to handle peak load:

work_mem * max_connections =  4MB * 300 = 1.2GB

Add the size of the shared_buffers to this and you'll have a good estimate of the amount of RAM Postgres needs to handle peak loads with default configuration. There are some other important values to fiddle with (like effective_cache_size,  checkpoint_completion_target, ...) but making sure those above are aligned with both your alfresco configuration and the hardware resources of your database host is really where to start (refer to the link above).

A qualified DBA should configure and maintain Alfresco's database to ensure continuous performance and stability of the database. If you don't have a DBA internally there are also dozens of companies offering good services around postgreSQL configuration and tuning.

Monitoring

This is a key in the troubleshooting process. Although monitoring will not give you solution to a performance issue it will help you getting on the right track. In this case having a monitoring in place on the DB server is important. If you can correlate an increasingly slow application with a global load increase on the database server, then you've got a good suspect. There are different things to monitor on the BD server but you should at least have the bare minimum:

  • CPU
  • RAM usage
  • disk IO & disk space

Spikes in CPU and IO disk usage can be the sign of a table that grew large without appropriate indexes.

Spikes in the used disk space can be explained because of the RDBMS creating temporary work files due to a lack of physical memory.

Monitoring the RAM can help you anticipate cache disk memory starvation (PostgreSQL heavily rely on this kind of memory).

Alfresco has some tables that are known to be potentially growing very large. A DBA should monitor their size, both in terms of number of rows and disk size. The query bellow is an example of how you can do it:

SELECT table_name as tableName,
  (total_bytes / 1024 / 1024) AS total,
   row_estimate as rowEstimate,
   (index_bytes / 1024 / 1024) AS INDEX,
   (table_bytes / 1024 / 1024) AS TABLE
   FROM (
     SELECT *,
     total_bytes-index_bytes-COALESCE(toast_bytes,0) AS table_bytes
      FROM (
      SELECT c.oid,
       nspname AS table_schema,
       relname AS TABLE_NAME,
       c.reltuples AS row_estimate,
       pg_total_relation_size(c.oid) AS total_bytes,
       pg_indexes_size(c.oid) AS index_bytes,
       pg_total_relation_size(reltoastrelid) AS toast_bytes
       FROM pg_class c LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
      WHERE

       relkind = 'r') a) a
     WHERE table_schema = 'public' order by total_bytes desc;

Usually the alf_prop_*, alf_audit_entry and possibly alf_node_properties are the tables that may appear in the resultset. There is no rule of thumb which dictate when a table is too large. This is more matter of monitoring how the table grow in time.

Another useful thing to monitor are the creation/usage of temporary files. When your DB is not correctly tuned, or doesn't have enough memory allocated to it, it may created temporary files on disk if it needs to further work on a big resultset. This is obvioulsy not as fast as working in-memory and should be avoided. If you don't know how to monitor that with your usual monitoring system there are some good tools that help a DBA be aware of such things hapenning.

pgbadger is an opensource tool which does that, and many other things. If you don't already use it, I strongly encourgae you to deploy it!

 

Debugging what takes long and how long is it?

Monitoring should have helped you pinpoint your DB server being over loaded, making your application slow. This could be because it is undersized for its workload (in which case you would probably see a steady high resource usage, or it could be some specific operations are too expensive for some reasons. in the former case, there is not much you can do apart from upgrading either the resources or the DB architecture (but that's not topics I want to cover here). In the later case, getting to know how long a query takes is really what people usually want to know. We accomplish that by using one of the debug option bellow.

 

At the PostgreSQL level

In my opinion, the best place to get this information is really on the DB server. That's where we get to know the actual execution time, without accounting for network round trip time and other delays. More over, PostgreSQL makes it very easy to enable this debug, and you don't even need to restart the server. There are many ways to enable debug in PostgreSQL, but the one that's the most interesting to us is log_min_duration_statement. By default it has a value of "-1" which means nothing will be logged based on its execution time. But for example, if we set in the postgresql.conf file:

log_min_duration_statement = 250

log_line_prefix = '%t [%p-%l] %q%u@%d '

Any query that takes more than 250 milliseconds to execute will be logged.

Setting the log_min_duration_statement value to zero will cause the system to log every single query. While this can be useful for debugging or for temporary audit this will not be very helpful here as we really want to target slow queries only.

      If interested in profiling your DB have a look at the great pgbadger tool from Dalibo
The Alfresco installer sets by default the log_min_messages parameter to fatal. This prevent the log_min_duration_statement from working. Make sure it is set back to its default value or to a value that's higher than LOG.

Then without interrupting the service, PostgreSQL can be reloaded in order for changes to take effect:

$ pg_ctl -U postgres -W -D /data/postgres/9.4/main reload

adapt the command above to your needs with appropriate paths

                 

 

This will produce an output such as bellow:

2017-12-12 19:54:55 CET [5305-1] LOG: duration: 323 ms execution : select

      pv.id as prop_id,

      pv.actual_type_id as prop_actual_type_id,

      pv.persisted_type as prop_persisted_type,

      pv.long_value as prop_long_value,

      sv.string_value as prop_string_value

   from

      alf_prop_value pv

      join alf_prop_string_value sv on (sv.id = pv.long_value and pv.persisted_type = $1)

   where

      pv.actual_type_id = $2 and

      sv.string_end_lower = $3 and

      sv.string_crc = $4

DETAIL: parameters: $1 = '3', $2 = '1', $3 = '4ed-f2f1d29e8ee7', $4 = '593150149'

Here we gather important informations:

  1. The date and time the query was executed at the beginning of the first line.
  2. The process ID and Session line number. As PostgreSQL forks a new process for each connection, we can map process ID and pool connections. A single connection may contain different transactions, which in turn will contain several statements. Each new statement processing increments the session line number
  3. The execution time on the first line
  4. The execution stage. A query is executed in several steps. With Alfresco making heavy usage of bind parameters, we will often see several lines for the same query (one for each step):
    1. prepare (when the query is parsed),
    2. bind (when parameters are replaced by their values and execution is planned),
    3. execution (when the query is actually executed).
  5. The query itself starting at the end of the first line and continuing on subsequent lines. Here it contains parameters and can't be executed as is.
  6. The bind parameters on the last line.

In order to get the overall execution time we have to sum up the execution time of the different steps. This seems painful but delivers fined grained breakdown of the query execution. However most of the time the majority of the execution time is spent on the execute stage, to better understand what's going on at this stage, we need to dive deeper into the RDBMS (see next chapter about explain plans).

 

At the application level

It is also possible to debug SQL queries in a very granular manner at the Alfresco level. However it is important to note that this method is way more intrusive as it required, adding additional jar files, modifying the configuration of the application and restarting the application server. It may not be well suited for production environments where any downtime is a problem.

Also note that execution times reported with this method include network round-trip times. In normal circumstances this should be few additionnal milliseconds, but could much more on a lame network.

To allow debugging at the application level we will use a jdbc proxy: p6spy

Impact on the application performances largely depends on the amount of queries that will be logged.

 

First of all we will get the latest p6spy jar file from the github repository.

Copy this file to the tomcat lib/ directory and add a spy.properties file in the same location containing the lines bellow:

driverlist:org.postgresql.Driver

executionThreshold=250

This will mimic the behaviour we had previously when debugging with PostgreSQL, meaning only queries that take more than 250 milliseconds will be logged.

We then need to tweak the alfresco-global.properties file in order to make it use the p6spy driver instead of the actual driver:

 

db.driver=com.p6spy.engine.spy.P6SpyDriver
db.url=jdbc:p6spy:postgresql://${db.host}/${db.name}

 

Alfresco must now be restarted and we will have a new file called spy.log which should now be available and contain lines like the one shown bellow:

1513100550017|410|statement|connection 14|update alf_lock set version = version + 1, lock_token = ?, start_ti
me = ?, expiry_time = ? where excl_resource_id = ? and lock_token = ?|update alf_lock set
version = version + 1, lock_token = 'not-locked', start_time = 0, expiry_time = 0 where excl_resource_id
= 9 and lock_token = 'f7a21222-64f9-40ea-a00a-ef95052dafe9'

We find here similar values to what we had with PostgreSQL:

  1. timestamp the query was executed on the application server.
  2. execution time in milliseconds
  3. The connection ID
  4. The query string without bind parameters
  5. The query string with evaluated bind parameters

 

Understanding the execution plan

Now that we have pinpointed the problematic query(ies) we can deep dive into PostgreSQL's logic and understand why the query is slow. RDBMS rely on their query planner to decide how to deal with a query.

The query planner itself makes decision based on the structure of the query, the structure of the database (e.g presence and types of indexes) and also based on statistics the system maintains during its execution. The more accurate those statistics are, the more efficient the query planer will be.

 

Explain plans

In order to know what the query planner would do for a specific query, it is possible to run it prefixed with the "EXPLAIN ANALYZE " statement.

To make this chapter more hands-on we'll proceed with an example. Let's consider a query which is issued while browsing the repository (get node informations based on parent). Using one of the methods we've seen above, we have identified that query an running it prefixed with "EXPLAIN ANALYZE" returns the following:

                                                                                 QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop  (cost=6614.54..13906.94 rows=441 width=676) (actual time=1268.230..1728.432 rows=421 loops=1)
   ->  Hash Left Join  (cost=6614.26..13770.40 rows=441 width=639) (actual time=1260.966..1714.577 rows=421 loops=1)
         Hash Cond: (childnode.store_id = childstore.id)
         ->  Hash Right Join  (cost=6599.76..13749.84 rows=441 width=303) (actual time=1251.427..1704.866 rows=421 loops=1)
               Hash Cond: (prop1.node_id = childnode.id)
               ->  Bitmap Heap Scan on alf_node_properties prop1  (cost=2576.73..9277.09 rows=118749 width=71) (actual time=938.409..1595.742 rows=119062 loops=1)
                     Recheck Cond: (qname_id = 26)
                     Heap Blocks: exact=5205
                     ->  Bitmap Index Scan on fk_alf_nprop_qn  (cost=0.00..2547.04 rows=118749 width=0) (actual time=934.132..934.132 rows=119178 loops=1)
                           Index Cond: (qname_id = 26)
               ->  Hash  (cost=4017.52..4017.52 rows=441 width=232) (actual time=90.488..90.488 rows=421 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 83kB
                     ->  Nested Loop  (cost=0.83..4017.52 rows=441 width=232) (actual time=8.228..90.239 rows=421 loops=1)
                           ->  Index Scan using idx_alf_cass_pri on alf_child_assoc assoc  (cost=0.42..736.55 rows=442 width=8) (actual time=2.633..58.377 rows=421 loops=1)
                                 Index Cond: (parent_node_id = 31890)
                                 Filter: (type_qname_id = 33)
                           ->  Index Scan using alf_node_pkey on alf_node childnode  (cost=0.42..7.41 rows=1 width=232) (actual time=0.075..0.075 rows=1 loops=421)
                                 Index Cond: (id = assoc.child_node_id)
                                 Filter: (type_qname_id = ANY ('{142,24,51,200,204,206,81,213,97,103,231,104,107}'::bigint[]))
         ->  Hash  (cost=12.00..12.00 rows=200 width=344) (actual time=9.523..9.523 rows=6 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 1kB
               ->  Seq Scan on alf_store childstore  (cost=0.00..12.00 rows=200 width=344) (actual time=9.517..9.518 rows=6 loops=1)
   ->  Index Scan using alf_transaction_pkey on alf_transaction childtxn  (cost=0.28..0.30 rows=1 width=45) (actual time=0.032..0.032 rows=1 loops=421)
         Index Cond: (id = childnode.transaction_id)
Planning time: 220.119 ms
Execution time: 1728.608 ms

Although I never faced it with PostgreSQL (more with Oracle), there are cases where the explain plans is different depending on whether you pass the query as a complete string or you use bind parameters.

In that case the parameterized query that may be found to be slow in the SQL debug logs might appear fast when executed manually.

To get the explain plan of these slow queries PostgreSQL has a loadable module which can log explain plans the same way we did with the log_duration. See the auto_explain documentation for more details.

    

We can indeed see that the query is taking rather long just by looking at the end of the output (highlighted in bold). However reading the full explain plan and more importantly understanding it can be challenging.

In the explain plan, PostgreSQL is breaking down the query into "nodes". Those "nodes" represent actions the RDBMS has to run through in order to execute the query. For example, at the bottom of the plan we will find the "scan nodes", which are the statements that actually return rows from tables. And upper in the plan we have "nodes" that correspond to aggregations or ordering. In the end we have an indented/hierarchical tree of the query from which we can detail each step. Each node (line starting with "-->") is shown with:

  • its type, whether the system using indexes and what kind (Index scan) or not any (sequential scan)
  • its estimated cost, an arbitrary representation of how costly an operation is.
  • the estimated number of lines the operation would return

And much more details that make it somewhat hard to read.

And to make it more confusing, some values, like "cost" or "actual time", have two different values. To make it short you should only consider the second one.

      

The purpose of this article is not to learn how to fully understand a query plan, so instead,we will use a very handy online tool which ill parse the output for us and point out the problems we may have: New explain | explain.depesz.com 

Only by pasting the same output and submitting the form we will get a better view of what's going on and what we need to look at.

The "exclusive" color mode gives the best representation of how efficient each individual node is.

"Inclusive" mode is cumulative (so the top row will always be dark red as it's equal to the total execution time).

"rows x" shows how accurately the query planner is able to guess the number of rows.

If using "mixed" color mode, each cell in each mode's column will have its own color (which can be a little bit harder to read).

So here we can see straight away that nodes #4 & #5 are where we are spending most time. We can see that those nodes return a big amount of rows (more than 100 000 while there are only 421 of then in the final result set) meaning that the available indexes and statistics are not good enough.

Alfresco normally provide all the necessary indexes for the database to deliver good performance in most of the cases so it is very likely that queries under-performing because of indexes are actually missing indexes. Fortunately, Alfresco also deliver a convenient way to check the database schema for any in consistency.

 

Alfresco schema validation

When connected to the JMX interface, in the MBeans tab, it is possible to trigger a schema validation while alfresco is running (go to "Alfresco \ DatabaseInformation \ SchemaValidator \ Operations" and launch "valideateSchema()").

This will produce the output bellow in the Alfresco log file:

2017-12-18 15:21:32,194 WARN [domain.schema.SchemaBootstrap] [RMI TCP Connection(6)-10.1.2.101] Schema validation found 1 potential problems, results written to: /opt/alfresco/bench/tomcat/temp/Alfresco/Alfresco-PostgreSQLDialect-Validation-alf_-4191170423343040157.txt
2017-12-18 15:21:32,645 INFO [domain.schema.SchemaBootstrap] [RMI TCP Connection(6)-10.1.2.101] Compared database schema with reference schema (all OK): class path resource [alfresco/dbscripts/create/org.hibernate.dialect.PostgreSQLDialect/Schema-Reference-ACT.xml]

The log file points us to another file where we can see the details of the validation process. Here for example we can see that an index is indeed missing :

alfresco@alfresco:/opt/alfresco/bench$ cat /opt/alfresco/bench/tomcat/temp/Alfresco/Alfresco-PostgreSQLDialect-Validation-alf_-4191170423343040157.txt
Difference: missing index from database, expected at path: .alf_node_properties.alf_node_properties_pkey

The index can now be re-created either taking a fresh install as a model or by getting in touch with alfresco support to know how to create the index.

The resulting - more efficient - query plan if much better:

 

Database statistics

Statistics are really critical to PostgreSQL performance as it's what mainly offer the query planner efficiency. With accurate statistics PostgreSQL will make good decision when planning a query. And of course, inaccurate statistics leads to bad decisions and so bad performances.

PostgreSQL has an internal process in charge of keeping statistics up to date (in addition to other house keeping tasks): the autovacuum process.

All versions of PostgreSQL Alfresco supports have this capability and it should always be active! By default this process will try to update statistics according to configuration options set in postgresql.conf. The options bellow can be useful to fine tune the autovacuum behaviour (those are the defaults):

autovacuum=true #Enable the autovacuum daemon

autovacuum_analyze_threshold=50 #Number of tuples modifications that trigger ANALYZE

autovacuum_analyze_scale_factor = 0.1 #Fraction of table modified to trigger ANALYZE

default_statistics_target = 100 # Amount of information to store in the statistics

autovacuum_analyze_threshold & autovacuum_analyze_scale_factor must be summed up in order to know when an ANALYZE will be triggered

         

If you saw a slow, but constant degradation in queries performances it maybe that some tables grew large enough to make the default parameters not as efficient as they used to be. As the tables grow, the scale factor can make statistics update very infrequent, lowering  autovacuum_analyze_scale_factor will make statistics updates more frequent, thus ensuring stats are more up to date.

The type of data distribution within a database tables can also change during its lifetime, either because of a data model change of simply because of new use-cases. Raising default_statistics_target will make the daemon collect and process more data from the tables when generating or updating statistics, thus making the statistics more accurate.

Of course asking for more frequent dates and more accurate statistics as an impact on the resources needed by the autovacuum process. Such tweaking should be carefully done by your DBA.

Also it is important to note that the options above are applied for every table. You may want to do something more targeted to known big tables. This is doable by changing the storage options for the specific tables:

=# ALTER TABLE alf_node_prop_string_value

-#     ALTER COLUMN string_value SET STATISTICS 1000;

Above SQL statement is really just an example and must not be used without prior investigations