lcabaceira

Alfresco Best Practices

Blog Post created by lcabaceira Employee on Apr 26, 2017

Hi everyone, as some of you know, i'm participating on BeeCon 2017 with a speech on "Alfresco Best Practices". Because this is a vast topic, it was impossible to consolidate all that i wanted to say within my 30 minutes presentation. This blog post is intended to support and consolidate the ideas presented at the conference BeeCon 2017 

 

The Post is divided by sections and those can eventually suffer some edits in the future, so stay tuned and be sure to provide me with your comments and opinions. Working together we can consolidate this important topic that will help drive success to customers, partners and integrators of Alfresco.

 

Let me start by giving a huge thanks to the folks at Alfresco that contributed to the content of this post, with their mentoring, thoughts, opinions, documents and other forms of content that made this blog post possible.  Big Kudos, to : Rui Fernandes, Derek Hulley, Maurizio Pillitu, Gabrielle Columbro, Miguel Rodriguez, Alex Strachan , Andy Hunt, Philippe Dubois  and many other important colleagues at Alfresco. Without their input, know how and their shared experience, this blog post would not be possible. It's a BIG post, so prepare to scroll down !

 

P.S. - If you like my posts and you feel that you benefit from reading them, be sure to leave a comment, this way i know that someone actually read my stuff.

 

Quality assurance- Use only a supported stack

Using a supported stack of software to host your alfresco infra-structure is the first best practice of this guide. Alfresco invested a lot of time and resources in quality assurance.

Software is complex and by its very nature each new release may add new functionality and new bugs. These can vary in sensitivity, from simply not working as the end user expects them to work, to causing full-blown system crashes. For companies that rely on Alfresco to provide content management services (either for traditional management of office documents, supporting the content delivered to your web site, managing and controlling corporate records, or for managing team content) having access to the system is critical.

Alfresco spends a lot of time and effort ensuring that as many issues and bugs are identified and fixed before the Alfresco Enterprise is released. The QA process starts with the Alfresco Community. Before the final build is released the Community version goes through a QA process. Dedicated QA engineers validate the release running on an Open Source stack. After this version is released the engineers create a branch in the code line - and this becomes the Alfresco Enterprise (see diagram below).

 

 

QA now kicks in a major way. Over a period of some months a team of dedicated QA engineers run almost 5000 tests against a range of different technology stacks - both Open Source and proprietary - to identify and fix as many problems as possible. The release is tested for stability, scalability and security and is tested in both single system and clustered configurations. The Alfresco Engineering team work to correct any issues or bugs that are identified.

Although, and due to the elasticity of Alfresco, you can deploy it on platforms not listed on the SPM, you are assuming a big RISK as they did not went to the existing Q/A process.

For the recent SPM check : https://www.alfresco.com/services/subscription/supported-platforms

 

2 - Alfresco Maintenance Best Practices

Maintain awareness of all Alfresco Service packs (sets of hot-fixes)

 

QA testing, community usage and enterprise customers; basically millions of users are constantly testing the Alfresco product set.  The Alfresco engineering team is publishing service packs on a regular basis.    

 

At the present, Alfresco is releasing a Service Pack every 2 or 3 months. As a general plan, Alfresco Service Packs are created periodically with release notes indicating the areas of code updates and specific issues that are addressed. One should plan and create the internal processes, time, and resources to update to the latest Alfresco Service Pack as soon as possible.   The goal is for the Alfresco Service Packs to be easy to update, contain specific fixes and no changes on the integration/extension points (APIs).

 

In keeping current on the latest Service Pack, one can minimize the risk of having end-users encountering interruptions within the overall ECM application/solution. In addition, if there is a need for a potential hot-fix in your environment, the Engineering team can often more readily resolve and potentially provide a fix in a more expeditious manner.

For being agile upgrading your system with service packs and hotfixes and in case there customizations in place, those should include automated unit tests covering a substantial part of your code, so that any manual testing required before the upgrade of a service pack can be reduced to a minimum and be executed fast. Major versions upgrades should require in general more throughout testing and validation.

 

3 - Cherish your database

Alfresco relies largely on a fast and highly transactional interaction with the RDBMS, so the health of the underlying system is vital.

 .             

The dataset performance is normally directly related with the overall performance of the solution. It’s important to have the database properly sized and tuned for each specific use case. Alfresco projects that are expected to have a lot of concurrent usage and that will have a repository will a considerable amount of nodes (>20 Million) will benefit from using an active-active database cluster with at least 2 machines.  This would help to increase the transactional throughput of the database that will result on an overall performance increase.

 

When a database is too large for any significant percentage of it to fit into the cache it can result in a single I/O per requested key/data pair.  Both the database and the log are on a single disk. This means that for each transaction, the Alfresco DB is potentially performing several file system operations:

 

  • Disk seek to database file
  • Database file read
  • Disk seek to log file
  • Log file write
  • Flush log file information to disk
  • Disk seek to update log file metadata (for example, inode information)
  • Log metadata write
  • Flush log file metadata to disk

 

On big databases we expect quite a lot of disk I/O, which is slow relative to other system resources such as CPU.  Faster Disks normally can help on such situations but there are other ways to increase transactional throughput (scale up, scale out).

3.1 - Monitor your database

Monitoring your database performance is very important as it can detect some possible performance problems or scaling needs.

 

The following aspects should be monitored and checked on a regular base.

  • Transactions
  • Number of Connections
  • Slow Queries
  • Query Plans
  • Critical DM database queries ( # documents of each mime type, … )
  • Database server health (cpu, memory, IO, Network)
  • Database sizing statistics (growth, etc)
  • Peak Period of resource usage
  • Indexes Size and Health

 

If you find that the query plans for the slow querys can be improved by creating new indexes, don't be afraid to do so. Make sure you maintain those new indexes across Alfresco upgrades.

 

3.2 - When running On-Prem, consider using a physical server

Alfresco recommends the database server to be a physical server as most of the vendor’s available benchmarks show increased performance of physical servers over virtualized. Regarding response time the golden rule is that the response time from the DB, should be around 2ms or lower.  Making sure that the database is performing (via wise monitorization) is key to avoid performance problems, as this will be one of the areas that will suffer more from your use case.

 

3.3 - Chose and size your database appropriately 

 

 

If the amount of nodes and the expected concurrency increases, for example when the database grows to more than 40 million nodes and the number of concurrent users increase considerably we suggest adopting an active-active database cluster approach. Considering our existing customers, the biggest running repositories are under Oracle (most of them RAC).

 

There are other (cheaper) solutions on the market that also allow the setup of an active-active database cluster such as MariaDB. MariaDB uses Galera as an active-active master clustering. Galera is a new kind of clustering engine, which, unlike traditional MySQL master-slave replication, provides master-master replication and thus enables a new kind of scalability architecture for MySQL/MariaDB/Percona. 

 

If you are running cloud-based deployment on AWS, consider using Aurora as it can scale really well. Alfresco recently releases the results of a 1 Billion documents benchmarks based on Amazon Aurora.

 

3.4 - Verify your the network latency while communicating to the database

In regards to latency in communication, the golden rule is that the response time from the DB in general, should be around 2ms or lower. 

 

3.5 - Database Tuning and the DBA role

Regular maintenance and tuning of the Alfresco database is necessary. Specifically, all of the database servers that Alfresco supports require at the very least that some form of index statistics maintenance be performed at frequent, regular intervals to maintain optimal Alfresco performance.

 

Index maintenance can have a severe impact on Alfresco performance while in progress, hence it needs to be discussed with your project team and scheduled appropriately. Make sure your database is tuned for your usage patterns: high throughput, long running queries, decision support, mixed usage. Finally check that the specific supported database being used is configured properly and according to Alfresco documentation.

 

Database tuning and maintenance is a responsibility of the customer DBA.  Overall the recommendation is that you should apply any tuning (including new indexes) that can contribute to an increased performance of your database. 

 

Warnings:

  • Any new indexes should be created and tested in the Stage environment before implementation into the Production environment.
  • You should make sure you maintain the new indexes across product upgrades and verify that the possible schema changes are not affecting your indexes.
  • Update your database statistics regularly or have Auto Update Statistics turned on.

 

4 - Application Server Best Practices

 

Your repository can get easier maxed out if you are setting the application server to process more connections/threads than the machine can handle it. Consider first “bottlenecking”/reducing the number of application server connector’s threads in order each call gets CPU and memory resources needed to perform properly, so that your system gets a more stable response time over peak usages (scale the database connection pool maximum accordingly as mentioned before). You may need to scale up (or scale out your cluster) for handling a bigger number of calls during peak usages in place of just increasing the number of threads of one node application server.

 

I've seen better results by reducing the number of application server threads but the natural tendency appears to be to increase the number of threads, this normally leads to contention as we end-up having many threads in a "WAIT" state, just waiting for a resource (normally a CPU cycle or a database connection) to become available.  

 

4.1 - RAM is good but not a guarantee of performance.

Given Alfresco is a Content Manager, and as such, its performance benefits of having enough RAM to minimize input/output. Nothing will be cheaper that spending on enough RAM, buy the exact amount varies depending on your application/solution; you can start with 8 GB, but be sure that your servers can be expanded to your future needs. Don't over allocate memory to your applications server as it can results on longer GC cycles.

 

The default of Alfresco installation memory is normally too small to any production deployment and should be considered appropriate only for development environments.

 

There are numerous details and factors when it comes to performance tuning, as there are several other documentation areas to utilize. If your performance bottleneck is IO access speed or CPU doesn’t expect giving more RAM to your system will help you necessarily.

 

5 - Alfresco Configuration Best Practices

5.1     Understand JMX configuration and priority

Alfresco allows for setting most of its configuration through JMX (or the Share admin console) in hot without need to restart and in a cluster aware mode. This is very handy for solving issues in hot and testing in general. But beware that those settings are stored in the database and have priority over the static settings (the file configurations on your server in general in alfresco-global.properties) will be active again when you restart the system.

 

This has a drawback, which is the fact that your static configuration may not reflect the real configuration state of your system what can make life complicated for system administrators in certain circumstances (especially newcomers).

5.2     Keep your static configuration files up to date

You should consider as a best practice keeping the static files up to date with your configuration and also try to make sure they are as much as possible the ones being actively configuring your system. This would include in general reverting any configuration setup through JMX as soon as possible in order the system uses again the static configuration with whatever values you find appropriate.

 

5.3     Distribute your load – Divide and conquer

One of the secrets of a successful architecture is to know exactly what, when and where are the processes occurring and what resources are those processes influencing.  Having this information brings the architect the power to “Divide and Conquer”.

 

Working with a fairly flexible technology we can wisely divide the overall processes across the resources (servers) achieving the necessary balance. Let’s consider for example the schedule jobs that Alfresco executes, on a distributed architecture we have lots of advantages to offload some of those jobs to a specific server, releasing important resources from the servers that are actually serving user requests.

 

From an alfresco perspective offloading  (disabling) this scheduled jobs from the front-end servers is no more than configured their cron job to execute very far away in the future and have a dedicated server (normally separated from the cluster) to execute this jobs.

 

Overall, don’t abuse of one layer of your architecture. Don’t rely exclusively on the search engine, or the opposite, don’t forget you have a search engine and over use the database. There are many tasks that may be optimized if you rely on common browsing through your repository (accessing database) and don’t depend without reason solely on the search engine (accessing Solr/Lucene) for path searches for example.

 

This is also important to take into consideration when designing your repository. Some spaces structure will fit better your discover requirements and should be preferred to others that will oblige you to rely on search for any operation. 

If you already have the node reference you don’t need to query your node with SearchService for example. Just instantiate the object passing the node reference and you are done without stressing the search layer for no reason.

 

5.4     Externalize your custom configuration

Hard coding configuration is bad. You can externalize your customizations configuration just as Alfresco does and use the same alfresco-global.properties file for configuring your own custom modules. Most of all avoid the need to recompile your code because of parameter values changes. It’s ugly, problematic and basically absurd. Avoid also setting configuration on XML files directly (on the classpath alfresco/extension Spring -context.xml files). Make the maintenance of your customization easier in order people just need to care about one file (in general alfresco-global.properties) and don’t have to tweak directly XML files which could also more easily lead to errors (XML bad formatted, etc).

 

6 - Solr Architecture Best Practices

 

Running a remote dedicated Solr engine is a known Alfresco best practice. Adding a dedicated tracking instance of alfresco on the solr machines optimizes this architecture. It is known to increase the overall performance of the system.

 

1.1.1          6.1 - Dedicated tracking instance on the Solr node(s)

Having a dedicated tracking alfresco instance present on the Solr nodes allows for what we call “dedicated tracking”.  These nodes serve as a database and solr proxy and they are also used for text extraction of documents (that is used for full text indexing).

 

Having this nodes performing those text extraction jobs we offload the main repository instances allowing more resource usage for user requests. We also avoid network traffic by issuing local tracking requests.

The number of documents and transactions will impact memory consumption on these nodes, so we advice to scale up the memory of this machines to cope with the new alfresco instance.

 

1.1.2          6.2 - Solr Considerations for large repositories

Big repository sizes (>20Million nodes) affect average response times for search operations, and more importantly for certain kinds of global searches very sensitive to the overall size of the repository. For those cases the ones most affected will also be the ones for which the user-group authority structure is more complex, with users belonging to many groups in average.

 

The architectural layer most affected by the increase on repository size is the search one (Solr) so you need to tune/size it accordingly.

 

1.1.3             6.3 - Solr Tuning

There are a lot of options on tuning Solr. In the following blog post you’ll find all the information you need regarding tuning Solr either for Indexing or Searching.

 

 

I’ve also included a chapter on this post on Solr index tuning.

 

7 - Garbage collection analysis 

A regular analysis to the garbage collection logs is also a known best practice and the health of the Garbage collection engine is normally related with the overall effectiveness of memory usage across the system. This is valid for Alfresco, Solr and any possible client that is part of the deployment.

 

The best practice is to choose an analysis timeframe which is know to be the period when the system is most heavily used and monitor the garbage collection operations that happened during that period.

There are some available tools to analyze garbage collection logs, but the one I think generates a more accurate report is Censum from Jclarity. It’s possible to download a trial version of this tool as use it to analyze the GC logs during 7 days.

The sample picture above shows the summary data generated by censum after performing a gc analysis. This is just one of the important screens and diagrams created by the tool.

 

Censum is a nice tool that takes log files from the complex Java™ (JVM) garbage collection sub-system and gives you meaningful answers by providing clear infographics and making recommendations based on the analysis results. There is a optional free alternative that can be found at http://gceasy.io

 

3                    7.1 - Understanding Garbage collection

The various memory zones in the JVM are shown in the following figure. The illustration of those will help to understand the analysis below.

 

The choice of GC strategy depends on the hardware and the use case. The 'ParNew' GC (-XX:+UseParNewGC) focuses on the new generation, and is relevant for 2+ CPUs/core machines. It can be used along with the CMS whereas the previous implementation (-XX:+UseParallelGC) is similar, but a little less efficient and lacks the synchronizations logic necessary to be compatible with the CMS. The use of ParNew alone is recommended for non-interactive work, such as batch processing and /or bulk loading via the Foundation API. More recently the G1 algorithm was introduced and this is a very flexible algorithm than can adapt to most use cases.

The CMS GC (ConcMarkSweep), also called Low Pause Collector, focuses on Old Generation and allows certain phases of the collection to share processor resources with application threads to minimize the stop the world state. Solr for example, shows advantages on using this garbage collection algorithm as opposed to the default one.              

 7.1 -              7.2 - Garbage collector common problems

Look for high pauses  - When analyzing your garbage collection effectiveness you should look for high pauses.

High pauses from garbage collection can be an indication of a number of problems. A High Percentage Of Time spent paused in GC may mean that the heap has been under-sized, causing frequent full GC activity. A high Longest Pause in Seconds is an indication that the heap is too large, causing long individual garbage collections.

Look for premature promotion of objects Premature promotion is a condition that occurs when objects that should be collected in a young generation pool (Eden or The Survivor "From" space) are instead promoted to Tenured (Old) space. A consequence of premature promotion is that this places additional pressure on Tenured space, which will result in more frequent collections of Tenured. More frequent collections in Tenured collector will interfere with your application's performance.

There are a number of possible causes for this problem:

  • The Eden and/or Survivor spaces may be too small.
  • The -XX:MaxTenuringThreshold flag may have been set too low.

 

There are also number of possible solutions for this problem:

  • Alter the size of the young space via the -XX:NewRatio property.
  • Alter the size of Survivor Spaces (relative to Eden) via the -XX:SurvivorRatio=<N> property using the information provided by the Tenuring graphs. This flag works to divide Young into N + 2 chunks. N chunks will be assigned to Eden and 1 chunk each will be assigned to the To and From spaces respectively.
  • Alter the -XX:MaxTenuringThreshold property.

 

Note: enlarging Survivor spaces takes will result in less space being assigned to Eden. The size of Eden times your allocation rate yields the frequency of collections in young generation. Be sure to increase the size of young so that Eden stays the same size in order to avoid increasing the number of young generational collections.

Look for periodic calls to system.gc() 

Periodic calls to trigger a Full GC either via System.gc() or Runtime.gc() preempt the natural flow of GC and corrupt the numerous metrics that are used to help the collectors run as optimally as is possible. There are many possible sources for these calls, which may include:

  • An RMI call from a remote JVM.
  • A Timer, TimerTask or ScheduleExecutor thread.
  • A cron or Quartz job.
  • Some external trigger causing the Memory MXBean to trigger a Full GC.

 

Sometimes, especially in low-latency applications, this might be deliberate and an OK thing to do. To immediately resolve this problem you should add the -XX:+DisableExplicitGC flag. For example:   java -XX:+DisableExplicitGC $OTHER_JVM_ARGS

 

You should also find and remove the explicit calls if any. If its an RMI call then find the remote JVM that is connecting to your application. The properties sun.rmi.dgc.server.gcInterval and sun.rmi.dgc.client.gcInterval control the maximum allowable milliseconds between these types of Full collections. Locate the System.gc() call(s) in your code and remove them. Check for any scheduled tasks that may be executing System.gc() call(s).

 

Look for high memory utilizations over a big period of time

 

High memory utilization can be an indication of a memory leak or simply that the heap isn't big enough.  You can check on the increase rate of memory usage also by analising the GC logs.

A memory leak is when a retained object, that has out lived it usefulness, is never de-referenced. Because a reference to the object still exists, it cannot be garbage collected. As more objects are retained there will be less working memory and eventually memory will be exhausted resulting in an OutOfMemoryError.

 

An alternative reason for an OutOfMemoryError is that your working set size, that is, the volume of data that is retained in heap, completely fills the tenured pool. In this case you will need to increase the size of tenured space.

The only way to accurately detect whether this is a memory leak or an undersized heap is to use a memory profiler that tracks Generational Counts. Currently the only profiler that correctly tracks generational counts is the NetBeans profiler. A class that is leaking tends to have a generational count that tracks the number of tenured collections. An alternative technique is to generate a heap dump and then find objects that have a large retained set size. Be aware that caches are objects that have large retained set sizes. That said, a cache can be the cause of a memory leak should it employ a poor cache eviction policy.  Refer to the cache section on this document for instructions on how to tune/verify the repository caches.

 

If there is no memory leak and you still see a high memory utilization than the easiest solution is to increase the size of tenured space. One way to do this is to increase the size of total heap using -mx. Another solution is to reduce the live set size by improving the memory efficiency of your application.

 

Repository Modeling and Design Best Practices

This section contains some recommendations that can contribute to increased performance of your repository. Those are resulting of our field experience while working on several different customer accounts.

 

8.1 - Limit Groups hierarchy to 5

Acl checks are known to slow down performance when the maximum group hierarchy depth exceeds 5 levels. Our advice is, when possible, limiting the max group hierarchy depth to 5.

 

8.2 - Limit the maximum number of nodes in a folder

When using share or another client to browse a repository folder, alfresco needs to perform a series of actions before it actually renders or delivers the content (permission checking, etc).  Acl checks go against the database and occupy a thread in tomcat, causing cpu consumption.

The more nodes that reside inside a specific folder the slower will be the response time.

We recommend, when possible, limiting the maximum number of document nodes inside a folder to 2000.

 

8.3 - Number of sites

The number of sites on the system has some relative influence on performance, especially when checking the site membership for the users. Although this is (on 3 limits suggested here) the factor that has less impact, we recommend keeping the number of sites below 5000.

A smart way to avoid this site membership checking overhead is to, when possible, remove the My Sites dashlet from the user’s initial dashboard. This can be done globally by configuring a new presets.xml file in alfresco share.

 

8.4 - Keep a low ratio on user/groups membership

The number of groups a user belongs to has impact on performance while rendering some client pages (specially on some share dashlets, like the mysites dashlet). Alfresco run some complex queries (based on the user groups membership) while determining the assets to render on some share pages.

 

We advice on keeping a low ratio on the number of groups a user belongs to. When possible, and to optimize the share client rendering performance, a user should not belong to more than 5 or 6 groups.

 

8.5 - Limit the folder hierarchy depth

The depth of the folder hierarchy also has an impact when browsing and performing document actions under a certain folder.  We recommend, when possible,limiting the maximum depth of a folder hierarchy to 15 levels.

 

8.6 - Remove Orphaned content

Alfresco by default won’t delete any binary content of your repository. All content finally deleted from trash can will be moved to deleted content store folder on the file system and kept there till you delete it. Make sure you delete regularly old orphaned content in order not to occupy more file system disk space than needed.

 

8.7 - Scale out after Scale Up

Scaling out a cluster (adding more nodes to a cluster) has a drawback in the cluster communications overhead. This is much reduced by using Solr since cluster communications will reduce only to cache synchronization but if you are using Lucene your cluster number of nodes should in general not be bigger than 4. Consider optimizing your different layers and scaling up each of your nodes before increasing the number of nodes of your cluster just because.

 

8.8 - Load-balance wisely

Load balancing your nodes can be made in different ways, the most important: with or without sticky sessions. For load balancing the front end Afresco Share layer you should use sticky sessions but in principle between other layers (Share to Alfresco and between Alfresco and Solr) this wouldn’t be in general necessary.

 

8.9 - Cron Expressions and Scheduled tasks

Make sure you review your scheduled jobs cron expressions, so that you confirm you are really scheduling your process as intended. Check documentation on line to avoid you schedule a process that should execute every minute to get executed every second. On a cluster environment make sure you avoid triggering duplicated jobs from each Alfresco node, this should in general be done by implementing your custom scheduled jobs in a cluster aware mode (check Alfresco out of the boxes implementations to use as an example).

 

8.10 -  Predict your performance degradations

Your system was working fine and all of a sudden (or maybe gradually) it started to degrade on performance?

 An administrator should be able to have visibility on which operations are being executed and how much they represent of overall work and how that affect your system and in which layers. He should be able to detect usage patterns changes (a part of the solution almost not used now it’s becoming more important, and others are no longer as much used) and to understand what change on end users situation inspired that (a business process is getting deprecated in favor of another one). Ideally this should be able to be foreseen but many times it won’t.

 

Understanding the causes for whatever is going on in the system will help you solving or workaround it (meanwhile), and give you the chance to eventually deal with the root cause, which may not be purely technical.

 

Remember that your system will grow in time (the repository size and also the number of users) so the initial sizing may not be the proper one for the actual repository size and number of concurrent users after the first couple of years. You should be ready to that by proper benchmarking your system thinking on future growth in order to plan a calendar for scaling up/out your system (bigger/more machines).

 

8.11  Trashcan Cleaning

In general cleaning the trashcan is considered a best practice (from a maintenance perspective) and it can lead to performance gains especially for long-lived repositories where trashcan cleaning never or almost never happened. By cleaning the trashcan you will:

 

  • Reduce the information stored in the database. Reducing the size of the tables, database indexes and related objects.
  • Reduce the size of your Solr indexes and the space occupied by them.
  • Reduce the content store in disk.

 

A big trashcan will mean you end up having a much bigger (and slower) database than you need, a much bigger index (and potentially slower solr), occupying much more storage than you need. 

 

There are add-ons for automatic (periodic) cleaning of trashcans (also per date of deletion based criteria). Note that content in the trashcan gets indexed (FTS) but not used by Alfresco so the least content you have in the trashcan the better. Also take in consideration that the time to merge a smaller index will be less.

 

Recommendation

 

Install an add-on for automatic (periodic) cleaning of the user trashcans. Check:

 

 

 

Note: You can also skip the archiving of deleted content by adding the sys:temporary aspect to your content before you delete it.

 

9   Repository Performance Best Practices - Un-used features

We’ve decided to include a specific chapter on disabling un-used features, as it can have a big impact on performance. Disabling some features that are not being used can release important resources allowing them to be used for active tasks, contributing for increased performance.

 

9.1 - Disable thumbnails and documents previews

Alfresco creates document thumbnails by default as part of any upload. If you are not making any use of thumbnails, disabling their creation can increase upload times, as no transformers will be involved. To disable this set the following property on alfresco-global.properties on your alfresco nodes.

 

system.thumbnail.generate=false

 

9.2 - Disable share web-preview

When users are accessing alfresco via the share interface, when they access the document details page a full document preview is generated. This involves calls to various third-party tools such as OpenOffice, Ghoscript, ImageMagick to create a flash version of the document. Since previews are not being used, we can prevent their creation by including the following snipped of xml in the share-config-custom.xml file.

 

<config evaluator="string-compare" condition="DocumentDetails" replace="true">

      <document-details>

         <!-- display web previewer on document details page -->

         <display-web-preview>false</display-web-preview>

      </document-details>

   </config>

 

This file is normally under <tomcat>/shared/classes/alfresco/web-extension.

 

The recommendations below are result of the analysis of the alfresco-global properties files currently present on the production alfresco nodes.

 

9.3 - Disable Replication

If you are not using the Alfresco replication features, you can set this property to false, that will disable the alfresco replication service.

 

replication.enabled=false

 

9.4 - Disable Transfer service receiver

If you are not using the replication service receiver, this service can also be disabled.

 

transferservice.receiver.enabled=false

 

9.5 - Disable Cloud-sync features

In case of not using any cloud-sync features and they can be disabled.

 

syncService.mode=OFF

sync.mode=OFF

sync.pullJob.enabled=false

sync.pushJob.enabled=false

 

9.6 - Disable user quotas

Checking for user quotas can add some overload to alfresco. If you are not using user quotas, this feature can also be disabled

 

system.usages.enabled=false

system.usages.clearBatchSize=0

 

9.7 - Disable creation of users home folders

Alfresco creates a home folder for each new user automatically. Apparently the users are not utilizing this folder for any business related tasks. We can disable the automatic creation of home folder for new users.

 

home.folder.creation.eager=false

 

9.8      Disable JodConverter sub-system

JodConverter is used together with other third-party tools to generate previews and thumbnails. If those are not being used we can safely disable Jodconverter.

 

jodconverter.enabled=false

 

9.9      Disable Un-used virtual file systems

Alfresco, enables access to its repository using virtual file systems protocols such as cifs,webdav, nfs. If those are not being used, they can be safely disabled.

 

nfs.enabled=false  # Disables the alfresco nfs server.

 

cifs.enabled=false  # Disables the alfresco cifs server.

 

ftp.enabled=false  # Disables the alfresco ftp server.

 

imap.enabled=false # Disables the alfresco imap server.

 

9.10 Disable activities feed

 

The activities feed is not being used disabling it will prevent regular checks to the activities and will again save on system resources.

 

activities.feed.notifier.enabled=false

activities.feed.cleaner.enabled=false

activities.post.cleaner.enabled=false

 

10  Content Modeling Best Practices

 

This Section contains a series of hints that may contribute to tune your content model

Every additional indexed property slows down writes to the nodes that contain that property, so while there’s a temptation to naively configure every single property in a content model as indexed, “just in case”, this can have a significant impact on the performance of the repository and the size of the indexes on disk.  The recommendation here is index just the bare minimum number of properties that are required for querying, also taking into account that the transactional metadata query mechanism (aka MDQ) doesn’t require that the properties being queries are marked as indexed.

 

You should use a aspect oriented model whenever is possible, note that we can make aspects mandatory for specific types so that the users/coders won't have to worry about making sure the aspect(s) are always added. Normally you could have a base document with whatever global properties are important to all documents in your company and possibly work with aspects (when possible) for relating properties. When necessary or applicable you can have specialized types.

 

Common properties in various content types should be included on aspects. Having properties on aspects gives you the flexibility to add the properties only to certain types. Making it an aspect mandatory gives you the ability to require the property for certain types. If you make the property mandatory, then any time it is present, it will be required.

 

The rule of thumb for Java inheritance is a maximum of 3. The same thinking could be applied to content models; rule on thumb is to avoid unnecessary depth.

 

You can use types that have no properties. That's good for grouping descendants or even just for use as a "marker" so that you can easily find instances of that type with queries.

 

You can have different content models and put them in different XML files.

 

10.1 - Don’t repeat the same properties on the content model

You should not define for different types on your content model the same property (in practical terms: same meaning, same constraints, etc.). Leverage the capabilities of aspects and type inheritance for just defining each property only once.

 

10.2 - Content model changes must be incremental

This means you should go at each moment of your model development with just the minimum amount of load on your content model (types and aspects). Don’t define properties you are not sure you really need. Don't define types or aspects if you are not sure you need them.

 

10.3 - Use root types in your content model

This will make it easier to separate your solution model content and folder types from others in the repository, what can be important for many reasons: policies, searches, etc.

 

10.4 - Index only what you need, use index-control aspect

You don't need to mark all the metadata of your custom model to be searchable if you ain't gonna use those fields as search criteria. Only indexing the fields you need is specially important for very large repositories but its generically a very good practice.

You also can turn off full text search if you are not going to use it. Even the archive core can be deactivated for indexing purposes if you don't need to search against your trash can (but you will need some mechanism to empty your trash can regularly that does not rely on search engine).

Nowadays (most recent Alfresco 4 versions) you can also disable content indexing for certain types or for all of them.

Also for large repositories index/search optimization (and for those cases where you don't need to check permission since you are using system user access level) you can disable completely the permission checks on Solr.

 

10.5 - Remember Solr eventual consistency

If you are using Solr as search engine you should remember that Solr indexing happens asynchronously. So don't expect a search right after the creation of a node will return the same node. You may need to have to wait just a little. Careful design of your solution should allow you to avoid most of the times the node to be present on the search results.

 

10.6 - Leverage properly the out of the box Alfresco content model

It’s good that you leverage the out of the box Alfresco content model for your custom solution. So that if you need a title for example you just use the already available titled aspect. But don’t be arbitrary and use the out of the box content model with completely different meaning. As an example: don’t use special tags on the title property to flag custom behaviors completely unrelated with the meaning of the property.

 

10.7 - Content Modelling Golden Rules

  • Do not change alfresco out-of-the-box content model
  • Consider implementing a root scope type
  • In the beginning add properties that you really need as it is hard to remove properties later.
  • Avoid unnecessary content depth
  • Use aspects when you can, they are really flexible and provide lots of extra possibilities.
  • Create multiple content models to keep them in order
  • Implement one java class that corresponds with each of the custom models.

 

11 - Keynotes on Monitoring 

Monitoring your architecture is very important and will help you to understand the components that are being streesed by our use case. Identifying the most relevant key performance indicators of the Alfresco applications assumes an important role on the overall application administration.

 

The diagram below shows some of the factors in the deployment that should be analyzed on a regular basis.

Notice that certain inspection targets have overhead when they are being inspected.  These targets might not be appropriate for long-term monitoring or may require tuning to minimize impact.

11.1 - Database Key Performance Indicators

 

Database tools will vary by vendor, but all of Alfresco’s supported databases have tools that can tell you the following key performance indicators.

 

  • Response time
  • Blocked queries
  • Top queries by frequency and / or time
  • Slow Queries
  • Average number of Transactions per second (during a peak period)
  • Number of Connections (during a peak period)
  • Query Plans
  • Database server health (Cpu, memory, IO, Network)
  • Indexes Size and Health

 

Alfresco (like all Java applications) uses JDBC to access the database, inspecting JDBC can yield useful information for developers, the tools below can be used to inspect JDBC.

 

  • jdbcspy
  • Log4jdbc – more active development, easier to configure (IMHO)

 

11.2 - What to monitor/report in Production?

Production monitoring should cover some basic information about Alfresco and its environment.  Making sure the JVM isn’t constantly stuck in “stop the world” GC, for example, or monitoring your database connection pool sizes so you can be alerted if they are consistently near the max value.  As far as test/pre-production environment is concerned, well, monitor what you need, when you need.

 

We advice producing a weekly report with the Database KPNs referred before as well as the following:

 

  1. Alfresco statistics (# of logged in users, DB pool used / free connections)
  2. JVM statistics (memory / CPU, GC time)
  3. SOLR statistics (queries / sec, response time)

 

11.3 - What monitoring Tools to use?

Monitoring your Alfresco architecture is a known best practice. It allows you to track and store all relevant system metrics and events that can help on:

  • Troubleshooting possible problems
  • Verify system Heath
  • Check user behavior
  • Build a robust historical data-warehouse to later analysis and capacity planning

 

You have 2 choices to enable monitoring on top of Alfresco, a very simple approach with relevant data for initial and continuous monitoring and a more robust one, that I consider to be one of the best open-source monitoring solutions i’ve seen at work.

 

  • Java Melody
  • Alfresco Monitoring by Miguel Rodriguez (Alfresco employee)

 

I will be explaining both of these tools on the next sections.

 

1.3.1      Alfresco Monitoring by Miguel Rodriguez

 

This is a fully open-source stack of monitoring tools that build the global monitoring solution.

This solution uses of the following open-source products.

The solution monitors all layers of the application, producing valuable data on all critical aspects of the infrastructure. This will allow a pro-active system administration opposed to a reactive way of facing possible problems by predicting the problems before they happen and take the necessary measures to maintain a healthy system on all layers.

I see this approach as both a monitoring and capacity planning system allowing to provide “near” real time information updates, customize reporting and to provide custom search mechanism over the collected data.

The diagram on the next page shows how the different components of the solution integrate. Note that we centralize data from all nodes and the various layers of the application in a single location.

The sample architecture on the diagram being monitored consists on a cluster of two Alfresco/Share nodes for serving user requests and two Alfresco/Solr nodes for indexing/searching content.

You have much more detailed information on this monitoring solution including a breakdown of all monitoring stages on the following blogpost.

 

 

1.3.1               Monitoring Alfresco with JavaMelody

 

Java melody is a open-source monitoring solution for java application very simple to con simple but producing  currently distributed at :

 

 


 

JavaMelody is used to monitor Java or Java EE application servers in QA and production environments. It is a tool to measure and calculate statistics on real operation of an application depending on the usage of the application by users. Very easy to integrate in most applications and is lightweight with mostly no impact to target systems.

 

This tool is mainly based on statistics of requests and on evolution charts, for that reason it’s one important add on to our Alfresco project, as it allow us to see in real time the evolution charts of the most important aspects of our application.

 

It includes summary charts showing the evolution over time of the following indicators:

 

  • Number of executions, mean execution times and percentage of errors of http requests, sql requests, jsp pages or methods of business façades (if EJB3, Spring or Guice)
  • Java memory
  • Java CPU
  • Number of user sessions
  • Number of jdbc connections
  • ….

 

These charts can be viewed on the current day, week, month, year or custom period.

 

Architecture of JavaMelody is lightweight. So it has a lower overhead as I can compare it to other available solutions. The overhead is so low that it can be enabled continuously in QA and Production environments.  A really nice thing is the storage of historical data - you can have a look at the same graphs spanning a week, a month or a year without setting up any additional infrastructure.

 

JavaMelody helps you to

  • Identify problems before they become too serious
  • Optimize based on the more limiting response times
  • Find the root causes of response times
  • Verify the real improvement after optimization
  • Check out the different types of memory being used (Heap, PermGen, Cache etc.)
  • Give facts about the average response times and number of executions
  • Provide data to help improve applications in QA and production 

 

Detailed documentation can be found at:

 

Installing JavaMelody specifically for Alfresco

The following step-by-step procedure illustrates on how to setup JavaMelody for Alfresco monitoring.

 

STEP 1 – Download JavaMelody and dependencies

 

Download the latest JavaMelody distribution jar from Javamelody

 

https://github.com/javamelody/javamelody/releases/download/javamelody-core-1.57.0/javamelody-1.57.0.jar and copy it to the alfresco/WEB-INF/lib lib directory.

 

You also need to include some dependencies to have the full features enabled (Pdf export)

  • jrobin-1.5.9.jar
  • iText-2.1.7.jar

 

STEP 2 – Update alfresco.war deployment descriptor (web.xml)

 

To enable the JavaMelody integration you need to update the alfresco deployment descriptor.

 

Location: <tomcatHome>/webapps/alfresco/WEB-INF/web.xml

 

Edit the web.xm file and add the spring javamelody filter

 

…..

   <filter>

     <filter-name>monitoring</filter-name>

     <filter-class>net.bull.javamelody.MonitoringFilter</filter-class>

   </filter>

   <filter-mapping>

     <filter-name>monitoring</filter-name>

     <url-pattern>/*</url-pattern>

   </filter-mapping>

 

   <listener>

       <listener-class>net.bull.javamelody.SessionListener</listener-class>

   </listener>

…..

 

 Update spring contextConfigLocation

…..

   <!-- Spring Application Context location monitoring-spring-datasource.xml  monitoring-spring.xml  -->

 

   <context-param>

      <param-name>contextConfigLocation</param-name>

      <param-value>

            classpath:net/bull/javamelody/monitoring-spring-datasource.xml

            /WEB-INF/web-application-context.xml

      </param-value>

      <description>Spring config file location</description>

   </context-param>

….

 

NOTE : The SQL time measured is the time from invoking the statement from JDBC until there is a resultset. It does not include the time to fetch the data for a select, nor does it include the time for jdbc ResultSet.next().

 

STEP 3 – Securing Java Melody

 

If you want BASIC authentication with username and password, but do no want to use a realm and "security-constraint" in web.xml, you can add the parameter "authorized-users" in web.xml, in context or in system properties like the other javamelody parameters (since v1.53).

For example in your WEB-INF/web.xml file:

<filter>

        <filter-name>monitoring</filter-name>

        <filter-class>net.bull.javamelody.MonitoringFilter</filter-class>

        <init-param>

                <param-name>authorized-users</param-name>

                <param-value>user1:pwd1, user2:pwd2</param-value>

        </init-param>

</filter>

 

NOTE: You can disable JavaMelody using disabled=false using the init parameter in web.xml or by adding -Djavamelody.disabled=true as system property

 

Installing JavaMelody Globally for all applications

 

You can integrate JavaMelody on every web-app deployed on your application server.

 

Step 1

 Configure the JavaMelody monitorization on Alfresco tomcat by copying the itextpdf-5.5.2.jar, javamelody.jar and jrobin-1.5.9.1 to the tomcat shared libfolder under <tomcat_install_dir>\shared\lib or your application server (if not tomcat) global classloader location.

 

Step 2

Edit the global tomcat web.xml (<path_to>\tomcat\conf\web.xml) file to enable javamelody monitorization on every application. Add the following filter:

 

<filter>

<filter-name>monitoring</filter-name>

<filter-class>net.bull.javamelody.MonitoringFilter</filter-class>

</filter>

<filter-mapping>

<filter-name>monitoring</filter-name>

<url-pattern>/*</url-pattern>

</filter-mapping>

<listener>

<listener-class>net.bull.javamelody.SessionListener</listener-class>

</listener>

 

And that’s about it, after restarting you can access the monitorization of every application in http://<your_host>:<server_port>/<web-app-context>/monitoring, for example http://localhost:8080/alfresco/monitoring

JavaMelody Maven Integration

 

You can avoid all the configuration mentioned above if you use maven to build your project, if this is the case you can include the dependency on your project build. The dependencies are the following:

<!-- Minimal dependencies for JavaMelody -->

<dependency>

    <groupId>net.bull.javamelody</groupId>

    <artifactId>javamelody-core</artifactId>

    <version>1.57.0</version>

</dependency>

 

<dependency>

    <groupId>org.jrobin</groupId>

    <artifactId>jrobin</artifactId>

    <version>1.5.9</version>

</dependency>

 

12 - Solr Best Practices - Tuning cores for indexing 

 

Optimize your ACL policy, re-use your permissions, use inherit and use groups. Don’t setup specific permissions for users or groups at a folder level. Try to re-use your Acls.

 

When full indexing a big repository with Solr there are some parameters that may save your life and avoid that your indexing takes weeks or months. The most important ones are the number of threads to use for multi-thread indexing and the main index merge factor (both for the workspace solr core configuration).

 

In general as a rule of thumb you should consider having one third of the number of cores of your Solr (and of your Alfresco repository machine used for index tracking) dedicated for the space indexing, so that incrementing a bit the default number of threads from 3 to that only during full indexing will make it end sooner. Also for the main merge factor (default is also 3) increasing it to closer to 10 (for large repositories) will make it indexing work lighter and faster. Both numbers (index threads and main merge factor) don’t need to be aligned.

 

For Lucene it may also be an optimization to turn full text indexing always asynchronous so disabling in transaction content indexing.

12.1      Indexing Performance Golden Rules

  • Have local indexes (don’t use shared folders, NFS, use Fast hardware (RAID, SSD,..)
  • Tune the mergeFactor, 25 is ideal for indexing, while 2 is ideal for search.
  • Tune your Ram buffer size (ramBufferSizeMB) on solrconfig.xml, 32 MB by default
  • Analyze your indexing processes (check alfresco repository health)
  • Tune the transformations that occur on the repository side, set a transformation timeout.
  • Closely monitor Solr JVM (especially GC and Heap usage)
  • Enable GC logs, analyze Gc performance, tune the GC algorithm
  • Do you need tracking to happen every 15 seconds ?
  • Use a dedicated tracking alfresco instance, several architecture options
  • User Solr 4 sharding capabilities to have your index divided in small chunks
  • Increase your index batch counts to get more results on your indexing webscript
    • In each core solrcore.properties, raise the batch count to 2000
  • Impacting factors in Indexing
    • Jvm Memory and Cpu usage on Repository Layer (text extraction /transformations)
    • Jvm Memory ,Cpu , Disk I/O, Disk Cache size on Solr Layer
    • Number of threads for indexing, Solr caches

 

ramBufferSizeMB

ramBufferSizeMB sets the amount of RAM that may be used by Solr indexing for buffering added documents and deletions before they are flushed to disk. Generally increasing this to 64 or even 128 has proven increased performance. But this depends on the amount of free memory you might have available.

 

Analyze your Indexing process

During the indexing process, plug in a monitoring tool (YourKit) to check the repository health during the indexing. Sometimes, during indexing, the repository layer executes heavy and IO/CPU/Memory intensive operations like transformation of content to text in order to send it to Solr for indexing. This can become a bottleneck when for example the transformations are not working properly or the GC cycles are taking a lot of time.

 

Dont forget to tune your Solr garbage collector

Solr operations are memory intensive so tuning the Garbage collector is an important step to achieve good performance. Jclarity Cemsum tool is really good to analize gc logs, but there are others.

 

12.2 - Solr tracking frequency

Consider if you really need tracking to happen every 15 seconds (default). This can be configured in Solr configuration files on the Cron frequency property.

 

alfresco.cron=0/15 * * * * ? *

 

This property can heavily affect performance, for example during bulk injection of documents or during a lucene to solr migration. You can change this to 30 seconds or more when you are re-indexing.

This will allow more time for the indexing threads to perform their actions before they get more work on their queue.

Increase your index batch counts to get more results on your indexing webscript on the repository side. In each core solrcore.properties, raise the batch count to 2000 or more alfresco.batch.count=2000

 

12.3 - Make sure you have a large disk cache

For index updates, Solr relies on fast bulk reads and writes. One way to satisfy these requirements is to ensure that a large disk cache is available. Use local indexes and the fastest disks possible.

 

In a nutshell, you want to have enough memory available in the OS disk cache so that the important parts of your index, or ideally your entire index, will fit into the cache. Let’s say that you have a Solr index size of 8GB. If your OS, Solr’s Java heap, and all other running programs require 4GB of memory, then an ideal memory size for that server is at least 12GB. You might be able to make it work with 8GB total memory (leaving 4GB for disk cache), but that also might NOT be enough.

 

12.4 - Re-Indexing is always possible

Over time, searches conducted in Alfresco against metadata or content may exhibit slower performance or return erroneous results.  Missing, corrupted or stale indexes caused by certain environmental events or problematic customizations are contributors to this behavior.  A full reindex is recommended as part of an Alfresco support regimen to ensure a quality search experience.  The frequency of executing a full reindex will depend upon the usage pattern of Alfresco.  To assess the state of the current indexes, Alfresco offers administrator tools and interfaces for Solr and in a smaller extent also to Lucene.

 

The re indexing (for Lucene or Solr) when needed may be done in a parallel environment to the production main system (no need to configure cluster or to use or affect the main production servers).

 

12.5 - Keeping your index small

Housekeeping and tuning your indexes is an important part of application maintenance.

 

Removing the node types that are not being searched (and therefore do not need to be indexed) can have a positive impact on the Solr Index performance.

 

You need to execute a database query to check the number of nodes per type that exist on the Alfresco database.

You should get to a table similar to this one :

 

 

I recommend running a database query that checks the number of nodes per mime-type and build a table like the one above and decide for a list of node types that can be blacklisted. Removing all nodes of such types from the Solr index, will drastically lower the index size improving indexing, bulk ingestion and query operations.

 

Preventing specific node-types from being indexed

Each Alfresco node containing cm:indexControl aspect and the cm:isIndexed property set to false will not be processed/indexed by Solr; This logic should be applied to any Alfresco node that is not intended to be searchable.This will keep solr dataset smaller and more performing.

 

Sample list:

o cm:thumbnail

o ver2:versionHistory

o sys:deleted

 

Nodes may also be blacklisted based upon on aspects (i.e. sys:hidden) by implementing a simple Alfresco Behaviour. The behavior below forces the use of cm:indexControl aspect for all Alfresco nodes that are not searchable This code can drastically lower Alfresco Solr index size.

 

Pseudo code example can be found below.

 

public class Unindex implements NodeServicePolicies.OnCreateNodePolicy {

private static List<QName> BLACKLISTED_TYPES = new ArrayList<QName>();

static {

BLACKLISTED_TYPES.add(ContentModel.TYPE_THUMBNAIL);

BLACKLISTED_TYPES.add(ContentModel.TYPE_FAILED_THUMBNAIL);

// .... more types here }

private Behaviour onCreateNode;

public void init() {

this.onCreateNode = new JavaBehaviour( this,

"onCreateNode",

NotificationFrequency.TRANSACTION_COMMIT);

QName onCreateNodeQName = Qname.createQName(

NamespaceService.ALFRESCO_URI,

"onCreateNode");

for (QName type : BLACKLISTED_TYPES)

this.policyComponent.bindClassBehaviour(onCreateNodeQName,type,this.onCreateNode);

}}

public void onCreateNode(ChildAssociationRef childAssocRef) {

NodeRef childRef = childAssocRef.getChildRef();

Map<QName,Serializable> props = new HashMap<QName,Serializable>();

props.put(ContentModel.PROP_IS_INDEXED,Boolean.FALSE);

nodeService.addAspect(childRef, ContentModel.ASPECT_INDEX_CONTROL,props);

}

}

 

Removing specific node-types from the index

 

For all existing nodes a Javascript action can be implemented to apply this logic across the entire repository (by recursively navigating the repository tree from the root below).

 

The next code snippet shows the logic applied to one single node

BLACKLISTED_TYPES = [

“{http://www.alfresco.org/model/content/1.0}thumbnail”,

“{http://www.alfresco.org/model/content/1.0}failedThumbnail”];

function needsIndexControl(node)

{

return BLACKLISTED_TYPES.indexOf(node.type) >= 0;

}

function controlIndexing(node)

{

if(node.isContainer)

{

for each (n in node.children)

controlIndexing(n);

}

} else if (node.isDocument)

 

12.6 - Troubleshooting indexing problems

 

Database – If it’s a database performance issue, normally adding more connections to the connection pool can increase performance.

I/O – If it’s an IO problem, it can normally occur when using virtualized environments, you should use hdparam to check read/write disk speed performance if you are running on a linux based system, there are also some variations for windows. Find the example below: sudo hdparm -Tt /dev/sda

 

The rule for troubleshooting involves testing and measuring initial performance, apply some tuning and parameter changes and retest and measure again until we reach the necessary performance. 

Plugin a profiling tool such as Yourkit to both the repository and Solr servers to help with the troubleshooting.

 

Tools to work with your indexes

 

It does not matter if you are using the legacy Lucene index subsystem (pre-4.x) or SOLR (4.x+), the underlying index is still a Lucene index, and the same tools can be used to check the health of the indexes.


One tool that is normally used to check the indexes is Luke.  Luke versions are important as they track Lucene versions.  If you are going to use Luke to make any changes to your index (NOT recommended for anything other that testing things in a dev sandbox!), it is important to use the right version of Luke or your index may not be readable by Alfresco.  If you are just going to inspect a copy of your index, the latest version is recommended.

 

There are some best practices to be followed when using Luke.

 

  • ALWAYS work against a copy of your index, never the actual running index
  • Luke versions can be important

 

What can we learn from our index w/ Luke?

 

  • Most common terms in our index
  • Total document count and size
  • Is the index optimized?
  • How are properties tokenized?
  • What properties are stored in the index?
  • GREAT tool for Lucene query development

 

The CheckIndex tool is bundled with each Lucene distribution, note that it may take lots of time to check a large index. The -fix option may result in deleted documents from index, re-index or restore from backup is usually a better strategy. Regardless of what tools you use to check your index, ALWAYS work against a backup, never the live index!!  It may be tempting to try to use the –fix option.  Not recommended.

 

13 - Solr Best Practices - Tuning cores for search

We can also tune our solr cores for being more responsive to search operations (sometimes compromising the indexing performance)

 

13.1 - When applicable, disable full text indexing

If full-text search it’s not being used at all and all the searches currently being performed are meta-data searches you can disable full text indexing features saving on important resources and keeping a much smaller index.

We can disable all full-text indexing activities and tune our search layer for performance on meta-data based searches. We can make usage of the transactional metadata queries feature of Alfresco.

To disable full-text search we need to configure the workspace Spacestore solr core. Edit the solrcore.properties file and set the following property:

  • index.transformContent=false

This will prevent full text indexing. You will need to restart the server after performing this change.

 

13.2 - If applicable, Disable the solr archive core

If you are not planning on searching for deleted content, we can safely disable the indexing of archive content.

1.2.1          

                      Disabling the archive core backup scheduled task

If you disabled the archive core you also need to disable the archive core backup scheduled task. We do this by configuring the cron to a date in a very far away future. You should do this on every alfresco node (including any existing tracking instance)

# disabling the archive backup as we are not using archive search

solr.backup.archive.cronExpression=* * * * * ? 2199

solr.backup.archive.numberToKeep=0

 

13.3 - Optimize your queries

In general you should just worry about this when optimization is needed. For those cases you may want to try variations of your query that put the burden on some filtering or ordering on “your side” and not on the search engine. In particular you may want to try execute your search as system user and afterwards filter the results based on the user role or permissions. You may find out considerable improvements for certain cases.

Also if your search is basically around specific metadata values that will return a coupe of results that may need to be filtered according to some complicated logical condition (potentially including many ORs) try to move that filtering to your custom code and just execute the search for the specific metadata values.

 

13.4 - Search simplification – avoid many ORs on your searches

If your query is slow consider how much does your model fits the queries you need to make. Possibly introducing some level of typification (“search for contracts”) will help you limiting the scope of your queries and make them faster. Sometimes content models are created too detached from final real business usage what ends up leading to many ORs combined conditions over metadata. Leveraging aspects you can also many times avoid that and get better performance on your searches.

 

13.5 - Don’t hide Index corruptions

If your searches in your custom code are returning non existing nodes, your indexes are corrupted and you should not try to hide that. Don’t remove silently (without at least logging the problem) the non existing nodes from your searches in your custom code. You will be hiding from administrators your indexes corruption problems and the unhealthy state of your indexes will end up hitting the performance of your system sooner than later besides affecting functionally other parts of your code...

Review your code specially around silent catches of exceptions that most probably may be generating the corruption of your indexes, and handle the root problem of index corruption directly. And remember you will also need after a full re indexing of your system.

 

13.6 - Search Performance Golden Rules

  • Use local folders for the indexes (don’t use shared folders, NFS)
  • Use Fast hardware (RAID, SSD,..)
  • Tune the mergeFactor, a mergeFactor of 2 is ideal for search.
  • Increase your query caches and the RAMBuffer.
  • Avoid path search queries, those are know to be slow.
  • Avoid using sort, you can sort your results on the client side using js or any client side framework of your choice.
  • Avoid * search, avoid ALL search
  • Tune your Garbage collector policies and JVM memory settings.
  • Consider lowering your memory on the JVM if the total heap that you are assigning is not being used. Big JVM heap sizes lead to bigger GC pauses.
  • Get the fastest CPU you can, search is CPU intensive rather then RAM intensive.
  • Optimize your ACL policy, re-use your permissions, use inherit and use groups. Don’t setup specific permissions for users or groups at a folder level. Try to re-use your Acls.
  • Upgrade your Alfresco release with the latest service packs and hotfixes. Those contain the latest Solr improvements and bug fixes that can have great impact on the overall search performance.
  • Make sure you are using only one transformation subsystem. Check the alfresco-global.properties and see if you are using either OooDirect or JodConverter, never enable both sub-systems.
  • User Solr 4 sharding capabilities

 

14 - Security Best Practices

14.1 - Perimeter Security

Make sure you set up firewalls properly. If you want your repository to be unavailable for direct end user access, remember that Share has a proxy that gives end users access to repository Rest API execution. This may be especially problematic if you have customizations in place that execute  as administrator for normal end user authentication. You may want to limit access from outside your local network to only certain parts of the remote API.

 

14.2 - Default Passwords

Apart from changing the default admin password, remember to change the DB and JMX access passwords.

 

14.3 - Run alfresco as a non-privileged user

Running Alfresco as a non root user will prevent the system from a compromise application or a failure on it, or even have a greater control of operating system resource consumption. It is important to configure all ports beyond port 1024 because of the security restriction for a non-root user running linux-like systems.

 

14.4 - Installation files and Permissions

In order to prevent unwanted accesses to the main and critical components it is highly recommendable to apply the set of permissions to these files and folders containing critical information. Set permissions for configuration files, content store, indexes and logs. Only the user running Alfresco must be able to access those folders.

 

14.5 - Consider using SSL for all Alfresco services

There is no doubt that it is highly recommendable to use all SSL protocol variants on activated services. Then we can activate HTTPS in three different ways, adding an appliance supporting SSL offloading to our network, activating HTTPS on a frontal web server (Apache, IIS, etc), or even configuring HTTPS on the application server, the first case being the most optimized one. The same can be done for FTP SSL variant, referred to official documentation for more details about how to configure it. Implementing SSL on Share Point protocol for Alfresco should be compulsory because it will help us avoid solving issues on the client side, mostly for MS Office on Windows XP, Windows 7 and MacOSX.

 

Check official documentation to know how to enable it.  SMTPS is also supported by Alfresco for both incoming (to the repository) and outcoming (to a MTA) email.  IMAP and JGroups implementation in Alfresco do not support SSL variants but there are different ways to set them on SSL. Additionally, in case you want to use external or third party authentication remember to encrypt communication between Alfresco and those servers. For transfer service avoid using HTTP and admin privileges user.

 

14.6 - Disable the guest user

Disabling guest access depending on the kind of authentication chosen and limiting number of users or specifying nominated users that can get access to the repository.

 

14.7 - Limit the users and state of the repository

Consider to limit number of users that can access to the repository, allow only certain nominated users or change state of the repository (read and write or read-only) for increase the security to the repository or prevent changes in certain moments.

If synchronizing users from your LDAP, make sure you only synchronize the branches that you really expect to be alfresco users.

 

14.8 - Session Timeout

Set timeout as your convenience for Webdav, Share or CIFS

 

15 - Development Best practices

15.1 - Own your code

If you hired the custom development of your solution to somebody else (and maybe you don’t even have the source code with you...) or your solution is considered complete and you don’t have the original developers available anymore, you are probably in trouble.

If you think that from now on all you need is to properly administer your solution and technical ownership of the custom code is not needed you are wrong. Your custom code was not developed by Alfresco and most possibly not by your administrators and custom code needs to be validated, improved and reviewed with time (during upgrades, change requirements, detected bugs).

Make sure your company “owns” the custom code it is using, meaning, has internal (or external from a close and trustful collaborator/partner) knowledge and control over it.

 

15.2 - Use Version control

Always use version control systems for your code and configuration customization. Unfortunately this very basic principle is still ignored many times by some small development teams, but the truth is that even individual developers working alone should use it.

You can use any version control you prefer (subversion, git, mercurial, etc) but use one. And make regular, in general centralized backups, of your version control system whatever its. Nowadays you also have wonderful free options also available in the cloud (Google Code, Github, etc).

 

15.3 - Unit Tests

It's considered more than a best practice to include as part of your development cycle the creation and execution of unit tests that test your solution individual pieces of code and also functional tests that will test broader scopes of your solution functionality. Those tests can be automated, coupled with load tests (benchmarks) and become part of your continuous integration development cycle.

 

15.4 - Avoid huge methods and classes

This is a complete generic advice and a very basic one, not specifically related with Alfresco at all. But we have seen so many times Alfresco custom code also hit hard by this terrible practice that it’s worth to mention it also here. Methods and classes should be small. A method should do just one thing, without hundreds of lines for it. Classes should not do unrelated things and transform themselves in bags of methods and unstructured code. If your code does not comply with this very basic rule it will be unmaintainable and suffer from bigger problems than any specifically programming related Alfresco advice may be able to solve.

Also in the context of generic advises: avoid repeated code, user proper names for classes and variables, etc...

 

15.5 - Close your streams and ResultSets

This is a fairly basic advice that you must follow. When you search in Alfresco you get ResultSet objects that work much the same way the old JDBC ResultSet ones and demand proper closing after used. So you should organize your code properly with try-catch-finally for closing those resources. With Solr result sets closing is less important but should still be done regarding Lucene compatibility. The same also applies for streams for example you may have in your code just like as in any Java code.

As a side note, please avoid code with many-nested try-catch-finally. There’s no reason for a method’s code to have more than one try-catch-finally. If you have many nested ones, you are probably doing too many things on the same method and your code will get much harder to maintain.

 

15.6 - Different solutions can be in different repositories

If you are implementing very different and independent solutions with considerable repository size each you should consider splitting your repository in more than one. Having dedicated repositories to each solution may give you a better chance to optimize it for each different usage the different solutions have.

As an example having a typical medium-large collaboration deployment of Alfresco using the same repository as a backend high usage custom solution may lead you to troubles easier to avoid if you had repositories split. Besides the fact that maintaining the different solutions with the same repository from upgrade point of view may be a headache that you don't necessarily need to have it.

 

15.7 - You don’t need to “re-invent the wheel”

Alfresco comes with a very powerful Rest API out of the box and also CMIS support. Consider it carefully if you really need that extra custom webscript.  It will be one more piece of API for you to support directly and eventually you would be just as good (or probably better) with what is available out of the box and supported by Alfresco.

Try to use the Alfresco Public API as much as possible since it’s the one built with version maintainability in mind and that it will possibly make your application compatible with both Alfresco Cloud and On Premise. If CMIS is good enough for you, even better. Your solution will be compatible with any major ECM. Only when functional or performance requirements leave you with no other choice should you consider to build your own Rest API.

Don't create a personal workflow engine for some BPM type functionality needed just because you don’t want to waste time learning a new technology (BPMN2 for Activiti most importantly). They are usually easier to learn than you think, and it will pay back in the future, specially comparing the headaches the maintenance in the future of your own code may bring when new change requirements come up.

Check Alfresco Add Ons and Alfresco Partner Solution Showcase, if your problem is generic enough it might already be solved by the community or by our partner network. Consider giving yourself also your code back to the community if you think the problem your code solves is generic enough that might interest others besides your code will also get eligible for improvement coming from others.

 

15.8 - Avoid deprecated or soon to be deprecated components

You shouldn't start any new development using any of the following components:

  • Alfresco old AVM WCM (deprecated)
  • JCR (use Open CMIS instead)
  • Old Soap API (use Rest APIs: CMIS and Public API or custom Web Scripts)

 

If your project is based already on customizations using one of those components you should consider a review asap. (In any case bear in mind that the only component which right now is really already deprecated is the first mentioned: Alfresco AVM WCM.) 

 

15.9 - Use transactions Properly

Avoid explicit transaction handling. You will have to add try-catch-finally for handling rollback and commit explicitly and your code will get harder to read and maintain, and you will more easily make mistakes. Alfresco possess an alternative for that, that encapsulates the transaction for you in a much nicer way. Take a look on RetryingTransactionCallback usage.

 

If you are managing many transactions in a loop make sure you really create a new transaction for each interaction and don't end up using the same transaction all the time. This can eventually happen by not paying attention to the boolean extra arguments on transactions creation. Using RetryingTransactionCallback this is controlled by the extra boolean arguments in the method doInTransaction.

Finally if your transactions are read-only you should also create them as such. On RetryingTransactionCallback this is again controlled by the extra boolean arguments in the method doInTransaction. For webscripts you should make sure to set them as such if it applies on the webscript configuration.

 

15.10 - Be Careful with Exception Handling

You have to be careful with exception handling specially if you don't guarantee an exception is thrown after handling. In Alfresco you may end up (if you are using Lucene) with inconsistent indexes, that may have results indexed that were rejected by the database (your transaction didn't rollback after a database error). If your searches are returning nodes that don't exist most possibly your code is already harming your indexes.

Anyway in any case you should consider exception handling as a terrible bad practice if you are using it for flow control (if an exception happens here do this, if it happens here do that...) and if you are not re-throwing your exception (at least for non ignorable instructions, which would be only those very simple cases where there is nothing you can do about the exception and it should not have any important impact to you).

 

15.11  - Leverage custom ContentModel class for custom properties QNAME

Alfresco exposses a ContentModel class for leveraging access to all out of the box content model QName objects related to types, aspects and properties. You should use it in place of instantiating all the time the QName yourself. You should also think about creating and using a similar class containing the QName constants that correspond to your custom model.

 

15.12 - Don’t run everything as system or admin

For most cases you will want to leverage permission control of Alfresco in order to control access to your content per user, group, etc. So you don’t generally need to call for normal end users remote APIs that end up executing as system users or administrators. This is problematic from a security point of view and may correspond to a poor permission control design of your application.

And please remember that the administration user of your repository may change password and don’t need necessarily be always “admin”... So when running as administrator use Alfresco AuthenticationUtil class helper methods and don’t trust to authenticate with hard coded credentials on your code.

 

15.13 - Don’t go threading on your own

If you need to make parallel coding don't go threading by yourself. Alfresco has a Thread pool that you should use on your asynchronous/parallelization code (the bean of the ThreadPoolExecutor in Alfresco is defaultAsyncThreadPool).

 

15.14  - Scheduled Jobs should use JobLockService

If you are implementing scheduled jobs you should use Alfresco JobLockService for cluster scenarios. This will guarantee that your jobs are not executed more than what they should and simultaneously on the different nodes of the cluster (and without having the extra maintenance overhead of switching some scheduled jobs in some nodes and others not, which also have the drawback of removing HA for your scheduled job).

 

15.15 - Use application level cache if needed

If you know you can easily cache results at your custom application level and avoid over hitting your Alfresco platform with no reason, do it. If it’s not easy or it’s not clear you need it wait until there’s a real reason for doing it, since application level cache will add another level of complexity on your system.

 

15.16 - Foundational service beans (Avoid lowercase beans)

Avoid using the lowercase identified beans (nodeService, searchService, etc) for injecting the Foundational Services on your custom beans. You should use the upper cases instead: NodeService, SearchService, etc. The last ones are the ones who have properly wrapped the necessary transaction and security checks.

The “lower case” versions of the Alfresco service beans (i.e. those whose first name starts with a lowercase letter e.g. nodeService) are configured to bypass Alfresco’s security, transaction and auditing checks, with no recourse for the administrator to turn them back on. There have been persistent (but incorrect) rumors in the Alfresco community that these versions of the services perform significantly better than the official (“upper case”) versions, but that hasn’t been the case since at least Alfresco v2.x.

 

15.17  - Use setProperties instead of many setProperty calls

The Alfresco NodeService possess 2 “equivalent” methods for setting a single property (setProperty) or many properties (setProperties). In general you should use the last one for setting many properties on the same node in place of calling many times the setProperty method. This is not only a cosmetic question since depending on the policies applied to the OnUpdateProperties

 

15.18 - Transfer, Copy, Move, Inject

If you have to copy or move content from one repository to the other, you should be aware that Alfresco has Transfer Services meant for this so there’s no need to reinvent the wheel in an unsupported way and most possibly with worst performance.

There is also the danger of considering different repositories when the same work could be accomplished by just one repository properly sized, and with no need for transfer and by just internal copy or move. (Don't optimize first but only when needed.)

And worst of all, don’t use remote services to make internal copies. Like trying to use CIFS, WebDAV, FTP or custom webscripts to copy content internally... We have seen it this to be done in some occasions and it just doesn't make sense.

Also if you need to massively inject content in a repository, you should try to avoid using protocols like CIFS (only eventually FTP). There are better options like out of the box massive in-place dump import, and even CMIS and Java based custom webscripts if really needed.

 

15.19 - Authentication tickets

Authentication tickets expiration can be setup so you don't need to create a new ticket for every webscript request operation of the same user and you can keep them in memory at your custom application layer for reuse. You can even configure authentication tickets for not expiring or you can set to expire after a specific certain time if you need it.

If you need it, you can also persist the tickets on disk in order they are reloaded after restart (so truly never expiring). For Alfresco to persist the tickets you will need to add diskPersitent=”true” to the cache configuration of ticketsCache.

 

15.20 - Rules and Policies

In general a good rule of thumb it’s to base yourself on “space-time” scope.

Meaning: If your automation is meant to be executed everywhere in the repository (all spaces) or almost everywhere (for a certain type of content), and to be executed “always” (you won’t need to easily activate or deactivate the automation), then you are most probably talking about a policy. On the opposite, if your automation is to be executed just once and awhile, or only for a very limited space, then you are probably better with a content rule.

In any case make sure in general your policies (and much more your rules) don’t execute always for any content or folder update in your repository. This hardly makes sense and if your solution needs it somehow it should be only applied to your custom model root content or folder types, in order it does not contaminate other usage the repository may have beyond your solution.

As a general advice especially whenever implementing customizations that shall have an horizontal impact for the whole repository consider very carefully the performance and scalability impact of your customization. This may apply for policies and content rules but also for customizations around Spring interceptors and policies, including Acegi security customizations. You may be getting yourself in a enormous performance bottleneck if any action over the repository will demand complex checks, communications with other systems, etc.

 

15.21 -  Attribute Service

Leverage system wide (across cluster node members inclusive) status through usage of the Attribute Service with no need for example of creating nodes storing the status values as properties.

 

15.22  - Don’t use println

You shouldn’t use System.out.println in your Alfresco custom code. Alfresco uses log4j for logging and you should use it also. It makes your code more professional and easier to control your log level.

 

15.23 - Use proper placeholder property names

For your solution custom property placeholder names leverage “solution namespace” for clarity when configuring your specific solution parameters. For example don’t use cleaner.expiryDates but if it’s something specific to your customization use better solutionName.cleaner.expiryDates (an equivalent rule should be used for you beans ids). So that in general all customization specific parameters can be identified easily together in block on the configuration files as separated from the rest of Alfresco product configuration and other customizations in place and most importantly not inadvertently conflict with other settings.

 

15.24  Avoid Spring application context aware code

Avoid making your code Spring application context aware by accessing directly the Spring’s application context to get the beans you need.

If for some strange reason your legacy code lives outside the scope of Spring there are mechanisms for accessing the Spring beans using in general class variables on singletons that could receive (explicitly) the proper injections from Spring configuration.

 

15.25  Don’t open explicitly connections to other databases

Alfresco database access is configured as a data source with a connection pool to be setup and configured to optimize database connection and properly handle communications with database. This is straightforward for J2EE applications but we still see projects leveraging other databases that open explicit connections to those databases inside Alfresco custom code. You should setup your other databases as data sources also and leverage pools of connections for those databases just the same way Alfresco does with its own database.

 

16      Alfresco Cache (Hazelcast / Hazelcast tuning)

Alfresco uses hazelecast for clustering and caching. The database is now used as the central place of discovery for cluster nodes (We don’t use UDP multicast discovery (JGROUPS) anymore)

 

This means that if we want to remove a node from the cluster (for example the alfresco tracking node) we need to specify alfresco.cluster.enabled=false) on that node properties file. The repository caches are separated in 2 different levels:

  • L1 = The transactional cache (TransactionalCache.java)
  • L2 = Hazelcast distributed Cache (>4.2.X)

The level 1 cache commits to L2 cache.

 

16.1 - Tracing Hazelcast usage

Transactional Caches (Level 1)

Alfresco version 5.0 introduced a way to trace the transactional cache usage (much similar to the previous ehcache tracing mechanism). Unfortunately that tracing is not available in version 4.2.X, one way to get this feature would be to open a support ticket requesting a back-port of this feature.

HazelCast caches (Level 2)

 

Using hazelcast mancenter you can trace the L2 cache usage, for more information check http://docs.alfresco.com/4.2/tasks/hazelcast-setup.html

 

16.2 - Hazelcast jmx options

Adding the following options to your JVM will expose the jmx features of hazelcast.

-Dhazelcast.jmx=true -Dhazelcast.jmx.detailed=true

 

16.3 - Hazelcast factories in Alfresco

In Alfresco, hazelcast works with factories that allow the creation of caches.


Cache-Factory - Allows for the creation of caches

Messenger-Factory - Abstraction over the hazelcast topic (published subscribe messaging system)

LockStore Factory - Where in-memory locks are kept

 

16.4 - Defining your own caches

You can define your own caches as per the example below

 

<bean name=“contentDataSharedCache" factory-bean=“cacheFactory" factory-method="createCache">

<constructor-arg value="cache.customContDataCache"/>

</bean>

 

cache.customContDataCache.maxItems=130000

cache.customContDataCache.timeToLiveSeconds=0

cache.customContDataCache.maxIdleSeconds=0

cache.customContDataCache.cluster.type=fully-distributed

cache.customContDataCache.backup-count=1

cache.customContDataCache.eviction-policy=LRU

cache.customContDataCache.eviction-percentage=25

cache.customContDataCache.merge-policy=hz.ADD_NEW_ENTRY

 

Note that there is no corresponding hazelcast-tcp.xml entry for the custom cache, the factory does all the configuration programmatically using the name of the cache customContDataCache as a root to discover the remaining configuration properties.

 

16.5 - Hazelcast cache mechanisms

With Hazelcast the cache is distributed across the clustering members, doing a more linear distribution of the memory usage.

In the alfresco implementation you have more mechanisms available to define different cache cluster types.

1.5.1          Fully Distributed

This is the normal value for a hazelcast cache. Cache values (key value pairs) will be evenly distributed across cluster members. Leads to more remote lookups when a get request is issued and that value is present in other node (remote).

1.5.2          Local cache (local)

Some caches you may not actually want them to be clustered at all (or distributed), so this option works as a unclustered cache.

1.5.3          Invalidating

This is a local (cluster aware) cache that sets up a messenger that sends invalidation messages to the remaining cluster nodes if you updated an item in the cache. Can be useful to store something that its not serializable, because all the values stored in a hazelcast cache must be serializable. That is the way hazlecast send the information to the member its going to be stored on.

If you got a cache where there is an enormous number of reads and very rare writes, using a invalidating cache can be the best approach, very likely to the way ehcache used to work. This was introduced also because there were some non serializable values in alfresco that could not reside on a fully-distributed cache. 

 

The way we define the caches (on our cache.properties file is as follows)

 

cache.aclSharedCache.tx.maxItems=40000

cache.aclSharedCache.maxItems=100000

cache.aclSharedCache.timeToLiveSeconds=0

cache.aclSharedCache.maxIdleSeconds=0

cache.aclSharedCache.cluster.type=fully-distributed

cache.aclSharedCache.backup-count=1

cache.aclSharedCache.eviction-policy=LRU

cache.aclSharedCache.eviction-percentage=25

cache.aclSharedCache.merge-policy=hz.ADD_NEW_ENTRY

 

Note the notion of cache backups (backup-count=??), that guarantees that a distributed cache has a specific number of backups, in case of a node holding that bit of the cache dies, those caches are still accessible. The more backups you have, the more memory will be consumed.

 

16.6 - Hazelcast known issues and limitations

The known issues and limits we know about hazelcast are:

  • Hazelcast SSL and Hazelcast Encryption not supported
  • Server list doesn’t clean itself – users do this manually in admin console
  • Cluster enabled license installation requires restart
  • No way to protect against multiple non-clustered servers against DB
  • No easy/direct support for Hazelcast near-cache
  • IPv6 unknown quantity

With a hazzlecast enterprise license, customers can get off-heap cache storage, avoiding garbage collection and avoiding having lots of memory available to hazelcast.

 

16.7  - Tuning Hazelcast

To perform a cache tuning exercise we need to analyze 3 relevant factors :

- type of data

- how often it changes

- number of gets compared to the number of writes

If we can identify caches that the correspondent values do not change often, it’s worth to try and set them to invalidating, and check the performance results.

Note that in distributed-caches, when we have a lot a remote gets, if the objects that are being stored are big, the remote get operation its going to be slow. This is mainly because the object is serialized and it needs to be un-serialized before its content is made available and that operation can take some time depending on the size of the object.

We also need to consider, that in distributed caches, when there are a lot of remote gets the network traffic will increase.

On the other hand, if we choose and invalidation cache mechanism and the caches are changing often, the Invalidation messages can also be a single point of network stress. So overall it’s all about analyzing the trade-offs of each mechanism and to choose the more appropriate for each use case.

Outcomes