gethin

Introducing Solr 6.3 and Alfresco Search Services

Blog Post created by gethin Employee on Dec 22, 2016

With the release of Alfresco 5.2, some exciting new changes are coming to Alfresco’s search capabilities. This post looks at some of the highlights of the new Alfresco Search Services 1.0, with a brief description of the major enhancements and most relevant features.

 

Alfresco Search Services Introduction

 

Welcome to Alfresco Search Services 1.0, based on Solr 6.3.

 

New Packaging / Execution

 

Apache Solr is an integral part of the Alfresco ECM solution, that’s why a Solr.war has always been bundled with the Alfresco repository.  The Solr.war is a repurposed version of Solr with lots of Alfresco-specific goodies added in.  For Solr 5.0, the Apache Solr project decided to take control over how Solr runs; instead of relying on an application server - Solr became an independently runnable executable.

 

With Alfresco Search Services now based on Solr 6.3 we have picked up this change - you no longer deploy a Solr.war to your application server.  The Alfresco team also took the opportunity to repackage and refactor some of the search code, as well as improve test coverage (up 30%) and performance. For example, SSL support has been enhanced and improved.

 

Independent Releases

 

Independent packaging allows customers and the community to be freed from a monolithic release lifecycle.  New features and bug fixes can be released quicker and more easily.  When new features are added to the Apache Solr project we are now able to benefit from them quicker and with less disruption.  To mark this change we decided to give the solr artifact a new name “Alfresco Search Services”.

 

New features and Enhancements

 

With the introduction of the new Alfresco Search Services, lets look at some of the major new features and enhancements.

 

Change Tracking

 

The Alfresco Trackers are the heart of the Alfresco Search Solution,  the tracking logic has changed significantly in this release.  Some of the changes came out of the work on the 1 Billion Benchmark, other changes have come as a result of analysis and performance enhancements.  Indexing and committing new information to the index no longer waits for warming, we have eliminated some of the pausing during indexing and querying at the same time, as well as enhanced the commit tracking.  In particular, large indexes doing bulk loading should see significant improvements.

 

Sharding

 

Some larger repositories benefit from sharding the data across multiple cores. For an introduction to sharding, see the documentation.  You now have greater control over how your data is sharded.  

  • The default sharding method is to use the Alfresco node reference - this is the most common use case.  
  • If your repository makes extensive use of Access controls then an ACL_ID approach may be most suitable.
  • If it makes sense to group your data by DATE then date-based sharding is now available.
  • If your use case doesn’t fit into any of the previous categories then you can choose any text-based property to shard on.  This is known as PROPERTY based sharding.

 

Fingerprints

 

It is now possible to generate a "fingerprint" of a document's content. This can be used to find similar documents. This feature makes use of Minhash work contributed by Alfresco to the Apache Lucene project.

 

Hit Highlighting

 

Search result term highlighting is now available in Alfresco Share, this was made possible by changes to Alfresco Search Services.  If you are not a Share user then an API is available for your own custom solution.

 

Tagging and Excluding Facet Filters

 

You can now present a user with a list of facets that can be independently selected, this is known as “multi-select faceting”.  Normally, facets only apply to the data that is being filtered, with multi-select faceting it is now possible to show facets for all documents not just those that are seen after filtering.  This feature hasn’t made it into Share yet but look out for it in the advanced search screen - “filter by” categories.

 

Indexing Multiple Document Versions

 

A standard search usually looks at the most recent version of a document. It is possible to tell Alfresco to index every version of a document, then you can search for content in old versions.  This would clearly increase the size of your index so should be used with caution but some use cases require searching the entire history of a document - this is now possible.  This new feature is not exposed in Share, it is only available via the REST API.  

 

New JSON API

 

With recent Alfresco releases a great deal of effort has gone into enhancing the Alfresco REST API to give a comprehensive and consistent developer experience; Alfresco Search is no exception.  There are 3 new targeted search apis for specific use cases:

GET public/alfresco/versions/1/queries/nodes
GET public/alfresco/versions/1/queries/people
GET public/alfresco/versions/1/queries/sites

 

There is a fourth, powerful, api for searches that fall outside the basic requirements.

POST /public/search/versions/1/search

 

The Apis are JSON based so can be called directly from clients - no Java classes/config are required.  Of course, the new features such as search highlighting, multi-select faceting are already available for you to use.

 

What to do next

 

In this post we introduced the new Alfresco Search Services 1.0, released with Alfresco 5.2. By working with, and committing to, the Solr project, Alfresco’s open source culture brings benefits to the Alfresco community and the wider Solr community.  Relevant features and enhancements has been introduced thanks to the use of Solr 6.3. In the coming weeks, more technical documentation will be written to describe how to use it in practice. Until then, we encourage you to try it, join the community and contribute to the success of Alfresco Search.

 

Download Alfresco Search Services 1.0 EA (md5: e72e43be38292564765264b40f1d7ad3)

Outcomes