harry.peek@alfresco.com

Sharding with Alfresco Search Services and Solr 6

Blog Post created by harry.peek@alfresco.com Employee on Jan 6, 2017

Read more about the changes and new features introduced with Solr 6 here.

 

In this post we will share more information about setting up an Alfresco Enterprise/Solr sharded search index. If you haven't already, see this post for more info on installing Solr 6.

 

When an index grows too large to be stored on a single search server it can be distributed across multiple search servers. This is known as sharding. The distributed/sharded index can then be searched using Alfresco/Solr's distributed search capabilities. Alfresco/Solr has several different methods to choose from for routing documents and ACL's to shards.

 

In this post we will focus on the out-of-the-box approach which is sharding by Node ID. In Alfresco/Solr's configuration this is referred to as DBID sharding, as the DBID field is used to hold the Node ID in the Solr index. With DBID sharding, documents are routed to shards based on a hash of the Node ID of the document. Hashing on the Node ID is a simple approach that relies on randomness to evenly distribute documents across the shards.

 

When using the DBID sharding approach, all ACLs are indexed on each shard. This ensures that the ACL for each node is co-located on the same shard. This is required for proper access control enforcement. The DBID sharding method is ideal for use cases where there are a large number of nodes, but a smaller number of ACLs.

 

Follow the steps described below to complete the sharding setup and test.

 

Switch the Search Services to Solr 6

  1. Go to the Alfresco Admin Console.
  2. Go to the Search Service Console.
  3. Select Solr 6 as the search service.
  4. Save the Search Service settings.

 

Turn on Dynamic Sharding

  1. Go to the Index Service Sharding page as described in the screeshot below.

 

 

  1. Check the Dynamic Shard Instance Registration checkbox.
  2. Save the Index Service Sharding settings.

 

Install Solr 6

  1. Solr 6 is not installed with the Alfresco Installer, so you'll need the Alfresco/Solr 6 zip file - download it here.
  2. Create a directory for each shard in the distributed index. This can be on the same server or different servers.
  3. Unzip the Solr 6 zip file in each directory.

 

Edit the shared.properties

  1. Inside each of the Solr 6 directories there is a directory called solrhome.
  2. Edit the solrhome/conf/shared.properties file.
  3. Change the solr.port property to be the port you want to start Solr on.
  4. Change the solr.host property to the host that Solr is running on.

 

Start each Solr instance

For each Solr install:

  1. At the same level as the solrhome directory there will be a solr directory.

  2. Enter the solr directory and enter the following command:

    ./bin/solr start

    This will start solr on the default port (8983).

    To start Solr on a different port enter the command:

    ./bin/solr start -p PORT_NUMBER

    Replace PORT_NUMBER with the port you will be starting Solr on.

  3. Open a browser and go to the solr admin screen:
    http://hostname:port/solr

    http://hostnameport

You will see a Solr 6 admin screen without any cores created

 

Add Index Servers

  1. Go to the Index Server Sharding Page on the Alfresco Admin Console.
  2. Choose Manage.
  3. A window will display where you can add new index servers.
  4. Add the base URL for each Solr server using the form, for example: http://hostname:port/solr.
  5. Stay on the Manage screen for the next step.

 

Add a Shard Group

  1. In the Manage screen add a Shard Group.
  2. Specify the number of shards, shard instances (replicas of shards), and the core name.
  3. Select Create Shards.

 

This will create Solr cores for each shard and shard replica on the index servers that have been registered. The cluster is now created and will began tracking the Alfresco repository and indexing documents and ACLs across the sharded index.

 

 

Please let us know how you get on, leave a comment or email harry.peek@alfresco.com

Outcomes