We are upgrading our Alfresco 5.2 community edition with Solr 4 index to Alfresco 6.1 with Solr 6.
Everything went fine until we tried the upgrade with our production data. The solr6 index is successfully build within 1-2 hours and searching is possible but the Cascade Tracker which updates the path of child nodes when a parent node is moved will run for months.
The problem is that getting the node metadata from the repository takes 5-10s for each child node and if a child node needs to be updated in the index it takes 30-50s to update it.
There are roughly 100,000 cascading transactions with hundreds of child nodes so it will be finished in a year.
I cloned Alfresco Search Service and put some logging and time measurement in the CascadeTracker and SolrInformationServer to find this out.
After some hours solr consumes all memory (no matter how much we give it) and it also consumes much cpu power and slows down the whole system. It seems the postgresql database is very quiet and has not much to do.
I tried more RAM (32GB), more CPU (6), more worker threads but nothing helps.
I did not find anything about this problem and I run out of ideas.
Some facts about our setup:
- approx. 5,000,000 nodes and 1,000,000 transactions
- 5000 sites
- Index size: 2GB
- alf_data: 75GB
- Alfresco Search Service 1.3.0.6
- ACS 6.1.2-ga
- Postgres 11.5
- docker-compose deployment
Any ideas are welcome!
Solved! Go to Solution.
Finally after some debugging I found the problem. There are cmerson nodes with >20,000 paths. This happens because we have many sites and the user has paths in any site. FortunateIy I found a toggle in the source code to ignore these nodes. After some searching for that toggle I found out that I need to set the following properties.
On ACS:
search.solrTrackingSupport.ignorePathsForSpecificTypes=true
On Solr 6 in solrcore.properties or shared.properties
alfresco.metadata.skipDescendantDocsForSpecificTypes=true alfresco.metadata.ignore.datatype.0=cm:person alfresco.metadata.ignore.datatype.1=app:configurations alfresco.metadata.ignore.datatype.2=cm:authorityContainer
On what kind of infrastructure are you running this Docker-based deployment? Given the extremely long times to load metadata / update the index, I am left wondering if there are any excessive IO / network delays. I am also not quite clear on why the cascade tracker is running at all - if you have just built the SOLR 6 index from scratch, I would not expect there to be any transactions to cascade, unless those have been created since the build was completed.
Did you happen to perform a memory dump of the SOLR process to see which parts of the application caused the high memory issue? It is often hard to guess remotely what may be going on (especially in a scenario that appears to be quite strange to me), without having actual memory measurements / analysis results.
Hi Axel,
thanks for your quick answer. We are currently testing it on one docker host which is on a vmware VM with a SAN storage. By writing this, I remember that I read something about bottlenecks on SAN storage. Maybe we try it with a local storage.
The cascade tracker becomes that slow after initial indexing finishes. We have not created or modified any documents in between.
I have no memory dump but I will do one ...
Finally after some debugging I found the problem. There are cmerson nodes with >20,000 paths. This happens because we have many sites and the user has paths in any site. FortunateIy I found a toggle in the source code to ignore these nodes. After some searching for that toggle I found out that I need to set the following properties.
On ACS:
search.solrTrackingSupport.ignorePathsForSpecificTypes=true
On Solr 6 in solrcore.properties or shared.properties
alfresco.metadata.skipDescendantDocsForSpecificTypes=true alfresco.metadata.ignore.datatype.0=cm:person alfresco.metadata.ignore.datatype.1=app:configurations alfresco.metadata.ignore.datatype.2=cm:authorityContainer
Ask for and offer help to other Alfresco Content Services Users and members of the Alfresco team.
Related links:
By using this site, you are agreeing to allow us to collect and use cookies as outlined in Alfresco’s Cookie Statement and Terms of Use (and you have a legitimate interest in Alfresco and our products, authorizing us to contact you in such methods). If you are not ok with these terms, please do not use this website.