AnsweredAssumed Answered

Index Tip Recovery For Large Repositories

Question asked by callermd on Feb 12, 2008
Latest reply on Jul 4, 2009 by oznevo
Hi All,

After bashing my head against a wall trying to figure out how to recover our indexes with 300 000 transaction and having it take well over 24 hours I have put together some helpful tips to help any poor souls in a similar situation.  This was done against 2.1 CE.

node.index.FullIndexRecoveryComponent

- Firstly remember to blow away the existing
lucene-indexes
directory AFTER shutting down tomcat.

- Set
index.recovery.mode=FULL

- One of the most frustrating things is that it is really hard to figure out if something is actually happening.  To make your life easier edit your log4j.properties (under FC with WAR install in /var/lib/tomcat5/webapps/alfresco/WEB-INF/classes/log4j.properties) and set 
log4j.logger.org.alfresco.repo.node.index.FullIndexRecoveryComponent=debug

You should now get
Reindexing transaction: xxxxx
messages which will help you figure out what is going on

- I changed
lucene.indexer.batchSize=1000
from its really high value.  This breaks the index recovery into chunks and has the happy effect of flushing to disk the lucene indexes to the lucene-indexes directory.  This flushing makes the log percentage indicator more accurate and seems to prevent nasty risks like the disk running out of space or the JVM running out of heap from being oh so heart breaking

- As per http://www.onjava.com/pub/a/onjava/2003/03/05/lucene.html benchmarks and recommendation for batch benchmarking I jacked the
lucene.indexer.mergeFactor=1000
from 10 .  However I did not benchmark this change in any rigorous fashion.

- After it is done you should set the batch size back as there is an open issue AR-1280 (documented in the file) about something bad happening if you don't

- Also remember to set the
index.recovery.mode=AUTO
instead of FULL.

Cheers!

Outcomes