AnsweredAssumed Answered

Alfresco FULL reindex of 2.2M documents doesn't complete

Question asked by bnmaxim on Jan 17, 2011
Latest reply on Jan 18, 2011 by mrogers
Hi Everyone,

using the 3.3.3 Enterprise version. After a larger crash, in all probability due to us stressing the app too much (we were jamming documents into alfresco through every orifice, and by orifice I mean 'repository interface.') Had to do a full reindex, but it failed to restart. (Out of memory, which we managed to fix by giving it 8 Gigs of Heap) The Repo now houses 2.2 million documents.

We are doing a lot of tests and tried setting all sorts of different parameters but there's apparently nothing that can jiggle the beast back to life.

This is what happens: the reindexing starts, it goes up to the point in the log file where it says it's got to 100%. We don't get the message saying it's completely done though. Stopping and restarting with the indexing in AUTO mode yields the exact same behaviour.

Obviously, alfresco doesn't start up without the complete index. What precisely happens?

The java threads basically lock each other out over time and in the end there's just one doing the job (might be normal behaviour; could someone confirm?). And that one just runs and doesn't stop. It uses between 70%-100% of one CPU core and runs forever(ran from Thursday afternoon until Monday morning).  The indexing obviously failed to finish.

I believe though, we've got something interesting we can go on: we took a few thread dumps throughout last week and today.. and they all show the same behaviour:

there's the "main" thread, waiting in the

                org.alfresco.repo.node.index.AbstractReindexComponent.waitForAsynchronousReindexing(AbstractReindexComponent.java:1101)

method for the other threads to complete.

And then there is the only thread doing any apparent work : "indexTrackerThread1", which is basically stuck here: (it is always the same thread per Alfresco instance that keeps on going: it was indexTrackerThread1 on Thursday, Friday and Monday. The number obviously varies between instances. If we start Alfresco up again, the thread might be  called indexTrackerThread5 or some other number)


 
                at sun.nio.ch.NativeThread.current(Native Method)
                at sun.nio.ch.NativeThreadSet.add(NativeThreadSet.java:27)
                at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:610)
                at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:113)
                at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:136)
                at org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java:247)
                at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:157)
                at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:116)
                at org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:92)
                at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:82)

It is basically hanging here. Regardless of the point in time when I take the thread dump, it's hanging here. Here's the source code around this line
org.apache.lucene.store.NIOFSDirectory:$NIOFSIndexInput.readInternal(NIOFSDirectory.java:113)

      
  long pos = getFilePointer(); 
            while (bb.hasRemaining()) {

                        int i = channel.read(bb, pos);

                        if (i == -1)

                                    throw new IOException("read past EOF");

                        pos += i;

            }
I have no idea of how Lucene and the indexer work, but it strikes me as pretty odd that every time I look at the program's progress, it's doing the exact same thing. (for bloody days now)

Could this bit of Lucene code be the issue?

I'd really really appreciate some help here.

Thank you very much. Hope the size of the post doesn't frighten you.
Any feedback is welcome. Don't hesitate to throw in your two cents.

Max

Outcomes