AnsweredAssumed Answered

Alfresco write-read 500 MO when adding a 200 ko pdf document

Question asked by dbas2000 on Sep 4, 2012
Latest reply on Sep 14, 2012 by t_sato
Our Alfresco output 500 MO to and from the disk where lucene-indexes are located, for each document added to the repository.

Our documents are only small pieces of PDFA files (200 ko) !

So, it takes more than 10 seconds to process a document and 2000 documents are inserted per days.

This scenario appear in our quality and production environment.
Configuration information :

    Alfresco is installed in a virtual environment (Vmware).
    Indexes and documents are in separate disk.
    The database is installed in a other server.
    Alfresco version : Community 3.4.e
    Java version     : Java(TM) SE Runtime Environment (build 1.6.0_27-b07) Java HotSpot(TM) 64-Bit Server VM
    OS Distribution  : Centos 6.2 2.6.32-220.4.1.el6.x86_64
    Database         : PostgreSQL 9.0.8 /Linux
    Repository size  : 240 GO for 1'600'000 documents
    Indexes size     : 6.6 GO
JVM Parameters       
           
        
  $JAVA_OPTS="-Xss512K -XX:MaxPermSize=256m -Xms2048m -Xmx2048m  
                    -Dalfresco.home=/soi/app/alfresco-ged34e
                    -Dhttp.proxyHost=xx.xx.xx.xx -Dhttp.proxyPort=xxxx 
                    -Duser.dir=${CATALINA_HOME}/user.dir 
                    -Duser.home=${CATALINA_HOME}/user.home 
                    -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=xxxx 
                    -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false 
                    -Djava.rmi.server.hostname=myserver.mydomain.ch  
                    -server -Dsun.security.ssl.allowUnsafeRenegotiation=true 
                    -Dfile.encoding=UTF-8 -Duser.country=CH -Duser.language=fr -Duser.timezone=Europe/Zurich 
                    -XX:NewRatio=1 -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode 
                    -XX:CMSInitiatingOccupancyFraction=80" 


[size=150]Full description [/size]


Addind a new document produce 500 MO

iotop  show more than 500 MO writed and readed for inserting a only 200 ko PDFA file

Total DISK READ: 0.00 B/s | Total DISK WRITE: 0.00 B/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
21138 be/4 alfresco    526.40 M    584.30 M  0.00 % 10.17 % java -Dnop -Xss512K -XX:MaxPermSize=256m -Xms1024~t/tem
21186 be/4 alfresco     17.55 M    448.00 K  0.00 %  3.50 % java -Dnop -Xss512K -XX:MaxPermSize=256m -Xms1024~t/tem


At the same time, the cpu works hard

 
PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
21068 alfresco  20   0 2119m 1.2g  14m S 92.2 63.2   6:43.85 java
21197 root      20   0  177m 9252 2528 S  0.7  0.5   0:04.51 iotop


Output log

below is just end of the log when adding a document. At the end of the log, there is 8 seconds elapsed between these two lines

2012-09-04 09:56:33,514 User:myuser DEBUG [lucene.fts.FullTextSearchIndexerImpl] FTS index request for workspace://SpacesStore
2012-09-04 09:56:41,987  DEBUG [lucene.fts.FullTextSearchIndexerImpl] FTS Indexing workspace://SpacesStore at Tue Sep 04 09:56:41 CEST 2012


2012-09-04 09:56:02,912 User:myuser DEBUG [impl.lucene.ADMLuceneIndexerImpl] Creating indexer
2012-09-04 09:56:02,912 User:myuser DEBUG [impl.lucene.ADMLuceneIndexerImpl] Update node workspace://SpacesStore/9efb97f2-c2e3-4b96-a0af-52fbf61d5158
2012-09-04 09:56:02,921 User:myuser DEBUG [impl.lucene.FilterIndexReaderByStringId] Applying deletions FOR 40c30117-e988-4db0-b0a8-52eb18408da8 (the index ito which these are applied is the previous one …)
2012-09-04 09:56:02,937 User:myuser DEBUG [impl.lucene.FilterIndexReaderByStringId] Applying deletions FOR 707e8a54-c785-453b-8722-6acb5d5d3e46 (the index ito which these are applied is the previous one …)
2012-09-04 09:56:03,263 User:myuser DEBUG [impl.lucene.FilterIndexReaderByStringId] Applying deletions FOR 75f2c3d4-6885-47a9-b3cd-2285245fefc9 (the index ito which these are applied is the previous one …)
2012-09-04 09:56:04,527 User:myuser DEBUG [impl.lucene.FilterIndexReaderByStringId] Applying deletions FOR 4968aab5-4fff-4dda-bb8e-c079cbf8c8ac (the index ito which these are applied is the previous one …)
2012-09-04 09:56:05,416 User:myuser DEBUG [impl.lucene.FilterIndexReaderByStringId] Applying deletions FOR 4c33d483-5ff5-4b1f-9d5c-e74446bfe3d3 (the index ito which these are applied is the previous one …)
2012-09-04 09:56:06,323 User:myuser DEBUG [impl.lucene.FilterIndexReaderByStringId] Applying deletions FOR 27b977cf-4739-4e5c-84ce-27bda6d4d526 (the index ito which these are applied is the previous one …)
2012-09-04 09:56:06,334 User:myuser DEBUG [lucene.analysis.MLTokenDuplicator] Locale fr_CH using EXACT_LANGUAGE_AND_ALL is [, {fr}]
2012-09-04 09:56:06,347 User:myuser DEBUG [lucene.analysis.MLTokenDuplicator] Locale fr_CH using EXACT_LANGUAGE_AND_ALL is [, {fr}]
2012-09-04 09:56:06,358 User:myuser DEBUG [lucene.analysis.MLAnalayser] Loading org.apache.lucene.analysis.fr.FrenchAnalyzer for fr_CH
2012-09-04 09:56:06,358 User:myuser DEBUG [lucene.analysis.MLTokenDuplicator] Locale fr_CH using ALL_ONLY is []
2012-09-04 09:56:06,359 User:myuser DEBUG [lucene.analysis.MLAnalayser] Loading org.apache.lucene.analysis.fr.FrenchAnalyzer for fr_CH
2012-09-04 09:56:06,359 User:myuser DEBUG [lucene.analysis.MLTokenDuplicator] Locale fr_CH using ALL_ONLY is []
2012-09-04 09:56:06,359 User:myuser DEBUG [lucene.analysis.MLAnalayser] Loading org.apache.lucene.analysis.fr.FrenchAnalyzer for fr_CH
2012-09-04 09:56:06,360 User:myuser DEBUG [lucene.analysis.MLTokenDuplicator] Locale fr_CH using ALL_ONLY is []
2012-09-04 09:56:06,361 User:myuser DEBUG [lucene.analysis.MLAnalayser] Loading org.apache.lucene.analysis.fr.FrenchAnalyzer for fr_CH
2012-09-04 09:56:06,361 User:myuser DEBUG [lucene.analysis.MLTokenDuplicator] Locale fr_CH using ALL_ONLY is []
2012-09-04 09:56:06,361 User:myuser DEBUG [lucene.analysis.MLAnalayser] Loading org.apache.lucene.analysis.fr.FrenchAnalyzer for fr_CH
2012-09-04 09:56:06,366 User:myuser DEBUG [lucene.analysis.MLTokenDuplicator] Locale fr_CH using ALL_ONLY is []
2012-09-04 09:56:06,366 User:myuser DEBUG [impl.lucene.LuceneAnalyser] Loaded org.alfresco.repo.search.impl.lucene.analysis.AlfrescoStandardAnalyser for type {http://www.alfresco.org/model/dictionary/1.0}boolean
2012-09-04 09:56:06,366 User:myuser DEBUG [lucene.analysis.MLAnalayser] Loading org.apache.lucene.analysis.fr.FrenchAnalyzer for fr_CH
2012-09-04 09:56:06,367 User:myuser DEBUG [lucene.analysis.MLTokenDuplicator] Locale fr_CH using ALL_ONLY is []
2012-09-04 09:56:06,369 User:myuser DEBUG [impl.lucene.LuceneAnalyser] Loaded org.alfresco.repo.search.impl.lucene.analysis.DateAnalyser for type {http://www.alfresco.org/model/dictionary/1.0}datetime
2012-09-04 09:56:06,370 User:myuser DEBUG [lucene.analysis.MLAnalayser] Loading org.apache.lucene.analysis.fr.FrenchAnalyzer for fr_CH
2012-09-04 09:56:06,371 User:myuser DEBUG [lucene.analysis.MLTokenDuplicator] Locale fr_CH using ALL_ONLY is []
2012-09-04 09:56:06,371 User:myuser DEBUG [impl.lucene.LuceneAnalyser] Loaded org.alfresco.repo.search.impl.lucene.analysis.DateAnalyser for type {http://www.alfresco.org/model/dictionary/1.0}datetime
2012-09-04 09:56:06,372 User:myuser DEBUG [lucene.analysis.MLAnalayser] Loading org.apache.lucene.analysis.fr.FrenchAnalyzer for fr_CH
2012-09-04 09:56:06,373 User:myuser DEBUG [lucene.analysis.MLTokenDuplicator] Locale fr_CH using ALL_ONLY is []
2012-09-04 09:56:06,374 User:myuser DEBUG [lucene.analysis.MLAnalayser] Loading org.apache.lucene.analysis.fr.FrenchAnalyzer for fr_CH
2012-09-04 09:56:06,374 User:myuser DEBUG [lucene.analysis.MLTokenDuplicator] Locale fr_CH using ALL_ONLY is []
2012-09-04 09:56:06,375 User:myuser DEBUG [lucene.analysis.MLAnalayser] Loading org.apache.lucene.analysis.fr.FrenchAnalyzer for fr_CH
2012-09-04 09:56:06,376 User:myuser DEBUG [lucene.analysis.MLTokenDuplicator] Locale fr_CH using EXACT_LANGUAGE_AND_ALL is [, {fr}]
2012-09-04 09:56:06,377 User:myuser DEBUG [impl.lucene.LuceneAnalyser] Loaded org.alfresco.repo.search.impl.lucene.analysis.LongAnalyser for type {http://www.alfresco.org/model/dictionary/1.0}long
2012-09-04 09:56:06,377 User:myuser DEBUG [lucene.analysis.MLAnalayser] Loading org.apache.lucene.analysis.fr.FrenchAnalyzer for fr_CH
2012-09-04 09:56:06,378 User:myuser DEBUG [lucene.analysis.MLTokenDuplicator] Locale fr_CH using ALL_ONLY is []
2012-09-04 09:56:06,381 User:myuser DEBUG [lucene.analysis.MLAnalayser] Loading org.apache.lucene.analysis.fr.FrenchAnalyzer for fr_CH
2012-09-04 09:56:06,381 User:myuser DEBUG [lucene.analysis.MLTokenDuplicator] Locale fr_CH using ALL_ONLY is []
2012-09-04 09:56:06,382 User:myuser DEBUG [lucene.analysis.MLAnalayser] Loading org.apache.lucene.analysis.fr.FrenchAnalyzer for fr_CH
2012-09-04 09:56:06,382 User:myuser DEBUG [lucene.analysis.MLTokenDuplicator] Locale fr_CH using ALL_ONLY is []
2012-09-04 09:56:33,514 User:myuser DEBUG [lucene.fts.FullTextSearchIndexerImpl] FTS index request for workspace://SpacesStore
2012-09-04 09:56:41,987  DEBUG [lucene.fts.FullTextSearchIndexerImpl] FTS Indexing workspace://SpacesStore at Tue Sep 04 09:56:41 CEST 2012
2012-09-04 09:56:41,988  DEBUG [impl.lucene.ADMLuceneIndexerImpl] Creating indexer
2012-09-04 09:56:41,989  DEBUG [impl.lucene.FilterIndexReaderByStringId] Applying deletions FOR 5bcc8dd0-8b6c-4874-adc8-31db4c61264a (the index ito which these are applied is the previous one …)
2012-09-04 09:56:42,399  DEBUG [lucene.fts.FullTextSearchIndexerImpl] FTS index completed for workspace://SpacesStore … 0 remaining


When starting, Alfresco read 7 GO

I also noticed that when Alfresco starts, it reads 7 GO before the webapp is alive.

Mesured with iotop :            
            
Total DISK READ: 0.00 B/s | Total DISK WRITE: 0.00 B/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
21070 be/4 alfresco      7.17 G    121.76 M  0.02 % 23.50


No change when disabling cm:content indexing


   <type name="cm:content">
         <title>Content</title>
         <parent>cm:cmobject</parent>
         <archive>true</archive>
         <properties>
            <property name="cm:content">
               <type>d:content</type>
               <mandatory>false</mandatory>
               <!– Although content is marked as indexed atomically it may end up asynchronous –>
               <!– if the content conversion will take too long. Content that does not require conversion –>
               <!– to UTF8 test/plain will always be indexed atomically –>
               <index enabled="false">
                  <atomic>true</atomic>
                  <stored>false</stored>
                  <tokenised>true</tokenised>
               </index>
            </property>
         </properties>
      </type>

For our business solution, whe don't need fulltext search, so I do understand why Lucene output 500 MO when cm:content indexing is disabled !

Conclusion

I looking for a solution to reduce these IO writing, any help welcome !


Best regards

Pierre-André

Outcomes