201702 solr6 not indexing all documents

cancel
Showing results for 
Search instead for 
Did you mean: 
xarope
Member II

201702 solr6 not indexing all documents

I've just upgraded my previous alfresco 50d instance to 201702, and also switched to solr6.

Left the new instance indexing over the weekend, but on morning morning when I checked (after almost 36 hours), it had only indexed perhaps 12% of documents, and from previous experience: (1) the CPU usage is not high enough to indicate it is continuing the indexing (it is indexing new documents, but no the existing/old ones), and (2) 36 hours was more than enough time for solr4 to reindex 90+% of the documents when I upgraded from alfresco 4.2 to 502d.

As example, on the previous alfresco 50d instance, my solr4 indexes were taking up 235GB of disk space, alfresco core had 3880462 current documents, and archive core had 2036546 documents.  Currently, the new solr6 index is only taking up 3.5gB of disk space, whilst alfresco core has only 475244 current documents, and archive 432393.  Unless solr6 is super-efficient to almost 2 orders of magnitude, I think I have a serious index issue!

I don't see any errors in the solr6.log nor solr-8983-console.log, related to this.

Anybody encountered similar issues?  If by the end of the week the indexing hasn't substancially increased in document count, then I will probably have to switch back to solr4.

2 Replies
xarope
Member II

Re: 201702 solr6 not indexing all documents

I also ran a solr6 report (i.e. http://<solrserver>:8983/solr/admin/cores?action=REPORT&amp;wt=xml ), here is the alfresco core section:

<long name="DB acl transaction count">8229</long>
<long name="Count of duplicated acl transactions in the index">0</long>
<long name="Count of acl transactions in the index but not the DB">0</long>
<long name="Count of missing acl transactions from the Index">0</long>
<long name="Index acl transaction count">8231</long>
<long name="Index unique acl transaction count">8231</long>
<long name="Last indexed change set commit time">1491820680654</long>
<str name="Last indexed change set commit date">2017-04-10T18:38:00</str>
<long name="Last changeset id before holes">-1</long>
<long name="DB transaction count">1372926</long>
<long name="Count of duplicated transactions in the index">0</long>
<long name="Count of transactions in the index but not the DB">134</long>
<long name="First transaction in the index but not the DB">815448</long>
<long name="Count of missing transactions from the Index">1054</long>
<long name="First transaction missing from the Index">4920501</long>
<long name="Index transaction count">179183</long>
<long name="Index unique transaction count">179183</long>
<long name="Index node count">243985</long>
<long name="Count of duplicate nodes in the index">71</long>
<long name="First duplicate node id in the index">4772300</long>
<long name="Index error count">4</long>
<long name="Count of duplicate error docs in the index">0</long>
<long name="Index unindexed count">7236</long>
<long name="Count of duplicate unindexed docs in the index">0</long>
<long name="Last indexed transaction commit time">1491877238167</long>
<str name="Last indexed transaction commit date">2017-04-11T10:20:38</str>

Based on the documentation in Unindexed Solr Transactions | Alfresco Documentation, seems like there are index errors, e.g. transaction 815448 ("First transaction in the index but not the DB") and 4920501 ("First transaction missing from the Index").  More digging around solr6 documentation to see if I can understand why!

xarope
Member II

Re: 201702 solr6 not indexing all documents

Since I couldn't figure this out, I snapshot'd and created a new database and alfresco+solr instance, and installed solr4.  In less than 24 hours, solr4 has indexed 1259637 documents in the alfresco core, and 1158783 documents in the archive core.

If I use a psql statement (I'm using postgresql) to get the number of "content" objects, "SELECT count(*) FROM alf_node AS a, alf_qname AS q WHERE a.type_qname_id=q.id AND q.local_name='content';", I get 1443451.  So that's pretty close.  But I still can't understand why solr6 is stuck, now more than 4 days, with the same ~400K of documents (and not 1.4M).

Looks like I will be switching back to solr4 over this weekend.  I hope others have better luck with solr6.