AnsweredAssumed Answered

Tips on troubleshooting individual file indexing with Alfresco 5.2 / Solr 6

Question asked by dbiggins on Apr 24, 2018
Latest reply on Apr 25, 2018 by dbiggins

I have a Alfresco Community (201707) installation which i am using to compare the default solr 4 vs solr 6 in the alfresco-search-services-1.1.0 install.

 

After a full index with Solr 4, I get the following info from the solr4 admin page:

Num Docs: 163458

Max Docs: 163458

...

Deleted Docs: 0

...

Master (Searching) 1524504594659 159 6.5 GB

...

Nodes in Index: 70921
Transactions in Index: 80844
Approx transactions remaining: 0

...

Unindexed Nodes: 11441
Error Nodes in Index: 0

in the solr4 SUMMARY report, I can see that it's done:

Node count with FTSStatus Clean    69165
Node count with FTSStatus Dirty    0
Node count with FTSStatus New    0

When I test the solr 6 setup, I stop the alfresco app, make the changes to the alfresco install for Solr 6, start the solr server and the alfresco server, and let it re-index.  It plugs along for a few hours, and then completes with the following stats:

 

Num Docs:164357

Max Doc:164357

...

Deleted Docs: 0

...

Master (Searching)     1524581958240 586 2.48 GB

, and in the SUMMARY report:

Alfresco Nodes in Index    70937
Alfresco Transactions in Index    81470
Alfresco Unindexed Nodes    11698
Alfresco Error Nodes in Index    0

Node count with FTSStatus Clean    69181
Node count with FTSStatus Dirty    0
Node count with FTSStatus New    0

When i run the ERROR query I get nothing:

{
  "responseHeader":{
    "status":0,
    "QTime":0,
    "params":{
      "q":"ERROR*",
      "wt":"json"}},
  "response":{"numFound":0,"start":0,"docs":[]
  }}

 So the indexer looks done and comparable volume-wise to the solr4 setup.

 

What first concerned me was the significantly smaller size: the Solr4 6.5 Gb vs Solr6 2.5 Gb size after a complete reindex, when I was expecting a 15% size increase with the introduction of fingerprints.

 

There are some docs that I can't get in a full text search result set, even though the docs have the index aspect attached.  I can try to reindex one of those docs, but no luck

http://[myip]:8983/solr/admin/cores?action=reindex&query=sys%5C%3Anode%5C-dbid%3A135156

 

At reindex time I saw a few

"FlateFilter: stop reading corrupt stream due to a DataFormatException"

and

"An error occured when reading table hmtx"

But no more then I saw on the solr4 setup.

 

Any thoughts on how best to troubleshoot the inconsistencies? 

 

Also, I know i can't upgrade to the pdfbox 2.0.X in 5.2, but anyone able to replace the pdfbox-1.8.10.jar and pdfbox-1.8.10.jar with pdfbox-1.8.13.jar and pdfbox-1.8.13.jar to get over the pdfbox probs?

Outcomes