AnsweredAssumed Answered

Solr Does Not Index Xml Elements / Attributes

Question asked by larophel on Aug 23, 2013
Latest reply on Aug 26, 2013 by larophel
Hello,

I have noticed that Solr sometimes does not index the names of Xml elements and Xml attribute values.
Element content, on the other hand, is always indexed.
I am using Alfresco 4.2.c Community, on Linux. The installation was performed using the default options.

I put together a list of steps, which can be used to reproduce this issue:

1. Start Alfresco (./alfresco.sh start) and log in to Alfresco Share.
2. Click on "Repository".
3. Click on "Upload" and upload a simple Xml file. You can use the attached Xml file as an example (remove the "_.txt" extension).
4. Wait some time to ensure that Solr finishes processing / indexing the new file. One minute should be enough.
5. Restart Alfresco (./alfresco.sh restart).
6. Refresh the browser page.
7. Click on "Copy to" in the menu for the file from step 3. You can use the same folder as the copy target.
8. Wait some time to ensure that Solr finishes processing the new file. Search for an attribute value, which appears in the file, e.g. "myvalueb".

Expected result: the search result will list both the files from step 3 and step 7.
Actual result: the search result only shows the file from step 3.
Searching for element content (e.g. "myvaluea") will list both files.

Please note: I have observed that sometimes the restart of Alfresco is not needed to reproduce the problem. However, the only consistent way I have found to reproduce this issue is with the restart.

I am attaching two log files, from when I ran above mentioned test. One time, I used the default logging level. The second time, I enabled DEBUG output for Solr.
In the "catalina_solr_debug.txt" file, you can observe that the file from step 3 gets indexed with Xml elements and attributes:


SOLR DEBUG 2013-08-23 12:52:30:707 content.wire:70 - << "<?xml version="1.0" encoding="UTF-8"?>[\n]"
SOLR DEBUG 2013-08-23 12:52:30:707 content.wire:70 - << "<mytaga>[\n]"
SOLR DEBUG 2013-08-23 12:52:30:707 content.wire:70 - << "<mytagb>myvaluea</mytagb>[\n]"
SOLR DEBUG 2013-08-23 12:52:30:707 content.wire:70 - << "<mytagc myattr="myvalueb"/>[\n]"
SOLR DEBUG 2013-08-23 12:52:30:708 content.wire:70 - << "</mytaga>[\n]"


The file from step 7 does not:


SOLR DEBUG 2013-08-23 12:56:30:626 content.wire:70 - << "[\n]"
SOLR DEBUG 2013-08-23 12:56:30:626 content.wire:70 - << "myvaluea[\n]"
SOLR DEBUG 2013-08-23 12:56:30:626 content.wire:70 - << "[\n]"
SOLR DEBUG 2013-08-23 12:56:30:626 content.wire:70 - << "[\n]"


Any help in solving this issue would be greatly appreciated.

Regards,

Outcomes