AnsweredAssumed Answered

Noob-ish Lucene content question: stored & tokenized?

Question asked by seanoc5 on Apr 25, 2008
Latest reply on May 8, 2008 by seanoc5
Hello,
I want to search alfresco's indexes from some legacy lucene code I have. More specifically, I want to get the tokenstream for the content of the indexed files. Is this possible? Easy? My assumption is that I am missing something obvious : -).

I found:
http://wiki.alfresco.com/wiki/Full-Text_Search_Configuration
which seems to tell me that I can set tokenized and stored in alfresco\tomcat\webapps\alfresco\web-inf\classes\alfresco\model\contentModel.xml

I made the change, restarted alfresco/tomcat, and added a document to the repository. But I can't seem to find the content in the index.

With Luke 0.8.1, I can find the index which seems to have my info: I find the file name in the QNAME field, and I can do a search, and will get a hit on the content. but the @{http…}content field comes up as not present, or not stored.

In the end, I am using the token stream to find the start/end offset position of hits to analyze the text of the actual hits. I may very well be doing this bass-ackwards as well, but I have it working with a Lius implementation, and would love to switch over to Alfresco.
Any pointers?

Sean

Outcomes