AnsweredAssumed Answered

Lucene and stop words in Alfresco

Question asked by hbf on Dec 30, 2007
Latest reply on Feb 7, 2008 by andy
Dear list,

Suppose I have a text document in Alfresco containing the phrase "time is money". I want users to be able to enter "money is time" and find the document. That is, I want to find all documents that contain all the words the user enters, in any order.

Reading Alfresco's Search documentation I could not find a way to formulate a query for this.

Maybe I am missing something?! If so, apologies!

Here comes what I have found out:

The query
TEXT:"money is time"
internally drops the stopword "is" and therefore searches for "money" followed by one or more stop words followed y "time" and will therefore NOT match.

The query
TEXT:"money" AND TEXT:"is" AND TEXT:"time"
searches for all documents containing the three words "money", "is", and "time". As "is" is a stop word, it does not occur in the index and therefore the query returns NO result.

The query
TEXT:"money" AND TEXT:"time"
searches for all documents containing the two words "money" and "time". It finds the document …

… however, I cannot easily generate this query as it requires me to drop all words that Alfresco's analyzer considers stop words.

Is there another way to perform a query for all documents containing a given set of words (possibly including stop words)?

If not, I see two ways out:

* Alfresco exposes the list of stop words (not nice).
* Alfresco's query parser recognizes stop words and handles them accordingly. (It would drop the clause 'AND TEXT:"is"' from the query 'TEXT:"money" AND TEXT:"is" AND TEXT:"time"' for example.)

Many thanks,
Kaspar

P.S. This question is Lucene related. However, I post it here and not to the Lucene mailing list as it depends on Alfresco's particular Lucene adaption. Not knowing the details, I might be wrong, of course.

Outcomes