AnsweredAssumed Answered

[Index & Search] Removing effects of language / locale

Question asked by jayjayecl on Aug 29, 2008
Latest reply on Sep 11, 2008 by andy
Hello,

First, I need to say that I read a lot about Indexing process, Searching process and effects of languages on these.
I read the following topics, and a few others :

http://forums.alfresco.com/en/viewtopic.php?f=4&t=10114&hilit=search+and+locale
and
http://forums.alfresco.com/en/viewtopic.php?f=9&t=9524&hilit=search+and+locale


My problem is that, with users that can be from different countries/languages, and that mix CIFS and webclient usage (for file uploading of file searching), the results of any search process are unefficient. I mean : they like the functionning of CIFS/windows search.
Indeed, they've got a lot of troubles getting the right result using webclient or a search portlet (via webservice), because of all stemming/analyzing procedures that are lead during the indexing process.

So, I'd like to configure a simple indexing anlysis, that would just erase any accents (for french and spanish words), but keep the words unstemmed.
If the users look for "procedure", they want to find files containing "procédure" of even "ProCéDUre", whatever the locale of their webclient, the locale of the document, or the way the file was uploaded.

Iwas wondering if it was as simply as
- declaring the same LuceneCustomAnalyzer in the DataTypeAnalyzers_locale.properties
- Creating this LuceneCustomAnalyzer from the French one, removing the call to FrenchStemmer, and customizing it in order to erase accents.
Am I right on this way to do it ?

Is there anything I forgot (like the fact that doing so, any search for "procedureS" (plural) will not show files with "procedure" (singular) ?

Thank you all

Outcomes