AnsweredAssumed Answered

Lucene search tokenization problem

Question asked by grk on Sep 25, 2014
So, my alfresco installation runs with the Lucene indexing subsystem. Being in Greece, there are folders that are in Greek and others in English. Also, there are client browsers with an "en" locale, and browsers with "el" locale (for greek).

So my problem is the following:
<li>Suppose we have two folders, let's call them "Folder A" and "Folder B" (but with Greek characters in real application).
<li>If a user creates the "Folder A" with a browser locale "en", then the sys:locale property of the node has the value "en". The same applies for Greek and locale "el".
<li>When I try to perform a lucene query from the administration panel, using the name "Folder", I will be getting different results according to my browser locale.

As I read <a href="" > on the wiki </a>, this is a tokenization problem:
All uploaded content is tokenized according to the users locale. (It is not yet possible to specify locale on upload). At search time, the users locale is used for tokenization. Locale(s) can also be explicitly specified using the SearchParameters object.
Probably one of the greek characters of the folder's name, gets tokenized differently when in "el" and "en" locale respectively.

How can I overcome this problem? It's not feasible to change the client's locale all the time, and lucene queries should be made from other points in Alfresco as well (javascript, etc.), where there is limited control over the locale.

Will changing to Solr solve this problem?