AnsweredAssumed Answered

Lucene Bug or Alfresco Bug? Or both?

Question asked by sergiofigueras on Sep 10, 2010
Hi everyone,
First of all, I've made a topic some weeks ago, the problem is descripted there: http://forums.alfresco.com/en/viewtopic.php?f=14&t=28778

Second of all, the problem was with the default "pt" tokenizer. I've used Luke to analyze the Lucene index, and any content that I was uploading, was going down in the tokenization process, for example: a text like this "Passo pálida e triste. Ouço dizer" has been tokenized like this: "Pass p lid e trist. Ou o dize" . That was resulting in different search results.

To solve that, I've changed the default analyzer to the AlfrescoStandardAnalyzer, in /tomcat/webapps/alfresco/WEB-INF/classes/alfresco/model/dataTypeAnalyzers… and everything has been resolved.

But, anybody else consider that is a bad way for the lucene analyzer?

But, I consider that I've found a problem with Alfresco too. I was uploading something near of 70k documents, with the Alfresco WebService and a Adapter that I've created. The problem is: 99% of the uploaded documents was in the "pt" locale (I saw that in the Luke), but some documents, was with the "en" locale. The question is: if I upload 70k docs from a single machine, reading from a database and then uploading, always in my "pt" locale, why some documents was with the "en" locale? I can't undestand that… Anybody can?

Outcomes