AnsweredAssumed Answered

Weakness of full-text indexing design?

Question asked by kgeis on Apr 1, 2010
Latest reply on Apr 6, 2010 by andy
I am wondering about the design of the full-text indexing in Alfresco.  I'm reading the wiki and the code for ADMLuceneIndexerImpl, and it's clear that content properties are converted to text and then that text is indexed by Lucene.

Now what about the case where I have structured data inside my content and I want it to maintain some of that structure when indexed.  For example, let's say I have an XML document with information captured from a feedback form.

<feedback>
  <first-name>Ken</first-name>
  <last-name>Geis</last-name>
  <title>Programmer</title>
</feedback>
The most obvious transform from this XML to text is to

Ken Geis Programmer
However, I want to prevent Lucene from thinking that this is a single text field.  I don't want to be able to search for the phrase "Geis Programmer" and retrieve this document.  I see some black magic done in ADMLuceneIndexerImpl that makes me wonder, is it as easy as transforming the input to

Ken\u0000Geis\u0000Programmer
If this isn't possible, I might have a problem.

Outcomes