Bad or broken search results on repository

Question asked by keithveleba on Jan 27, 2010

I recently rolled out a pilot project using Alfresco Community 3.2 where I work.
It hosts about 350,000 assets (jpegs, PDFs and OpenOffice Writer documents.)  This represents about 10 years of archive material.
Our goal is to use this as the basis for our content repository and workflow system going forward. (I work for a newspaper).
I am seeing some inconsistent full-text search results and I see no resources to help me diagnose and correct this.  The lucene settings are unchanged from the base install.

We have a OpenOffice writer files of stories written by reporters in the repository, and copies of our published pages in PDF form.  We keep a copy of the story document, as well as the PDF page on which it was published.

An example search that seems inconsistent:

For example, I want to pull up all documents that contain the words "osborn" and "gardening".

In the simple search box, I input those words separated by a space.

The search returns all documents (ODT and PDF)  that hit on "osborn" OR "gardening". Now, if I put a plus sign in front of osborn (to tell the query engine "osborn" MUST hit), only the PDFs show up in the search results.  The ODT files do not.  I'm puzzled, because the full text search seems to be working, but only on OR searches;  AND searches seem to break the results.  Is something broken, or am I fighting a bad index?

The success of this pilot (and a potential Alfresco sale) rides on the ability to have consistent search results.  Anyone ever have problems like this?

Any and all help would be greatly appreciated.