AnsweredAssumed Answered

PDF Indexing

Question asked by smca on Apr 30, 2007
Latest reply on May 31, 2007 by forcev
I am new to Alfresco… downloaded and installed just a few hours ago. It's up and running and I can login and upload files and create users.

The problem I am facing is that the PDF files I uploaded can only be searched by filename/title, while other files e.g. Word .doc files can be searched using words in the file.

So, if I have Raptors.doc and Raptors.pdf, and both contain the word "Toronto" in them, a search for "Raptors" retrieves both documents, but a search for "Toronto" retrieves just the .doc file, not the .pdf file.

As a side note, both the .pdf and the .doc files were created from the same source in Google Docs & Spreadsheet.

Am I missing something?

Thanks in advance.

____________________________________

added after initial post
____________________________________

I noticed that PDFs under 500kB have no issues with indexing. However, for *large* PDFs (say above 3 MB, which isn't really large, I have PDFs well over 70MB), I get the following error message:

Metadata extraction failed: reader: ContentAccessor[ contentURL=store://C:\Alfresco\tomcat\temp\Alfresco\alfresco39438.upload, mimetype=application/pdf, size=12976429, encoding=UTF-8] extracter: org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter@17349d7

Outcomes