AnsweredAssumed Answered

OCR

Question asked by p4w3l on Dec 25, 2015
Latest reply on Jan 4, 2016 by openpj
I'm just trying to find a reading about how indexing works in Alfresco if you have a picture - scanned PDF or so and want to keep oryginal scan and in the same time find the document with search.

Please somebody elaborate about Alfresco architecture and two ( in my mind ) possible approach:

1. If transform is not neccessary - just need to have documment OCRed for a little time it needs to be indexed. Later if user searchs - it finds an oryginal document ( a document scan ). There is no need to have that document as text. Just there is a need to find it. Is this approach possible?

2. Say we have a document scan and want to have it OCRed version as text or searchable PDF. Will a transfromer replace the oryginal document ? How to arrange to have both oryginal ( PDF as scan ) and OCRed ( serchable PDF ) so they looks like a one item ?

I try to learn Alfresco and map its abstracts to real user needs.

Outcomes