Linux OCR

Question asked by even4 on May 1, 2009
Latest reply on Aug 1, 2012 by wmay
I have installed Alfresco 3.1 and have it running smoothly on Debian Lenny using Apache/Tomcat.

I'm now looking at OCR and have installed Ocropus and Tesseract. Both of these are running perfectly. I have tried to implement an ocr transformation xml file without any luck.

Has anyone completed a successful integration of Ocropus/Tesseract with Alfresco? Can you list your xml and any other specific modifications you needed to make?

I understand Tesseract can't convert pdf, but for now tif to text is ok. I'm hoping tesseract comes along in leaps and bounds now that it is a google funded project, as there seems to be a big discrepancy between the quality of the Windows and Linux OCR options.

Any help is appreciated, and I will share whatever knowledge I can work out on getting OCR working well with Alfresco linux.