AnsweredAssumed Answered

"OCR Extract" action doesn't work well (alfresco-simple-ocr + pdfsandwich)

Question asked by hisayo-s on Aug 16, 2018
Latest reply on Jul 2, 2019 by fedorow



I'm using Alfresco 5.2 community edition on CentOS7.5 and it works well itself.


Now I trying to add OCR function to Alfresco, so I installed alfresco-simple-ocr (simple-ocr-repo-2.3.1.jar) and pdfsandwich to add function.


When I install pdfsandwich version 1.4, ruled "Extract OCR" action do works and version 1.1 PDF-file made automatically. But all pages of OCR PDF are white paper; no images, no characters.


Secondly I uninstall pdfsandwich version 1.6 insted of version 1.4, and tried again. Then ruled "Extract OCR" action DO NOT seem to be occured, and version 1.1 PDF file never made.


I tried pdfsandwich version 1.4, 1.5, 1.6 and 1.7 on comannd-line, and they works well expect version 1.7. (Version 1.7 says buggy message on command line)  When use version 1.4, 1.5, 1.6, exit-code is zero. 





Attached file is screen shot of rule definition. (Japanese)

When item created or input on this folder OR when item updated,

AND MIME-type is "Adobe PDF Document",

execute "Extract OCR".

- Continue on error: Checked

- Execute the rule background: Checked





### Alfresco Simple OCR ###

ocr.extra.commands=-verbose -rgb -lang jpn




Anyone please help me !