AnsweredAssumed Answered

"OCR Extract" action doesn't work well (alfresco-simple-ocr + pdfsandwich)

Question asked by hisayo-s on Aug 16, 2018
Latest reply on Aug 30, 2018 by angelborroy

Hello,

 

I'm using Alfresco 5.2 community edition on CentOS7.5 and it works well itself.

 

Now I trying to add OCR function to Alfresco, so I installed alfresco-simple-ocr (simple-ocr-repo-2.3.1.jar) and pdfsandwich to add function.

 

When I install pdfsandwich version 1.4, ruled "Extract OCR" action do works and version 1.1 PDF-file made automatically. But all pages of OCR PDF are white paper; no images, no characters.

 

Secondly I uninstall pdfsandwich version 1.6 insted of version 1.4, and tried again. Then ruled "Extract OCR" action DO NOT seem to be occured, and version 1.1 PDF file never made.

 

I tried pdfsandwich version 1.4, 1.5, 1.6 and 1.7 on comannd-line, and they works well expect version 1.7. (Version 1.7 says buggy message on command line)  When use version 1.4, 1.5, 1.6, exit-code is zero. 

 

---------------------------------------------------------------------------

RULE DEFINITION

 

Attached file is screen shot of rule definition. (Japanese)

When item created or input on this folder OR when item updated,

AND MIME-type is "Adobe PDF Document",

execute "Extract OCR".

- Continue on error: Checked

- Execute the rule background: Checked

 

---------------------------------------------------------------------------

/opt/alfresco-community/tomcat/shared/classes/alfresco-global.properties

 :

 :
### Alfresco Simple OCR ###
ocr.command=/usr/local/bin/pdfsandwich
ocr.output.verbose=true
ocr.output.file.prefix.command=-o

ocr.extra.commands=-verbose -rgb -lang jpn
ocr.server.os=linux

 

---------------------------------------------------------------------------

 

Anyone please help me !

Attachments

Outcomes