With the approach suggested by Fedorow in https://hub.alfresco.com/t5/alfresco-content-services-forum/quot-ocr-extract-quot-action-doesn-t-wor..., I was able to make OCR work with Alfresco 6.1.0 and Docker.
I updatedocr_input and /ocr_output to use /usr/local/tomcat/ocr_input and /usr/local/tomcat/ocr_out so that alfresco container can access these folders without any access issues.
Thanks Fedorow
Below are the changes done to docker-compose.yml and ocrmypdf.sh
docker-compose.yml
...
services:
alfresco:
...
volumes:
- ocr-input:/usr/local/tomcat/ocr_input
- ocr-output:/usr/local/tomcat/ocr_output
...
ocrmypdf:
...
volumes:
- ocr-input:/usr/local/tomcat/ocr_input
- ocr-output:/usr/local/tomcat/ocr_output
...
volumes:
...
ocr-input:
external: true
ocr-output:
external: true
...
bin/ocrmypdf.sh
#!/bin/bash
INPUT_DIR=/usr/local/tomcat/ocr_input
OUTPUT_DIR=/usr/local/tomcat/ocr_output
# ocrmypdf hostname
OCRMYPDF_SERVER="ocrmypdf"
# identify parameters, input and output file
array=( "$@" )
len=${#array[@]}
ARGS=${array[@]:0:$len-2}
LAST_ARGS="${@: -2}"
INPUT_FILE_PARAM=`echo "$LAST_ARGS" | cut -d ' ' -f 1`
OUTPUT_FILE_PARAM=`echo "$LAST_ARGS" | cut -d ' ' -f 2`
# extract filenames
INPUT_FILE=$(basename "$INPUT_FILE_PARAM")
OUTPUT_FILE=$(basename "$OUTPUT_FILE_PARAM")
# SSH parameters
SCP=cp
SSH=ssh
USER=root
# copy original pdf to ocrmypdf server
$SCP $INPUT_FILE_PARAM $INPUT_DIR
# execute ocrmypdf program
$SSH $USER@$OCRMYPDF_SERVER "/usr/bin/ocr.sh $ARGS $INPUT_DIR/$INPUT_FILE $OUTPUT_DIR/$OUTPUT_FILE"
# copy transformed pdf back to alfresco path
$SCP $OUTPUT_DIR/$OUTPUT_FILE $OUTPUT_FILE_PARAM
# remove temporal files
rm -f $INPUT_DIR/$INPUT_FILE $OUTPUT_DIR/$OUTPUT_FILE
After the above changes I was able to successfully run OCR with Alfresco 6.1.
As we are running our Alfresco instance on Kubernetes and using HELM deployment, I need to configure the volumes in values.yaml file but I am not sure how to configure the volumes in values.yaml file. Any one has any idea on how we can make similar configuration in kubernetes.
Any help apprecaited.
- Sriram
Solved! Go to Solution.
OCR EXtract is an action which we assign to a folder as rule. Could you please check if the "OCR Extract" action is availble under actions in Folder rule?
If you are not seeing the action in the folder rule then the "simple-ocr-repo-2.3.1.jar" is either not properly installed in alfresco repository or look out for any exceptions around it.
- Sriram
Hello, I'm having trouble configuring ocrmypdf in Alfresco. I am using the "alfresco-content-repository-community: 6.2.0-ga" version. After I follow the setup instructions, the option to configure the OCR action is not displayed in Alfresco. Would you help me? Follow the link for the project I'm running.
https://github.com/guilhermekelling/ocr.git
Thank you,
Guilherme Kelling
OCR EXtract is an action which we assign to a folder as rule. Could you please check if the "OCR Extract" action is availble under actions in Folder rule?
If you are not seeing the action in the folder rule then the "simple-ocr-repo-2.3.1.jar" is either not properly installed in alfresco repository or look out for any exceptions around it.
- Sriram
Good morning Sriram,
You were right, the file "imple-ocr-repo-2.3.1.jar" was not in the right local. After adjusting the configuration, the option was displayed in Alfresco.
Thank you for your help.
Hi @guilhermekellin,
Great that @SriramG was able to help you resolve your problem - thanks for reporting back. I've marked this as solved.
Kind regards,
Ask for and offer help to other Alfresco Content Services Users and members of the Alfresco team.
Related links:
By using this site, you are agreeing to allow us to collect and use cookies as outlined in Alfresco’s Cookie Statement and Terms of Use (and you have a legitimate interest in Alfresco and our products, authorizing us to contact you in such methods). If you are not ok with these terms, please do not use this website.