Hello,
I'm using the Alfresco addons alfresco-simple-ocr with pdfsandwich to extract data from an invoice, everything works fine but the results are not very accurate.
my invoice has this template:
and the result that i get after OCR is this:
BILLING ADDRESS INVOICE XXXX XXXX XXX XXXX XXXX Number 545614513 XXXX XXXX Date May 30, 2019 XXXX XXXX Delivery No. INV1254 DELIVERY ADDRESS Your Request Date May 30, 2019 XXXX XXXX XXX XXXX XXXX Your Order No. SO655614 XXXX XXXX Contract No. - XXXX XXXX Quote No. SO655614 Customer No. Your Contact XXXXX (152)-568-5458 Our Contact XXXXX admin@yourcompany.example.com Pos. Prod.No. Description Qty Price/Item (USD) VAT Total (USD) 1 P_21154 XXXX 1 0.20 15% 0.20 Total USD (excl. taxes) 0.20 VAT 0.03 Total Net Price in USD (incl. VAT) 0.23
So my questions are:
- How can I improve the accuracy of the results? Because sometimes for example: instead of an 'S' I get a '5' or '8' instead of '8'....
- How can get the results in blocks, part by part: Part1, part 2 and then part 3
I tried croping the invoice with this command line and it gives me the results i want, but how can i do it from inside Alfresco?:
convert -density 200 INVOICE.pdf -crop 100x50% +repage \( -clone 0 -crop 50x100% +repage -reverse \) -delete 0 -reverse INVOICE-out.pdf
alfresco-global.properties:
# OCR # ocr.command=/usr/bin/pdfsandwich ocr.output.verbose=true ocr.output.file.prefix.command=-o ocr.extra.commands=-verbose -rgb -lang fra+eng -nopreproc ocr.server.os=linux
Try OCRmyPDF, it will give you better results.
https://ocrmypdf.readthedocs.io/en/latest/
OCRmyPDF gives me more or less the same results.
How can i OCR block by block (part1, part2, part3)?
i am having the same issue. Did u found any solution?
I'm still facing the same issue.
Anyone has an idea on how to extract the text from the invoice by parts?
Ask for and offer help to other Alfresco Content Services Users and members of the Alfresco team.
Related links:
By using this site, you are agreeing to allow us to collect and use cookies as outlined in Alfresco’s Cookie Statement and Terms of Use (and you have a legitimate interest in Alfresco and our products, authorizing us to contact you in such methods). If you are not ok with these terms, please do not use this website.