AnsweredAssumed Answered

show Tesseract OCR output pdf file as original tiff??

Question asked by cperez on Apr 14, 2014
Latest reply on Apr 21, 2014 by cperez
Hi all!!

I install tesseract on my server to convert a tif file into pdf file.

I use the next code within ubuntu terminal:


find . -maxdepth 1 -name "*.tif" -print0 | while IFS= read -r -d '' n; do
tesseract "$n" "$n" -l eng hocr;
hocr2pdf -i "$n" -n -o "$n.pdf" < "$n.html";
done


This code only show a plain text in pdf format, similar as the original but not equals.
I want to see the document(pdf) identically as the scanned image(tiff), whith all the lines and the images but I have not found any way to do this.

Can anyone tell me a way to do it with tesseract, or in the worst case with another free application?

Thanks a lot in advance!!

Outcomes