Tiger OCR integration

Document created by resplin Employee on Jun 6, 2015
Version 1Show Document
  • View in full screen mode

Obsolete Pages{{Obsolete}}

The official documentation is at: http://docs.alfresco.com



TransformationsThird Party Extensions

Cognitive Enterprises (ocr.com) has been focused on providing OCR software since 1988. Their technology is incorporated into a lot of applications worldwide through a library called Tiger OCR.


Benefits of this technology


At Intelliant we like it for several reasons:


  • Remarkably fast.
  • Light (library and uncompressed dictionaries weigh only 8 Mb).
  • Accurate. Recognition accuracy is quite good for the main european languages.
  • Character coordinates. Intelliant OCR doesn't only provide the text contained in the images, but also the coordinates of each character on the page. Thanks to this functionality, we are able to create searchable PDF files, with both image and text information included, similar to Acrobat Capture outputs.
  • Cheap runtime fee policy. This allowed us to incorporate OCR capabilities into our applications for a fraction of the usual price, allowing small and medium businesses to access 'paperless office'.

A straightforward integration


One of our tools, Intelliant OCR, is a commandline application, which means that it can be very easily integrated in other document management systems. As a demonstration, we provide configuration files allowing it to be plugged 'out of the box' into Alfresco ECM:


  • ocr-transformers-context.xml defines OCR transformations that you can use within basic content rules (for example, transform to PDF all TIFF images dropped into a specific folder),

For a Tomcat bundle typical Windows installation, both files need to be placed in following directory: C:\Program Files\alfresco-1.3.0\tomcat\shared\classes\alfresco\extension.

You will find additional information in the Alfresco Documentation:


Although you can find other popular solutions on the market, we believe that our approach can simplify the whole development and administration process, providing a single point of control within Alfresco repository interface for all of your organization content rules.


Feedback


We welcome your feedback on the intended functionality of these configuration files and, in particular, any use cases you have that this integration ideally suits. Please don't hesitate to contact us in case of any questions / comments or inquiries.

--Franz 14:04, 15 August 2006 (BST)

Attachments

    Outcomes