The official documentation is at: http://docs.alfresco.com
Cognitive Enterprises (ocr.com) has been focused on providing OCR software since 1988. Their technology is incorporated into a lot of applications worldwide through a library called Tiger OCR.
Benefits of this technology
At Intelliant we like it for several reasons:
- Remarkably fast.
- Light (library and uncompressed dictionaries weigh only 8 Mb).
- Accurate. Recognition accuracy is quite good for the main european languages.
- Character coordinates. Intelliant OCR doesn't only provide the text contained in the images, but also the coordinates of each character on the page. Thanks to this functionality, we are able to create searchable PDF files, with both image and text information included, similar to Acrobat Capture outputs.
- Cheap runtime fee policy. This allowed us to incorporate OCR capabilities into our applications for a fraction of the usual price, allowing small and medium businesses to access 'paperless office'.
A straightforward integration
One of our tools, Intelliant OCR, is a commandline application, which means that it can be very easily integrated in other document management systems. As a demonstration, we provide configuration files allowing it to be plugged 'out of the box' into Alfresco ECM:
- ocr-transformers-context.xml defines OCR transformations that you can use within basic content rules (for example, transform to PDF all TIFF images dropped into a specific folder),
- web-client-config-custom.xml adds RTF to the list of mime types available for content transformations.
For a Tomcat bundle typical Windows installation, both files need to be placed in following directory: C:\Program Files\alfresco-1.3.0\tomcat\shared\classes\alfresco\extension.
You will find additional information in the Alfresco Documentation:
- Basic Tutorial,
- Repository Configuration,
- Content Transformations,
- Transformation Mimetype,
- Web Client Configuration Guide, etc.
Although you can find other popular solutions on the market, we believe that our approach can simplify the whole development and administration process, providing a single point of control within Alfresco repository interface for all of your organization content rules.
We welcome your feedback on the intended functionality of these configuration files and, in particular, any use cases you have that this integration ideally suits. Please don't hesitate to contact us in case of any questions / comments or inquiries.
--Franz 14:04, 15 August 2006 (BST)