Tesseract full integration

normando · ‎30 Oct 2010

Hola a todos. Espero que lo siguiente les sea útil a quiénes deseen realizar búsquedas de texto en documento escaneados TIF, y en linux.

Antes debo hacer alguna aclaraciones.

Tesseract 2.x NO funciona con archicos con extensión .tiff , sólo funciona con archivos .tif . La versión 3.0 sí lo hace. A continuación pongo la versión para Tesseract 2.x, dado que Alfresco modifica los archivos TIF a TIFF en el proceso de transformación, y luego lo guarda nuevamente a .TIF (alguien sabe si este se puede cambiar y dejarlo sólo en .tif?).

Hay otro problema más, y es que Tesseract da siempre como archivo de salida uno finalizado en txt, por lo tanto si le agregamos la extensión, obtendremos un archivo.txt.txt. Por este motivo he realizado un wrapper para eviatr este problema, y que Alfresco pueda indexar correctamente.

En primer lugar debemos probar Tesseract desde la consola:

tesseract archivo.tif archivosalida -l spa‍

archivosalida lo dejamos sin extensión. Si todo funciona correctamente.

Creamos un archivo ocrtiff-transform-context.xml en /tomcat/shared/classes/alfresco/extenssion con el siguiente contenido:

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE beans PUBLIC '-//SPRING//DTD BEAN//EN' 'http://www.springframework.org/dtd/spring-beans.dtd'>

<beans>
    <bean id="transformer.worker.ocr.tiff" class="org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker">

        <property name="mimetypeService">
            <ref bean="mimetypeService" />
        </property>

          <property name="checkCommand">
             <bean class="org.alfresco.util.exec.RuntimeExec">
                <property name="commandsAndArguments">
                    <map>
                        <entry key=".*">
                            <list>
<!–                            <value>tesseract</value> –>
                                <value>/opt/alfresco/ocr</value>
                            </list>
                        </entry>
                    </map>
                </property>
                <property name="errorCodes">
                   <value>2</value>
                </property>
             </bean>
          </property>

          <property name="transformCommand">
             <bean class="org.alfresco.util.exec.RuntimeExec">
                <property name="commandsAndArguments">
                    <map>
                        <entry key=".*">
                            <list>
<!–                            <value>tesseract</value>
                                <value>${source}</value>
                                <value>${target}</value>
                                <value>-l</value>
                                <value>spa</value> –>
                                <value>/opt/alfresco/ocr</value>
                                <value>${source}</value>
                                <value>${target}</value>
                            </list>
                        </entry>
                    </map>
                </property>
                <property name="errorCodes">
                   <value>1,2</value>
                </property>
             </bean>
          </property>

          <property name="explicitTransformations">
             <list>
                <bean class="org.alfresco.repo.content.transform.ExplictTransformationDetails">
                    <property name="sourceMimetype"><value>image/tiff</value></property>
                    <property name="targetMimetype"><value>text/plain</value></property>
                </bean>
             </list>
          </property>
    </bean>

    <bean id="transformer.ocr.tiff" class="org.alfresco.repo.content.transform.ProxyContentTransformer" parent="baseContentTransformer">
        <property name="worker">
            <ref bean="transformer.worker.ocr.tiff" />
        </property>
    </bean>
</beans>
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Luego, un pequeño wrapper para efectuar todas las modificaciones del caso y que Tesseract no muera en el intento. En la raíz de donde tengan instalado Alfresco (en mi caso /opt/alfresco) crean un archivo con permisos de ejecución (755) llamado 'ocr' con el siguiente contenido:

#!/bin/bash
# save arguments to variables
SOURCE=$1
TARGET=$2
TMPDIR=/tmp
FILENAME=`basename $SOURCE`
OCRFILE=$FILENAME.tif

# to see what happens
#echo "from $SOURCE to $TARGET" >>/tmp/ocrtransform.log

cp -f $SOURCE $TMPDIR/$OCRFILE

# call tesseract and redirect output to $TARGET
tesseract $TMPDIR/$OCRFILE ${TARGET%\.*} -l spa
rm -f $TMPDIR/$OCRFILE
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Si quieren sabér qué ocurre cuando Alfresco hace las transformaciones, descomenten la línea del log.

Ahora solo resta reiniciar Alfresco, y subir cualquier archivo .tif para que éste sea indexado correcta. Es super veloz amigos

Cualquier duda comenten. Para los que más sepan, les agradeceré cualquier mejora en el script y transformer. Gracias

Happy ocring :-)

normando · ‎30 Oct 2010

He olvidado decir que mi versión de Alfresco es la 3.4a

cesarista · ‎30 Nov 2010

Hola Normando:

Te dejamos una pequeña evolución del script:

http://blyx.com/2010/11/30/integracion-de-ocr-en-alfresco
http://www.zylk.net/web/guest/web-2-0/blog/-/blogs/integracion-de-ocr-en-alfresco

Un saludo.

–C.

normando · ‎30 Nov 2010

Muchas gracias César. Muy bueno el artículo.

Sólo agregaría, que según mi experiencia, se pueden subir archivos tif comprimidos. Un archivo que normalmente pesa 2mb, comprimido es de 50kb (blanco y negro), y tesseract perfectamente hace su trabajo.

No he probado aún tu script, pero me ha gustado mucho la "limpieza" que haces de los caracteres reconocidos.

Saludos

normando · ‎30 Nov 2010

César, he probado implementar todo, y no me ha funcionado. Por supuesto que he ajustado los paths, y verificado que todo funciona por separado.

Es decir, no obtengo ningún error en los logs. Y en el archivo /tmp/ocr.log esto:

/opt/alfresco/tomcat/temp/Alfresco/RuntimeExecutableContentTransformerWorker_source_7788580097067181368.tiff

No se mucho de python, pero tesseract está intentado efectuar el ocr en un archivo con extensión .tiff o .tif?

Detalles del bug en tesseract 2.x:
http://code.google.com/p/tesseract-ocr/issues/detail?id=163

Gracias

cesarista · ‎1 Dec 2010

Hola Normando:

Te mando unos consejos para la puesta a punto:

- Lo primero de todo ejecuta el script de python con tu archivo tif o tiff en una consola y comprueba que funciona el script y el propio tesseract.


ocr-simple.py imagen.tif 
‍‍‍

- Verifica que el archivo de python tiene permisos de ejecución en $ALF_BIN (chmod +x ocr-simple.py).

- No se decirte tesseract (pon extension .tif mejor, lei lo del bug ese hace tiempo), pero alfresco aplica el transformador con un mimetype que se aplica tanto a archivos tif o tiff.

- Por ultimo, configura el log4j para ver las trazas de los transformadores (esto esta detallado en el artículo).

Un saludo y me cuentas.

–C.

normando · ‎3 Dec 2010

Hola César

Bueno, he probado el comando que me indicas y funciona correctamente, generando el archivo de texto resultante.

ocr-simple.py imagen.tif archivosalida‍

He añadido al comando la opción español, porque en mi caso me reconoce más palabras:

command = popen('/usr/bin/tesseract '+sys.argv[1]+' /tmp/tesser-$$ -l spa 2> /dev/null; cat /tmp/tesser-$$.txt')‍

No me funciona con alfresco por que alfresco genera archivos .tiff aunque los que haya subido sean .tif, y ése es el argumento que le pasa al script, un archivo .tiff en lugar de .tif

Es muy interesante la depuración que haces de palabras, aunque creo que está muy restrictivo, puesto que no devuelve palabras con acentos, y las que son mayúsculas directamente las amputa de la salida. Al menos en mi caso.

Seguiré trabajando para mejorar el primer script que permite utilizar archivos .itff que no los reconoce tesseract 2.x. Y voy a tomar las muy buenas ideas del tuyo, y lo publicaré cuando lo tenga bien afinado

Gracias nuevamente por compartir conocimientos.

cesarista · ‎3 Dec 2010

Hola Normando:

No entiendo porque dices que "alfresco genera archivos .tiff".

Alfresco entiende un archivo tif o tiff con un mimetype "image/tiff" (Alfresco 3.3 Community). Si subes un archivo con la extensión tif, la extensión en alfresco tambien es tif y el script y el transformador funcionan convenientemente. ¿Has comprobado que se ejecuta el transformador con el log4j?

No me funciona con alfresco por que alfresco genera archivos .tiff aunque los que haya subido sean .tif, y ése es el argumento que le pasa al script, un archivo .tiff en lugar de .tif

Un saludo.

–C.

cesarista · ‎6 Dec 2010

Hola:

Te dejo unas pruebas pero esta vez en Alfresco Share:

http://www.zylk.net/web/guest/web-2-0/blog/-/blogs/integracion-ocr-en-alfresco-share

Un saludo.

–C.

urban · ‎20 Dec 2010

Hola, estoy intentando realizar la integración de alfresco y tesseract, tengo instalada la versión 3.4b, a partir de la versión 3.2 instalada como paquete en ubuntu 9.04. He probado el script ocr-simple.py desde la linea de comandos y funciona correctamente, pero no consigo que funcione desde alfresco. Al tener alfresco instalado sobre tomcat no se en que ruta colocar el script. Aquí pongo la salida de alfresco.log.

14:50:55,459 INFO  [org.alfresco.config.JndiPropertiesFactoryBean] Loading properties file from class path resource [alfresco/repository.properties]
14:50:55,465 INFO  [org.alfresco.config.JndiPropertiesFactoryBean] Loading properties file from class path resource [alfresco/domain/transaction.properties]
14:50:55,478 INFO  [org.alfresco.config.JndiPropertiesFactoryBean] Loading properties file from file [/var/lib/tomcat6/webapps/alfresco/WEB-INF/classes/alfresco/module/test/alfresco-global.properties]
14:50:55,485 INFO  [org.alfresco.config.JndiPropertiesFactoryBean] Loading properties file from URL [file:/usr/share/tomcat6/shared/classes/alfresco-global.properties]
14:50:55,603 INFO  [org.alfresco.config.JndiPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/alfresco-shared.properties]
14:50:55,739 INFO  [org.alfresco.config.FixedPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/version.properties]
14:50:55,773 INFO  [org.alfresco.config.FixedPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/domain/cache-strategies.properties]
14:51:09,257 DEBUG [org.alfresco.repo.content.transform.ContentTransformerRegistry] Registered general transformer: 
   transformer: ProxyContentTransformer[ average=0ms]
14:51:13,206 DEBUG [org.alfresco.repo.content.transform.ContentTransformerRegistry] Registered general transformer: 
   transformer: StringExtractingContentTransformer[ average=0ms]
14:51:13,212 DEBUG [org.alfresco.repo.content.transform.ContentTransformerRegistry] Registered general transformer: 
   transformer: BinaryPassThroughContentTransformer[ average=0ms]
14:51:13,224 DEBUG [org.alfresco.repo.content.transform.ContentTransformerRegistry] Registered general transformer: 
   transformer: PdfBoxContentTransformer[ average=0ms]
14:51:13,232 DEBUG [org.alfresco.repo.content.transform.ContentTransformerRegistry] Registered general transformer: 
   transformer: FailoverContentTransformer[ average=0ms]
14:51:13,241 DEBUG [org.alfresco.repo.content.transform.ContentTransformerRegistry] Registered general transformer: 
   transformer: ComplexContentTransformer[ average=0ms]
14:51:13,242 DEBUG [org.alfresco.repo.content.transform.ContentTransformerRegistry] Registered general transformer: 
   transformer: ProxyContentTransformer[ average=0ms]
14:51:13,242 DEBUG [org.alfresco.repo.content.transform.ContentTransformerRegistry] Registered general transformer: 
   transformer: ComplexContentTransformer[ average=0ms]
14:51:13,515 DEBUG [org.alfresco.repo.content.transform.ContentTransformerRegistry] Registered general transformer: 
   transformer: TextToPdfContentTransformer[ average=0ms]
14:51:13,518 DEBUG [org.alfresco.repo.content.transform.ContentTransformerRegistry] Registered general transformer: 
   transformer: ComplexContentTransformer[ average=0ms]
14:51:13,588 DEBUG [org.alfresco.repo.content.transform.ContentTransformerRegistry] Registered general transformer: 
   transformer: TikaAutoContentTransformer[ average=0ms]
14:51:13,595 DEBUG [org.alfresco.repo.content.transform.ContentTransformerRegistry] Registered general transformer: 
   transformer: PoiHssfContentTransformer[ average=0ms]
14:51:13,602 DEBUG [org.alfresco.repo.content.transform.ContentTransformerRegistry] Registered general transformer: 
   transformer: PoiContentTransformer[ average=0ms]
14:51:13,609 DEBUG [org.alfresco.repo.content.transform.ContentTransformerRegistry] Registered general transformer: 
   transformer: PoiOOXMLContentTransformer[ average=0ms]
14:51:13,623 DEBUG [org.alfresco.repo.content.transform.ContentTransformerRegistry] Registered general transformer: 
   transformer: TextMiningContentTransformer[ average=0ms]
14:51:13,630 DEBUG [org.alfresco.repo.content.transform.ContentTransformerRegistry] Registered general transformer: 
   transformer: HtmlParserContentTransformer[ average=0ms]
14:51:13,636 DEBUG [org.alfresco.repo.content.transform.ContentTransformerRegistry] Registered general transformer: 
   transformer: MediaWikiContentTransformer[ average=0ms]
14:51:13,637 DEBUG [org.alfresco.repo.content.transform.ContentTransformerRegistry] Registered general transformer: 
   transformer: ComplexContentTransformer[ average=0ms]
14:51:13,643 DEBUG [org.alfresco.repo.content.transform.ContentTransformerRegistry] Registered general transformer: 
   transformer: MailContentTransformer[ average=0ms]
14:51:13,671 DEBUG [org.alfresco.repo.content.transform.ContentTransformerRegistry] Registered general transformer: 
   transformer: EMLTransformer[ average=0ms]
14:51:13,679 DEBUG [org.alfresco.repo.content.transform.ContentTransformerRegistry] Registered general transformer: 
   transformer: ArchiveContentTransformer[ average=0ms]
14:51:15,785 DEBUG [org.alfresco.repo.content.transform.ContentTransformerRegistry] Registered general transformer: 
   transformer: ProxyContentTransformer[ average=0ms]
14:51:15,786 DEBUG [org.alfresco.repo.content.transform.ContentTransformerRegistry] Registered general transformer: 
   transformer: ComplexContentTransformer[ average=0ms]
14:51:15,787 DEBUG [org.alfresco.repo.content.transform.ContentTransformerRegistry] Registered general transformer: 
   transformer: ComplexContentTransformer[ average=0ms]
14:51:15,787 DEBUG [org.alfresco.repo.content.transform.ContentTransformerRegistry] Registered general transformer: 
   transformer: ComplexContentTransformer[ average=0ms]
14:51:17,538 DEBUG [org.alfresco.repo.content.transform.ContentTransformerRegistry] Registered general transformer: 
   transformer: ProxyContentTransformer[ average=0ms]
14:51:17,597 INFO  [org.springframework.extensions.webscripts.TemplateProcessorRegistry] Registered template processor Repository Template Processor for extension ftl
14:51:17,601 INFO  [org.springframework.extensions.webscripts.ScriptProcessorRegistry] Registered script processor Repository Script Processor for extension js
14:51:27,254 INFO  [org.alfresco.repo.domain.schema.SchemaBootstrap] Esquema gestionado por el gestor de base de datos org.hibernate.dialect.MySQLInnoDBDialect.
14:51:27,695 INFO  [org.alfresco.repo.domain.schema.SchemaBootstrap] No se hicieron cambios en el esquema.
14:51:27,765 INFO  [org.alfresco.repo.management.subsystems.ChildApplicationContextFactory] Starting 'sysAdmin' subsystem, ID: [sysAdmin, default]
14:51:27,780 INFO  [org.alfresco.config.FixedPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/version.properties]
14:51:27,781 INFO  [org.alfresco.config.JndiPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/alfresco-shared.properties]
14:51:27,782 INFO  [org.alfresco.config.FixedPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/domain/cache-strategies.properties]
14:51:27,794 INFO  [org.alfresco.repo.management.subsystems.ChildApplicationContextFactory] Startup of 'sysAdmin' subsystem, ID: [sysAdmin, default] complete
14:51:30,802 INFO  [org.alfresco.repo.management.subsystems.ChildApplicationContextFactory] Starting 'thirdparty' subsystem, ID: [thirdparty, default]
14:51:30,861 INFO  [org.alfresco.config.FixedPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/version.properties]
14:51:30,862 INFO  [org.alfresco.config.JndiPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/alfresco-shared.properties]
14:51:30,864 INFO  [org.alfresco.config.FixedPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/domain/cache-strategies.properties]
14:51:30,870 WARN  [org.alfresco.util.exec.RuntimeExec] The bean RuntimeExec property 'commandMap' has been deprecated; use 'commandsAndArguments' instead.  See https://issues.alfresco.com/jira/browse/ETHREEOH-579.
14:51:30,872 WARN  [org.alfresco.util.exec.RuntimeExec] The bean RuntimeExec property 'commandMap' has been deprecated; use 'commandsAndArguments' instead.  See https://issues.alfresco.com/jira/browse/ETHREEOH-579.
14:51:31,177 DEBUG [org.alfresco.util.exec.RuntimeExec] Execution result: 
   os:         Linux
   command:    [/usr/bin/alfresco-pdf2swf, -V]
   succeeded:  true
   exit code:  0
   out:        pdf2swf - part of swftools 2009-03-15-1014

   err:        
14:51:31,680 DEBUG [org.alfresco.util.exec.RuntimeExec] Execution result: 
   os:         Linux
   command:    [/usr/bin/convert, /tmp/tomcat6-temp/Alfresco/ImageMagickContentTransformerWorker_init_source_1247669936733175951.gif[0], /tmp/tomcat6-temp/Alfresco/ImageMagickContentTransformerWorker_init_target_7357412200573451800.png]
   succeeded:  true
   exit code:  0
   out:        
   err:        
14:51:31,810 DEBUG [org.alfresco.util.exec.RuntimeExec] Execution result: 
   os:         Linux
   command:    [/usr/bin/convert, -version]
   succeeded:  false
   exit code:  1
   out:        Version: ImageMagick 6.4.5 2009-06-04 Q16 OpenMP http://www.imagemagick.org
Copyright: Copyright (C) 1999-2008 ImageMagick Studio LLC


   err:        
14:51:31,816 INFO  [org.alfresco.repo.management.subsystems.ChildApplicationContextFactory] Startup of 'thirdparty' subsystem, ID: [thirdparty, default] complete
14:51:31,816 INFO  [org.alfresco.repo.management.subsystems.ChildApplicationContextFactory] Starting 'OOoDirect' subsystem, ID: [OOoDirect, default]
14:51:31,854 INFO  [org.alfresco.config.FixedPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/version.properties]
14:51:31,855 INFO  [org.alfresco.config.JndiPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/alfresco-shared.properties]
14:51:31,855 INFO  [org.alfresco.config.FixedPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/domain/cache-strategies.properties]
14:51:32,308 DEBUG [org.alfresco.util.exec.RuntimeExec] Execution result: 
   os:         Linux
   command:    [/usr/bin/soffice, -accept=socket,host=127.0.0.1,port=8100;urp;StarOffice.ServiceManager, -headless, -norestore]
   succeeded:  true
   exit code:  0
   out:        
   err:        
14:51:32,481 WARN  [org.alfresco.util.OpenOfficeConnectionTester] No se pudo establecer la conexión a OpenOffice
14:51:32,497 INFO  [org.alfresco.repo.management.subsystems.ChildApplicationContextFactory] Startup of 'OOoDirect' subsystem, ID: [OOoDirect, default] complete
14:51:34,976 INFO  [org.alfresco.repo.admin.ConfigurationChecker] El directorio raíz de Alfresco ('dir.root') es: /var/lib/alfresco
14:51:35,414 INFO  [org.alfresco.repo.admin.patch.PatchExecuter] Comprobando si hay parches para aplicar …
14:51:36,256 INFO  [org.alfresco.repo.admin.patch.PatchExecuter] No se requieren parches.
14:51:36,271 INFO  [org.alfresco.repo.module.ModuleServiceImpl] Encontrado(s) 0 módulo(s).
14:51:36,351 INFO  [org.alfresco.repo.management.subsystems.ChildApplicationContextFactory] Starting 'fileServers' subsystem, ID: [fileServers, default]
14:51:36,419 INFO  [org.alfresco.config.FixedPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/version.properties]
14:51:36,421 INFO  [org.alfresco.config.JndiPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/alfresco-shared.properties]
14:51:36,426 INFO  [org.alfresco.config.FixedPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/domain/cache-strategies.properties]
14:51:36,824 INFO  [org.alfresco.repo.management.subsystems.ChildApplicationContextFactory] Starting 'Authentication' subsystem, ID: [Authentication, managed, passthru1]
14:51:36,845 INFO  [org.alfresco.config.FixedPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/version.properties]
14:51:36,848 INFO  [org.alfresco.config.JndiPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/alfresco-shared.properties]
14:51:36,848 INFO  [org.alfresco.config.FixedPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/domain/cache-strategies.properties]
14:51:48,505 INFO  [org.alfresco.repo.management.subsystems.ChildApplicationContextFactory] Startup of 'Authentication' subsystem, ID: [Authentication, managed, passthru1] complete
14:51:48,506 INFO  [org.alfresco.repo.management.subsystems.ChildApplicationContextFactory] Starting 'Authentication' subsystem, ID: [Authentication, managed, ldap-ad1]
14:51:48,528 INFO  [org.alfresco.config.FixedPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/version.properties]
14:51:48,529 INFO  [org.alfresco.config.JndiPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/alfresco-shared.properties]
14:51:48,530 INFO  [org.alfresco.config.FixedPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/domain/cache-strategies.properties]
14:51:48,950 INFO  [org.alfresco.repo.management.subsystems.ChildApplicationContextFactory] Startup of 'Authentication' subsystem, ID: [Authentication, managed, ldap-ad1] complete
14:51:53,956 ERROR [org.alfresco.fileserver] Failed to get local domain/workgroup name, using default of WORKGROUP
14:51:53,957 ERROR [org.alfresco.fileserver] (This may be due to firewall settings or incorrect <broadcast> setting)
14:51:54,085 ERROR [org.alfresco.fileserver] [FTP] FTP Socket error : java.net.BindException: Permission denied
14:51:54,095 ERROR [org.alfresco.fileserver] java.net.BindException: Permission denied
java.net.BindException: Permission denied
   at java.net.PlainSocketImpl.socketBind(Native Method)
   at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:365)
   at java.net.ServerSocket.bind(ServerSocket.java:319)
   at java.net.ServerSocket.<init>(ServerSocket.java:185)
   at java.net.ServerSocket.<init>(ServerSocket.java:141)
   at org.alfresco.jlan.ftp.FTPServer.run(FTPServer.java:555)
   at java.lang.Thread.run(Thread.java:662)
14:51:54,093 INFO  [org.alfresco.repo.management.subsystems.ChildApplicationContextFactory] Startup of 'fileServers' subsystem, ID: [fileServers, default] complete
14:51:54,098 INFO  [org.alfresco.repo.management.subsystems.ChildApplicationContextFactory] Starting 'imap' subsystem, ID: [imap, default]
14:51:54,132 INFO  [org.alfresco.config.FixedPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/version.properties]
14:51:54,132 INFO  [org.alfresco.config.JndiPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/alfresco-shared.properties]
14:51:54,133 INFO  [org.alfresco.config.FixedPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/domain/cache-strategies.properties]
14:51:54,201 INFO  [org.alfresco.repo.management.subsystems.ChildApplicationContextFactory] Startup of 'imap' subsystem, ID: [imap, default] complete
14:51:54,201 INFO  [org.alfresco.repo.management.subsystems.ChildApplicationContextFactory] Starting 'email' subsystem, ID: [email, outbound]
14:51:54,216 INFO  [org.alfresco.config.FixedPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/version.properties]
14:51:54,217 INFO  [org.alfresco.config.JndiPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/alfresco-shared.properties]
14:51:54,218 INFO  [org.alfresco.config.FixedPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/domain/cache-strategies.properties]
14:51:54,292 INFO  [org.alfresco.repo.management.subsystems.ChildApplicationContextFactory] Startup of 'email' subsystem, ID: [email, outbound] complete
14:51:54,293 INFO  [org.alfresco.repo.management.subsystems.ChildApplicationContextFactory] Starting 'email' subsystem, ID: [email, inbound]
14:51:54,307 INFO  [org.alfresco.config.FixedPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/version.properties]
14:51:54,307 INFO  [org.alfresco.config.JndiPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/alfresco-shared.properties]
14:51:54,308 INFO  [org.alfresco.config.FixedPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/domain/cache-strategies.properties]
14:51:54,375 WARN  [org.springframework.beans.GenericTypeAwarePropertyDescriptor] Invalid JavaBean property 'blockedSenders' being accessed! Ambiguous write methods found next to actually used [public void org.alfresco.email.server.EmailServer.setBlockedSenders(java.util.List)]: [public void org.alfresco.email.server.EmailServer.setBlockedSenders(java.lang.String)]
14:51:54,375 WARN  [org.springframework.beans.GenericTypeAwarePropertyDescriptor] Invalid JavaBean property 'allowedSenders' being accessed! Ambiguous write methods found next to actually used [public void org.alfresco.email.server.EmailServer.setAllowedSenders(java.util.List)]: [public void org.alfresco.email.server.EmailServer.setAllowedSenders(java.lang.String)]
14:51:54,440 INFO  [org.alfresco.repo.management.subsystems.ChildApplicationContextFactory] Startup of 'email' subsystem, ID: [email, inbound] complete
14:51:54,441 INFO  [org.alfresco.repo.management.subsystems.ChildApplicationContextFactory] Starting 'googledocs' subsystem, ID: [googledocs, default]
14:51:54,482 INFO  [org.alfresco.config.FixedPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/version.properties]
14:51:54,483 INFO  [org.alfresco.config.JndiPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/alfresco-shared.properties]
14:51:54,484 INFO  [org.alfresco.config.FixedPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/domain/cache-strategies.properties]
14:51:54,838 INFO  [org.alfresco.repo.management.subsystems.ChildApplicationContextFactory] Startup of 'googledocs' subsystem, ID: [googledocs, default] complete
14:51:54,863 INFO  [org.alfresco.repo.usage.UserUsageTrackingComponent] Enabled - calculate missing user usages …
14:51:54,878 INFO  [org.alfresco.repo.usage.UserUsageTrackingComponent] Found 0 users to recalculate
14:51:54,878 INFO  [org.alfresco.repo.usage.UserUsageTrackingComponent] … calculated missing usages for 0 users
14:51:54,878 INFO  [org.alfresco.repo.management.subsystems.ChildApplicationContextFactory] Starting 'Synchronization' subsystem, ID: [Synchronization, default]
14:51:54,893 INFO  [org.alfresco.config.FixedPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/version.properties]
14:51:54,894 INFO  [org.alfresco.config.JndiPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/alfresco-shared.properties]
14:51:54,894 INFO  [org.alfresco.config.FixedPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/domain/cache-strategies.properties]
14:51:54,934 INFO  [org.alfresco.repo.security.sync.ChainingUserRegistrySynchronizer] Synchronizing users and groups with user registry 'ldap-ad1'
14:51:54,967 INFO  [org.alfresco.repo.security.sync.ChainingUserRegistrySynchronizer] Retrieving groups changed since 14-dic-2010 14:56:54 from user registry 'ldap-ad1'
14:51:54,995 DEBUG [org.alfresco.repo.security.sync.ldap.LDAPUserRegistry] Found 0
14:51:55,008 INFO  [org.alfresco.repo.security.sync.ChainingUserRegistrySynchronizer] ldap-ad1 Group Analysis: Commencing batch of 0 entries
14:51:55,011 INFO  [org.alfresco.repo.security.sync.ChainingUserRegistrySynchronizer] ldap-ad1 Group Analysis: Completed batch of 0 entries
14:51:55,028 INFO  [org.alfresco.repo.security.sync.ChainingUserRegistrySynchronizer] Retrieving users changed since 14-dic-2010 14:56:01 from user registry 'ldap-ad1'
14:51:55,038 INFO  [org.alfresco.repo.security.sync.ChainingUserRegistrySynchronizer] ldap-ad1 User Creation and Association: Commencing batch of 0 entries
14:51:55,039 INFO  [org.alfresco.repo.security.sync.ChainingUserRegistrySynchronizer] ldap-ad1 User Creation and Association: Completed batch of 0 entries
14:51:55,094 INFO  [org.alfresco.repo.security.sync.ChainingUserRegistrySynchronizer] Finished synchronizing users and groups with user registry 'ldap-ad1'
14:51:55,094 INFO  [org.alfresco.repo.security.sync.ChainingUserRegistrySynchronizer] 0 user(s) and 0 group(s) processed
14:51:55,107 INFO  [org.alfresco.repo.management.subsystems.ChildApplicationContextFactory] Startup of 'Synchronization' subsystem, ID: [Synchronization, default] complete
14:51:55,181 INFO  [org.alfresco.service.descriptor.DescriptorService] Alfresco JVM - v1.6.0_22-b04; maximum heap size 989,875MB
14:51:55,182 INFO  [org.alfresco.service.descriptor.DescriptorService] Alfresco started (Community): Current version 3.4.0 (b 3262) schema 4111 - Originally installed version 3.2.0 (@build-number@) schema 2019
14:51:55,183 INFO  [org.alfresco.repo.management.subsystems.ChildApplicationContextFactory] Starting 'Replication' subsystem, ID: [Replication, default]
14:51:55,196 INFO  [org.alfresco.config.FixedPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/version.properties]
14:51:55,197 INFO  [org.alfresco.config.JndiPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/alfresco-shared.properties]
14:51:55,197 INFO  [org.alfresco.config.FixedPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/domain/cache-strategies.properties]
14:51:55,206 INFO  [org.alfresco.repo.management.subsystems.ChildApplicationContextFactory] Startup of 'Replication' subsystem, ID: [Replication, default] complete
14:52:07,756 INFO  [org.springframework.extensions.webscripts.DeclarativeRegistry] Registered 371 Web Scripts (+0 failed), 612 URLs
14:52:07,757 INFO  [org.springframework.extensions.webscripts.DeclarativeRegistry] Registered 2 Package Description Documents (+0 failed) 
14:52:07,757 INFO  [org.springframework.extensions.webscripts.DeclarativeRegistry] Registered 1 Schema Description Documents (+0 failed) 
14:52:07,759 INFO  [org.springframework.extensions.webscripts.AbstractRuntimeContainer] Initialised Repository Web Script Container (in 11349.13ms)
14:52:07,761 INFO  [org.springframework.extensions.webscripts.TemplateProcessorRegistry] Registered template processor freemarker for extension ftl
14:52:07,762 INFO  [org.springframework.extensions.webscripts.ScriptProcessorRegistry] Registered script processor javascript for extension js
14:52:18,642 INFO  [org.springframework.extensions.webscripts.DeclarativeRegistry] Registered 257 Web Scripts (+0 failed), 265 URLs
14:52:18,645 INFO  [org.springframework.extensions.webscripts.DeclarativeRegistry] Registered 8 Package Description Documents (+0 failed) 
14:52:18,645 INFO  [org.springframework.extensions.webscripts.DeclarativeRegistry] Registered 0 Schema Description Documents (+0 failed) 
14:52:18,796 INFO  [org.springframework.extensions.webscripts.AbstractRuntimeContainer] Initialised Spring Surf Container Web Script Container (in 2657.7637ms)
14:52:18,879 INFO  [org.springframework.extensions.webscripts.TemplateProcessorRegistry] Registered template processor freemarker for extension ftl
14:52:19,037 INFO  [org.springframework.extensions.webscripts.ScriptProcessorRegistry] Registered script processor javascript for extension js
14:52:19,164 INFO  [org.springframework.extensions.webscripts.TemplateProcessorRegistry] Registered template processor freemarker for extension ftl
14:52:19,170 INFO  [org.springframework.extensions.webscripts.ScriptProcessorRegistry] Registered script processor javascript for extension js
14:52:19,336 INFO  [org.springframework.extensions.webscripts.TemplateProcessorRegistry] Registered template processor freemarker for extension ftl
14:52:19,342 INFO  [org.springframework.extensions.webscripts.ScriptProcessorRegistry] Registered script processor javascript for extension js
14:57:13,230 DEBUG [org.alfresco.repo.content.transform.ContentTransformerRegistry] Searched for transformer: 
   source mimetype: image/tiff
   target mimetype: text/plain
   transformers: [ProxyContentTransformer[ average=0ms]]
14:57:13,685 DEBUG [org.alfresco.util.exec.RuntimeExec] Execution result: 
   os:         Linux
   command:    [/usr/bin/python, /home/urbano/Escritorio/ocr-simple.py, /tmp/tomcat6-temp/Alfresco/RuntimeExecutableContentTransformerWorker_source_1870219105233062900.tiff, /tmp/tomcat6-temp/Alfresco/RuntimeExecutableContentTransformerWorker_target_3626504261222258866.txt]
   succeeded:  true
   exit code:  0
   out:        
   err:        cat: /tmp/tesser-6386.txt: No existe el fichero ó directorio

14:57:16,909 DEBUG [org.alfresco.repo.content.transform.ContentTransformerRegistry] Searched for transformer: 
   source mimetype: image/tiff
   target mimetype: text/plain
   transformers: [ProxyContentTransformer[ average=455ms]]
14:58:15,077 DEBUG [org.alfresco.repo.content.transform.ContentTransformerRegistry] Searched for transformer: 
   source mimetype: image/tiff
   target mimetype: text/plain
   transformers: [ProxyContentTransformer[ average=455ms]]
14:58:15,377 DEBUG [org.alfresco.util.exec.RuntimeExec] Execution result: 
   os:         Linux
   command:    [/usr/bin/python, /home/urbano/Escritorio/ocr-simple.py, /tmp/tomcat6-temp/Alfresco/RuntimeExecutableContentTransformerWorker_source_4665339913640775042.tiff, /tmp/tomcat6-temp/Alfresco/RuntimeExecutableContentTransformerWorker_target_4203595087035328580.txt]
   succeeded:  true
   exit code:  0
   out:        
   err:        cat: /tmp/tesser-6399.txt: No existe el fichero ó directorio
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Espero vuestra ayuda. Gracias de antemano
Urbano.