AnsweredAssumed Answered

Get source filename from transformer-worker

Question asked by normando on Sep 7, 2011
Latest reply on Sep 8, 2011 by normando
Hello.

I am a newbie with Java, but I was learn a few things with Alfresco.

I currently has running a transformer context for do an OCR to all TIF files (paste at the bottom).
Now I need to do the same but based on the filenames switch to the specific Tesseract dictionary.

Fo example: If I upload a file named 123123123-spa.tif, then I append at the end of the tesseract command, "-l spa", because this filename indicate that this file is spanish. The same for english, german, and so on.

Exist into a bean something like an "IF" statment and how to get the real (no node) filename from ${source} and ${target} ?

Thank you for the help.


<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE beans PUBLIC '-//SPRING//DTD BEAN//EN' 'http://www.springframework.org/dtd/spring-beans.dtd'>

<beans>
    <bean id="transformer.worker.ocr.tiff" class="org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker">

      <property name="mimetypeService">
         <ref bean="mimetypeService" />
      </property>

      <property name="checkCommand">
         <bean class="org.alfresco.util.exec.RuntimeExec">
            <property name="commandsAndArguments">
               <map>
                  <entry key=".*">
                     <list>
                        <!–<value>tesseract</value> –>
                        <value>/opt/alfresco/ocr</value>
                     </list>
                  </entry>
               </map>
            </property>
            <property name="errorCodes">
               <value>2</value>
            </property>
         </bean>
      </property>

      <property name="transformCommand">
         <bean class="org.alfresco.util.exec.RuntimeExec">
            <property name="commandsAndArguments">
               <map>
                  <entry key=".*">
                     <list>
                        <!–<value>tesseract</value>
                        <value>${source}</value>
                        <value>${target}</value>
                        <value>-l</value>
                        <value>spa</value> –>
                        <value>/opt/alfresco/ocr</value>
                        <value>${source}</value>
                        <value>${target}</value>
                     </list>
                  </entry>
               </map>
            </property>
            <property name="errorCodes">
               <value>1,2</value>
            </property>
         </bean>
      </property>

      <property name="explicitTransformations">
         <list>
            <bean class="org.alfresco.repo.content.transform.ExplictTransformationDetails">
               <property name="sourceMimetype"><value>image/tiff</value></property>
               <property name="targetMimetype"><value>text/plain</value></property>
            </bean>
         </list>
        </property>
    </bean>

    <bean id="transformer.ocr.tiff" class="org.alfresco.repo.content.transform.ProxyContentTransformer" parent="baseContentTransformer">
      <property name="worker">
         <ref bean="transformer.worker.ocr.tiff" />
      </property>
    </bean>
</beans>

Outcomes