AnsweredAssumed Answered

Run pdfsandwich at pdf-file-upload

Question asked by nic on May 6, 2015
Hello,

i'm playing around with the community edition 5.0.d and trying to integrate an ocr on pdf files. The installation is running on an Ubuntu Server. There is a command-line tool called pdfsandwich, which automatically does the necessary steps (split, transform, tesseract-ocr, etc) on the given pdf. Now i want to integrate this tool in alfresco. If a pdf file is uploaded to any document library, pdfsandwich should be executed for the uploaded file.

I already tried to write the following transformer:


<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE beans PUBLIC '-//SPRING//DTD BEAN//EN' 'http://www.springframework.org/dtd/spring-beans.dtd'>

<beans>
   <bean id="transformer.worker.pdfocr" class="org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker">
      <property name="mimetypeService">
         <ref bean="mimetypeService" />
      </property>
      <property name="checkCommand">
         <bean class="org.alfresco.util.exec.RuntimeExec">
            <property name="commandsAndArguments">
               <map>
                  <entry key=".*">
                     <list>
                        <value>ls</value>
                        <value>/usr/bin/pdfsandwich</value>
                     </list>
                  </entry>
               </map>
            </property>
         </bean>
      </property>

      <property name="transformCommand">
         <bean class="org.alfresco.util.exec.RuntimeExec">
            <property name="commandsAndArguments">
               <map>
                  <entry key=".*">
                     <list>
                        <value>/usr/bin/pdfsandwich</value>
                        <value>${source}</value>
                        <value>${target}</value>
                     </list>
                  </entry>
               </map>
            </property>
            <property name="errorCodes">
               <value>1,2</value>
            </property>
         </bean>
      </property>

      <property name="explicitTransformations">
         <list>
            <bean class="org.alfresco.repo.content.transform.ExplictTransformationDetails">
               <property name="sourceMimetype"><value>application/pdf</value></property>
               <property name="targetMimetype"><value>application/pdf</value></property>
            </bean>
         </list>
      </property>
   </bean>

   <bean id="transformer.pdfocr" class="org.alfresco.repo.content.transform.ProxyContentTransformer" parent="baseContentTransformer">
      <property name="worker">
         <ref bean="transformer.worker.pdfocr" />
      </property>
   </bean>
</beans>


But then i get the following error:


2015-05-06 13:13:39,269 ERROR [org.alfresco.repo.content.transform.ContentTransformerHelper] [localhost-startStop-1] content.transformer.pdfocr.extensions.pdf.pdf.priority=50
2015-05-06 13:13:39,269 ERROR [org.alfresco.repo.content.transform.ContentTransformerHelper] [localhost-startStop-1] content.transformer.pdfocr.extensions.pdf.pdf.supported=true


I think there are 2 problems. First, pdfsandwich is only given an input-file. I tried it without

<property name="targetMimetype"><value>application/pdf</value></property>

but that doesn't help.

Second, i'm trying to do a transformation on two equal mime-types and i'm not sure if that works.

Is there a simple way to run a command-line tool, if a pdf is uploaded to any rep?

Thank you

Outcomes