AnsweredAssumed Answered

Content Transformation mimetype confusion

Question asked by deisenlord on Mar 20, 2013
Latest reply on Mar 21, 2013 by deisenlord
Please see my simpler explanation below first.  Thanks David

I need to exact some fields from a PDF Form in a workflow.   I have a 3rd party linux command line tool that works perfectly.   I added a content transformation to extensions that is included below.   The final intent is to exact the fields to a CSV file so the transform is supposed to go from PDF to CSV.   Now near as I can tell although mimetype text/csv exists the transform from application/pdf to text/csv does not, only to text/plain.  Hence my extension at the bottom of the post.  Now this appears to work initially just fine but then I find that my preview is broken for pdfs.  Turn on debugging and I see that the system has decided to use my transform for csv -> swf as well as pdf -> text and csv -> text and damn near everything else ??

Incorrect csv -> text

2013-03-20 14:06:30,260  DEBUG [content.transform.TransformerDebug] [http-bio-8443-exec-7] 7.1           csv  txt  <<TemporaryFile>> 2.1 KB transformer.pdf2csvtxt<<Runtime>>
2013-03-20 14:06:30,297  DEBUG [util.exec.RuntimeExec] [http-bio-8443-exec-7] Execution result:
   os:         Linux
   command:    /usr/local/bin/pdftk /opt/alfresco-4.2.d.2/tomcat/temp/Alfresco/RuntimeExecutableContentTransformerWorker_source_661257349285517878.csv dump_data_fields output /opt/alfresco-4.2.d.2/tomcat/temp/Alfresco/RuntimeExecutableContentTransformerWorker_target_1434143270609748430.txt
   succeeded:  false
   exit code:  1
   out:
   err:        Error: Failed to open PDF file:
   /opt/alfresco-4.2.d.2/tomcat/temp/Alfresco/RuntimeExecutableContentTransformerWorker_source_661257349285517878.csv


Incorrect csv -> swf

2013-03-20 14:08:35,389  DEBUG [content.transform.TransformerDebug] [http-apr-80-exec-10] 8.2           csv  swf  dje20130320-4.csv 2.1 KB transformer.pdf2csvtxt<<Runtime>>
2013-03-20 14:08:35,423  DEBUG [util.exec.RuntimeExec] [http-apr-80-exec-10] Execution result:
   os:         Linux
   command:    /usr/local/bin/pdftk /opt/alfresco-4.2.d.2/tomcat/temp/Alfresco/RuntimeExecutableContentTransformerWorker_source_6168968677349312104.csv dump_data_fields output /opt/alfresco-4.2.d.2/tomcat/temp/Alfresco/RuntimeExecutableContentTransformerWorker_target_3388825081834715207.swf
   succeeded:  false
   exit code:  1
   out:
   err:        Error: Failed to open PDF file:
   /opt/alfresco-4.2.d.2/tomcat/temp/Alfresco/RuntimeExecutableContentTransformerWorker_source_6168968677349312104.csv


In fact when I list registered mimetypes via localhost/alfresco/service mimetypes EVERYTHING that didn't have a default transformation is now handled by my extension??   Here's the first few lines from application/pdf

</code>
application/pdf - pdf
Extractors: org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter
Transformable To:
application/acp = Proxy via: org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker(/usr/local/bin/pdftk)
application/dita+xml = Proxy via: org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker(/usr/local/bin/pdftk)
</code>

Signed,
Confused.   Why are all conversions between pdf, csv, text and swf now going though my extension.  I specified an explicitTransformation property.

My extension


<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE beans PUBLIC '-//SPRING//DTD BEAN//EN' 'http://www.springframework.org/dtd/spring-beans.dtd'>

<beans>
        <bean id="transformer.worker.pdf2csv" class="org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker">
                <property name="mimetypeService">
                        <ref bean="mimetypeService" />
                </property>
                <property name="checkCommand">
                        <bean class="org.alfresco.util.exec.RuntimeExec">
                                <property name="commandsAndArguments">
                                        <map>
                                                <entry key="Linux">
                                                        <list>
                                                                <value>ls</value>
                                                                <value>/usr/local/bin/pdftk</value>
                                                        </list>
                                                </entry>
                                        </map>
                                </property>
                        </bean>
                </property>

                <property name="transformCommand">
                        <bean class="org.alfresco.util.exec.RuntimeExec">
                                <property name="commandsAndArguments">
                                        <map>
                                                <entry key="Linux">
                                                        <list>
                                                           <value>/usr/local/bin/pdftk</value>
                                                           <value>${source}</value>
                                                           <value>dump_data_fields</value>
                                                           <value>output</value>
                                                           <value>${target}</value>
                                                        </list>
                                                </entry>
                                        </map>
                                </property>
                                <property name="errorCodes">
                                        <value>1,2</value>
                                </property>
                        </bean>
                </property>

                <property name="explicitTransformations">
                        <list>
                                <bean class="org.alfresco.repo.content.transform.ExplictTransformationDetails">
                                        <property name="sourceMimetype"><value>application/pdf</value></property>
                                        <property name="targetMimetype"><value>text/csv</value></property>
                                </bean>
                        </list>
                </property>
        </bean>

        <bean id="transformer.pdf2csv" class="org.alfresco.repo.content.transform.ProxyContentTransformer" parent="baseContentTransformer">
                <property name="worker">
                        <ref bean="transformer.worker.pdf2csv" />
                </property>
        </bean>
</beans>

Outcomes