AnsweredAssumed Answered

Extraction of custom metadata properties from docx files

Question asked by mrksjs on May 7, 2012
Latest reply on Oct 4, 2012 by kbonnet
hello,

i have alfresco community 4.0.d running and im trying to extract custom metadata properties from a .docx file.
found some threads about this, but no working solution.

- i created a custom model, with a custom content type + custom apsect
- created a folder with a rule that
– specializes all content to my type
– adds the custom aspect
– extracts common metadata
- created an xml file in /tomcat/shared/classes/alfresco/extension with bean definition for 'org.alfresco.repo.content.metadata.PoiMetadataExtracter' and added my mappings

my-metadata-extracter-context.xml
     <bean id="extracter.Poi" class="org.alfresco.repo.content.metadata.PoiMetadataExtracter" parent="baseMetadataExtracter">
        <property name="inheritDefaultMapping">
            <value>true</value>
        </property>
        <property name="mappingProperties">
            <props>
                <prop key="namespace.prefix.my">http://www.mymodel.de/model/1.0</prop>
            <!– default property 'author' works, 'user1' won't –>
                <prop key="author">my:ptNr</prop>
            </props>
        </property>
    </bean>

logfile
 2012-05-05 04:01:27,899  DEBUG [content.metadata.MetadataExtracterRegistry] [main] Registering metadata extracter: org.alfresco.repo.content.metadata.PoiMetadataExtracter@114beb40
2012-05-05 04:01:27,915  DEBUG [content.metadata.AbstractMappingMetadataExtracter] [main] Added mapping from author to [{http://www.mymodel.de/model/1.0}ptNr]

2012-05-05 04:04:05,249  DEBUG [content.metadata.AbstractMappingMetadataExtracter] [http-8080-1] Converted extracted raw values to system values:
   Raw Properties:    {Last-Author=asdf, subject=QWERTZ, Application-Name=Microsoft Office Word, Author=ABC DEF, Application-Version=14.0000, Character-Count-With-Spaces=4385, date=2012-04-16T08:26:00Z, publisher=null, creator=ABC DEF, Word-Count=601, Creation-Date=2012-04-16T08:26:00Z, author=ABC DEF, title=null, created=2012-04-16T08:26:00Z, Line-Count=31, description=QWERTZ, Paragraph-Count=8, Last-Printed=2009-10-07T07:41:00Z, Revision-Number=4, Template=default.dotx, Page-Count=3, Last-Modified=2012-05-02T16:11:00Z, xmpTPg:NPages=3, Character Count=3792, Content-Type=application/vnd.openxmlformats-officedocument.wordprocessingml.document, comments=null}
   System Properties: {{http://www.mymodel.de/model/1.0}ptNr=ABC DEF, {http://www.alfresco.org/model/content/1.0}description=QWERTZ, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/content/1.0}author=ABC DEF, {http://www.alfresco.org/model/content/1.0}created=2012-04-16T08:26:00Z}
2012-05-05 04:04:05,265  DEBUG [content.metadata.AbstractMappingMetadataExtracter] [http-8080-1] Extracted Metadata from ContentAccessor[ contentUrl=store:///opt/alfresco-4.0.d/tomcat/temp/Alfresco/alfresco4310738247854243061.upload, mimetype=application/vnd.openxmlformats-officedocument.wordprocessingml.document, size=80093, encoding=UTF-8, locale=en_US]
  Found: {Last-Author=asdf, subject=QWERTZ, Application-Name=Microsoft Office Word, Author=ABC DEF, Application-Version=14.0000, Character-Count-With-Spaces=4385, date=2012-04-16T08:26:00Z, publisher=null, creator=ABC DEF, Word-Count=601, Creation-Date=2012-04-16T08:26:00Z, author=ABC DEF, title=null, created=2012-04-16T08:26:00Z, Line-Count=31, description=QWERTZ, Paragraph-Count=8, Last-Printed=2009-10-07T07:41:00Z, Revision-Number=4, Template=default.dotx, Page-Count=3, Last-Modified=2012-05-02T16:11:00Z, xmpTPg:NPages=3, Character Count=3792, Content-Type=application/vnd.openxmlformats-officedocument.wordprocessingml.document, comments=null}
  Mapped and Accepted: {{http://www.alfresco.org/model/content/1.0}created=Mon Apr 16 10:26:00 CEST 2012, {http://www.mymodel.de/model/1.0}ptNr=ABC DEF, {http://www.alfresco.org/model/content/1.0}description={en_US=QWERTZ}, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/content/1.0}author=ABC DEF}


can't retrieve the data! :'(

Outcomes