AnsweredAssumed Answered

Custom metadata extraction fails.

Question asked by fintan on Feb 1, 2016
Hi,

I am trying to extract metadata from pdf and tiff files on Alfresco 5.0.d. I've embeded the metadata with exiftool.


exiftool -config exif_config.pl -json=D003038_003.json -v D003038_Athassal\ Abbey/D003038_003.tif


%Image::ExifTool::XMP::xxx = (
    GROUPS => { 0 => 'XMP', 1 => 'xxx', 2 => 'Document' },
    NAMESPACE => { 'xxx' => 'http://XMP.xxx.org/xxx/1.0/'},
    WRITABLE => 'string',

    County => { },
    NatMonNo => { }
);

[       
    {
      "SourceFile": "D003038_Athassal Abbey/D003038_003.pdf",
      "Filename" : "D003038_003.pdf",
      "XMP-dc:Relation" : "D003038_Athassal Abbey/D003038_003.tif",
      "Title" : "Athassal Abbey",
      "Subject" : "Elevations 25-35 Keyplan",
      "Keywords" : "Religious house, Augustinian Canons",
      "CreationDate" : "2005:03:11T00:00:00",
      "Author" : "CS",
      "county" : "Tipperary South",
      "NatMonNo" : 120
    }
}


The embedding works fine. So I setup the extrator context file with this inside.


  <bean class="org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter" id="extracter.PdfBoxXXX" parent="baseMetadataExtracter">
    <property name="inheritDefaultMapping">
      <value>true</value>
    </property>
    <property name="mappingProperties">
      <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean">
        <property name="location">
          <value>classpath:alfresco/extension/PdfBoxMetadataExtracter.properties</value>
        </property>
      </bean>
    </property>
  </bean>


The corresponding properties file

#
# PdfBoxMetadataExtracter - default mapping
#
# author: Derek Hulley

# Namespaces
namespace.prefix.cm=http://www.alfresco.org/model/content/1.0
namespace.prefix.xxx=http://www.xxx.ie

# Mappings
title=cm:title
description=cm:description
creator=cm:author
dc\:rights=cm:rights
dc\:relation=cm:relation
createdate=cm:created
subject=cm:taggable
       
county=cm:companyaddress2
       
NatMonNo=xxx:natmonno


The Model file.


    <aspects>
        <aspect name="xxx:location">
            <title>xxx: Location</title>
            <properties>
                <property name="cm:companyaddress2">
                    <title>County</title>
                    <type>d:text</type>
                    <mandatory>false</mandatory>
                    <constraints>
                        <constraint ref="xxx:counties"/>
                    </constraints>
                </property>
            </properties>
        </aspect>
    </aspects


The aspect works fine when applied to a document. Add/edit and delete. The extracting of predefined metadata title,description, etc.. works as well but for the life of me I can't get the county property to be extracted.

I've tried various combinations with no result. The output from Alfresco is


Updating node properties for "workspace://SpacesStore/0ef51606-eecb-4284-a152-a5d95e3fbfc5" from {} to {{http://www.alfresco.org/model/content/1.0}name=D003038_003.pdf, {http://www.alfresco.org/model/system/1.0}node-dbid=2253, {http://www.alfresco.org/model/system/1.0}store-identifier=SpacesStore, {http://www.alfresco.org/model/system/1.0}node-uuid=0ef51606-eecb-4284-a152-a5d95e3fbfc5, {http://www.alfresco.org/model/content/1.0}modified=Mon Feb 01 14:41:38 GMT 2016, {http://www.alfresco.org/model/system/1.0}locale=en_GB, {http://www.alfresco.org/model/content/1.0}created=Mon Feb 01 14:41:38 GMT 2016, {http://www.alfresco.org/model/system/1.0}store-protocol=workspace, {http://www.alfresco.org/model/content/1.0}creator=admin, {http://www.alfresco.org/model/content/1.0}modifier=admin}.
Creating child association from "workspace://SpacesStore/17127dac-3b59-4b7b-a6a3-890ddcf68ea7" to "workspace://SpacesStore/0ef51606-eecb-4284-a152-a5d95e3fbfc5".
Updating node properties for "workspace://SpacesStore/ac79fcb4-334f-4702-853e-e6ba99b8e11b" from {{http://www.alfresco.org/model/content/1.0}name=documentLibrary, {http://www.alfresco.org/model/content/1.0}tagScopeCache=null, {http://www.alfresco.org/model/system/1.0}node-dbid=873, {http://www.alfresco.org/model/system/1.0}store-identifier=SpacesStore, {http://www.alfresco.org/model/site/1.0}componentId=documentLibrary, {http://www.alfresco.org/model/system/1.0}locale=en_GB, {http://www.alfresco.org/model/content/1.0}owner=admin, {http://www.alfresco.org/model/system/1.0}node-uuid=ac79fcb4-334f-4702-853e-e6ba99b8e11b, {http://www.alfresco.org/model/content/1.0}modified=Wed Jan 27 12:15:31 GMT 2016, {http://www.alfresco.org/model/content/1.0}created=Wed Jan 27 10:10:56 GMT 2016, {http://www.alfresco.org/model/system/1.0}store-protocol=workspace, {http://www.alfresco.org/model/content/1.0}creator=admin, {http://www.alfresco.org/model/content/1.0}description={en=Document Library}, {http://www.alfresco.org/model/content/1.0}modifier=admin} to {{http://www.alfresco.org/model/content/1.0}tagScopeCache=contentUrl=store://2016/2/1/14/41/d8c1d252-2ffa-4066-8b41-c91c9b5c75fe.bin|mimetype=text/plain|size=26|encoding=UTF-8|locale=en_US_|id=1323, {http://www.alfresco.org/model/content/1.0}name=documentLibrary, {http://www.alfresco.org/model/system/1.0}node-dbid=873, {http://www.alfresco.org/model/system/1.0}store-identifier=SpacesStore, {http://www.alfresco.org/model/site/1.0}componentId=documentLibrary, {http://www.alfresco.org/model/system/1.0}locale=en_GB, {http://www.alfresco.org/model/content/1.0}owner=admin, {http://www.alfresco.org/model/system/1.0}node-uuid=ac79fcb4-334f-4702-853e-e6ba99b8e11b, {http://www.alfresco.org/model/content/1.0}modified=Wed Jan 27 12:15:31 GMT 2016, {http://www.alfresco.org/model/content/1.0}created=Wed Jan 27 10:10:56 GMT 2016, {http://www.alfresco.org/model/system/1.0}store-protocol=workspace, {http://www.alfresco.org/model/content/1.0}creator=admin, {http://www.alfresco.org/model/content/1.0}description={en=Document Library}, {http://www.alfresco.org/model/content/1.0}modifier=admin}.
Updating node properties for "workspace://SpacesStore/894cf30c-1c89-436c-a3a6-139bae368b37" from {{http://www.alfresco.org/model/content/1.0}name=nms, {http://www.alfresco.org/model/site/1.0}sitePreset=site-dashboard, {http://www.alfresco.org/model/content/1.0}tagScopeCache=null, {http://www.alfresco.org/model/system/1.0}node-dbid=855, {http://www.alfresco.org/model/system/1.0}store-identifier=SpacesStore, {http://www.alfresco.org/model/system/1.0}locale=en_GB, {http://www.alfresco.org/model/content/1.0}title={en=nms}, {http://www.alfresco.org/model/system/1.0}node-uuid=894cf30c-1c89-436c-a3a6-139bae368b37, {http://www.alfresco.org/model/content/1.0}modified=Wed Jan 27 10:10:56 GMT 2016, {http://www.alfresco.org/model/site/1.0}siteVisibility=PUBLIC, {http://www.alfresco.org/model/content/1.0}created=Wed Jan 27 10:10:55 GMT 2016, {http://www.alfresco.org/model/system/1.0}store-protocol=workspace, {http://www.alfresco.org/model/content/1.0}description={en=test nms}, {http://www.alfresco.org/model/content/1.0}creator=admin, {http://www.alfresco.org/model/content/1.0}modifier=admin} to {{http://www.alfresco.org/model/content/1.0}tagScopeCache=contentUrl=store://2016/2/1/14/41/756ff773-079e-410b-8a1b-1095ea9798ea.bin|mimetype=text/plain|size=26|encoding=UTF-8|locale=en_US_|id=1324, {http://www.alfresco.org/model/site/1.0}sitePreset=site-dashboard, {http://www.alfresco.org/model/content/1.0}name=nms, {http://www.alfresco.org/model/system/1.0}node-dbid=855, {http://www.alfresco.org/model/system/1.0}store-identifier=SpacesStore, {http://www.alfresco.org/model/system/1.0}locale=en_GB, {http://www.alfresco.org/model/content/1.0}title={en=nms}, {http://www.alfresco.org/model/system/1.0}node-uuid=894cf30c-1c89-436c-a3a6-139bae368b37, {http://www.alfresco.org/model/content/1.0}modified=Wed Jan 27 10:10:56 GMT 2016, {http://www.alfresco.org/model/site/1.0}siteVisibility=PUBLIC, {http://www.alfresco.org/model/content/1.0}created=Wed Jan 27 10:10:55 GMT 2016, {http://www.alfresco.org/model/system/1.0}store-protocol=workspace, {http://www.alfresco.org/model/content/1.0}description={en=test nms}, {http://www.alfresco.org/model/content/1.0}creator=admin, {http://www.alfresco.org/model/content/1.0}modifier=admin}.
2016-02-01 14:41:43,269  INFO  [web.scripts.DictionaryQuery] [http-apr-8080-exec-1] Successfully retrieved Data Dictionary from Alfresco.
2016-02-01 14:41:43,517  INFO  [management.subsystems.ChildApplicationContextFactory] [http-apr-8080-exec-4] Starting 'Transformers' subsystem, ID: [Transformers, default]
2016-02-01 14:41:43,779  INFO  [management.subsystems.ChildApplicationContextFactory] [http-apr-8080-exec-4] Startup of 'Transformers' subsystem, ID: [Transformers, default] complete
Updating node properties for "workspace://SpacesStore/0ef51606-eecb-4284-a152-a5d95e3fbfc5" from {{http://www.alfresco.org/model/content/1.0}name=D003038_003.pdf, {http://www.alfresco.org/model/content/1.0}autoVersionOnUpdateProps=false, {http://www.alfresco.org/model/system/1.0}node-dbid=2253, {http://www.alfresco.org/model/system/1.0}store-identifier=SpacesStore, {http://www.alfresco.org/model/content/1.0}versionLabel=1.0, {http://www.alfresco.org/model/system/1.0}locale=en_GB, {http://www.alfresco.org/model/content/1.0}content=contentUrl=store://2016/2/1/14/41/e78c72d0-577c-4622-ba85-a00f8164f9c1.bin|mimetype=application/pdf|size=1497301|encoding=UTF-8|locale=en_GB_|id=1321, {http://www.alfresco.org/model/content/1.0}autoVersion=true, {http://www.alfresco.org/model/content/1.0}title={en_GB=Athassal Abbey}, {http://www.alfresco.org/model/content/1.0}author=CS, {http://www.alfresco.org/model/content/1.0}taggable=[workspace://SpacesStore/481c43aa-9a83-43f1-b1a5-dbeeccad390d], {http://www.alfresco.org/model/system/1.0}node-uuid=0ef51606-eecb-4284-a152-a5d95e3fbfc5, {http://www.alfresco.org/model/content/1.0}modified=Mon Feb 01 14:41:42 GMT 2016, {http://www.alfresco.org/model/content/1.0}created=Mon Feb 01 14:41:38 GMT 2016, {http://www.alfresco.org/model/system/1.0}store-protocol=workspace, {http://www.alfresco.org/model/content/1.0}description={en_GB=Elevations 25-35 Keyplan}, {http://www.alfresco.org/model/content/1.0}creator=admin, {http://www.alfresco.org/model/content/1.0}modifier=admin, {http://www.alfresco.org/model/content/1.0}initialVersion=true} to {{http://www.alfresco.org/model/content/1.0}autoVersionOnUpdateProps=false, {http://www.alfresco.org/model/content/1.0}name=D003038_003.pdf, {http://www.alfresco.org/model/content/1.0}lastThumbnailModification=[doclib:1454337707490], {http://www.alfresco.org/model/system/1.0}node-dbid=2253, {http://www.alfresco.org/model/system/1.0}store-identifier=SpacesStore, {http://www.alfresco.org/model/content/1.0}versionLabel=1.0, {http://www.alfresco.org/model/system/1.0}locale=en_GB, {http://www.alfresco.org/model/content/1.0}content=contentUrl=store://2016/2/1/14/41/e78c72d0-577c-4622-ba85-a00f8164f9c1.bin|mimetype=application/pdf|size=1497301|encoding=UTF-8|locale=en_GB_|id=1321, {http://www.alfresco.org/model/content/1.0}autoVersion=true, {http://www.alfresco.org/model/content/1.0}title={en_GB=Athassal Abbey}, {http://www.alfresco.org/model/content/1.0}taggable=[workspace://SpacesStore/481c43aa-9a83-43f1-b1a5-dbeeccad390d], {http://www.alfresco.org/model/content/1.0}author=CS, {http://www.alfresco.org/model/system/1.0}node-uuid=0ef51606-eecb-4284-a152-a5d95e3fbfc5, {http://www.alfresco.org/model/content/1.0}modified=Mon Feb 01 14:41:42 GMT 2016, {http://www.alfresco.org/model/content/1.0}created=Mon Feb 01 14:41:38 GMT 2016, {http://www.alfresco.org/model/system/1.0}store-protocol=workspace, {http://www.alfresco.org/model/content/1.0}description={en_GB=Elevations 25-35 Keyplan}, {http://www.alfresco.org/model/content/1.0}creator=admin, {http://www.alfresco.org/model/content/1.0}modifier=admin, {http://www.alfresco.org/model/content/1.0}initialVersion=true}.
Updating node properties for "workspace://SpacesStore/261c4638-bbbb-43f5-994a-5ffb7f22895d" from {} to {{http://www.alfresco.org/model/content/1.0}name=doclib, {http://www.alfresco.org/model/system/1.0}node-dbid=2256, {http://www.alfresco.org/model/system/1.0}store-identifier=SpacesStore, {http://www.alfresco.org/model/system/1.0}node-uuid=261c4638-bbbb-43f5-994a-5ffb7f22895d, {http://www.alfresco.org/model/content/1.0}modified=Mon Feb 01 14:41:47 GMT 2016, {http://www.alfresco.org/model/system/1.0}locale=en_US, {http://www.alfresco.org/model/content/1.0}created=Mon Feb 01 14:41:47 GMT 2016, {http://www.alfresco.org/model/content/1.0}contentPropertyName={http://www.alfresco.org/model/content/1.0}content, {http://www.alfresco.org/model/system/1.0}store-protocol=workspace, {http://www.alfresco.org/model/content/1.0}creator=admin, {http://www.alfresco.org/model/content/1.0}modifier=admin}.
Updating node properties for "workspace://SpacesStore/7de49008-eff1-4ee0-bf0e-2faf8cf6c7f2" from {} to {{http://www.alfresco.org/model/content/1.0}isContentIndexed=true, {http://www.alfresco.org/model/content/1.0}name=7de49008-eff1-4ee0-bf0e-2faf8cf6c7f2, {http://www.alfresco.org/model/system/1.0}node-dbid=2257, {http://www.alfresco.org/model/system/1.0}store-identifier=SpacesStore, {http://www.alfresco.org/model/system/1.0}node-uuid=7de49008-eff1-4ee0-bf0e-2faf8cf6c7f2, {http://www.alfresco.org/model/content/1.0}modified=Mon Feb 01 14:41:47 GMT 2016, {http://www.alfresco.org/model/system/1.0}locale=en_US, {http://www.alfresco.org/model/content/1.0}created=Mon Feb 01 14:41:47 GMT 2016, {http://www.alfresco.org/model/system/1.0}store-protocol=workspace, {http://www.alfresco.org/model/content/1.0}creator=admin, {http://www.alfresco.org/model/content/1.0}isIndexed=false, {http://www.alfresco.org/model/content/1.0}modifier=admin}


Is there something I'm missing? Thanks

  Fintan

Outcomes