AnsweredAssumed Answered

Extracting XML meta data into aspects

Question asked by samuel.penn on Sep 29, 2008
Latest reply on Nov 6, 2008 by mcrocker
Hi,

I'm currently trying to get my head around the XML Meta Data extractor as described at http://wiki.alfresco.com/wiki/Metadata_Extraction.

Something that isn't made clear, is what happens if the extractor is setup to write the data into a property field which only exists in an aspect? Is the aspect automatically added to the document, or does it have to already exist on the document? I'd like to define an aspect which describes some of the form fields in the WCM form data, and have that aspect automatically added by the extractor as required. I can't see any mention of whether something needs to be set up to get it to happen, or whether it's impossible.

Having setup an extractor in wcm-xml-metadata-extracter-context.xml, and switching on debug for metadata, I can see that my configuration is being picked up when the server starts:


16:27:55,613 DEBUG [content.metadata.AbstractMappingMetadataExtracter] Added mapping from atoz to [{http://www.centrom.com/alfresco/localgov/model}atoz]
16:27:55,629 DEBUG [metadata.xml.XPathMetadataExtracter] Added mapping from atoz to /art:article/art:header/art:atoz/text()

However, when I save a suitable web form in WCM, I see the following in the logs:


16:29:18,083 DEBUG [content.metadata.MetadataExtracterRegistry] Finding extractors for text/xml
16:29:18,130 DEBUG [metadata.xml.XPathMetadataExtracter]
No working metadata extractor could be found:
   Document: ContentAccessor[ contentUrl=store://2008/9/29/16/29/cf7eb2e7-e0e5-4cca-972f-655a78f91e98.bin, mimetype=text/xml, size=760, encoding=UTF-8, locale=en_US]
16:29:18,130 DEBUG [metadata.xml.XPathMetadataExtracter]
XML metadata extractor redirected:
   Reader:    ContentAccessor[ contentUrl=store://2008/9/29/16/29/cf7eb2e7-e0e5-4cca-972f-655a78f91e98.bin, mimetype=text/xml, size=760, encoding=UTF-8, locale=en_US]
   Extracter: null
   Metadata: {{http://www.alfresco.org/model/content/1.0}name=metatest.xml, {http://www.alfresco.org
/model/system/1.0}node-dbid=19105, {http://www.alfresco.org/model/system/1.0}store-identifier=hertsm
ere–admin–preview, {http://www.alfresco.org/model/wcmappmodel/1.0}orginalparentpath=hertsmere--adm
in–preview:/www/avm_webapps/ROOT, {http://www.alfresco.org/model/content/1.0}content=contentUrl=sto
re://2008/9/29/16/29/cf7eb2e7-e0e5-4cca-972f-655a78f91e98.bin|mimetype=text/xml|size=760|encoding=UT
F-8|locale=en_US_, {http://www.alfresco.org/model/content/1.0}owner=admin, {http://www.alfresco.org/
model/content/1.0}title={en_US=metatest.xml}, {http://www.alfresco.org/model/content/1.0}modified=Mo
n Sep 29 16:29:17 BST 2008, {http://www.alfresco.org/model/system/1.0}node-uuid=UNKNOWN, {http://www
.alfresco.org/model/wcmappmodel/1.0}parentformname=web-article, {http://www.alfresco.org/model/conte
nt/1.0}created=Mon Sep 29 16:29:17 BST 2008, {http://www.alfresco.org/model/system/1.0}store-protoco
l=avm, {http://www.alfresco.org/model/content/1.0}creator=admin, {http://www.alfresco.org/model/cont
ent/1.0}modifier=admin, {http://www.alfresco.org/model/wcmappmodel/1.0}renditions=[/www/avm_webapps/
ROOT/metatest.jsp]}

The 'no working metadata extractor could be found' suggests that it's not actually finding the extractor. I also had the impression that the extraction only happened when the form content was published to the staging sandbox - this debug is appearing when I save the form content in the user's sandbox, and I get no metadata debug at all when the form content is pushed to staging.

Looking at any version of the metadata.xml file in the node browser shows that no aspect has been added, and no metadata has been added.

The meta data extraction config I'm using is below - could anyone tell me if it looks sensible?


   <bean id="extracter.xml.centrom.ArticleModelMetadataExtracter"
         class="org.alfresco.repo.content.metadata.xml.XPathMetadataExtracter"
         parent="baseMetadataExtracter"
         init-method="init" >
      <property name="mappingProperties">
         <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean">
            <property name="properties">
               <props>
                  <prop key="namespace.prefix.lg">http://www.centrom.com/alfresco/localgov/model</prop>
                  <prop key="atoz">lg:atoz</prop>
               </props>
            </property>
         </bean>
      </property>
     
      <property name="xpathMappingProperties">
         <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean">
            <property name="properties">
               <props>
                  <prop key="namespace.prefix.art">http://www.centrom.com/localgov/wcm/article</prop>
                  <prop key="atoz">/art:article/art:header/art:atoz/text()</prop>
               </props>
            </property>
         </bean>
      </property>
   </bean>
  
  
   <!–
      This selector examines the XML documents, executing the given XPath statements until a
      match is made.
   –>
   <bean id="extracter.xml.centrom.selector.XPathSelector"
         class="org.alfresco.repo.content.selector.XPathContentWorkerSelector"
         init-method="init">
      <property name="workers">
         <map>
            <entry key="/art:article">
               <ref bean="extracter.xml.centrom.ArticleModelMetadataExtracter" />
            </entry>
         </map>
      </property>
   </bean>
  
   <bean id="extracter.xml.centrom.XMLMetadataExtracter"
         class="org.alfresco.repo.content.metadata.xml.XmlMetadataExtracter"
         parent="baseMetadataExtracter">

      <property name="registry">
         <ref bean="avmMetadataExtracterRegistry" />
      </property>

      <property name="overwritePolicy">
         <value>EAGER</value>
      </property>
      <property name="selectors">
         <list>
            <ref bean="extracter.xml.centrom.selector.XPathSelector" />
         </list>
      </property>
   </bean>

My aspect is defined as follows:


   <namespaces>
      <namespace uri="http://www.centrom.com/alfresco/localgov/model" prefix="lg"/>
   </namespaces>
  
    <aspects>
        <aspect name="lg:article">
            <title>Article Aspect</title>
            <properties>
                <property name="lg:atoz">
                    <type>d:text</type>
                </property>
            </properties>
        </aspect>
    </aspects>


Thanks,
Sam.

Outcomes