ragauss

Metadata Extraction to Tags

Blog Post created by ragauss Employee on May 31, 2013

The Alfresco Tags You Know and Love



The tagging capabilities in Alfresco and Share provide an easy way for users to associate tags with a piece of content and filter content sets by those tags. It's an excellent way to add more context for other members of a team, and is particularly useful for visual content where there may not be any associated text to describe or enable searching for what's depicted.



Some file formats may contain metadata that can aid in that description and searching, and that metadata is likely something Alfresco can already extract, but we wouldn't get the slick user experience that tags afford us.



'Wait! What if we could map extracted metadata to standard tags?' Hey, that's a great idea!

Tag Mapping



We introduced the ability to map metadata extraction to tags in 4.2.c, but it's not enabled out of the box. Let's take a quick look at how things worked, what was changed, then how you might use it.

Metadata Extraction Mapping Refresher



ContentMetadataExtracter is the action executer which does the work of getting the proper MetadataExtracter from the MetadataExtracterRegistry, then calls its extract method to fill in a properties map.



AbstractMappingMetadataExtracter, which is what most metadata extractors extend from, allows you to map incoming metadata fields to Alfresco properties.

Tags Refresher



Alfresco's tags are stored and displayed using the cm:taggable property inside the cm:taggable aspect. A type of category node is created for each tag (or linked to if it already exists) and is associated with the tagged content as a property by the TaggingService.



In the past you couldn't just map your free-form string metadata fields to cm:taggable as it's expecting a nodeRef to perform that property linking.

What's Changed



Now we've caught MalformedNodeRefExceptions related to tags in AbstractMappingMetadataExtracter, and if enableStringTagging=true the raw string values will be passed on as is to the next step. There may be some cases where you actually have a tag's nodeRef as a metadata field in your binary file, in which case no MalformedNodeRefException would be thrown and your content would be linked to that existing tag.



Once we've returned to ContentMetadataExtracter the properties modified by the metadata extractor are iterated and set by the NodeService. It's during that process that we look for cm:taggable and use the TaggingService to create or link the raw string tags, provided enableStringTagging=true and the TaggingService is set.



Multi-valued metadata fields are supported of course, and a tag will be created or linked for each value.

How to Use it



Again, to make all this magic happen you must currently set the taggingService property on ContentMetadataExtracter and set enableStringTagging=true. Your overriding bean definition might look like this:

<bean id='extract-metadata' class='org.alfresco.repo.action.executer.ContentMetadataExtracter' parent='action-executer'>

    <property name='nodeService'>

        <ref bean='NodeService' />

    </property>

    <property name='contentService'>

        <ref bean='ContentService' />

    </property>

    <property name='dictionaryService'>

        <ref bean='dictionaryService' />

    </property>

    <property name='taggingService'>

        <ref bean='TaggingService' />

    </property>


    <property name='metadataExtracterRegistry'>

        <ref bean='metadataExtracterRegistry' />

    </property>

    <property name='applicableTypes'>

        <list>

            <value>{http://www.alfresco.org/model/content/1.0}content</value>

        </list>

    </property>

    <property name='carryAspectProperties'>

        <value>true</value>

    </property>

    <property name='enableStringTagging'>

        <value>true</value>

    </property>


</bean>


then define your metadata extractor mapping, something like:

dc\:subject=cm:taggable


IPTC Keywords Example



The Media Management module supports full IPTC extraction for images, which is where keywords used by so many photo editing and organization programs is stored, and a perfect candidate for mapping to Alfresco tags:



Tag Mapping



What are other metadata fields are you thinking of mapping to tags?

Attachments

Outcomes