@sanjaybandhniya Hope you have read the information shared above.
"TikaAutoMetadataExtracter takes care of other mimetypes which doesn't have specific extractors, It uses AutoDetectParser for parsing and extraction. E.g. for images"
And gave example of TikaAutoMetadataExtractor and other with bold letters: "An example for Images, PDF, Office"
Look ath this bean definition which is provided in above response as well:
<bean id="extracter.TikaAuto" class="org.alfresco.repo.content.metadata.TikaAutoMetadataExtracter" parent="baseMetadataExtracter">
This is my bean.
<bean id="extracter.TikaAuto" class="org.alfresco.repo.content.metadata.TikaAutoMetadataExtracter" parent="baseMetadataExtracter"> <constructor-arg> <ref bean="tikaConfig" /> </constructor-arg> <property name="overwritePolicy"> <value>EAGER</value> </property> <property name="mappingProperties"> <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean"> <property name="location"> <value>classpath:alfresco/metadata/TikaAutoMetadataExtracter.properties</value> </property> </bean> </property> </bean> <bean id="extracter.PDFBox" class="org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter" parent="baseMetadataExtracter"> <property name="documentSelector" ref="pdfBoxEmbededDocumentSelector" /> <property name="overwritePolicy"> <value>EAGER</value> </property> <property name="mappingProperties"> <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean"> <property name="location"> <value>classpath:alfresco/metadata/PdfBoxMetadataExtracter.properties</value> </property> </bean> </property> </bean> <bean id="extracter.Poi" class="org.alfresco.repo.content.metadata.PoiMetadataExtracter" parent="baseMetadataExtracter"> <property name="poiFootnotesLimit" value="${content.transformer.Poi.poiFootnotesLimit}" /> <property name="poiExtractPropertiesOnly" value="${content.transformer.Poi.poiExtractPropertiesOnly}" /> <property name="poiAllowableXslfRelationshipTypes"> <list> <!-- These values are valid for Office 2007, 2010 and 2013 --> <value>http://schemas.openxmlformats.org/officeDocument/2006/relationships/presProps</value> <value>http://schemas.openxmlformats.org/officeDocument/2006/relationships/viewProps</value> </list> </property> <property name="overwritePolicy"> <value>EAGER</value> </property> <property name="mappingProperties"> <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean"> <property name="location"> <value>classpath:alfresco/metadata/PoiMetadataExtracter.properties</value> </property> </bean> </property> </bean>
Your bean looks correct, what is the config in these files:
PdfBoxMetadataExtracter.properties
PoiMetadataExtracter.properties
TikaAutoMetadataExtracter.properties
Properties file having my custom properties.
namespace.prefix.ks=http://www.alfresco.com/model/custom-model/1.0
created=ksriginalCreationDate
modified=ksriginalModificationDate
My content Model
<aspects> <aspect name="ks:importedDoc"> <properties> <property name="ks:originalCreationDate"> <type>d:date</type> </property> <property name="ks:originalModificationDate"> <type>d:date</type> </property> </properties> </aspect> </aspects>
It is working for Pdf and Office files.
Hmm kind of weird. It should work i think. Let me try at my end and see what i get.
It seems to work perfectly. Try re-checking the configs and logs and see what you get.
Here is the test i did:
<bean id="extracter.TikaAuto" class="org.alfresco.repo.content.metadata.TikaAutoMetadataExtracter" parent="baseMetadataExtracter"> <constructor-arg> <ref bean="tikaConfig" /> </constructor-arg> <property name="overwritePolicy"> <value>EAGER</value> </property> <property name="mappingProperties"> <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean"> <property name="location"> <value>classpath:alfresco/metadata/TikaAutoMetadataExtracter.properties</value> </property> </bean> </property> </bean> <bean id="extracter.PDFBox" class="org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter" parent="baseMetadataExtracter"> <property name="documentSelector" ref="pdfBoxEmbededDocumentSelector" /> <property name="overwritePolicy"> <value>EAGER</value> </property> <!-- Including custom properties --> <property name="mappingProperties"> <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean"> <property name="location"> <value>classpath:alfresco/metadata/PdfBoxMetadataExtracter.properties</value> </property> </bean> </property> </bean> <bean id="extracter.Poi" class="org.alfresco.repo.content.metadata.PoiMetadataExtracter" parent="baseMetadataExtracter"> <property name="poiFootnotesLimit" value="${content.transformer.Poi.poiFootnotesLimit}" /> <property name="poiExtractPropertiesOnly" value="${content.transformer.Poi.poiExtractPropertiesOnly}" /> <property name="poiAllowableXslfRelationshipTypes"> <list> <!-- These values are valid for Office 2007, 2010 and 2013 --> <value>http://schemas.openxmlformats.org/officeDocument/2006/relationships/presProps</value> <value>http://schemas.openxmlformats.org/officeDocument/2006/relationships/viewProps</value> </list> </property> <property name="overwritePolicy"> <value>EAGER</value> </property> <!-- Including custom properties --> <property name="mappingProperties"> <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean"> <property name="location"> <value>classpath:alfresco/metadata/PoiMetadataExtracter.properties</value> </property> </bean> </property> </bean>
TikaAutoMetadataExtracter.properties
# Namespaces namespace.prefix.cm=http://www.alfresco.org/model/content/1.0 namespace.prefix.exif=http://www.alfresco.org/model/exif/1.0 namespace.prefix.audio=http://www.alfresco.org/model/audio/1.0
#Custom model namespace namespace.prefix.demo=http://www.github.com/abhinavmishra14/model/demo/1.0
# OOTB Default Mappings author=cm:author title=cm:title description=cm:description created=cm:created
# Custom Properties to be mapped created=demo:originCreatedDate modified=demo:originModifiedDate
geo\:lat=cm:latitude geo\:long=cm:longitude tiff\:ImageWidth=exif:pixelXDimension tiff\:ImageLength=exif:pixelYDimension tiff\:Make=exif:manufacturer tiff\:Model=exif:model tiff\:Software=exif:software tiff\:Orientation=exif:orientation tiff\:XResolution=exif:xResolution tiff\:YResolution=exif:yResolution tiff\:ResolutionUnit=exif:resolutionUnit exif\:Flash=exif:flash exif\:ExposureTime=exif:exposureTime exif\:FNumber=exif:fNumber exif\:FocalLength=exif:focalLength exif\:IsoSpeedRatings=exif:isoSpeedRatings exif\:DateTimeOriginal=exif:dateTimeOriginal xmpDM\:album=audio:album xmpDM\:artist=audio:artist xmpDM\:composer=audio:composer xmpDM\:engineer=audio:engineer xmpDM\:genre=audio:genre xmpDM\:trackNumber=audio:trackNumber xmpDM\:releaseDate=audio:releaseDate #xmpDM:logComment xmpDM\:audioSampleRate=audio:sampleRate xmpDM\:audioSampleType=audio:sampleType xmpDM\:audioChannelType=audio:channelType xmpDM\:audioCompressor=audio:compressor
PdfBoxMetadataExtracter.properties
# Namespaces namespace.prefix.cm=http://www.alfresco.org/model/content/1.0
#Custom model namespace namespace.prefix.demo=http://www.github.com/abhinavmishra14/model/demo/1.0
# OOTB Default Mappings author=cm:author title=cm:title subject=cm:description
# Custom Properties to be mapped created=demo:originCreatedDate modified=demo:originModifiedDate
PoiMetadataExtracter.properties
# Namespaces namespace.prefix.cm=http://www.alfresco.org/model/content/1.0
#Custom model namespace namespace.prefix.demo=http://www.github.com/abhinavmishra14/model/demo/1.0
# OOTB Default Mappings author=cm:author title=cm:title description=cm:description
# Custom Properties to be mapped created=demo:originCreatedDate modified=demo:originModifiedDate
ContentModel:
<aspect name="demo:testAuditMetadata">
<title>Test Audit Metadata</title>
<description>Test Audit Metadata</description>
<properties>
<property name="demo:originCreatedDate">
<title>Original Created Date</title>
<description>Original Created Date</description>
<type>d:text</type>
</property>
<property name="demo:originModifiedDate">
<title>Original Modified Date</title>
<description>Original Modified Date</description>
<type>d:text</type>
</property>
</properties>
</aspect>
Log:
Image Extraction: Mapped and Accepted: {{http://www.alfresco.org/model/exif/1.0}focalLength=4.5, {http://www.alfresco.org/model/exif/1.0}model=TG-5, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/exif/1.0}flash=false, {http://www.alfresco.org/model/exif/1.0}fNumber=8.0, {http://www.alfresco.org/model/exif/1.0}isoSpeedRatings=100, {http://www.alfresco.org/model/content/1.0}description={en_US=OLYMPUS DIGITAL CAMERA}, {http://www.alfresco.org/model/exif/1.0}dateTimeOriginal=Sun May 07 13:23:51 EDT 2017, {http://www.alfresco.org/model/exif/1.0}manufacturer=OLYMPUS CORPORATION, {http://www.github.com/abhinavmishra14/model/demo/1.0}originCreatedDate=2017-05-07T13:23:51, {http://www.alfresco.org/model/exif/1.0}pixelXDimension=590, {http://www.alfresco.org/model/exif/1.0}pixelYDimension=442, {http://www.alfresco.org/model/content/1.0}author=null, {http://www.alfresco.org/model/exif/1.0}exposureTime=0.005} 2020-05-22 09:58:00,041 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] [http-bio-8080-exec-14] Completed metadata extraction: reader: ContentAccessor[ contentUrl=store://2020/5/22/9/57/db13881d-4caf-4a72-a481-054bb9246b63.bin, mimetype=image/jpeg, size=94399, encoding=UTF-8, locale=en_US] extracter: org.alfresco.repo.content.metadata.TikaAutoMetadataExtracter@126bd574 changed: {{http://www.github.com/abhinavmishra14/model/demo/1.0}originCreatedDate=2017-05-07T13:23:51,{http://www.alfresco.org/model/exif/1.0}focalLength=4.5, {http://www.alfresco.org/model/exif/1.0}model=TG-5, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/exif/1.0}flash=false, {http://www.alfresco.org/model/exif/1.0}fNumber=8.0, {http://www.alfresco.org/model/exif/1.0}isoSpeedRatings=100, {http://www.alfresco.org/model/content/1.0}description={en_US=OLYMPUS DIGITAL CAMERA}, {http://www.alfresco.org/model/exif/1.0}dateTimeOriginal=Sun May 07 13:23:51 EDT 2017, {http://www.alfresco.org/model/exif/1.0}manufacturer=OLYMPUS CORPORATION, {http://www.alfresco.org/model/exif/1.0}pixelXDimension=590, {http://www.alfresco.org/model/exif/1.0}pixelYDimension=442, {http://www.alfresco.org/model/content/1.0}author=null, {http://www.alfresco.org/model/exif/1.0}exposureTime=0.005}
PDF Extraction: Mapped and Accepted: {{http://www.github.com/abhinavmishra14/model/demo/1.0}originCreatedDate=2018-10-26T20:36:24Z, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/content/1.0}author=null, {http://www.github.com/abhinavmishra14/model/demo/1.0}originModifiedDate=2018-10-26T20:36:28Z} 2020-05-22 09:58:11,676 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] [http-bio-8080-exec-2] Completed metadata extraction: reader: ContentAccessor[ contentUrl=store://2020/5/22/9/58/262e3dc1-5cfc-4558-9f01-fae20c5cae2d.bin, mimetype=application/pdf, size=3104712, encoding=UTF-8, locale=en_US] extracter: org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter@8414655 changed: {{http://www.github.com/abhinavmishra14/model/demo/1.0}originCreatedDate=2018-10-26T20:36:24Z, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/content/1.0}author=null, {http://www.github.com/abhinavmishra14/model/demo/1.0}originModifiedDate=2018-10-26T20:36:28Z}
Office extraction: Mapped and Accepted: {{http://www.github.com/abhinavmishra14/model/demo/1.0}originCreatedDate=2020-02-10T16:13:00Z, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/content/1.0}author=Abhinav, {http://www.github.com/abhinavmishra14/model/demo/1.0}originModifiedDate=2020-02-10T20:05:00Z} 2020-05-22 09:58:22,021 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] [http-bio-8080-exec-11] Completed metadata extraction: reader: ContentAccessor[ contentUrl=store://2020/5/22/9/58/f3281f14-7ffb-4d91-a3b2-d0fc8de305d5.bin, mimetype=application/vnd.openxmlformats-officedocument.wordprocessingml.document, size=3075453, encoding=UTF-8, locale=en_US] extracter: org.alfresco.repo.content.metadata.PoiMetadataExtracter@2752d52e changed: {{http://www.github.com/abhinavmishra14/model/demo/1.0}originCreatedDate=2020-02-10T16:13:00Z, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/content/1.0}author=Abhinav, {http://www.github.com/abhinavmishra14/model/demo/1.0}originModifiedDate=2020-02-10T20:05:00Z}
Image metadata on share view details:
PDF And Office metadata on share view details:
Hi,
If posssible then can you share demo that you have created because for Image,it is not working even I have used your code.
@sanjaybandhniya Please share your contentmodel, share config, bean definition, extractor properties and log here.
@abhinavmishra14 I have create new thread.please check
@sanjaybandhniya Find the demo project here:
https://github.com/abhinavmishra14/alfresco-metadataextraction-demo
I had an observation between community and enterprise versions. Examples i gave above works perfectly fine with enterprise versions of 5.2.x (i used 5.2.6) and 6.1.x(used 6.1), but properties files are not getting picked correctly (its some sort of intermittent behavior) on community editions.
Only change i did is highlighted below for community edition and it picks up always corretly.
<property name="mappingProperties"> <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean"> <property name="location"> <value>classpath:alfresco/module/${project.artifactId}/metadataextraction/TikaAutoMetadataExtracter.properties</value> </property> </bean> </property>
On enterprise version both works fine, above path and below given path as well:
<value>classpath:alfresco/metadata/TikaAutoMetadataExtracter.properties</value>
This one also works on both versions:
<value>classpath:alfresco/extension/metadata/TikaAutoMetadataExtracter.properties</value>
I am not sure what difference the two type of versions (community and enterprise) has in terms of extension points, tried looking at source code but no clues. But good news is that the other path i shared above (available in demo project) works fine for both community and enterprise versions.
Hope this helps trim down your issue.
Ask for and offer help to other Alfresco Content Services Users and members of the Alfresco team.
Related links:
By using this site, you are agreeing to allow us to collect and use cookies as outlined in Alfresco’s Cookie Statement and Terms of Use (and you have a legitimate interest in Alfresco and our products, authorizing us to contact you in such methods). If you are not ok with these terms, please do not use this website.