AnsweredAssumed Answered

Outlook msg extraction fail on Tika date format

Question asked by loftux Moderator on Dec 15, 2010
Latest reply on Dec 19, 2010 by loftux
I'm trying to get Outlook msg metadata extraction to work. It fails with
Caused by: org.alfresco.service.cmr.repository.datatype.TypeConversionException: Unable to convert string to date: Thu, 19 Feb 2009 11:17:09 +0100 (CET)
   at org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter.makeDate(AbstractMappingMetadataExtracter.java:899)
   at org.alfresco.repo.content.metadata.TikaPoweredMetadataExtracter.makeDate(TikaPoweredMetadataExtracter.java:166)
   at org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter.convertSystemPropertyValues(AbstractMappingMetadataExtracter.java:798)
The mail extractor is able to extract all metadata, it is just that the date isn't recognized.
This is a date format that from the error isn't supported. In TikaPoweredMetadataExtracter.java class there already is a bunch of additional date formats to be supported, but none seem to match the date format I've encountered.
I've tried to set -Duser.country=US -Duser.language=en in JAVA_OPTS, but that didn't change anything.
So is it outlook that has set the date format on the msg file? The msg file in question is from an Outlook client in an all Swedish environment.
If so, then no config change in Alfresco will be able to support this. Could we change the TikaPoweredMetadataExtracter class to be configurable, so that when you happen to be live in some obscure part of the world like sweden can extend with extra date formats?

Outcomes