AnsweredAssumed Answered

Many metadata extractor for PDF mimetype

Question asked by ingdade on Dec 18, 2013
I have to deal with pdf documents of various types.
I configured my test model with 2 types, each of one has different metadatas.

As my pdf are text file I want to extract custom metadata from pdf, but metedatas are different for each type.

I'm able to write a custom metadate extracter studiyng doc and samples.

The problem is that, as all of the samples, I overwrite the standard pdf extrator and i'm not able to choose which extractor to use.

I noticed that in Alfresco each extractor is associated with a mimetype and is registered in the metadataExtracterRegistry.

It's possible to avoid using auto detect Extracter and specify which extractor to use?

I need more than 1 extractor for PDF mimetype.

I thought that i can write a custom action that invoke a particular extractor so I tried, without succes, to write a custom action duplicating the org.alfresco.repo.action.executer.ContentMetadataExtracter

I think the the extractor is invoked , it read the custom metadatas, but it doesn't write the properties in the model, maybe because the mapping is missing.

I instanciated extractor with a "new" statement changing

MetadataExtracter extracter = metadataExtracterRegistry.getExtracter(mimetype)


EnhancedPdfExtracter extracter = new EnhancedPdfExtracter();

and than calling extracter.extract(.. ) even if my custom metadata extracter has only extractRaw metod.

Anyone can help me or has any idea to use more than one PDF excracter?