POI - Extracting text from MSWord document

Question asked by stebans on Aug 27, 2007
Latest reply on Jan 23, 2009 by brailateo

I would like to know what would be the way to extract a piece of text from uploaded MS Word document. This information is to be used as metadata.

It's easy to extract some text from Word using POI using HWPFDocument present in POI version 3.0+ (poi-scratchpad-3.0.1-FINAL-20070705.jar). Unfortunately, POI hasn't been updated in Alfresco-2.1, and is still POI-2.5.1 without extra lib. I cannot upgrade with scratchpad which depends on POI-3.0+ (if I'm right).

I cannot read content from any MS Word file with the present configuration.  Do you have a workaround?

