AnsweredAssumed Answered

How to get/exract content as text string from a noderef?

Question asked by dynamolalit on May 20, 2010
Latest reply on Jun 9, 2010 by dynamolalit

I have a workflow which sends ms-word/powerpoint/pdf/excel for approval to a pool of users.Once approved, i need to extract content from the document in an XML file. I tried this code for content transformation:

contentService = services.getContentService();
        ContentReader reader = contentService.getReader(nodeRef, ContentModel.PROP_CONTENT);
        if (reader != null && reader.exists())
                // get the transformer
                ContentTransformer transformer = contentService.getTransformer(reader.getMimetype(), MimetypeMap.MIMETYPE_TEXT_PLAIN);
                // is this transformer good enough?
                if (transformer != null)
                    // We have a transformer that is fast enough
                    ContentWriter writer = contentService.getTempWriter();

                        transformer.transform(reader, writer);
                        // point the reader to the new-written content
                        reader = writer.getReader();
                        // Check that the reader is a view onto something concrete
                        if (!reader.exists())
                            throw new ContentIOException("The transformation did not write any content, yet: \n"
                                    + "   transformer:     " + transformer + "\n" + "   temp writer:     " + writer);
                        }else {
                              content = reader.getContentString();
                    catch (ContentIOException e)


        logger.debug("Content as a string  :  "+content);

It gives me content of document as a string but also creates a text file with same name as content which i do not want & nullifying purpose of workflow. How can i avoid that.I have read that transformation of content will result into file creation. 8)

Also the content i got in previous step is not purely text, it contains characters like   and a lots like this in formed XML which my external application fails to parse. :(

How can i fulfill my requirement.Would appreciate for any help/suggestions.