AnsweredAssumed Answered

How to get/exract content as text string from a noderef?

Question asked by dynamolalit on May 20, 2010
Latest reply on Jun 9, 2010 by dynamolalit
Hi,

I have a workflow which sends ms-word/powerpoint/pdf/excel for approval to a pool of users.Once approved, i need to extract content from the document in an XML file. I tried this code for content transformation:



contentService = services.getContentService();
        ContentReader reader = contentService.getReader(nodeRef, ContentModel.PROP_CONTENT);
        if (reader != null && reader.exists())
        {
                // get the transformer
                ContentTransformer transformer = contentService.getTransformer(reader.getMimetype(), MimetypeMap.MIMETYPE_TEXT_PLAIN);
                // is this transformer good enough?
                if (transformer != null)
                {
                    // We have a transformer that is fast enough
                    ContentWriter writer = contentService.getTempWriter();
                    writer.setMimetype(MimetypeMap.MIMETYPE_TEXT_PLAIN);

                    try
                    {
                        transformer.transform(reader, writer);
                        // point the reader to the new-written content
                        reader = writer.getReader();
                        // Check that the reader is a view onto something concrete
                        if (!reader.exists())
                        {
                            throw new ContentIOException("The transformation did not write any content, yet: \n"
                                    + "   transformer:     " + transformer + "\n" + "   temp writer:     " + writer);
                        }else {
                              content = reader.getContentString();
                        }
                    }
                    catch (ContentIOException e)
                    {


                    }
                }
            }


        logger.debug("Content as a string  :  "+content);


It gives me content of document as a string but also creates a text file with same name as content which i do not want & nullifying purpose of workflow. How can i avoid that.I have read that transformation of content will result into file creation. 8)

Also the content i got in previous step is not purely text, it contains characters like   and a lots like this in formed XML which my external application fails to parse. :(


How can i fulfill my requirement.Would appreciate for any help/suggestions.

Outcomes