AnsweredAssumed Answered

Logic of 'on upload' rules, pdf binary content to string

Question asked by chrisokelly on Jun 1, 2012
Hi guys,

I am creating a test script to see how I can work with PDF content. First step I just wanted to output the content straight to a text file, so I created a script as follows -

pdfContent = document.content;
outputContent = space.createFile("PDFcontent.txt");
outputContent.content = pdfContent.content;
outputContent.save();

and assigned it to run on upload for a folder, but found I was getting an error -
ERROR [extensions.webscripts.AbstractRuntime] [http-8443-31] Exception from executeScript - redirecting to status template error: 05011581 Failed to execute script 'workspace://SpacesStore/cfc8d9a9-1159-400e-b9ec-184c8dda8710': Existing file or folder PDFcontent.txt already exists
org.alfresco.scripts.ScriptException: 05011581 Failed to execute script 'workspace://SpacesStore/cfc8d9a9-1159-400e-b9ec-184c8dda8710': Existing file or folder PDFcontent.txt already exists
when uploading a pdf to the folder. The PDF is not uploaded and the text file is not created (which makes the file exists issue seem odd). Now this isn't a lasting problem, I already worked around it by changing the script to -
pdfContent = document.content;
fileExists = space.childByNamePath("PDFcontent.txt");
if (!fileExists)
{
outputContent = space.createFile("PDFcontent.txt");
}
else
{
outputContent = fileExists;
}
outputContent.content = pdfContent.content;
outputContent.save();

However I am a little confused as to the logic which is causing this. Will the 'on upload' event be triggered multiple times simultaneously (perhaps when the upload starts and finishes?), causing both to fail? Any advice is appreciated, as I said I have sorted out the actual issue, but would like a better understanding of the rule logic to prevent other issues in the future.

Edit:
as some may notice, my method of getting the content in the bit of script above was buggy and returned undefined (looking for document.content.content, oops). Figured that now, and while the script does work in that a file PDFContent.txt is created with the content of the uploaded PDF, it's mimetype is also pdf, it's essentially a copy of the first. The eventual goal here is for me to figure out how I can have a script search within the text of uploaded PDF's for specific strings, like "invoice", and apply aspects and properties to suit.

At this point the script looks like
pdfContent = document.properties.content;
logger.warn(pdfContent);
fileExists = space.childByNamePath("PDFcontent.txt");
if (!fileExists)
{
outputContent = space.createFile("PDFcontent.txt");
}
else
{
outputContent = fileExists;
}
outputContent.properties.content = pdfContent;
isInvoice = ((pdfContent + "").search(/invoice/i) == -1) ? false:true;
if (isInvoice)
{
outputContent.properties.content += "\n \n This document is an invoice";
}
outputContent.save();
However, as those more experienced than I might have already noticed, this doesn't work. The content property is returned as org.alfresco.repo.jscript.ScriptNode$ScriptContentData@8b95a65. Is there a way for me to get this as a string? I believe I could do this with a temp file, by transforming the pdf to txt then reading the content property of that file but that seems more than a little over-complicated. Is there something in the API to achieve this?

Outcomes