AnsweredAssumed Answered

Using ContentReader for transformation

Question asked by abruzzi on Sep 4, 2014
Latest reply on Sep 9, 2014 by kaynezhang
A number of years ago (v3.1.2) I integrated our OCR server (Abbyy Recognition Server) with Alfresco.  The integration was a quick hack that I wrote.  Basically it used a PHP script to make the SOAP call to the OCR server, and the PHP was called by a RuntimeExec bean.

Now I'm trying to significantly beef up the integration, so I'm starting by creating my own class to do the SOAP call through Java.  I'm new to Java, so I'm going slowly.  I'm using the javax.xml.soap saaj-api to build and process the SOAP call.  The actual file is sent base64 encoded in the XML of the SOAP request.  I've used the following code to get the file content from the ContentReader, encode it, and place it into the SOAP message:

        String fileStr = reader.getContentString();
        byte[] fileBin = fileStr.getBytes("US-ASCII");

        String fileB64 = DatatypeConverter.printBase64Binary(fileBin);


With the new Java code, the server complains that it is getting an invalid file.  So, using Wireshark (packet sniffer) I have eavesdropped on the conversation between my code and the OCR server, and also on the old (working) PHP code.  Since in both eavesdropping sessions I am attempting to process the exact same file, the base64 encoding should look the same.  Instead, they look very similar, but there are differences.  I've tried different encodings in the .getBytes() method and the base64 changes but never to be identical to PHP version and never one that works:

Java Output:


PHP Output:


You can see they are very similar, but not identical.  My main question is is there some mistake in how I'm getting the content of the file out of the ContentReader (reader) and processing it that might be causing my problem?  Like I said, I'm pretty weak at Java.