gavincornwell

Rekognising out-of-process extensions

Blog Post created by gavincornwell Employee on Sep 22, 2017

Event streams are becoming common place, they allow external applications to "listen" to what's happening within the server. AWS make use of this pattern in several of their services, S3 and DynamoDB being two examples.

 

This is a capability we are considering adding to the Digital Business Platform in the future. Rather than waiting for the new event stream I thought it would be an interesting experiment to build an example using existing Alfresco technologies. The example uses the events generated for desktop sync and uses Apache Camel to push them to Kinesis Firehose. Once the events reach AWS they are processed by Lambda functions and any uploaded images analysed by Rekognition.

 

The architecture for the example is shown in the diagram below (click to enlarge).

 

Architecture

 

The fictional use case is based on a vehicle insurance company claim process that only deals with cars, motorcycles and bicycles. Their process automatically analyses images added to the system, converts them to a custom type, generates a unique and sets a claim type property if the type of vehicle can be detected by Rekognition. Full technical details can be found here.

 

The Camel route used to send events from ActiveMQ to Firehose, custom model and configuration are provided in simple JAR modules for the Repository and Share available from here

 

Everything required to deploy and run the demo is available in this GitHub repository. Clone the repository to your local machine using the command below and follow the deployment instructions.


git clone https://github.com/gavincornwell/firehose-rekognition-demo.git .

 

Once the repository is up and running follow the detailed demo steps to upload images to the system and see the metadata in Alfresco get updated automatically.

 

If you take a look at the Lambda function used for processing the uploaded images you'll see it's already getting quite complex and breaking a few best practices; it's doing more than one thing, the repository password is being passed in plain text and the approach is not very scalable, in terms of performance, cost and architecture.

 

The Lambda function is effectively a multi-stage process which feels like an ideal fit for Step Functions!

 

Over the next few weeks I'm going to improve the solution by swapping the single Lambda function for several smaller Lambda functions that are orchestrated by Step Functions, simplify and secure the infrastructure setup and introduce a continuous delivery pipeline, so stay tuned!

Outcomes