In my last post we looked at a potential out-of-process extension that analysed images using AWS Rekognition. The solution used a single large Lambda function, in this post we're going to examine an improved approach using Step Functions.
The architecture is shown below (click to enlarge).
The use case has also been expanded since the first post, the Lambda function that processes the results from Rekognition now categorises images into Cars, Motorcycles, Boats, Electronics, Jewellery, Wristwatches, Clocks, Bicycles, Sport Equipment and Furniture. Any image that can not determined is set to Unknown rather than adding an aspect.
The initial part of the solution is still the same, Camel is used to route events to Kinesis Firehose, accepted events are sent to S3, which in turn triggers a Lambda function. That Lambda function now parses the Alfresco events and executes a Step Function State Machine (shown in the diagram below) for each event.
The State Machine calls three separate smaller Lambda functions, each function does one thing and one thing only and are re-usable outside of the Step Function. This is a much more scalable solution and allows the images to be processed in parallel.
Everything required to deploy and run the demo is available in this GitHub repository. Clone the repository to your local machine using the command below and follow the much simpler deployment instructions.
git clone https://github.com/gavincornwell/firehose-step-functions-demo.git .
Once the repository is up and running follow the detailed demo steps to upload images to the system and see the metadata in Alfresco get updated automatically.
Currently the State Machine is fairly simple and serial but it lays the foundation for a more complex ingestion pipeline which is something I may investigate in a future post.