Skip navigation
All Places > Alfresco Content Services (ECM) > Blog > 2018 > January
2018

A few years ago Samuel wrote a blog post entitled “So, when is Alfresco moving to GitHub?” In it he presented a number of reasons why it was difficult to move our code from SVN to Git. Given the recent move of Share to GitHub, I thought it would be worth writing an update on the situation.

 

The vast majority of our production code is now in Git. ACS Repository, Activiti, Search Services, Records Management, Google Docs Module, Android App, iOS App, Share, ADF Components, … and that’s not including our enterprise code which is primarily within our hosted GitLab.

 

Some of this code has always been in Git, but much of it has been migrated from SVN. I migrated the Records Management codebase in 2015, and since then we’ve had a fairly continuous stream of migrations.* The obvious question is "What’s changed since Samuel’s post?"

 

The answer is that not very much has changed. The team decided not to leave older releases in SVN for exactly the reasons mentioned by Samuel. Cherry-picking changes from SVN to Git can be done with svn diff --git and git apply, but it’s a pain, and we make a lot of service pack changes in our products. The ACS repository codebase is large and although we’ve split out several modules into smaller Git repositories, there is still over 2Gb of history in the Git repo.

 

The primary reason for migrating to Git is the popularity - Git is seven times more popular than any other version control system. Some of the reasons for this are Git’s local and lightweight branches (leading to cleaner workflows), faster access to history (since it’s all stored locally) and smaller overall repository size (leading to faster access to remote commits). Some other reasons are historical – SVN has greatly improved its merging and has got rid of the need for a .svn directory per folder. However since GitHub is the most popular and largest open source host in the world we want to use Git to make it easy for users to access our code.

 

 

* We’ve had migrations in the past too, e.g. Share extras in 2012, but the last year has seen a concerted effort to migrate projects.

I’m excited to announce that we’ve completed the move of Share’s entire codebase to GitHub.

 

The observant amongst you will notice that a couple of months back we moved the old Share GitHub mirror and stopped updating it - that’s because we’ve been transitioning both the code and our internal development processes to GitHub natively.

 

As of today, the new repositories are now public. The Share codebase is split across four repositories:

Share: https://github.com/Alfresco/Share

Surf: https://github.com/Alfresco/Surf

Surf Web Scripts: https://github.com/Alfresco/surf-webscripts

Aikau: https://github.com/Alfresco/aikau (this project has always been openly developed)

 

What this means for everyone who uses or develops against Share is that you’ve now got a much greater level of transparency over how we’re working, earlier visibility of what we’re doing and an increased opportunity to have input to that process. It makes it significantly easier for us to accept contributions as pull requests and for those of you who want to apply custom patches to your own forks. These benefits apply whether your Community or Enterprise users, as the codebase is exactly the same across the two versions.

 

We’ve written contribution guidelines to help, so please take a look at those: https://github.com/Alfresco/share/blob/develop/CONTRIBUTING.md

 

Those of you who are coming to DevCon, I look forward to discussing this there and maybe even working with you on some Share PRs at the Hack-a-thon.

At the end of my last post I alluded to an improved ingestion pipeline using Step Functions, this post looks at how we can use the recently announced AWS service Comprehend to analyse text files.

 

The updated architecture is shown below (click to enlarge).

 

Demo Architecture

 

The Lambda function that fetches the content sets a flag to indicate whether the content is an image, this is used by the Step Function definition to decide whether to call the ProcessImage or ProcessText Lambda function as shown in the Step Function definition below:

 

Step Function Definition

 

We covered the behaviour of the ProcessImage in the last post, the new ProcessText Lambda function takes the text and sends it Comprehend to detect entities and to perform semantic analysis. The function then looks for Person, Location and Date entities in the text and compares the positive and negative values from the sentiment analysis to determine the values for the properties on the acme:insuranceClaimReport custom type.

 

Everything required to deploy and run the demo is available in this GitHub repository. Clone the repository to your local machine using the command below and follow the deployment instructions.

 

git clone https://github.com/gavincornwell/firehose-step-functions-demo.git .

 

Once the repository is up and running follow the detailed demo steps to upload images and text files to the system and see the metadata in Alfresco get updated automatically.

 

I'll be a doing a live demo of this at the forthcoming DevCon in Lisbon, Portugal, hope to see you there!

gavincornwell

Steps to Rekognition

Posted by gavincornwell Employee Jan 4, 2018

In my last post we looked at a potential out-of-process extension that analysed images using AWS Rekognition. The solution used a single large Lambda function, in this post we're going to examine an improved approach using Step Functions.

 

The architecture is shown below (click to enlarge).

 

 

The use case has also been expanded since the first post, the Lambda function that processes the results from Rekognition now categorises images into Cars, Motorcycles, Boats, Electronics, Jewellery, Wristwatches, Clocks, Bicycles, Sport Equipment and Furniture. Any image that can not determined is set to Unknown rather than adding an aspect.

 

The initial part of the solution is still the same, Camel is used to route events to Kinesis Firehose, accepted  events are sent to S3, which in turn triggers a Lambda function. That Lambda function now parses the Alfresco events and executes a Step Function State Machine (shown in the diagram below) for each event.

 

 

The State Machine calls three separate smaller Lambda functions, each function does one thing and one thing only and are re-usable outside of the Step Function. This is a much more scalable solution and allows the images to be processed in parallel.

 

Everything required to deploy and run the demo is available in this  GitHub repository. Clone the repository to your local machine using the command below and follow the much simpler deployment instructions.

 

git clone https://github.com/gavincornwell/firehose-step-functions-demo.git .

 

Once the repository is up and running follow the detailed demo steps to upload images to the system and see the metadata in Alfresco get updated automatically.

 

Currently the State Machine is fairly simple and serial but it lays the foundation for a more complex ingestion pipeline which is something I may investigate in a future post.

Following the successful release of Alfresco Content Services in 2017, we have been planning our next round of innovation. In this blog post, we share some of our plans so that you can prepare for the next release and provide feedback. At the end you will find a table summarizing the actions you should take.

 

Even though we refer to Alfresco Content Services, most of this information also applies to Alfresco Community Edition.

There are three overriding architectural goals for upcoming releases of Alfresco Content Services (ACS):

 

  • Improve integration across the Alfresco Digital Business Platform - encompassing products such as Alfresco Process Services and the Alfresco Application Development Framework. One of these planned improvements is a shared authentication system that supports additional modern protocols such as OpenID Connect. This improved integration will make it easier for Alfresco Content Services customers to benefit from the power of Alfresco Process Services when their use case requires it.

  • Provide a containerized deployment option that can be hosted on a range of infrastructure, both on-premises and by cloud providers such as AWS. These containers will also be portable between deployment environments such as dev, test, and production.

  • Further enhance the REST API to allow advanced customizations to be completed outside of the repository Java process, including APIs for batching requests and subscribing to system events. Integrations using the REST APIs are easier to maintain and upgrade than customizations within the repository.

 

These goals will require some significant architectural changes to the Content Repository, and so we expect the next release to be a major version, Alfresco Content Services 6.0, which we plan to release in 2018. You will see these changes begin to enter Alfresco Community Edition immediately.

 

In order to make these improvements, we need to change some features of the product that you might be using. Specifically, we want to make sure you are aware of the following plans:

 

  • Installation Bundles: Customers have asked us to reduce the amount of effort necessary to deploy Alfresco Content Services (ACS) in a production configuration. The ACS installers will be replaced with Docker containers, using Kubernetes and Helm. This deployment technology allows us to better define a standard production configuration while giving greater flexibility to our customers as they deploy into their environments.

  • Web Application Servers: As part of providing a containerized cloud-ready deployment, we will be removing the need to manage a separate web application container. Instead, configuration will be injected into the Docker container, reducing the effort required to setup, secure, and manage the application. In the next release, we will no longer support Alfresco Content Services deployed within J2EE web application servers such as JBoss, WebSphere, and WebLogic. Over the long term, we are considering embedding the web application server within the repository and making the content repository directly executable. As a result, it is likely that support for deploying into a separate Tomcat web container will be dropped in a future release.

  • Solaris and DB2: As we focus on the most widely used deployment platforms, we will be dropping support for the Solaris operating system and IBM’s DB2 database.

  • CIFS an NTLMv1: Due to security vulnerabilities in the protocols, we will be removing the ability to access Alfresco Content Services as a shared network drive using CIFS / SMBv1 and the ability to authenticate using NTLMv1. We recommend that customers needing shared network drive access use our AOS WebDAV when using Windows clients and our standards-compliant WebDAV when using non-Windows clients. Customers should also use Kerberos instead of NTLMv1 for SSO. We will continue to improve our implementations of WebDAV and Kerberos.

  • Legacy Solr: ACS 6.0 will leverage the advanced capabilities of Solr 6. Previous versions of Solr will no longer be used—Solr 1 will be removed from the product, and Solr 4 will be deprecated and remain in the product only to support upgrades. No functionality will be lost upgrading to Solr 6, but there are some different defaults affecting the way locale is handled that will require minor adjustments in customizations.

 

In addition, we plan to remove the following capabilities:

 

  • Alfresco Process Services Share Connector: Advanced content and process applications can be built with superior user experiences using process and content components from the Application Development Framework.

  • Repository Multi-Tenancy: The multi-tenant capability of the Content Repository will only be supported as part of an OEM agreement and we are likely to remove multi-tenancy from Alfresco Community Edition. The support of multi-tenancy in Alfresco Process Services remains unchanged.

  • Encrypted Node Properties: This capability provides a label for properties that are managed by client code and is used internally in modules provided by Alfresco. With the release of Alfresco Content Services 6.0, it will be considered part of the private API. Custom clients can achieve the same capability by using a Blob or Base64 String property and managing the encryption of the content within those properties.

  • CIFS Shortcuts: Alfresco Content Services’ CIFS implementation provided Windows Explorer shortcuts for ECM tasks. These will be removed along with support for CIFS shared network drives. We do not currently plan to move them to the WebDAV implementation.

  • Meeting Workspace and Document Workspace: These Share site types are not supported by recent releases of Microsoft Office, and so will be removed.

 

With the release of Alfresco Content Services version 6.0, the following features will continue to be available but are deprecated, and you should expect them to be removed in a future version:

 

  • Some Share Features: We will gradually simplify Share to focus on the most commonly used capabilities by removing the following lesser-used site components and dashlets: site blogs, site calendars, site data lists, site links, and site discussion forums. These use cases are better met with dedicated interfaces, either through integration with third party applications or through custom development.

  • Web Quick Start: Web Quick Start provides an add-on to Alfresco Share that demonstrates how to build a website on top of Alfresco Content Services. Though customers are welcome to continue using Web Quick Start, we will not be enhancing this product. There are many ways to use the Alfresco Digital Business Platform to deliver content to the Web, and we would be happy to discuss your specific needs with you or point you to a partner.

  • Alfresco in the Cloud: As the market for content collaboration technologies has evolved, we are evaluating replacements for Alfresco in the Cloud (my.alfresco.com). We will offer different synchronization solutions to supersede Alfresco Cloud Sync to my.alfresco.com. As a result, we are no longer adding new functionality to that service. As our new products mature, we will reach out to the customers who are using Alfresco in the Cloud to outline the replacements and possible timelines.

 

We also make the following recommendations to help those building applications on top of Alfresco Content Services to prepare for future releases:

 

  • The versioned REST API for ACS covers a wide range of use cases, and is preferred over in-process APIs for extending Alfresco Content Services. Integrations and customizations that use the REST API are easier to integrate into your own development processes and are easier to maintain when upgrading ACS.

  • In order to make it easier to design, deploy, and maintain custom workflows, in a future release we will be providing a platform-wide workflow service using Alfresco Process Services (powered by Activiti). This will replace the use of embedded Activiti for custom workflows. Future custom workflows will be implemented external to the Content Repository and will leverage the REST APIs of Alfresco Content Services. To be easily upgradable, new custom workflows should make local REST API calls in order to avoid using the in-process APIs.

  • ACS workflows are intended to automate the management of content items within the Content Repository and APIs for custom workflows will continue to be available with subscriptions to Alfresco Content Services. A subscription to Alfresco Process Services (APS) is required for advanced process management use cases which is used for collecting, disseminating, integrating and coordinating information across an organization.

  • Though we continue to improve and maintain Share, we recommend that custom applications be built with the Application Development Framework (ADF). ADF components make it easier to assemble and maintain custom applications.

 

Thank you for the feedback you have previously given on our products which have informed these changes. We think you will appreciate how these changes will allow us to evolve Alfresco Content Services to meet the needs of your organization both now and in the future. If you are a customer, and have any questions, please reach out to your Customer Success Manager. If you are using one of our open source products and want to engage in the discussion, feel free to comment on this post. We look forward to continuing the conversation with you.

 

Regards,

 

The Alfresco Team

 

Table Summarizing Changes and Guidance

Architecture Change

Guidance

Timing

Improved REST APIs

Use the REST APIs instead of the in-process APIs.

Immediately

An eventual move to a platform workflow service

Custom ACS workflows should use REST calls to the Content Repository when possible.

 

Use APS for process management across the organization.

Immediately

 

 

Immediately

Simplify the Share UI

Integrate with 3rd party applications or develop custom interfaces.

Immediately

Containerized deployment

Transition your deployment from the installers toward container technology.

6.0 release

Executable content repository

Move away from separate web application servers.

6.0 release

No support for Solaris

Migrate to a supported different OS.

6.0 release

No support for DB2

Migrate to a supported database.

6.0 release

No support for CIFS or CIFS shortcuts.

Use WebDAV.

6.0 release

No support for NTLMv1

Use Kerberos.

6.0 release

Replace Solr 1 and Solr 4

Upgrade to Alfresco Search Services powered by Solr 6.

6.0 release

Discontinue the APS Share Connector

Leverage the Alfresco Development Framework.

6.0 release

Repository Multi-Tenancy only for OEMs

If you need multi-tenancy, talk to your Customer Care Representative about your use case.

6.0 release

Encrypted Node Properties

Use a Blob or Base64 String property.

6.0 release

Removal of Meeting Workspace and Document Workspace site types

Use standard collaboration sites in Share.

6.0 release

Removal of some Share features: site blogs, site calendars, site data lists, site links, and site discussion forums

Develop a dedicated interface or use one provided by a third-party.

Post 6.0

Phasing out of Web Quick Start

Transition to another web delivery platform.

Post 6.0

Phasing out of Alfresco in the Cloud

No action needed at this time. We will contact you when there is a timeline you should be aware of.

Post 6.0

Filter Blog

By date: By tag: