AnsweredAssumed Answered

Experiences with a very large number of users

Question asked by sbond on Nov 5, 2006
Latest reply on Nov 29, 2006 by derek
We try to make Alfresco working with a very large number of users from a LDAP directory. Our customer is an educational institution with 50 000 students and employees.

We plan to use the Alfresco repository to store, index and manage access rights on thousands of documents. Users will access documents from a portal server through WebService. All users and groups will need to be present in the Alfresco Repository.

First, we have identified an issue (AR-1057) with the import functionality for a very high number of nodes. Import performance decrease progressively during LDAP users importation. This is caused by the Hibernate statefull session that keeps all data in level 1 cache. As mentioned in the Hibernate doc, frequent call of session.flush() and session.clear() is recommended for batch inserts.

We first try to add a "session.clear()" call after the session.flush() of HibernateNodeDaoServiceImpl.setChildNameUnique() but that cause integrity problems for imported nodes.

Our second try has been to add the "session.clear()" into the HibernateNodeDaoServiceImpl.flush() method and call NodeDaoService.flush() from the end of the DbNodeService.createNode() method. That's work great, cut drastically memory usage and partially solves the performance problem.

However, total import time still high. For a batch of 25 000 users, 6 hours is needed.

The time needed to search a user is also very high. If the repository contains 45 000 users, an xpath search using jaxen (as InviteUsersWizard and ReassignTaskDialog) will consume a lot of memory and take more than 1 hour…  The same query using Lucene (as UsersBean from the administration console) take less than 5 seconds.

It is envisaged to use another method than synchronisation to access LDAP users and groups from Alfresco? Many applications execute search directly on the LDAP repository and create dynamically users on the local database when needed.