Hello - we have an issue that once we roughly go above 100 users we start seeing below errors and our application is no longer able to authenticate against alfresco for any api calls, new tickets or logoff events.
2017-01-19 03:49:28,254 ERROR [org.alfresco.util.transaction.TransactionSupportUtil] After completion (committed) TransactionalCache exception
org.alfresco.error.AlfrescoRuntimeException: 00191108829 Failed to transfer updates to shared cache
2017-01-19 04:59:02,914 ERROR [org.springframework.extensions.webscripts.AbstractRuntime] Exception from executeScript - redirecting to status template error: [CONCURRENT_MAP_PUT] Redo threshold[90] exceeded! Last redo cause: REDO_MAP_OVER_CAPACITY, Name: c:cache.ticketsCache
com.hazelcast.core.OperationTimeoutException: [CONCURRENT_MAP_PUT] Redo threshold[90] exceeded! Last redo cause: REDO_MAP_OVER_CAPACITY, Name: c:cache.ticketsCache
Please give more detail about your environment (including exact build versions of Alfresco, Database, O/S etc). Is this using Alfresco Share with Alfresco Platform ?
Is this Community or an Enterprise cluster ? See also [ACE-5184] Tomcat 7 classloader serializes authentication ticket retrieval - Alfresco JIRA (to see if there is any correlation).
Thanks,
Jan
Hi Jan,
We use Alfresco 5.0.3.1 enterprise version.
It runs on Windows Server 2012 R2.
We have two Alfresco nodes running in the cluster.
We do not use Alfresco Share but a third party user interface.
We have also logged a ticket for this with Alfresco support wanted to see if the community also had this issue.
Thanks
Thanks for the details. In addition to your support ticket (and any community feedback), please take a look at ACE-5184 in case there is any correlation.
Regards,
Jan
Unfortunately the Hazelcast cache for tickets (as most other caches) has been configured to use synchronous replication of data to other cluster nodes. This can cause various issues to propagate over multiple members of the cluster. E.g. when a cluster node is suffering from excessive GC overhead this might introduce significant delays to trigger timeouts in the communication and can even cause the cluster to dissolve in the worst cases.
In your case it would be interesting to get more information out of the Hazelcast layer at the time of these errors and why redos have to be performed. I was previously able to analyze internal issues with (the older version of) Hazelcast is by setting the appropriate logger (com.hazelcast) to DEBUG via the Alfresco Support Tools addon. These can then be passed on to Alfresco Support / used here for additional analysis.
Ask for and offer help to other Alfresco Content Services Users and members of the Alfresco team.
Related links:
By using this site, you are agreeing to allow us to collect and use cookies as outlined in Alfresco’s Cookie Statement and Terms of Use (and you have a legitimate interest in Alfresco and our products, authorizing us to contact you in such methods). If you are not ok with these terms, please do not use this website.