Really interesting post.
We had exactly the same issues some time ago trying to do exactly the same thing. It was long time ago so I might be misleaded in some details.
During our multiple attempts to solve the issue I can remember that we reached a point were creating a folder returned a FileExistsException but at the same time we could not get and use the folder itself.
I thought that Alfresco always set READ_COMMITED isolation level to all connections so I said "that must be a non repeteable read" but I have just rechecked it now and the documentation states clearly that by default the databases default isolation level is used so in the case of Mysql this is "REPETEABLE_READ" and non repeteable reads could not happen. So now i'm a bit clueless about how this happened.
In this case retrying the transaction could in fact solve the issue because if the new file already exists once the transaction is retried the logic that checks the existence of the folder will see the folder and won't try to create it so the exception won't happen again.
But of course that is something very especific to the use case and a developer can't expect RTH to be able to "guess" what different execution path the logic will follower when retrying.
I think that we tried to catch the FilExistsException and wrap it in some kind of retryable transaction so that the RTH would just retry it, but I think that this path had other issues.
Maybe doing the whole operation in a RTH with a new transactionm catching the FileExistsException in your code and restarting the opeartion by yourself (again in a new transaction) will do the trick. But to be honest I think that we tried this and another ton of things and in the end we think that we just gave up.
We had a similar issue (although we weren't creating folders as fine-grained as to the minute).
We found that you'd get two threads that try to create the same folder because before commit they can't see each other's new folders.
We ended up having another process create the folders ahead of time.
As you mentioned, this can leave a lot of empty folders which would have to be cleaned up later.
Would this be so bad if a cron task did it fairly often?
What if you changed the approach a bit:
1. don't trigger your action on an inbound rule for every document
2. use a cron task to handle documents in batches
3. for each batch:
a. make a pass through all the documents in the batch to determine required folders (to cover all the creation dates)
b. create all the folders
c. then move all the documents
That way you should never have one thread trying to create the same folder as another thread.
You can experiment with optimal batch size and frequency for your needs.
Been a while, just wanted to update status.
We ended up developing the desired functionality in the client's Alfresco API wrapper, where all operations from different projects join together.
Just before the upload methods. the desired folder structure is created, and documents are directly uploaded to their destination.
Via opencmis library, no concurrency / thread-safe issues
However, solution is not generic enough so as to benefit from it
__
@iblanco thanks for the insight, seems we're all working on the same direction, stepping same rocks in the way. guess we require a more advanced knowledge / time, in order to provide a solution, compatible with current architecture design.
@tfrith, thanks for the idea, but one of our requirements is to make all the upload operation, up to the destination folder, synchronously, so no option for batch movements afterwards.
stay hungry!
Ask for and offer help to other Alfresco Content Services Users and members of the Alfresco team.
Related links:
By using this site, you are agreeing to allow us to collect and use cookies as outlined in Alfresco’s Cookie Statement and Terms of Use (and you have a legitimate interest in Alfresco and our products, authorizing us to contact you in such methods). If you are not ok with these terms, please do not use this website.