I am looking to save a checksum for managed content. We have multiple sources that save images to alfresco and unfortunately, we end up housing a lot of duplicates. Looking into ways that will alleviate the problem.
Not out of the box, but you can add it easily. I did something similar for another client. You can create a behavior that computes a hash on the content stream every time it is updated, and store that hash as a property on the content. Then, finding duplicates is just a matter of running a search for all documents that have that same hash value.
I think I saw that version 6.x added something related to checksums but I have not investigated to see if it is similar to what I describe above.
Jeff is right when he mentions „something related“ in the newer versions you have document fingerprinting.
Document Fingerprints | Alfresco Documentation
You can also find related documents with fingerprinting.
I saw it first in a tech Talk live - and - again - an excellent article from Andy Hind about document fingerprints.
Maybe this helps...
I am not finding much documentation on fingerprinting of image and other media content. Any idea if this has been designed to cater toward text content?
Thank you. Before we implemented something ourselves that will save the hashes, I wanted to see if Alfresco had something to offer before we tried to reinvent the wheel. Looks like we have v5.2 and I am not sure if an upgrade is pending and we might not be able to use the Document Fingerprint option yet.
Fingerprinting was designed for text only.
If you can turn your image into a text representation than you can use it.
Retrieving data ...