I appreciate the updated info about using the FINGERPRINT function in AFTS queries, but in the process of testing the search, I came up with some questions about how things should work. Specifically:
- What is the default overlap percentage if I don't specify it as the second value? When I run the AFTS query 'FINGERPRINT:52763', where 52763 is the DBID, I get 487 results. When I supply any overlap percentage ranging from 'FINGERPRINT:52763_1' to 'FINGERPRINT:52763_99', I get the same 1 result, which is the document I am using as the source in the search.
- I assume that the FINGERPRINT's minhash is generated when the doc (mostly PDFs in our case) is created or updated. Should I ALWAYS receive one row (the source document) for the FINGERPRINT query if the text is Tika extractable?
- I have two PDFs that are almost identical that aren't showing up in each others FINGERPRINT queries, and in fact, return 0 rows. Does that mean there was a problem extracting the text for the minhash? If so, how do I query if the minhash is empty?
I am using Alfresco Community 5.2 (201707), and Alfresco Search Services 1.1.