I use alfresco community 5.0.d and i would like to know if it is possible to search in Alfresco all the files thatdon't have an extension.
I do not know how to do the search.
Depends on which Search you want to use. If using the "Aikau" Search in Share or the Alfresco FTS , the Searchstring !=cm:name:*.??? should do it. It should find all nodes not having a name that ends with a three character extension.
The question isn't necessarily a matter of which UI you use (Aikau faceted search or Node Browser for instances), but if the search services support this type of query. The problem with a wildcard based approach in FTS is that it will by design only scale to a certain amount of documents in the system. This is a result of how the query is translated to the underlying Lucene system in SOLR. Also, the pattern *.??? assumes that all extensions are three-letter extensions only which might have been the standard in the old DOS 8.3 world but all modern MS Office extensions are four-lettered ones.
Without having done a similar query myself on a large document base (i.e. more than just a couple tens of thousands of documents), I would assume the best way to work with this is by doing a CMIS query using the LIKE operator on cmis:name. The reasoning behind this is that a CMIS query using LIKE can actually be applied against the database instead of the SOLR index, and thus is not limited by the index query rewrite restrictions. The only thing you need to ensure is that the additional indexes for transactional metadata queries have been applied on the database system.
Hi Axel, I mentioned "Aikau" because it's the easiest way to test the FTS String. The query performs well on large document sets (tested with 1000.000 doc repo ) , but paging throu large resultset gets slower for following pages (and gets worse page by page)
It's true it finds only three character extensions, but is easy to adapt :-)
I used ??? because I thought Solr would internally invert the query string (???.*) which would not be so expensive - do you know if this is correct?
I can't say how SOLR / Lucene handles this low level. I just remember issues with running into maxBooleanClause limits with Alfresco SOLR before due to the way that Alfresco was rewriting wildcard queries before sending them off to the SOLR / Lucene layer. Though this may have changed in Alfresco 5.0 or later versions...
Max boolean clauses should be no problem here - Hits you when using big "or" conjunctions. I hoped to eliminate that by using the '=' Operator. (You see that I used 'hope' - what would we do without it :-)
I tried in Alfresco Search with a small set in a site --> TYPE:"cm:content" AND !=cm:name:*.*
And then I played a little bit with mimetype facet, considering "Binary File (Octet Stream)", HTML and text mimetypes. I obtained some meaningful list, although not exactly accurate.
- What about a database query for doing LIKEs ?
As I said, you could try CMIS SQL queries using LIKE against the DB - in that case you would basically only test for the presence of a dot, e.g. do a where cmis:name NOT LIKE '%.%' query...
I think you could use a recursive script, but only for admin use/purpose. Runtime of the script would be long - the browser will probably run in a timeout error, but there should be no lock problem because the script would only do read access which will result in "Shared locks" on the db which are not causing lock escalations BUT if there is any insert or update request on the dataset, it will block the script until the Update/insert is completed.
But your cmis variant is far better. Is there any restriction on the number of fetched results when executing a cmis db query like in the early search service?
a last update for this one
Did it for me. Found all files without an, at least one character, extension. AND is implicit in the newer Alfresco versions.
Cesar Capillas filter cm:content was a good idea, missed it in the first shot (boaah... so many nodes without extension )
I used the slingshot search via an angularjs SPA, so the count of documents without extension was present in milliseconds. A feature of Solr, which gives you the count of matches directly in the result header.
Thanks a lof for your answers.
I will try CMIS SQL queries
You can try CMIS SQL queries in node browser (cmis-strict or cmis-alfresco) or CMIS workbench first, before writing a custom CMIS code for example in java or python.
As I wrote in another thread (Get all Childs of a Node )
just for completeness...
Retrieving data ...