AnsweredAssumed Answered

Efficient CMIS query for nodes with optional aspects

Question asked by pco on Sep 29, 2014
Latest reply on Oct 2, 2014 by pco
Hi, folks,

I am trying to create a small groovy app with OpenCMIS and Alfresco Community 5.0.a as the backend (so CMIS 1.1 :-)
I experienced quick progress during the first weeks with Alfresco (Content Modelling, customizations) and OpenCMIS APIs. Having arrived on the fine-tuning level some questions arise where I am not sure how to proceed.

1. I want to provide a tabular view of all cmis:document objects of a folder (like detail view in Windows explorer for example)
2. Some columns refer to properties of aspects (e.g. number of pages inside PDF file)
3. Not all documents necessarily have all aspects (e.g. jpeg's don't have "number of pages" aspect, would not make any sense)

What I have done so far: I Started with Session.query method for creating the query. The tabular view renders the List of QueryResult-items. That means: Fast access to the repo, only one AtomPub-Request per query (verified with tcp sniffer) - quite snappy (OperationsContext is tuned appropriately). I can display basic data, like name, contentstream size, content type, thumbnail etc.

Example for OpenCMIS workbench (simplified):

def folder = session.getObjectByPath('<some folder>');
// hint: select * just for simplicity of example, don't do this in production, if performance matters
def result = session.query("select * from cmis:document where IN_FOLDER('${}')", false)

result.each {
    println it.getPropertyValueById('cmis:name')
    println it.getPropertyValueById('pdfp:numberOfPages')
    println '—————'

Problem: The optional Aspects. In order to fetch the "numberOfPages" for example, I would have to do an outer join, which is not yet supported by alfresco. This is how it should look like (note the "LEFT"):

def folder = session.getObjectByPath('<some folder>');
def result = session.query("select * from cmis:document d LEFT join pdfp:pages p on d.cmis:objectId=p.cmis:objectId where IN_FOLDER(d,'${}')", false)

result.each {
    println it.getPropertyValueById('cmis:name')
    println it.getPropertyValueById('pdfp:numberOfPages')
    println '—————'

If I omit the "LEFT" I am exactly getting what I want, but of course only for documents which have the aspect - too bad.

So I started to investigate workarounds. Some came to my mind:
1. After the main query, do a subsequent query for each optional aspect with "cmis:objectId in (<firstquery>.ids), then join in java
2. Load full-blown CmisObject after the query to get the info (n+1 problem, very bad idea)
3. Use Session.queryObjects to get full-blown CmisObjects right away (I tried this, but it turned out that aspect's properties are not selected, you need to call object.refresh() to get them, so again n+1)
4. #2 but with ehcache in front, so the repo call happens only on rare occasion
5. Do not add the data to the tabular view but only in the detail view

At the moment I am thinking about #4 or #5 because I wanted to keep my query code as simple as possible and rather wait for outer join to be implemented.

So: How are you guys coping with this? I can't imagine I am the only one who fell over this limitation.
Btw: Hope to see aggregation implemented in the future as well (Select distinct, count(), group by, all these would be very welcome)