pmonks2

cmis:objectId - A Case of Mistaken Identity

Blog Post created by pmonks2 on Feb 7, 2014

It's ok to store cmis:objectId's in my CMIS client application, right?



Ah such a simple question, yet hiding a plethora of probable pitfalls!



Over the last couple of years I've encountered (and held myself!) the misconception that cmis:objectId's are basically synonymous with NodeRef's, Alfresco's native form of identifier.  Unfortunately there is a subtle but significant difference that traps many an unwary CMIS client developer: an Alfresco NodeRef identifies an entire object including that object's version history (if any) - in effect documents and versions are fundamentally different types of 'thing', and versions don't have any independent notion of identity.  In contrast, in CMIS both documents and versions are the same (they're both cmis:documents) and are each uniquely identified by their own cmis:objectId.  From the versioning section of the spec (emphasis added):

Each version of a document object is itself a document object, i.e. has its own object id, property values, MAY be acted upon using all CMIS services that act upon document objects, etc.


So going back to our original question, the validity of storing a cmis:objectId for later use depends on what the CMIS client application is storing a reference to; some possibilities include:



  1. An unversioned object type (i.e. something other than cmis:document) => golden!


  2. An unversioned cmis:document => peachy!


  3. A specific version of a versioned cmis:document => good to go!


  4. The latest version of a versioned cmis:document => ruh roh raggy!!



1 problematic case out of 4 may not seem like too much of an issue, until we recall that versioning is enabled by default for all CMIS-accessible files in Alfresco (i.e. cmis:document and all sub-types).  Add to this that many CMIS client apps, regardless of the server they're connecting to, basically don't care about versioning (and when they do it's often limited to concurrency control via private working copies) - they simply want to treat the CMIS repository as a glorified file/folder store, reading and writing files as if they were flat, unversioned objects - and you start to appreciate the seriousness of the problem.

In short, cmis:objectId alone cannot satisfy the 80% use case of CMIS client applications i.e. version-agnostic file/folder CRUD!





So what's the alternative?



There are at least two approaches that I've come up with for working around this issue (and there may be more):



  1. Store a cmis:objectId and make additional CMIS calls to manually 'fast forward' to the latest version of the object on every subsequent CMIS call.


  2. Store a cmis:objectId for unversioned object types, and a cmis:versionSeriesId for versioned object types, and make subsequent CMIS calls appropriate to each.



Fast forward



With this approach, the CMIS client application would store the cmis:objectId as normal, but every single time it accesses the object it identifies, it would look up the cmis:objectId of the latest version of the object first, before continuing with the original operation.  In detail, this involves:



  1. Call the getObject service with the original cmis:objectId.


  2. Look for the cmis:versionSeriesId property in the response.  If the cmis:versionSeriesId property exists in the response:



    1. Call the getPropertiesOfLatestVersion service with the cmis:versionSeriesId.


    2. Pull out the cmis:objectId from the response - this is guaranteed to be the cmis:objectId of the latest version of the object, at the time of the call.


    3. Update the stored cmis:objectId with the retrieved cmis:objectId (optional).





  3. Call the desired CMIS service.



The advantages of this method is that the logic is reasonably clean and simple, but it has the downside of requiring at least 2, and sometimes 3, CMIS calls for every single 'original' CMIS call the client application wished to make (regardless of whether the object is versioned or not), as well as risking race conditions between steps 2.1 and 3 (i.e. when a new version of the object gets created by some other process between those two calls).



[UPDATE 2014-02-28] A vendor I'm working with mentioned another variation of this strategy that uses the 'cmis:isLatestVersion' property in step 2 to determine whether the cmis:objectId refers to the latest and greatest version or not.  Other than use of a different property, the logic remains much the same (the client application still needs to 'fast forward' to the latest version, using cmis:versionSeriesId).

Conditionally store either cmis:objectId or cmis:versionSeriesId



This approach involves storing cmis:objectIds for object types that are not versioned, and cmis:versionSeriesIds for object types that are.  Unversioned object types include everything that isn't a cmis:document (cmis:folder, cmis:relationship, cmis:policy and cmis:item), as well as, on a case-by-case basis, cmis:document and sub-types of cmis:document (whether such object types are versioned or not can be determined by retrieving the 'versionable' property for each cmis:document object type in the system).



For unversioned objects (i.e. those that have a cmis:objectId stored in the CMIS client application), CMIS service calls can be made directly by the client application, secure in the knowledge that the results will always refer to the latest version of the object (since, by definition, there can only ever be one version of such objects).



For versioned objects (i.e. those that have a cmis:versionSeriesId stored in the CMIS client application), one of two possible call sequences are required:



  1. If the CMIS client application only requires metadata, it can call one of the 'OfLatestVersion' services (getObjectOfLatestVersion or getPropertiesOfLatestVersion).


  2. For all other use cases:



    1. Call the getPropertiesOfLatestVersion service with the cmis:versionSeriesId.


    2. Pull out the cmis:objectId from the response - this is guaranteed to be the cmis:objectId of the latest version of the object, at the time of the call.


    3. Call the desired CMIS service with the retrieved cmis:objectId.






The advantage of this method is that it optimises the number of CMIS calls needed to perform such 'version independent' operations - often only requiring a single call.  The disadvantages are that it requires some initial 'discovery' calls to figure out exactly what's versioned vs what isn't, the client application's logic is more complex due to the two different types of CMIS identifier that must be used, and there is the risk of a race condition between steps 2.1 and 2.3 in the event of a concurrent update by another process.



You might be wondering why a CMIS client application can't simply store the cmis:versionSeriesId in all cases.  Unfortunately cmis:versionSeriesId is optional (you'll have to manually scroll down to the cmis:versionSeriesId definition in that reference) - a compliant CMIS repository does not have to provide this property for unversioned object types, and in my experience most don't.

This sux - surely there's something better?



I've been unable to come up with a better alternative based strictly on the CMIS 1.x specifications, but that doesn't mean others don't exist - I'd love to hear about them if you've come up with one.  That said, having worked fairly extensively with CMIS client application implementers over the last couple of years I'm reasonably certain there isn't a fundamentally better approach.



The good news is that the issue has been brought to the attention of the CMIS Technical Committee, and there is a proposal from Oracle for something called 'representative copies' that potentially has some overlap with this use case.



Speaking personally, I would like to see something along the lines of the following, minimally intrusive change:



  • Make cmis:versionSeriesId mandatory for all object types and rename it (e.g. to cmis:id) to show case its more general utility.


  • Update all services that receive a cmis:objectId to also support the new identifier property as an alternative.  When the new identifier is provided, the semantics would be 'perform the requested service against the latest version of the object'.


  • Remove the 'OfLatestVersion' services, as they would now be redundant.



Conclusion



CMIS is a valuable addition to the content management repertoire, but as with version 1s of most products, it has its share of flaws.  This particular flaw happens to be both subtle and of significant impact, which makes it all the more important for CMIS client application developers to understand it and factor it into their designs.



More generally, it is my opinion that this also reflects the specification's focus on addressing 'hard core ECM' requirements, to the (unintended) detriment of the 80% content management case i.e. simple file/folder CRUD.  I suspect no one on the CMIS TC realised at the time that the intersection of versioning and identity would 'bleed through' the basic file/folder CRUD use case in this way.



Ultimately the best way for problems like this to be fixed (or better yet, to not surface in the first place!) is community involvement.  I've found the CMIS TC to be an open and welcoming place, and I strongly encourage all CMIS client application implementers to get involved in the committee's good work, at the very least at the level of an observer (as I have).

Outcomes