pmonks2

Fixing Object Identity in CMIS - Some Proposed Solutions

Blog Post created by pmonks2 on Mar 10, 2014
As mentioned in my last post, the CMIS TC has been working on the issue of version-independent object identity, and that work is being tracked as CMIS-731.  At this point there are 2 basic proposals floating around (each with a number of variants), and I wanted to describe, compare & contrast them, and open the discussion up to the CMIS community for feedback.



Right now the committee is actively discussing these proposals, but my concern is that because there is minimal representation on the TC from the developers of CMIS client applications, the requirements and proposed solutions are being presented and discussed in something of a vacuum.  While I think I have a reasonable understanding of client application requirements (from my work with the Alfresco ecosystem), I'm still an indirect source - I'd much rather hear requirements from, and have proposals validated by, the developers of CMIS client applications, and while the OASIS machinery can be a little intimidating to navigate, a blog is a relatively low pressure, easy place to provide that kind of feedback - please don't hesitate to comment here!



I'll start by providing a summary of the two basic proposals and their primary variants as I understand them, and then compare and contrast those vs the requirements that the committee has discussed to date.  I'd then encourage you to provide any and all feedback you have (even if it's a short 'Peter, you're wrong and here's why: ...!') as comments on this post.



So without further ado, the proposals:

Extend applicability of cmis:versionSeriesId



This proposal is based on the observation that in the current versions of the specification (1.0 and 1.1), the cmis:versionSeriesId property (where present, and for the services that support it) already provides a version-agnostic identifier; the only gaps being that it isn't ubiquitous across all object types and services.



Based on this observation, this proposal mandates cmis:versionSeriesId for all servers, regardless of whether they support versioning or not, and all CMIS services that today accept a cmis:objectId would offer an equivalent that accepts a cmis:versionSeriesId (the semantics being 'invoke this service as if the cmis:objectId of the latest version had been provided').  This could be achieved by continuing the xxxOfLatestVersion service pattern to its ultimate conclusion, or by overloading the existing services to support either cmis:objectId or cmis:versionSeriesId as input.



This proposal also optionally renames cmis:versionSeriesId to something more descriptive (its expanded semantics no longer being limited to version series'), as well as deprecating or removing the xxxOfLatestVersion services if the alternative of overloading the existing services to accept either cmis:objectId or cmis:versionSeriesId is selected (since they would be redundant).

Basic Variant - extend the semantics for cmis:document types only



In this variant, cmis:versionSeriesId would only become mandatory for cmis:document and sub-types of it.  Other CMIS object types (cmis:folder, cmis:relationship, cmis:item and cmis:policy) would continue to not support this property, as is already the case in CMIS 1.0 and 1.1 (cmis:objectId would remain the only identifier for these object types).

Extended Variant - extend the semantics to all CMIS object types



In this variant, cmis:versionSeriesId would become mandatory for all object types - not just cmis:document but also cmis:folder, cmis:relationship, cmis:item and cmis:policy.

Add a new identifier



In this proposal, a new mandatory identifier would be added to the specification, tentatively called cmis:representativeCopyId at the time of writing (see CMIS-731).



All CMIS services that today accept a cmis:objectId would offer an equivalent that accepts a cmis:representativeCopyId, with the semantics being 'invoke this service as if the cmis:objectId of the latest version had been provided'.

Basic Variant - add the new identifier to cmis:documents types only



In this variant, cmis:representativeCopyId would only become mandatory for cmis:document and sub-types of it.  Other CMIS object types (cmis:folder, cmis:relationship, cmis:item and cmis:policy) would not support this property (cmis:objectId would remain the only identifier for these object types).

Extended Variant - add the new identifier to all CMIS object types



In this variant, cmis:representativeCopyId would become mandatory for all object types - not just cmis:document but also cmis:folder, cmis:relationship, cmis:item and cmis:policy.

Comparison Matrix



With the basic proposals outlined, we can now compare these two proposals (and their variants) vs the requirements that the committee has identified to date (additional requirements welcome!):



























































































































































Extend cmis:versionSeriesIdAdd a new identifier
cmis:document onlyAll CMIS object typescmis:document onlyAll CMIS object types
1Avoids extra round trips to the server that are required today (e.g. calls to getTypeDefinition to figure out if a type is versioned or not, calls to 'fast forward' through a version history, etc.)
2Provides a single identifier that can be used for cmis:document and all sub-types
3Provides a single identifier that can be used for all CMIS objects types
4Eliminates conditional logic around identifier handling in CMIS client applications
5Avoids identifier proliferation
6Avoids adding a 2nd identifier to object types that don't need it (cmis:folder etc.)
7Avoids potential confusion around the current semantics of 'version series'
SCORE (higher is better)4545


What's been apparent during the committee's discussions, and is borne out by this comparison, is that none of these proposals is a clear winner.  What this exercise does do, however, is focus the conversation on the key differentiating characteristics of the proposals, which are:



  • Line 3: Is it important to have a single identifier for all objects in CMIS, or is it acceptable to require client applications to deal with 2 (one for cmis:document and sub-types, and another for everything else)?


  • Line 4: What value should be placed on keeping all client applications simpler, even at the expense of more complex server side implementations?


  • Line 5: What is the value of keeping the specification simpler, by avoiding identifier proliferation?


  • Line 6: How bad is it to add another identifier property to object types that technically don't need it?


  • Line 7: Does expanding the semantics of cmis:versionSeriesId confuse or devalue the concept of 'version series'?



While I and the CMIS TC members have our own answers to these questions, I'm much more interested in hearing directly from the developers of CMIS client applications.  Which of these proposed solutions makes your life easiest?  Which requirements do you care about, and which don't matter?  What requirements are missing from the list above?



The window is closing on identifying a preferred solution to the long-standing problems of CMIS identity, and once closed it's unlikely to be reopened for a long time (if ever), so now is your chance to have a say!



As always the CMIS mailing list is the best place to leave ad-hoc feedback, but feel free to comment here and I'll pass your feedback along to the committee.

A note on Private Working Copies (PWCs)



One topic that's come up in the TC meetings is how this new mechanism should interact with Private Working Copies (PWCs).  As a refresher, a PWC is the temporary copy of a document that gets created when the checkout service is called on it.



The complication revolves around whether a PWC, while it exists, should be the target of the proposed version-independent identifier (however it is implemented).  In the description of the checkout service, the CMIS specification states:

until it is checked in (using the checkIn service), the PWC MUST NOT be considered the latest or latest major version in the version series.


which implies it should not be resolvable via the new identifier.



However my experience has been that it is a common requirement for a CMIS client application to want to retrieve the latest 'usable' version of an object, which is the PWC for those user(s) that have permission to access the PWC, and the latest non-PWC version otherwise. So there's dramatic tension here between the spec's definition that PWC's are not versions, and the reasonable expectation that the new identifier would resolve to a PWC where appropriate.



The upshot is that further consideration is needed around how the new identifier would interact with PWCs, if at all. Feedback from CMIS client application developers is, again, very welcome.

Outcomes