AnsweredAssumed Answered

How to retrieve content details without the API

Question asked by titusnachbauer on Jan 19, 2015
Latest reply on Jan 21, 2015 by titusnachbauer
Yes, I know, do not touch the database. I won't. Let me explain :-)

I am a migration consultant and a client asked us to migrate data from an old Alfresco system. It is version Enterprise - v3.1.1 (229) from 2009. Now normally we would either crawl the frontend or use an API to access data like this, but we did not have access to both at the start of the project. So i went and read some documentation, trying to understand the inner workings of the database enough so that I can get some meaningful data out. And here is the good news: that was no problem in general. Thanks to helpful posts on this forum and pages like these: http://wiki.alfresco.com/wiki/Data_Dictionary_Guide

So last week I finally got access to the frontend and when I walked through that I found some missing data that I would like to add. There are a lot of files that have no modified date, modifier, creation date and so on, although their details page clearly shows them.

For these items ALF_NODE_PROPERTIES only has the name, title, icon and description. I would have expected the other fields to also be found here, based on what I saw in other items. Those items I found using this (admittedly creative) query:
SELECT n."assoc_id", n."node_id", Q.LOCAL_NAME AS "field", p.string_value AS "value"
FROM ALF_NODE_PROPERTIES p
INNER JOIN
    (SELECT na.source_node_id AS "node_id", na.id AS "assoc_id"
        FROM ALF_NODE_ASSOC na
        INNER JOIN ALF_QNAME q ON na.type_qname_id = q.id
        WHERE q.local_name = 'rootVersion'
        ) n
    ON (n."node_id" = p.node_id)
INNER JOIN ALF_QNAME q
    ON p.QNAME_ID = q.id)
UNION
(SELECT n."assoc_id", n."node_id", Q.LOCAL_NAME AS "field", p.string_value AS "value"
FROM ALF_NODE_PROPERTIES p
INNER JOIN
    (SELECT na.target_node_id AS "node_id", na.id AS "assoc_id"
        FROM ALF_NODE_ASSOC na
        INNER JOIN ALF_QNAME q ON na.type_qname_id = q.id
        WHERE q.local_name = 'rootVersion'
        ) n
    ON (n."node_id" = p.node_id)
INNER JOIN ALF_QNAME q
    ON p.QNAME_ID = q.id)";

Now this gives me a lot of usable information about documents, but as stated before, many items are just missing from this. There is no child node in ALF_CHILD_ASSOC and there is no other relations in ALF_NODE_ASSOC for these items. This being an enterprise system, the database is pretty extensive and it feels like I am almost there. So before I build the frontend crawler, debug all the issues with that and let it run all weekend, I would love to know where the information shown on the details pages is stored. Again, I only want to retrieve it, I won't store anything or mess up their database. Any hints will be much appreciated.

UPDATE:

Of course the ALF_NODE table contains the audit_created, audit_modifier, etc. columns, so that seems to give me the information I needed. Of course the big question now is: why do you store this in ALF_NODE_PROPERTIES for some assets? And what is the difference between the two, so what data is valid if there is a difference between ALF_NODE and ALF_NODE_PROPERTIES?

Outcomes