AnsweredAssumed Answered

Purge / Orphan Node Issue with 3.4 CE

Question asked by matosconsulting on Aug 8, 2014
Hi all,

I have a client with an issue that has stumped us for a couple of days. I'll try to summarize with relevant details as best as I can:

<ul>
<li>I'm a late comer to this project and did not set up the instance myself.</li>
<li>Client is a very heavy CIFS user with occasional webdav and web use.</li>
<li>Running CE 3.4 on a well sized server (32GB RAM, 8 core)</li>
<li>System has been running okay for a while, but requires a restart about once a week.</li>
<li>A few days ago, they began reporting that when many users (about 5-10) start performing a lot of CIFS operations, the system grinds to a halt.</li>
<li>I found that there was a huge bottleneck happening due a query that was running repeatedly and requiring a write to tmp disk file (Ref Item 1)</li>
<li>You'll notice that this first query is a query against the Deleted Content Store. After looking in to it, I saw that there were 660,000+ deleted items sitting in the trash.</li>
<li>I also found occasional slow queries (30 to 60 seconds) when performing an insert into alf_node_aspects (Ref Item 2)</li>
<li>So…I decided to try and clear out the trash. "Delete All Items" fails due to a ERROR [node.archive.NodeArchiveServiceImpl] Failed to get archive size / Node does not exist: archive://SpacesStore/[Node Id]</li>
<li>However, I am able to search for a subset of documents by name and delete them in batch</li>
<li>Digging in to the Alfresco Source and DB, I tried to understand what the real problem is. I found that I had an issue similar to <a href="https://forums.alfresco.com/forum/installation-upgrades-configuration-integration/configuration/node-does-not-exist-errors">this one</a>, in that there were 17,000+ nodes in alf_node that were duplicated with store_id=5 and store_id=6</li>
<li>On top of that, I found that the exception happening (Node does not exist) was referring to a node that actually <em>doesn't exist in alf_node</em>! Reviewing the source leads me to believe that the Purge process is trying to operate on a result set based on a cached index as opposed to what's in the alf_node table.</li>
<li>At this point, users are able to use Alfresco only when not performing CIFS operations. Once CIFS usage starts, the system grinds to a halt within 15-30 minutes, requiring a restart.</li>
<li>I ran a query from one of the Alfresco Team presentations to count "orphaned nodes" - and it came back with a count of 143,145.
</ul>

Now I'm posting here hoping someone can provide some insight or help. I believe the main issues are:

1.) We really need to purge the 550000+ "deleted items" somehow.
2.) We need to somehow fix whatever orphan pointers that the system is trying to delete during a purge all.
3.) We probably need to clean up the duplicate alf_node entries.

Any insight would be greatly appreciated.

Ref Item 1: This is the MySQL query that was running constantly (and taking 5 - 10 mins to complete) when users were working via CIFS. After I performed some tuning, this was no longer an issue.

SELECT assoc.id                  AS id,
       parentNode.id             AS parentNodeId,
       parentStore.protocol      AS parentNodeProtocol,
       parentStore.identifier    AS parentNodeIdentifier,
       parentNode.uuid           AS parentNodeUuid,
       childNode.id              AS childNodeId,
       childStore.protocol       AS childNodeProtocol,
       childStore.identifier     AS childNodeIdentifier,
       childNode.uuid            AS childNodeUuid,
       assoc.type_qname_id       AS type_qname_id,
       assoc.child_node_name_crc AS child_node_name_crc,
       assoc.child_node_name     AS child_node_name,
       assoc.qname_ns_id         AS qname_ns_id,
       assoc.qname_localname     AS qname_localname,
       assoc.is_primary          AS is_primary,
       assoc.assoc_index         AS assoc_index
FROM   alf_child_assoc assoc
       JOIN alf_node parentNode
         ON ( parentNode.id = assoc.parent_node_id )
       JOIN alf_store parentStore
         ON ( parentStore.id = parentNode.store_id )
       JOIN alf_node childNode
         ON ( childNode.id = assoc.child_node_id )
       JOIN alf_store childStore
         ON ( childStore.id = childNode.store_id )
WHERE  parentNode.id = 11


Ref Item 2: MySQL query taking 30-90 seconds sometimes - now this is the problem query after tuning.

insert into alf_node_aspects (node_id, qname_id) values (3671383, 20344);

Outcomes