AnsweredAssumed Answered

Age-Off Unaccesed Files

Question asked by jsb on Feb 17, 2016
Latest reply on Feb 18, 2016 by afaust
We have about 1 million files in our Alfresco that have not been <strong>accessed</strong> (aka viewed in share/explorer) in over a year. We want to remove these files. Even more we want to implement a age-off policy that removes files automatically when they haven't been accessed in a year.

I think the best way to do this would be with a Scheduled Action. I have two ideas for how to do this.

————————————————-
Approach #1
————————————————-
I have the scheduled action running, but I don't know how to query for what I want. Here are my two files:
scheduled-action-services-context.xml (More or less copied from somewhere else on the internet…)
<blockcode>
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE beans PUBLIC '-//SPRING//DTD BEAN//EN' 'http://www.springframework.org/dtd/spring-beans.dtd'>

<beans>
    <!–
    Define the model factory used to generate object models suitable for use with freemarker templates.
    –>
    <bean id="templateActionModelFactory" class="org.alfresco.repo.action.scheduled.FreeMarkerWithLuceneExtensionsModelFactory">
        <property name="serviceRegistry">
            <ref bean="ServiceRegistry"/>
        </property>
    </bean>
<!– Execute the script /Company Home/Record Management/ageoff.js –>
    <bean id="runScriptAction" class="org.alfresco.repo.action.scheduled.SimpleTemplateActionDefinition">
        <property name="actionName">
            <value>script</value>
        </property>
        <property name="parameterTemplates">
            <map>
                <entry>
                    <key>
                        <value>script-ref</value>
                    </key>
                    <!– Note that as of Alfresco 4.0, due to a  Spring upgrade, the FreeMarker ${foo} entries must be escaped –>
                    <value>\$\{selectSingleNode('workspace://SpacesStore', 'lucene', 'PATH:"/app:company_home/app:dictionary/app:scripts/cm:ageoff.js"' )\}</value>
                </entry>
            </map>
        </property>
        <property name="templateActionModelFactory">
            <ref bean="templateActionModelFactory"/>
        </property>
        <property name="dictionaryService">
            <ref bean="DictionaryService"/>
        </property>
        <property name="actionService">
            <ref bean="ActionService"/>
        </property>
        <property name="templateService">
            <ref bean="TemplateService"/>
        </property>
    </bean>
    <bean id="runScript" class="org.alfresco.repo.action.scheduled.CronScheduledQueryBasedTemplateActionDefinition">
        <property name="transactionMode">
            <value>UNTIL_FIRST_FAILURE</value>
        </property>
        <property name="compensatingActionMode">
            <value>IGNORE</value>
        </property>
        <property name="searchService">
            <ref bean="SearchService"/>
        </property>
        <property name="templateService">
            <ref bean="TemplateService"/>
        </property>
        <property name="queryLanguage">
            <value>lucene</value>
        </property>
        <property name="stores">
            <list>
                <value>workspace://SpacesStore</value>
            </list>
        </property>
        <property name="queryTemplate">
            <value>PATH:"/app:company_home"</value>
        </property>
        <property name="cronExpression">
            <!– In reality this will be once a day, this is just for testing –>
            <value>0 0/3 * * * ?</value>
        </property>
        <property name="jobName">
            <value>jobD</value>
        </property>
        <property name="jobGroup">
            <value>jobGroup</value>
        </property>
        <property name="triggerName">
            <value>triggerD</value>
        </property>
        <property name="triggerGroup">
            <value>triggerGroup</value>
        </property>
        <property name="scheduler">
            <ref bean="schedulerFactory"/>
        </property>
        <property name="actionService">
            <ref bean="ActionService"/>
        </property>
        <property name="templateActionModelFactory">
            <ref bean="templateActionModelFactory"/>
        </property>
        <property name="templateActionDefinition">
            <ref bean="runScriptAction"/> <!– This is name of the action (bean) that gets run –>
        </property>
        <property name="transactionService">
            <ref bean="TransactionService"/>
        </property>
        <property name="runAsUser">
            <value>System</value>
        </property>
    </bean>
</beans>

</blockcode>


ageoff.js
<blockcode>

// I am testing with this date range because I am testing in a temporary 4.2.e instance.
var temp = "NOW-1YEAR/DAY TO NOW/DAY+1DAY"
// Real date range will be something like "MIN TO NOW-1YEAR/DAY"

// This is kind of what I want, but it doesn't work. I think my query is somehow wrong. Also, I don't
// think "@cm\\:accessed" exists, but when I replace it with "@cm\\:created" it doesn't seem to work anyway.
var docs = search.luceneSearch("PATH:\"/app:company_home/app:user_homes//*\" AND @cm\\:accessed:[" + temp + "] AND TYPE:\"cm:content\" AND -TYPE:\"cm:folder\"");

//———————————————————————————–
//This will get a list of everything in user homes. This works! (but not what I want)
//———————————————————————————–
//var docs = search.luceneSearch("PATH:\"/app:company_home/app:user_homes//*\" AND TYPE:\"cm:content\" AND -TYPE:\"cm:folder\"");

var dest;
for(dest=0; dest < docs.length; dest++) {
        // Instead of remove I think I want to set the sys:temporary aspect?
        var success = docs[dest].remove();
}
</blockcode>

<strong>Question</strong>: Is there a way to query based on when the documents were accessed or viewed (even just viewed on the share site, not necessarily downloaded)? Google returns hits for cm:created and cm:modified, but not cm:accessed. I think this approach is somewhat dead for this reason.

——————————————————————————–
Approach #2
——————————————————————————–
I have this java class that can correctly find the files that have not been accessed in a year if I run it from the contentstore root directory (alf_data/contentstore). According to this page https://wiki.alfresco.com/wiki/Custom_Actions I believe I shouldn't have too hard of a time converting this class into a custom action.


import java.util.Map;
import java.lang.ProcessBuilder;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.List;


// Find un-accessed files:
//   find ./ -atime +366 -type f -exec ls -l –time=atime {} \;
//

public class AgeOff {
   /**
    *
    */
   private static String loadStream(InputStream s) throws Exception {
      BufferedReader br = new BufferedReader(new InputStreamReader(s));
      StringBuilder sb = new StringBuilder();
      String line;
      while((line = br.readLine()) != null) {
         sb.append(line).append("\n");
      }
      return sb.toString();
   }      

   public static void main(String[] args) {
      ProcessBuilder pb = new ProcessBuilder("/bin/bash", "-c", "find ./ -atime +366 -type f -exec ls -l –time=atime {} \\;");
      try {
         Process p = pb.start();
         String output = loadStream(p.getInputStream());
         String outerr = loadStream(p.getErrorStream());

         System.out.println(output);
         System.err.println("————-ERRORS————-");
         System.err.println(outerr);
      } catch(Exception e) {
         System.out.println("EXCEPTION: " + e.getMessage());
      }
     }
}



Problem with this approach is that I now have a list of files in the contentstore and I need to somehow translate that to alfresco Nodes so that I can delete or set sys:temporary. Is there a way to translate a contentstore path to an alfresco node?

———————————————————————–
Summary
———————————————————————–
1) Is there a valid way to query for cm:accessed?
2) Is there a way to translate a contentstore path/id to an alf_node? (I hope that is the correct terminology)

I have looked into the Records Management module and it doesn't seem to be what I want. But maybe I am missing something.
This post (https://forums.alfresco.com/forum/end-user-discussions/alfresco-share/automatically-deleting-documents-after-xx-days-06192009) is relevant, but I want accessed, not created.

We have two alfresco instances running for different purposes, 4.2.e and 3.4.8. Ideally I need something that works for both, but I really just want some sort of push in the right direction. I have only tested the above with 4.2.e because it is community and I can spin up a temporary instance to test with so I don't touch production. So I suppose I would rather have help with 4.2.e if there is no common solution.
OS is CentOS.

Outcomes