AnsweredAssumed Answered

Document storage design (backup and restore)

Question asked by mgpa on Oct 24, 2008
Latest reply on Oct 28, 2008 by mgpa
Firstly, we are not currently using Alfresco.  Our evaluation version of the latest beta of v3 does not work at all (but that is off topic).

We currently have over 1.3 million documents, nearly all 30kB to 80kB in size.  Our main problem is that we use a tape backup solution.  Given the volume of data (c 55 GB), it takes a disproportionately long time to back up all of these files, when compared to backing up similar sized data sets for SQL Server and Exchange, for example.  We do not have a server replication solution (at the moment).

We currently have a Windows 2003/8 environment and the appeal of using CIFS to import our document store has a huge appeal.  (We are not adverse to a non-Windows environment but the costs of learning and supporting the new environment must outweigh the savings made and without any loss of resilience – no flaming, please.)

From the forum and the wiki, Alfresco will give us version control, document indexing and potentially better user access control but it does not appear to answer our concern of “how do we backup and restore 1 million+ files as quickly as possible?”.   This is especially important if we have to implement a disaster recovery plan.

Alfresco seems to use a flat storage space that can either be stored locally or on a SAN.  Has any thought been considered to storing the files in another medium that can be more quickly backed up and restored?  Total elapsed time is the key here.

We could just replicate to an USB external hard disk but I would like to consider more secure and easily automated solutions.  I would be interested to know if anyone has thought about and overcome such an issue.

Finally, how would Alfresco cope if the Alfresco software and the indexes were restored prior to the restoration of the document store?  For example, rebuild a server with Alfresco (et al) on it, restore the indexes, kick off the restoration of the data files and then start Alfresco, without waiting for the restoration job to finish.  This would mean that Alfresco would be available to users, so that they could create new content whilst the backup restore was running restoring the old content.  Would we have to rebuild the indexes once the data restore job has completed?  Or would it be recommended to?  (The obvious difference being that we could get away with being "recommended to" until the next planned outage or being forced to would cause another outage to be much sooner than planned.)

For us, it would be more important for our user base to be able to create new content than it would be to view old content.  This does not mean that the old content is not important, it just means that we must always be able to create new content, regardless of whether old content is available or not.