Backup and Restore

Document created by resplin Employee on Jun 6, 2015
Version 1Show Document
  • View in full screen mode

Obsolete Pages{{Obsolete}}

The official documentation is at: http://docs.alfresco.com



BackupAdministrationUpgrading
Back to Server Administration Guide


Important Note!


This document only describes the process for backing up the Alfresco content repository. It assumes that the various binaries (operating system, database, JDK, application server, Alfresco, and so on.) and configuration files (operating system, database, JDK, application server, Alfresco, and so on) are being backed up independently of the process described here.

Also, no backup strategy is complete until the process has been tested end-to-end, including restoration of backups that were taken previously.  Please ensure you have adequately tested your backup scripts prior to deploying Alfresco to production.


Overview


Backing up an Alfresco repository involves backing up the directory pointed to by the dir.root setting AND the database Alfresco is configured to use. Backing up either one without the other results in a backup that cannot be successfully restored.

Similarly, when you restore an Alfresco backup you must restore both the dir.root directory AND the Alfresco database from the same backup set - restoring either one in isolation is guaranteed to corrupt your repository.

In versions up to and including 3.1, the dir.root directory is usually defined in <configRoot>/alfresco/extension/custom-repository.properties. In versions 3.2 and above, the dir.root directory is usually defined in <configRoot>/alfresco-global.properties.  By default this directory is named alf_data and is located within the directory where Alfresco is installed.


Cold Backup Procedure


By default, the dir.root contains both the content and indexes, and it is possible to backup just the content and then do a full reindex when a backup is restored. A full reindex can be a time consuming process however, so the steps below include the indexes in the backup.


  1. Stop Alfresco.
  2. Backup the database Alfresco is configured to use, using your database vendor's backup tools.
  3. Backup the Alfresco dir.root directory in its entirety.
  4. Store both the database and Alfresco dir.root backups together as a single unit. For example, store the backups in the same directory or compressed file.
  5. Start Alfresco.

Hot Backup Procedure


Overview


It is absolutely critical that hot backups are performed in the following order:


  1. Make sure you have a 'backup-lucene-indexes' folder under 'dir.root' (see below to learn how those indexes are created by a scheduled job or triggered manually through the JMX console)
  2. Backup the database Alfresco is configured to use, using your database vendor's backup tools (see below for details).
  3. As soon as the database backup completes, backup specific subdirectories in the Alfresco dir.root (see below for details).
  4. Finally, store both the database and Alfresco dir.root backups together as a single unit. For example, store the backups in the same directory or in a single compressed file.

Do not store the database and dir.root backups independently, as that makes it unnecessarily difficult to reconstruct a valid backup set, should restoration become necessary.

'Note': make sure that the job to generated the 'backup-lucene-indexes' does not run while you do the SQL backup. The 'backup-lucene-indexes' generation should be finished before you start the SQL backup.


Refreshing the Backup Lucene Indexes (Optional)


An optional step prior to initiating a hot backup is to trigger a Lucene index backup via JMX.  This can be done several ways, including via VisualVM or JConsole (MBeans Tab -> Alfresco/Schedule/DEFAULT/MonitoredCronTrigger/indexBackupTrigger/Operations 'executeNow' button) as well as via the command line.  After completion of this operation the 'backup-lucene-indexes' folder contains an up-to-date cold copy of the Lucene indexes, ready to be backed up.

Important Note: during the creation of the backup Lucene indexes, the system is placed in read-only mode; that read-only phase could take several minutes depending on the size of the Lucene indexes.


Backing up the Database


In an Alfresco system, the ability to support hot backup is fundamentally dependent on the hot backup capabilities of the database product Alfresco is configured to use. Specifically, it requires a tool that can 'snapshot' a consistent version of the Alfresco database (that is, it must capture a transactionally consistent copy of all of the tables in the Alfresco database). In addition, to avoid serious performance problems in the running Alfresco system while the backup is in progress, this 'snapshot' operation should either operate without taking out locks in the Alfresco database or complete extremely quickly (within seconds).

Backup capabilities vary widely between relational database products, and you should ensure that any backup procedures that are instituted are validated by a qualified, experienced Database Administrator before being put into a production environment.


Backing up the File system


Backup the following subdirectories of the Alfresco dir.root directory using whatever tools you are comfortable with (rsync, xcopy):


  • contentstore
  • contentstore.deleted
  • audit.contentstore
  • backup-lucene-indexes

IMPORTANT NOTE: Never, under any circumstances, attempt to backup the lucene-indexes subdirectory while Alfresco is running. Doing so is almost certain to cause Lucene index corruption.
Use 'backup-lucene-indexes' instead.


Notes


  • Alfresco includes a background job responsible for backing up the Lucene indexes that (by default) is configured to run at 3am each night. The hot backup process must not run concurrently with this background job, so you should either ensure that the hot backup completes by 3am, or wait until the index backup job has completed before initiating a hot backup.

Summary: Time ordering of data


To summarise the order of the operations:


  1. Lucene backup index, then
  2. SQL, then
  3. content files

Lucene then SQL: Lucene indexes have to be backed up first and before SQL because if new rows are added in SQL after the lucene backup is done, a lucene reindex (AUTO) can regenerate the missing Lucene indexes from the SQL transaction data.

SQL then Files: SQL have to be done before files because if you have a SQL node pointing to a missing file that node will be orphan. On the contrary, if you have a file without SQL node data, this just means that the user has added the file too late to be included in a backup.


Restore Procedure


  1. Stop Alfresco.
  2. Copy the existing dir.root to a temporary location.
  3. Restore dir.root.
  4. If you are restoring from a hot backup, rename <dir.root>/backup-lucene-indexes to <dir.root>/lucene-indexes.
  5. Restore the database from the database backups and update statistics for all tables in the Alfresco schema (consult your DBA for the details on how to do this, as it varies from database to database).
  6. Start Alfresco.

Lucene Index Restoration


Note that in addition to full restorations, the backup sets created via either the cold or hot backup procedures described above can also be used to restore just the Lucene indexes.  This is useful in cases where the repository itself does not need to be restored but for some reason the Lucene indexes are stale and rebuilding them from scratch is undesirable.

The Lucene index restoration process is as follows:


  1. Stop Alfresco.
  2. Move the existing <dir.root>/lucene-indexes directory out of the way.
  3. If you are performing cold backups, restore <dir.root>/lucene-indexes from the most recent backup set.
  4. If you are performing hot backups, restore <dir.root>/backup-lucene-indexes from the most recent backup set and rename it to <dir.root>/lucene-indexes.
  5. Restart Alfresco.

Upon restarting, Alfresco will (by default) detect that the indexes are stale, and incrementally reindex just that content that has changed since the last backup was performed. As the size of your content set grows, the time savings from performing incremental reindexing rather than full reindexing will become greater and greater (incremental reindexing is typically measured in minutes, whereas full reindexing can take hours for large content sets).

Important note: in order for incremental reindexing to occur properly, you should set the index.recovery.mode property to 'AUTO'  to ensure that the restored Lucene indexes are incrementally built with any newer transactions in the database and contentstore. Setting this property to 'FULL' forces a full reindex even if incremental reindexing is possible (thereby negating any benefits from this procedure).


Disaster Recovery


Disaster recovery involves pushing your backup to a separate location that can be used in the event of a loss of the primary location. The most common disaster recovery process with Alfresco is to use a cold disaster recovery environment that is ready for boot-up when needed.


Activities on the Primary Site


The first part of the process is technically identical to the hot backup process described above:


  1. (optional) backup the backup Lucene indexes (frequency should be adjusted to more than nightly--perhaps hourly, to reduce DR instance startup time)
  2. backup the database
  3. backup the content store
  4. replicate all of the backups as a single unit to the DR site

This process would then be run regularly (perhaps continuously) so that the DR site tracks closely to the primary.

Note: as with hot backup, ordering is critical - performing these steps out of order will likely result in CONTENT INTEGRITY ERRORs if a restore is required.  The only invariant for the persistent state of the repo is that the contentstore is 'newer than' the database and that the database is 'newer than' the Lucene indexes. How you guarantee that invariant is completely open.

There is often confusion about how this can be a feasible DR strategy, given the (incorrect) belief that backup is time consuming and/or expensive.  The key is understanding that the invariant described above is an ordering invariant that in no way places constraints on how the backup sets are generated or replicated.  In fact, each of the backup steps described above have various efficient ways of being executed, from simply capturing deltas (e.g. transaction logs for the DB, rsync deltas for the contentstore and/or indexes) right through to sophisticated approaches involving snapshotting filesystems (such as those found in many SAN devices).

With the use of such mechanisms, as well as simple optimizations such as compression, queuing (e.g. not waiting for replication to complete before starting the next round of backups) it's entirely possible to keep the content of a DR site within seconds of the primary.


Activities on the DR Site


For restoring a backup set on the DR side, you have to be careful not to restore content that might have been corrupted during the disaster event. So the process would be:


  1. verify the timestamp of the most recent valid contentstore
  2. select a db snapshot for that most recent valid contentstore (i.e. the db snapshot that is no newer than that contentstore backup)
  3. select an index backup for that db snapshot (i.e. the index back that is no newer than that db snapshot)

Notes


Alfresco, by default, physically deletes orphaned files in the contentstore 2 weeks after they were deleted from the database.  This means that database and contentstore backups that are separated by more than 2 weeks (by default) do not constitute a valid backup set and should not be used to restore any other environment (DR or otherwise).  You can think of this as a sliding window of 2 weeks duration in which any two DB & contentstore backups will be valid, provided the DB backup is no newer than the contentstore backup (the ordering invariant).

The Records Management module changes these rules somewhat, due to the strict requirements around disposition mandated by DOD 5015.2. In a nutshell, the RM module is incompatible with hot backups, not due to anything specific to Alfresco, but due to the precise way that DOD 5015.2 mandates that physical deletions be performed.




Back to Server Administration Guide

1 person found this helpful

Attachments

    Outcomes