Implementing 'DocFlip' for FSRs

Blog Post created by pmonks2 on Oct 30, 2008
In my previous post I discussed how File System Receivers (FSRs) implement deployment transactions on top of non-transactional filesystems.  As discussed in that post, there is a window of time in which an inconsistent state could be seen by an application reading the content; that is, while the FSR is in the middle of the commit phase.  Now the duration of this window varies based on a number of factors, but in some cases it's critical to minimise the inconsistent window as much as possible, and in these cases a technique called 'docflip' can help.

I first heard about 'docflip' almost 10 years ago, and have seen it in use several times since then.  The basic approach is relatively simple:

  1. Two full copies of the target directory are maintained.

  2. A symlink is used that points to one of these directories.  All applications that are reading content use this symlink exclusively (they are unaware of the two underlying directories).

  3. At any point in time:

    1. One of the directories (the one pointed to by the symlink) is the 'live' copy.

    2. The other directory (that is not pointed to by anything) is the 'shadow' copy.

  4. A transaction involves:

    1. Writing all of the changes to the shadow copy.

    2. Either committing the transaction, which involves:

      1. Flipping the symlink from the current live directory to the (newly updated) shadow directory, effectively swapping which directory is live and which is the shadow.

      2. Re-running step 4.1 against the (new) shadow directory (the directory that was live up until step 4.2.1) – this can also be achieved by simply rsyncing from the (new) live to the (new) shadow directory, if rerunning the original set of content modifications is too difficult or expensive.

    3. Or rolling back the transaction, which involves replacing the (partially updated) shadow directory with the contents of the current live directory, without touching the symlink at all.

Note that there are some downsides to this approach, including:

  • It requires two full copies of the target directory, which can be problematic with large content sets.

  • It assumes that applications don't keep files open for extended periods of time - updates to a file are only visible when that file is (re)opened.

  • It doesn't work very well on Windows platforms due to Windows' unfortunate choice of using fully qualified paths for file handles instead of inodes, making it impossible to flip the symlink / junction if any files are currently held open by an application.

Regardless, 'docflip' greatly reduces the window of time in which the filesystem is in an inconsistent state - basically to the time it takes to rewrite a symlink.  That said it doesn't completely eliminate phantom reads, since it's still possible for an application to read a file prior to a transaction, a transaction commits (flipping the symlink) and then the application re-reads the file a second time post transaction and the file has changed.  However without introducing read transactions (which would require changes to the applications reading the filesystem, along with some kind of transaction coordinator), it's probably impossible to obtain serialisable isolation on non-transactional filesystems.

So now that we have a technique for minimising the time for changes to commit, how would this be implemented with an Alfresco FSR?

Without enhancing the FSR in any way, the approach I've considered involves:

  1. Having 3 copies of the target filesystem - one managed by the FSR, the other two (the live and shadow copies) managed by the custom 'docflip' process.  As with vanilla 'docflip' a symlink would point to the currently live copy of the content, and all applications reading the content would read via that symlink.

      • It's not possible to use the FSR's own target directory as one of the live / shadow directories, since that would require that the FSR itself can be dynamically reconfigured to ensure it always writes to the shadow (which changes with every flip of the symlink).

      1. Configuring a ProgramRunnable that calls a 'docflip' shell script.  This shell script:

        1. Replicates the deployed delta from the FSR target directory to the shadow copy.

        2. Commits the transaction by flipping the symlink (ie. swaps the shadow and live copies).

        3. Re-replicates the deployed delta from the FSR target directory to the (new) shadow copy.

      2. Rollback doesn't need to be considered, since by the time the ProgramRunnable is invoked, the FSR has already committed the deployed content to the target directory.  The only concern would be if step 2.3 fails – that would need to raise a critical administration alert of some kind since it would require manual intervention to avoid throwing all subsequent deployments into disarray.  Forcibly shutting down the FSR in this case might be justified, just to ensure that no further deployment can occur until the issue is resolved.

      Replicating the changes made to the FSR's target directory to the 'docflip' directories (steps 2.1 and 2.3) could be done in a number of ways, including:

      1. Brute force rsync of the entire target directory.

      2. Directed rsync, using the manifest of changes that are sent to the shell script by the ProgramRunnable.

      3. By interpreting the manifest of changes that are sent to the shell script by the ProgramRunnable and executing equivalent cp / rm / mkdir / rmdir commands.

      4. Implementing the entire 'docflip' process in Java instead of a shell script, and directly interpreting the manifest of changes.

      These are listed in what I believe would be least dev effort / worst performance to most dev effort / highest performance.  The 'sweet spot' is likely to be a combination of options 2 and 3, where rsync is used for creates / updates and rm / mkdir / rmdir are used for file deletes and directory operations.  If performance trumps all else option 4 is worth considering, possibly leveraging Java NIO and/or multi-threading techniques (being careful to preserve the order of operations listed in the manifest that are order-dependent eg. create directory A, ..., ..., create file A/B.txt).

      So there you have it - a (hopefully enlightening!) exploration of the intricacies of FSR deployment, as well as ways to mitigate some of the potential concerns with the default implementation.