Virtualization Server FAQ

Document created by resplin Employee on Jun 6, 2015
Version 1Show Document
  • View in full screen mode

Obsolete Pages{{Obsolete}}

The official documentation is at: http://docs.alfresco.com



{{AVMWarning}}
AVM

WCM_Preview


Table of Contents


What is the virtualization server for?


The primary role of the virtualization server is to support in-context preview of Alfresco managed content.  It is not, however, the only way to achieve this - the other options (along with a comparison between them) are described at WCM Preview.

Note that the other functions the virtualization server has been used for in the past (including dynamic Web Form includes) have been superceded by non-virtualization-server based mechanisms that are recommended going forward.  For that reason if you're not using the virtualization server for in-context preview, it is redundant and should not be installed.




What does the virtualization server do?


The virtualization server's job is to allow data within certain portions of the AVM repository to be interpreted as a set of virtual websites, and to allow these websites to be browsed by end users (and QA tools) prior to deployment (or submission to a 'common staging area').  These websites may include virtualized servlets and JSPs, but completely static websites can be virtualized too (.html, .gif, .png, etc).




How do I configure the virtualization server


See the wiki article: Configuring the Virtualization Server.


What's the proper way to start the virtualization server?


The virtualization server is a self-contained tomcat 5.5 instance that listens on port 8180 and is part of the WCM tarball.  To get it running, download the linux tarball for the WCM, unpack it and run virtual_alf.sh start.  The whole thing is self-contained under the virtual-server directory.

On Unix/Linux, do the following:

vi /opt/Alfresco/virtual_alf.sh

  1. Change ALF_HOME=@@ALF_HOME@@ to ALF_HOME=/opt/Alfresco or where ever you installed Alfresco.
  2. Change export JAVA_HOME='@@JAVA_HOME@@' to export JAVA_HOME=/usr/java/jdk1.6.0_06/ or where ever you put the Java files.
#!/bin/sh
# Start or stop Alfresco server
# Set the following to where Tomcat is installed
ALF_HOME=/opt/Alfresco
cd '$ALF_HOME'
APPSERVER='$ALF_HOME'/virtual-tomcat
export JAVA_HOME=/usr/java/jdk1.6.0_06/
.........


Ideally, the virtualization server would be started as a service under /etc/init.d but you'll have to write it yourself. 

On Windows, the 2.0 Alfresco installer attempts to run the virtualization server as a console application; however, this can cause problems (e.g.: the server can hang, due to being blocked on IO from logging).  While some have suggested introducing pause during startup, this is inappropriate also, because it only postpones the inevitable hang.   The proper thing is to install the virtualization server as a Windows service, and start/stop
it from the service manager console.

Here's how to set it up;


  • From DOS prompt (signified by 'cmd>'), to service named 'alfrescoVirtualTomcat', do this:
             cmd> set JAVA_HOME=c:\java\jdk1.5.0_08
             cmd> cd c:\alfresco\virtual-tomcat
             cmd> bin\service.bat install alfrescoVirtualTomcat

  • You can adjust various service parmeters (such as setting the 'Maximum Memory Pool' to 512 M) by typing the following command and clicking on the 'java' tab:
             cmd> bin\tomcat5w //ES//alfrescoVirtualTomcat

  • Because you'll be running virtual tomcat as a service, you must give SYSTEM full permission control over it's home c:\alfresco\virtual-tomcat. Therefore, go into Explorer (right click Start & select Explore), then navigate to c:\alfresco\virtual-tomcat, right click on it, select properties, and then select the Security tab. From there, you can give SYSTEM 'full control'. Make the setting apply recursively.


  • Now from the Windows Services pannel, you can start/stop the alfrescoVirtualTomcat service. You can use a similar technique to set up the Tomcat instance that runs the Alfresco webapp; however, be sure to do it within the 'tomcat' directory, not 'virtual-tomcat', and use a service name  other than 'alfrescoVirtualTomcat'; a logical choice might be something like 'alfrescoTomcat'.   If you'd like you can use the Windows 'Services' wizard to make it run automatically at system startup.  On XP see:
             Control Panel > Administrative Tools > Services

To modify a service setting, right click on the appropriate service, click on the 'General' tab, then modify things such as 'Startup type'.  NOTE:  If you start the virtualization service automatically, be sure to make the Tomcat instance hosting the Alfresco webapp start automatically too;  otherwise, make them both be 'Manual'. A bug has been filed on this (see:  WCM-422).

In some configurations, it was necessary to delay starting the virtualization server until after the Alfresco webapp was up (see  WCM-750).  This issue has been resolved in Alfresco 2.1E.


How do I run the virtualization server in debug mode?


Edit the configuration file:

$VIRTUAL_TOMCAT_HOME/conf/logging.properties 


Modify this file so that the following lines appear within it:

 org.alfresco.catalina.host.AVMHostConfig.level = FINE
org.apache.catalina.startup.HostConfig.level = FINE

After you've saved this change, restart the virtualization server.




How do I create or import a totally minimal/static 'hello world' website just to get started?


Create index.html


In the WCM authoring environment, you can navigate to your author sandbox of an empty web project
that you've created, and click on the 'create' menu, then select 'create web content'. You can create
a file called 'index.html' of type HTML and then enter text 'hello world'.
Assuming your virtualization server is running, you can click on the icon that looks like an
eyeball (i.e. 'preview') and you'll see your one page 'website' virtualized from your sandbox.


Import war file


Alternatively, suppose you have a static website in your UNIX or windows native file system.
Let's say this is in a directory called 'xxx', and it just has one
file called 'index.html'. This is about as simple of
a website as you can get:

 % mkdir xxx
% cd xxx
% echo 'hello world' > index.html

Now, to import this website into Alfresco, you've got to bundle it up
as a war file. That's quite easy:

 % jar -cfv example.war *

Just for the fun of it, let's see what's in that war file:

 % jar -tvf example.war
0 Thu Mar 22 12:51:20 EDT 2007 META-INF/
71 Thu Mar 22 12:51:20 EDT 2007 META-INF/MANIFEST.MF
12 Thu Mar 22 12:40:50 EDT 2007 index.html

Ok, now you've got a war file that contains the website you want
to import. Again. this war file (example.war) could have contained
lots of files & directories filled with servlets, JSPs, filters, and so
on, or it could have been something utterly mundane like a static
website. In the end, it's a war file -- that's all.

So now we have our war file, and it's time to import it using
'bulk import'. You can navigate to your author sandbox of a web project
that you've created, and click on the 'create' menu, then select 'bulk import'.
You'll be prompted to find the war file you just created in your native
UNIX/Windows file system.  Do the import, and that's it!  Again, assuming
your virtualization server is running, you can click on the icon that looks like an
eyeball (i.e. 'preview') and you'll see your website virtualized from your sandbox.

Put another way, just because you're importing a war file doesn't
mean the data *within* the war file has to be a full-blown webapp.
Simple websites can be virtualized too.


What's the 'ROOT' webapp, and how does it relate to the 'splash' page of my website?


The ROOT webapp is associated with the virtual host relative path: '/'
Suppose you only have ROOT webapp, and you issue the request:

 http://mysite.www--sandbox.<whatever>:8180/foo/bar.html

The virtualization server knows you're talking about the AVM path

 <store-name>:/www/avm_webapps/ROOT/foo/bar.html

However, suppose you had a webapp named 'foo'. Now the same
request would be mapped to the AVM path:

 <store-name>:/www/avm_webapps/foo/bar.html

In other words, having a webapp named 'foo' makes /foo/bar.html
refer to the /bar.html file within foo (as opposed to the /foo/bar.html
within ROOT).  The fact that the 1st segment of the request path in the URL
either maps to a webapp name or a path within a webapp
(depending on whether or not you have a webapp by that name)
is something that I personally dislike, but that's the spec
(grumble grumble!).  To compound matters, serlvet containers
get very unhappy in certain circumstances if you do not have a
ROOT webapp.

Therefore, here's what's done by the Alfresco GUI:


  • When you import your first webapp into a web project, the GUI forces you to make it the ROOT webapp. Thus, you've satisfied the most basic requirement of a well-configured virtual host.
  • If you want to import additional webapps into a web project, you are free to do so, and may name them whatever you'd like (typically, if your war file is named xyz.war, your webapp should be named xyz).



Unless you are doing something funky/snazzy/weird, you should
not be moving files into the avm_webapps dir directly. Instead,
just import your war normally via the web project's wizard.




What do the terms DNS, wildcard domain, website, virtual website, virtualization domain, and web project mean?


  • DNS

An acronym for 'Domain Name System'. The DNS protocol creates what amounts to a distributed telephone book for computer networks; however, instead of associating human names like 'John Q. Public' with telephone numbers (for example, 617-555-1212), DNS allows computer networks to associate domain names (for example, mail.yahoo.com) with IP addresses (for example, 192.168.1.5). Within Alfresco, the DNS Name you provide when creating a new web project is used to create a separate virtual website for each virtualized webapp.


  • Website

Everything reachable via the web at a given a protocol/hostname/port. Each of following URLs point to the splash page of a 'different' website:

       http://example.com/
       https://example.com/
       http://another-example.com/
       http://another.example.com/
       http://example.com:8080/

  • Virtual website

A website that shares the same IP address and port with other websites.  The server disambiguates which site is being accessed on the basis of the host name that the client used to reach it.  DNS allows arbitrarily many host names may be associated with a single IP address; there is no fixed limit on the number of virtual websites a server can host.


  • Wildcard domain

Within DNS it is possible to associate all subdomains of a domain with a common property, such as an IP address. For example, consider the following domains:

               example.com
           aaa.example.com
           bbb.example.com
       ccc.bbb.example.com

Note that aaa.example.com and bbb.example.com are two distinct subdomains of example.com. Further, ccc.bbb.example.com  is a subdomain of bbb.example.com. The following expression refers to all subdomains of bbb.example.com:

       *.bbb.example.com

Therefore, if we had a wildcard domain record within a nameserver that associated *.bbb.example.com with the IP address 192.168.1.5 then here are a few examples of domains names that would share this same IP address:

         ccc.bbb.example.com
     ddd.ccc.bbb.example.com
         eee.bbb.example.com

  • Virtualization domain

In Alfresco the virtualization domain is a wildcard domain that is used to associate a large number of virtual websites with a common IP address.


  • Web project

A set of virtual websites bound together in various ways to facilitate collaborative web site development. Virtual websites that are a part of the same web project are members share a common subdomain of  www--sandbox.<virtualization-domain>


Why are host names used to virtualize websites, rather than cookies, request paths, or query strings?


Most systems use one of the following methods to provide a 'virtual view' of a website:


  1. Cookies
    Issues:
    • All windows share the same set of cookies, so you can't have multiple views active at the same time.
    • It's easy to clear a cookie cache accidentally
    • There's no easy way to send a URL plus its associated cookie to another person (e.g.: via email).


  2. Request path mangling
    Issues:
    • Complicated by the need to use proxies
    • Security/DoS issues in need to cache POST bodies to handle redirects
    • Requires custom plugins to handle internal subrequests (e.g.: server side includes)<p>
  3. Extra QUERY_STRING arguments
    Issues:
    • Integrates badly with 3rd party applications
    • Requires taking special action to create 'bookmarkable' virtual link
    • Namespace pollution of application-level POST/GET arguments.



Instead, Alfresco encodes the information required for virtualization within a 'virtual hostname'.  This approach has several advantages:


  • Side-by-side comparisons of different virtual views can be done within the same browser instance
  • The browser's built-in bookmarking mechanism works (no need for javascript 'URL constructors')
  • Virtualized hyperlinks to be sent via email
  • The virtual view is plainly visible in the URL itself



The general format of an Alfresco virtual hyperlink is:

 http://virtual-hostname.www--sandbox.virtualization-domain:port/request-path 

In order for this to work, virtualization-domain and all its subdomains must resolve in DNS to the IP address of the Virtualization Server.  This is sometimes known as a 'wildcard DNS' address mapping.   There are two ways to achieve this:


  1. Use the appropriate subdomain of ip.alfrescodemo.net
  2. Configure a nameserver, and set up a wildcard domain pointing at your Virtualization Server machine's IP address.

While the second method will let you use your Virtualization Server even when disconnected from the Internet, many people will find the first method easier.

It's fairly obvious to most people why using cookies and/or extra query string arguments is a bad way to do virtualization (see above).  However, the reasons against embedding virtualization information in the request path portion of a URI are more subtle.   For starters, this technique requires that every single integration for which you forward-proxy to embed a custom plugin that duplicates the same logic for internal subrequests (e.g.: SSIs). You also lose the contents of POSTS across redirects, unless you cache these arbitrarily long POST bodies across redirects for N clients in parallel, thereby exposing you to a range of scalability issues & and potential security problems. Even then, you'd *still* have to rely upon cookies because when pages contain frames, some browsers get confused and send back bogus 'Referer' headers, thereby busting virtualization. Path-based virtualization also uses network resources less well, due to the large number or redirects required to keep requests in 'canonical request-path-mangled form'; if you don't do this, subsequent clicks end up with the wrong 'Referer' if you traverse a link of the form href='/...' (thereby propagating the wrong value into your cookie).

The only really 'tricky' problem created by embedding virtualization information into the hostname is how to make it easy for users to set up a DNS wildcard that resolves all subdomains of your 'virtualization domain' back to the same place.  The solution to this problem is the free/public EchoDNS service provided at ip.alfrescodemo.net.




What's EchoDNS?


EchoDNS is a special-purpose nameserver developed at Alfresco. It infers the IP address to return in response to a lookup request from the host label prior to the zone it's serving (and any subdomains of that host label).  This IP-bearing host label is expected to contain digits separated by hyphens.  The domain 'ip.alfrescodemo.net' is served by EchoDNS nameservers ('ns1.alfrescodemo.net' and 'ns2.alfrescodemo.net').  Thus, suppose EchoDNS is asked to translate the following FQDN into an IP address:

    alice.mysite.www--sandbox.192-168-1-5.ip.alfrescodemo.net

Its response would be:

    192.168.1.5

This is because the host label immediately prior to ip.alfrescodemo.net is:

    192-168-1-5

Alfresco virtualizes content on the basis of name-mangled hostnames (rather than using cookies or request-path mangling).  Therefore, it's important to be able to map a wildcard domains back to a specific IP address.  The problem is, which IP address?  The answer to that question really depends upon the IP address of the machine you've installed Alfresco's virtualization server on.   Ideally, people in a larger organization would have installed Tinydns, or BIND, or some other nameserver, and they could just configure things for themselves.  However, what about those who just don't want the hassle?   Note:  /etc/hosts is not a viable option because it lacks support for wildcarding;  you really do need to use a nameserver for such tasks.

The common way out of an install problem like this would be to use a nameserver on the Internet that allows wildcarding, and just take advantage of *their* setup.  Unfortunately (but understandably), companies like  http://dyndns.com require you to register with them, fill out a password, etc.  That's a hassle too... plus hosting something like this carries with it security issues due to the updates.

EchoDNS allows Alfresco to provide a wildcard DNS domain for all possible IP addresses, without requiring users to install and/or configure a name server themselves.  Further, there's no need to register a dynamic wildcard domain, because the 'answer' (e.g.: 192.168.1.5) is embedded in the question itself (e.g.:  alice.mysite.www--sandbox.192-168-1-5.ip.alfrescodemo.net).

Your virtualization domain can be configured within:

   $VIRTUAL_TOMCAT_HOME/conf/alfresco-virtserver.properties

By default, this configuration file sets the virtualization domain to:

    alfresco.virtserver.domain=127-0-0-1.ip.alfrescodemo.net

However, you can (and typically should) change this to allow any browser in your LAN to access the virtualization server.  When the virtualization server starts up, it registers itself with the Alfresco webapp, and tells it the value of your alfresco.virtserver.domain property.   The Alfresco webapp then uses this value when creating clickable 'preview' links (c.f.: the 'eyeball' icon).

For more details, see:  Configuring the Virtualization Server.


Can the virtualization server run 'stand alone' on a machine with no connection to the Internet?


Yes.  If you want to work on a machine with no access to the internet (thus precluding using XXX-XXX-XXX-XXX.ip.alfrescodemo.net as your virtualization domain), then you should set up your own name server (or reconfigure your existing one) so that it includes a wildcard domain for virtualization. If you rely on a hack like
editing /etc/hosts with some pain you'll be able to set things up for a static set of users & staging areas, but reviewers will never be able to preview content that comes to them via a workflow. This is because workflows use ephemeral AVM stores with ephemeral DNS names -- it's an ever-changing set of GUIDs. Unfortunately, there's no way to create a DNS wildcard in the 'hosts' file... you must use a nameserver (e.g.: Djbdns, or BIND).

As mentioned in: Configuring the Virtualization_Server, you can configure BIND on windows or UNIX to create a wildcard DNS domain for virtualization. If you're on a UNIX platform, you have at least two free/common options: BIND and Djbdns.

Shameless plug:  Djbdns (tinydns + dnscache) is secure, stable, and well-designed.  No security flaws have been
seen in Djbdns for the past few years -- despite a $500 reward for the first person to find one.  By contrast, a fresh set of horrible security issues are discovered with each new release of BIND.   If you're on Unix/Linux, and you don't already use Djbdns, consider the merits of switching.  On the down-side, the docs for Djbdns are only so-so, and the config may strike you as a little strange at first (Daniel Bernstein has a different approach to daemons than most folks in the world of UNIX). It may take you a day or two in order to feel like you've got the whole thing under your belt.  Once you've gotten over this initial hurdle, it's very smooth sailing.

There's no denying one thing: DJB's programs are well thought-through and executed with a level of care that has won him a devoted following.




Can the virtualization server be on a different machine than the one hosting the Alfresco webapp?


Yes.  Edit the configuration file:

  $VIRTUAL_TOMCAT_HOME/conf/alfresco-shared.properties  

Within this, set the attribute alfresco.rmi.services.host to the name of the host running the AVM.
And also ensure that alfresco.rmi.services.port is correct.

When you restart the virtualization server, it will register itself with the alfresco webapp at alfresco.rmi.services.host, and let WCM know that it can be contacted back at all subdomains of alfresco.virtserver.domain.


How can you simplify the URLs that end-users see?


Important information is embedded within URLs to virtualized assets that is needed to determine the store and version being accessed.  That said, if you want to create a pretty-looking facade for a single store/version, this can be accoplished using a reverse proxy.

For example, suppose we have a sandbox for the user alice within the web project 'mysite', which is hosted by a virt server on port 8180 of a machine who's IP address is 192.168.1.5.   Normally, you'd access it via your browser using URL looking something like this:

  http://alice.mysite.www--sandbox.192-168-1-5.ip.alfrescodemo.net:8180/

Instead though, suppose we want this to appear to the outside world as:

  http://example.com/

All you'd need to do is set up Apache2 as a reverse proxy on example.com that fetches its real data from the virt server.On example.com, your Apache2 config would look like this:

   ProxyRequests off
   ProxyPass           /  http://mysite.www--sandbox.192-168-1-5.ip.alfrescodemo.net:8180/
   ProxyPassReverse    /  http://mysite.www--sandbox.192-168-1-5.ip.alfrescodemo.net:8180/

Now, suppose a user in the outside world goes to:

   http://example.com/moo/cow.html

What happens is Apache 2 on example.com fetches data from:

   http://alice.mysite.www--sandbox.192-168-1-5.ip.alfrescodemo.net:8180/moo/cow.html

That data is then served back to the user. The idea is that you can hide a backend server (i.e.: the virt server) behind a front end (i.e.: apache2), and the end user's browser will never know. Thus you can make the URLs look however you wish.  Typically, you'd do this for staging sandbox, not a user sandbox, so instead our config might look more like this in the real world:

   ProxyRequests off
   ProxyPass           /  http://mysite.www--sandbox.192-168-1-5.ip.alfrescodemo.net:8180/
   ProxyPassReverse    /  http://mysite.www--sandbox.192-168-1-5.ip.alfrescodemo.net:8180/

Once you get your mind around configuring Apache2,  there's a lot of power here!


Is the virtualization server's use of wildcard DNS compatible with HTTPS?


Yes. You need to use an X.509 wildcard cert.  They are easy to create/buy, and browsers work with them nicely.  If you want to experiment with this yourself, just install openssl; when you get to the step of creating the actual certificate request, do something like this:

  openssl req -new -x509 -keyout demoCA/private/cakey.pem -out demoCA/cacert.pem -days 3652

You'll be asked to specify the 'Common Name' (or 'CN') in the key signing request; just use a wildcard at the appropriate subdomain level (e.g.: *.your-virtualization-domain ). It's that simple.  It works like any other cert.




What is the difference between a native file system path, a JNDI path, and an AVM path?


The term 'native file system path' refers to a file or directory within the 'native file system' of the host computer. Absolute native file system paths on UNIX begin with a  '/' character.  For example:

 /etc/passwd

On Windows, absolute file system paths begin with a drive, letter followed by a colon.  For example:

 c:/WINDOWS/win.ini

Java Naming and Directory Interface (JNDI) is a Java-based API that provides the ability to construct very generic/abstract 'naming and directory services'.  JNDI is just an interface;  it is the pluggable implementation behind this interface that determines exactly what is being 'named', and how the 'lookup' operation is performed.  JNDI can be used for naming files in a file system, users in LDAP, DNS, NDS, or even objects in an arbitrary data structure.

Rather than access files and directories via APIs specifically geared toward native file system paths, Tomcat uses abstract JNDI-based APIs, in combination with a 'pluggable' concrete implementation that interprets the 'names' given to it as native file system paths; from there, the filesystem-specific implementation uses normal file system APIs to fetch the result, and returns it back to the abstract JNDI-based interface invoked by the user.          The beauty of JNDI is that because the interface is abstract, the user-level code that relies upon it does not need to know or care how the actual 'lookup' is done.  Therefore, it's possible for Tomcat to combine the abstract JNDI interface with different concrete implementations for different purposes.  For example, one way Tomcat uses this generality is to allow it to fetch data from unexpanded '.war' files.

Alfresco's Tomcat-based virtualization server replaces the default concrete JNDI implementation (which accesses the local native file system) with one that accesses the AVM via calls to AVMRemote.  Because of the generality of JNDI, 'names' can really look like just about anything (e.g.: 'names' for LDAP don't look like file system paths at all).  However, the virtualization server uses a JNDI naming scheme that happens to look exactly like native file system paths.  Therefore, if you call the standard servlet function getRealPath() what you'll see is a JNDI name that looks like a valid file system path for whatever operating system is hosting the virtualization server.  For example, on Unix, you might see a path like:

 /media/alfresco/cifs/v/mysite--alice/VERSION/v-1/DATA/www/avm_webapps/ROOT/index.html

The interesting thing to note is this:  you can still see preview this index.html file in your browser via the virtualization server even though there is no such 'file' visible in your native file system!  This is because getRealPath() returns a JNDI name (which is 'abstract'), and this name only looks like a native file system path.  The concrete JNDI binding used to fetch this 'name' employs AVMRemote, not the native filesystem-centric APIs.  Behind the scenes, this 'name' is translated into an AVMRemote call that fetches the data within index.html  by transforming th JNDI path into an AVM path and a version number.   In this example, the associated AVM path is:

 mysite--alice:/www/avm_webapps/ROOT/index.html     

To look anything up within the AVM, you must always specify a version number and an AVM path. This is because the AVM is a versioning content repository.  The special version number '-1' refers to the 'latest' contents.  All other version numbers are non-negative integers.   The AVM path itself is composed of two components:  the virtual repository name, and the repository-relative path;  a colon is used to delimit these two pieces of information.  Therfore in the previous example, the virtual repository (aka: 'store') being accessed is:

 mysite--alice

The repository-relative path is:

 /www/avm_webapps/ROOT/index.html

Many webapps never call getRealPath() because considered a 'best practice' to avoid assuming making assumptions about whether or not the application server has expanded your webapp's .war file; instead, such webapps merely generate pages with URLs, which are then handled via the built-in mechanisms JNDI binding.  However, because many people are not too familiar with JNDI, sometimes it's easier for them to use filesystem-centric functions directly.  In other cases, a webapp uses a library that makes direct calls to the file system, so it can't control what's being done.   In situations like this, you'd really like the JNDI name to correspond to an actual file in the native file system.

The JNDI names used for files and directories within the AVM have been designed so that they contain all the information needed by AVMRemote (the version number, the store name, and the store-relative path), while still conforming to the constraints imposed by the host operating system for native file system paths.   Alfresco has the ability to mount the AVM using a CIFS client.  If your CIFS mount point agrees with the value you've specified in $VIRTUAL_TOMCAT_HOME/conf/alfresco-virtserver.properties, then your webapp can use the JNDI names returned by getRealPath() just as if they *were* 'native file system paths'.   CIFS handles the mapping to AVM assets for you. 

By default, the CIFS mount point on UNIX is:

  /media/alfresco/cifs/v

By default, the CIFS mount point on Windows is:

  v:

For more details, see the internal documentation within
$VIRTUAL_TOMCAT_HOME/conf/alfresco-virtserver.properties

The utility class org.alfresco.util.JNDIPath  allows you manually translate between a JNDI name and its associated  AVM version and path.  This allows webapps to use the full power of AVMRemote, and eliminates the need for setting up a CIFS mount. 




Is the servlet method getRealPath() supported?


If you want to use the JNDI path you fetched from getRealPath() directly as a valid file system path, then you'll need to create a CIFS mount so that when Java goes to look for your file, that 'file system' will actually be there.  If you try do use the JNDI path you get from getRealPath() as if it were a valid native file system path without    having created a CIFS mount in the location indicted by $VIRTUAL_TOMCAT_HOME/conf/alfresco-virtserver.properties, you'll get null pointer exceptions.  While using a CIFS mount is a nice way of working around a webapp that relies on a file system, it has a few disadvantages. For one thing, you don't have access to the full AVMRemote API, and for another, it creates another out-of-band config to get right.

However, you could avoid all reliance on CIFS mounts if you translate the JNDI path you've gotten back from getRealPath() into an AVM path.  If you do this, you can then use AVMRemote to fetch whatever you want (and have the full power of the remote API at your disposal, not just what's exposed in CIFS.  The JNDI names used to reference all assets in the AVM begin with a string whose value you can fetch via the static method:

org.alfresco.jndi.AVMFileDirContext.getAVMFileDirMountPoint()

This mount point can then be used to parse the jndi path you get from the servlet method getRealPath() to produce an AVM version & AVM path that can be passed as args to AVMRemote using:

 org.alfresco.util.JNDIPath

This class is available within:

 $VIRTUAL_TOMCAT_HOME/common/lib/alfresco-jndi-client.jar

Therefore, it is already in their classpath, so you can use it from within your webapp. Here's the constructor:

 public JNDIPath(String mount_point, String jndi_path)

Therefore, you could say:

 String mount_point = org.alfresco.jndi.AVMFileDirContext.getAVMFileDirMountPoint();
String real_path   = ... whatever the servlet method getRealPath() says...
JNDIPath p         = new JNDIPath(mount_point, real_path );

Now do whatever you want with 'p.getAvmVersion()' and 'p.getAvmPath()'.

For example, on UNIX, if the constructor args are:

 mount_point == /media/alfresco/cifs/v
jndi_path   == /media/alfresco/cifs/v/mysite/VERSION/v-1/DATA/www/avm_webapps/ROOT

Or in Windows, if the constructor args are:

 mount_point == v:
jndi_path   == v:/mysite/VERSION/v-1/DATA/www/avm_webapps/ROOT

Then:

 getAvmVersion() == -1
getAvmPath() == mysite:/www/avm_webapps/ROOT

From here, you can use the values returned by getAvmVersion() and getAvmPath() to query AVMRemote.  Note that the value for the JNDI mount point is configured in $VIRTUAL_TOMCAT_HOME/conf/alfresco-virtserver.properties.

The property on UNIX is:

 alfresco.virtserver.cifs.avm.versiontree.unix

The property on Windows is:

 alfresco.virtserver.cifs.avm.versiontree.win




Is the virtualization server limited to Java-based websites?  What about php or .net?


There are two different senses of 'virtualization': low-level 'content virtualization' (provided by the AVM), and higher-level 'website virtualization' (provided by the Tomcat-based virtualization server). Currently, support for virtualizing .net & php websites does not include integration with the Alfresco GUI. However, if all you want to do is see the content within a specific set of workareas rendered within your webserver (e.g.: php pages served using Apache 2), then you can do the following:




  •   Create a CIFS mount to expose the contents of the AVM.  Now any webserver (not just Alfresco's virtualization server) can access this data just as it would any other set of files in an ordinary file system.
  •   For each AVM 'store' you want to be able to browse in your non-tomcat web server (e.g.: Apache 2), configure a virtual host and use the appropriate path within CIFS as your docroot for that virtual host.  For example, in Apache 2, you'd configure separate <VirtualHost> entries by hand for each area you want to be able to browse.



This approach has some obvious limitations: all virtualized areas must be configured within your content server (e.g.: Apache 2) as a virtual host manually. It also won't scale as nicely as the virtualization server does in terms of having many different users automatically sharing the same underlying libraries (the virtualization server does some fancy footwork to share the jar files in the staging WEB-INF/lib directory with the user-level workareas when they're identical). Further, you won't be able to just click on a URL in the Alfresco GUI and see a window pop up with your website in it.

However, if you have a limited number of author workareas & staging areas, using CIFS to expose the contents of the AVM as a file system does work quite well -- it is surprisingly fast. If you don't want to go through the hassle of trying to customize the Alfresco GUI, there's a simple low-tech solution that might be adequate for you in the short term: bookmarks.

The Tomcat-based virtualization server can deal with static sites, and full-blown java-based websites (webapps/servlets/jsps, etc).  Providing a rich integration for virtualizing Apache 2 might be the next logical step; the goal would be to make it so that you'll end up with a different Apache 2 virtual host per area automatically, just like the current Tomcat-based virtualization server does for webapps.  Another goal would be to make it easier to configure an Apache 2 / Tomcat stack.  If this work gets underway in earnest, announcements will be made & this page will be updated.

Incidentally, the bias towards Apache 2 over 1.x is due to the fact that Apache 2 works nicely on both Windows & UNIX/UNIX-like operating systems, while Apache 1.x behaves poorly on Windows (it's slow & buggy).


Are files edited/created in CIFS immediately visible via the virtualization server?


Yes.  However, there are two potential issues you must be aware of:


  1. Client-side (browser) caching
  2. Modifications within the WEB-INF and META-INF directory won't be visible unless you reload the webapp

Most web developers are familiar with client-side caching issues.  A common solution is to just hit refresh (CTRL-R) within the browser.   A more drastic, and less efficient solution is to turn off the browser's cache entirely for some amount of time;  this is seldom a good idea, particularly because it's possible to configure the virtualization server to inject different Cache-Control headers into its responses for each type of workarea (author, preview, workflow, and staging).  For more details, see the 'Cache-Control parameters' section of
$VIRTUAL_TOMCAT_HOME/conf/alfresco-virtserver.properties

Updates to WEB-INF and META-INF via CIFS are another matter.  When you make modifications to the contents of WEB-INF or META-INF via the browser-based interface provided by the Alfresco webapp, the webapp issues JMX messages to the virtualization server informing it that the webapp must be reloaded  (along with any other virtual webapps that are transparently layered over it).   However, if you modify the contents of your webapp's WEB-INF or META-INF directory via CIFS  (e.g.:  adding/changing jar file), the virtualization server will have no idea it needs to refresh the set of classes loaded behalf of your webapp.  Therefore, you must go back to the Alfresco webapp GUI and hit 'Refresh' manually on the appropriate sandbox in your web project.



EXAMPLE:

Given:


  • Your webapp war file is called alfresco-sample-website.war
  • Your web project is called 'test'
  • Your CIFS mount point is the 'v' drive
  • You want this to be the ROOT webapp of your 'test' web project



Navigate to:

 v:/test/HEAD/DATA/www/avm_webapps/ROOT

And say:

 jar xvf  alfresco-sample-website.war

If you've done this properly, you should see the WEB-INF and META-INF within:

 v:/test/HEAD/DATA/www/avm_webapps/ROOT

At this point, you can go back to the GUI and hit the 'Refresh'
control associated with the staging sandbox of your web project.
Once the virtualization server has reloaded it, you should be
able to click on the 'eyeball' icon in staging (or any of the
associated author sandboxes) and see your webapp virtualized.

Again, this is only necessary if you upload content via
CIFS that modifies the META-INF or WEB-INF directories.
Otherwise, you can modify content directly in CIFS and/or
the GUI and no special action needs to be taken by the user.



DEBUGGING:

If you really want to see what's going on at a deeper level, you can edit:

$VIRTUAL_TOMCAT_HOME/conf/logging.properties

Then set:

 org.alfresco.catalina.host.AVMHostConfig.level = FINE

However, changing the debug level of the virtualization server requires you to restart it;
you only need to restart the virt server, not the alfresco webapp, if you want to employ
this low-level debugging technique.




How fast is the virtualization server compared to a 'normal' version of Tomcat?


The virtualization server is slower at deploying web applications than a normal version of Tomcat, since each web application is read out of the Alfresco repository via RMI rather than being read directly off disk (as happens with a vanilla installation of Tomcat).  Once deployment is complete however, performance should be fairly comparable.  You can test this for yourself quite easily by doing a CIFS mount, copying the contents of a webapp to a native file system, and pointing a 'pristine' version of Tomcat at it.   From there, a cheap and cheerful way to do a comparison on your hardware is to just point a spider at the splash page of each server.


Is the virtualization server required in the deployed/production environment?


No.




Can the virtualization server be used in the deployed/production environment?


Technically yes, although this is not recommended by Alfresco.  The thing you've got to assess for yourself is the performance and scalability of the virtualization server in your environment, with your traffic and load characteristics. There are many ways to measure this, and many different kinds of load your site might experience depending upon the nature of your content, and the pattern of use experienced by the site. Some sites are very bandwidth heavy, some demand more CPU and/or memory, some tend to get more bursts of activity from users than others, and so on.

Here are some things to consider:


  •   HTTP caches can make a dramatic difference
    • Be sure you understand caching in HTTP
    • Apache2 has mod_cache but there are alternatives.
    • Use URLs consistently (i.e.: same content == same url, when possible)
    • Avoid replacing files on update when contents have not changed
    • Prefer GET over POST (some caches don't store POST responses)
    • Limit cookie use to dynamic pages
    • Avoid embedding user info in URLs
  • The features/performance trade-off of using AVM versus deploying to native file system
  • Separation of development environment from live environment



The last two points are probably worth discussing in more detail. While serving content directly out of an Alfresco AVM instance will offer
you a wider set of features, nothing is going to beat the performance of deploying your webapp to a native file system and serving content
directly off of that (e.g.: on UNIX, an ext3 file system, on NT, NTFS). Whether you want to serve content out of the AVM directly
or off of a native file system, it's advisable to make your entire live/customer-facing infrastructure totally independent
of the one you're using for development (e.g.:  machines, processes, databases, disks, subnets, etc.). The central idea is to arrange things so that no matter how hard the users or developers pound on things, no matter what security issues arise, what network traffic is generated, or what sort of maintenance/upgrade/reboot shenanigans are necessary from time to time, doing something to one system does not effect the other in any way.

If you are going to serve content directly out of the AVM, caching might be more important. There are so many variables when it comes to this stuff that the best advice is to test with a realistic load, measure, tune, and experiment with the many options available to you. Once you've done that, throw your system some curve-balls such as alternating periods of quiescence and high load, long sustained periods of activity, lots of simultaneous requests, fewer but heavy requests, and so forth.  The approach(es) that work best will depend upon the sheer scale of what you are doing, the audience you're serving, and your site's technical & business requirements.


Can links to external websites be virtualized?


If you are willing to suffer some performance penalty, you can map external links back into a sandbox. It might be nice to have a browser-specific plugin  to do this (it would be faster and much more scalable), but there is a forward proxy solution based on Apache2 you could right now.

For more details, see: WCM-128




Can the virtualization server help me detect dead links in my webapp?


Yes, it already does.

The link validation feature built into the Alfresco 2.1 webapp GUI uses the virtualization server to fetch pages in the website.  These pages are hyperlinks, which are validated as well.  Note that because link validation is done by fetching the page as a browser would, both static and programmatically generated links are validated.  In order to build self-consistent link validation reports, the pages being virtualized within the staging area are all taken from the same snapshot.   Thus, the combination of AVM's built-in versioning feature and the virtualization server's ability to virtualize an archived edition makes it possible to construct validation information incrementally (from checkin to checkin), and eliminate all 'version skew' from the reports it creates.




What can't be virtualized in an fully automatic way?


There are some limits to what can be virtualized; for example, if you have a singleton of any sort (not just a java singleton), then it will be shared by all virtual webapps based off of the project's staging webapp.

For example, suppose your webapp writes to a database table.  If users Alice and Bob are viewing their 'separate' virtualized instances of the webapp within their own sandboxes, and Alice does something that makes this webapp modify the 'moo' table, then Bob will see the change Alice has made to the 'moo' table immediately.  In other words, the virtualization server can virtualize files but can't magically restructure arbitrary hard-coded programs.




How are XForms in the Alfresco webapp related to the virtualization server?


Currently, Alfresco's ability collect XML data via browser-based XForms that are auto-generated from XSDs is tied to the Alfresco webapp.  The current integration also makes it possible to associate the data that's collected with one or more 'data rendering' templates (such as XSLT or Freemarker);  this allows you to do things like produce HTML output in one easy step.   From there, the current integration can also create a URL to these generated web pages that allows them to be previewed via the virtualization server.

Eventually, we'd like to repackage things in such a way the websites you're building can use these auto-generated forms too.




Why do URLs to output derived from data captured by XForms begin with 'preview.' ?


Any data rendering template that is powerful enough to allow for programmatic extensions (e.g.:  XSLT, Freemarker) is also powerful enough to generate an arbitrary set of output.  For example,  it is possible to create  Freemarker template that produces multi-paginated output (several html files connected via next/previous links).  It's also possible for names of a template's output files to be based on arbitrary dynamic logic and/or manipulations of the XML data itself.

Note also that it's fairly common for someone who is exploring different visual tradeoffs to  want to experiment with 'what if' scenarios.   Depending on the template, seemingly minor changes to the data or the template's internal logic could cause a very different set output files to be created.   The problem then becomes allowing users the freedom to experiment with different visual possibilities while ensuring that none of their tests overwrite 'precious' data in their workarea  (or leave detritus behind afterwards).  The solution is this:


  • Author sandboxes consist of two 'stores':  its 'main' store and its 'preview' store.
  • The preview store is a transparent overlay on the main store.
  • The main store is a transparent overlay on the staging sandbox's main store.
  • Initially, the Alfrseco webapp puts template output into the authors preview store.

Thus, you can consider the 'preview' store within an author's sandbox as a sort of 'scratch area'.  Suppose you have a web project named 'mysite', and an author sandbox named 'alice'.   A URL like the following one is like looking through a transparent 3-layer sandwich:

  http://preview.alice.mysite.www--sandbox.<virtualzation-domain>.../

URL like the following one will omit the contents of the 'preview' layer within the 'alice' sandbox entirely:

  http://preview.alice.mysite.www--sandbox.<virtualzation-domain>.../

This arrangement makes it possible for the user to 'accept' the output of the template (there by moving it into the 'main' working layer of the 'alice' sandbox), or to reject it without fear that this experiment has polluted the sandbox's main working layer in any way.  

Incidentally, you might also notice that even if your browser's cache time is long, by default it won't end up caching the contents of a preview layer for more than a few seconds.   This feature allows you to do multiple 'what-if' experiments in sequence on the same file without requiring the user to do an explicit browser refresh (CTRL-R).  See the 'Cache-Control parameters' section within $VIRTUAL_TOMCAT_HOME/conf/alfresco-virtserver.properties for more details.




How can I configure the virtualization server to manage client-side caching?


See the 'Cache-Control parameters' section within $VIRTUAL_TOMCAT_HOME/conf/alfresco-virtserver.properties explains all this, but here's the same information in wiki form:



The virtualization server limits how long a browser may cache items received from different virtual hosts by injecting 'Cache-Control: max-age=...' HTTP headers in its responses.  The actual value used for max-age depends upon what the name of the virtual host is.  By default, low values are used for 'preview' hosts, and longer values are used for 'staging'.  You can tune these settings here.

For generic browsing on the Internet, a user might choose to configure  their browser to cache data for a very long time.  It's a personal preference:  the longer an item is permitted to be cached, the better the browser's performance is likely to be.  However, it also becomes more likely that 'stale' versions of files will be viewed.

When browsing content on the virtualization server, the trade-offs are not only different from 'generic' Internet browsing, they're also different for each major category of virtual host (i.e.:  'staging', 'preview', etc.).

Consider this:


  •   The probability of viewing 'stale' cached data is the probability that file XYZ changes multiplied by the probability you happen to look at XYZ within  some allotted time interval.
  •   Content in a virtual host devoted to 'preview' operations is almost certain to be viewed within a few seconds of being created/modified.  However, you do want a little bit of caching to handle multiple repeated images within a single page in an efficient manner.
  •   Content in a virtual host devoted to 'staging' may be changing, but the likelihood of viewing any given page is fairly low.  Staleness in 'staging' is also better tolerated because the user probably isn't working on these files, and hence won't know/care if they're a bit out of date.  Thus, the performance/freshness trade-off is different.

If the user does encounter  'stale'  data, they can always hit their browser's 'refresh' button.   The goal of these settings is to help them avoid having to explicit refresh *most* the time.

If an administrator errs to much on the side of short max-age values, performance will suffer.  Browser caches are particularly important on slow links.

If an administrator errs to much on the side of lengthy max-age Cache-Control settings, users might end up turning off their browser's cache entirely.  This would hurt performance even more.  Again, it's a trade-off.

The $VIRTUAL_TOMCAT_HOME/conf/alfresco-virtserver.properties file specifies the following defaults settings:

 # Time is given in seconds:
#
alfresco.virtserver.cache-control.max-age.preview=4
alfresco.virtserver.cache-control.max-age.workarea=1800
alfresco.virtserver.cache-control.max-age.staging=1800
alfresco.virtserver.cache-control.max-age.default=1800




How are virtualized websites related to virtualized content in the AVM?


The AVM is a very general-purpose content repository; some of its features are being used to support web content management (WCM), but it is capable of performing many other tasks. For example, we plan to create default configurations that will be suited to the needs of source code management (SCM).

In WCM, every virtual repository (aka: 'store') contains a top-level directory named 'www'.   By default, the 'www' directory of 'main store' of each author and workflow sandbox is just a transparent overlay on the 'www' directory of the corresponding staging sandbox's 'main store'.

The Tomcat-based virtualization server transforms the virtualized webapp files in the AVM into a set of name-mangled virtual webapps.  If you're curious, you can see some of this in action if you look at virtualization server's 'work' directory.  For example, on Unix/Linux, if you have two web projects ('mysite' and 'silly'), the 'work' directory will be:

$VIRTUAL_TOMCAT_HOME/work/Catalina/avm.alfresco.localhost/                                       

Its contents might look something like this:

  $-1$mysite--admin--preview$ROOT/       $-1$silly--admin--preview$ROOT/
  $-1$mysite--admin$ROOT/                $-1$silly--admin$ROOT/
  $-1$mysite--alice--preview$ROOT/       $-1$silly--alice--preview$ROOT/
  $-1$mysite--alice$ROOT/                $-1$silly--alice$ROOT/
  $-1$mysite--bob--preview$ROOT/         $-1$silly--preview$ROOT/
  $-1$mysite--bob$ROOT/                  $-1$silly-$ROOT/
  $-1$mysite--preview$ROOT/              host-manager/
  $-1$mysite$ROOT/                       manager/
  $42$mysite$ROOT/

Caution:  this is being shown only to give you a peek under the covers.  The specifics of the name-mangling are subject to change at any time, without notice.

Now suppose that the virtualization server gets a request for:

   http://alice.mysite.www--sandbox.192-168-1-5.ip.alfrescodemo.net:8180/...

At this point, it reverse-proxies the request to the appropriate version of the appropriate name-mangled webapp.  In this example, the HEAD versions of all the webapps associated with the 'mysite' and 'silly' project have been transformed into virtual webapps  (each web project just has a single 'ROOT' webapp).  In addition, version 42 of the staging repository in 'mysite' has also been 'brought online' explicitly.




What do the directories seen via a CIFS mount of the AVM represent?


When you do a CIFS mount of the AVM, and you look at the top-level directories you might see a bunch of directories like this:

  mysite
  mysite--admin
  mysite--alice
  mysite--bob

Each of these directories corresponds to a store within the AVM.  A sandbox is a collection of stores.  Each store (aka: 'virtual repository') is very much like a Subversion (SVN) repository, only you can overlay them using a mechanism called 'transparency'.  See also: Collaborative Content Production.

When each user works in their own sandbox, they all enjoy the ability to isolate their development environment (e.g.: what files are changed) from everyone else's. The reason why sandboxes include both a 'main' working store, and a 'preview' store is so that templating operations can be 'previewed' without clobbering any of the precious data in the main working store of the user's sandbox.  If you like what you 'preview', you can accept it; otherwise, you can toss it away and be confidant that *nothing* in your main workarea has been altered.

Just as the store used for the author's main workarea is overlay on your web project's staging store, ephemeral 'preview' data is sent to a store that's an overlay on the author's main workarea.  A triple-decker transparent sandwich, if you will.

Just as a sandbox is a higher-level structure of AVM content stores bound together by metadata & workflows, a web project is an even higher-level structure of sandboxes bound together by metadata & workflows too. A web project is the high-level object corresponds to *all* the various areas & objects used to create a multi-user collaborative environment devoted to building a website.

Suppose you do a directory listing in CIFS, and you see something like:

  mysite
  mysite--admin
  mysite--alice
  mysite--bob

Or even:

  mysite
  mysite--preview
  mysite--admin
  mysite--admin--preview
  mysite--alice
  mysite--alice--preview
  mysite--bob
  mysite--bob--preview

What you're seeing here are stores (low-level virtual repositories) within the AVM. The default naming scheme of these stores gives you a very good idea of how things are organized under the hood (unless someone has renamed them).  For example, all these stores probably belong to the web project named 'mysite'. Within the 'mysite' web project the user Alice has a sandbox that consists of two stores: mysite--alice and mysite--alice--preview. Note that the mysite project also has a sandbox for bob & admin. The 'staging area' of the mysite project contained within a store named 'mysite'; note also that there's yet another store called 'mysite--preview' in case you want to do fancy stuff later like deferring the final promotion of submitted content to the main staging area. Currently, all sandboxes in Alfresco consist of just 2 stores (it's 'main' store and the scratch 'preview' store), but this is something that will probably be made more flexible & tunable later on. As things stand, the CIFS projection avoids showing you all the stores by default... it omits showing you most of the 'preview' stores because people typically don't want the visual clutter of seeing directories that just contain ephemeral/scratch data.

Another interesting thing to note is that the leading portion of the default DNS name corresponding to each store follows a very similar scheme to the one used by CIFS, only the order is reversed.  For example, the URL for the main working store of Alice's sandbox looks like:

 alice.mysite.www--sandbox.<virt-domain>:<port>/...

By making the CIFS name for this store look like 'mysite--alice' the users of the file system get to see the 'real' store names and everything lines up nicely when you do a directory listing (and works nicely with tab-completion). File systems are organized from least specific to most specific, so this makes perfect sense.

However, DNS is organized the opposite way: from most to least specific.  Thus, in order for stuff like wildcard cookies, wildcard X.509 certs, etc to work properly, 'alice' is a subdomain of 'mysite'. Similarly, when you're previewing ephemeral 'what if' data in templating, your URL will look like this:

  preview.alice.mysite.www--sandbox.<virt-domain>:<port>/...

By default, that will correspond to the mysite--alice--preview store.  Several important details are being glossed over in this discussion, because the real associations are determined by metadata values associated with the stores, not by the store names.  However, if you don't change the naming scheme by renaming stores, the simplified description provided here is valid. Relying on metadata is more robust because it allows us to do store renaming (if/when necessary), and still have all the workflow & webapp logic dealing with stores, sandboxes and web projects work.




How is the virtualization server related to other parts of Alfresco?


Within the Alfresco webapp suppose you have a web project named 'mysite', and two users: Alice and Bob.   Within Alice's sandbox, the 'eyeball' icon will have URLs for Alice of the form:

  http://alice.mysite.www--sandbox.<virtualization-domain>:8180/...

The 'eyeball' icon within the Alfresco webapp will have URLs for Bob like this:

  http://bob.mysite.www--sandbox.<virtualization-domain>:8180/...

Let's say your virtualization server is listening on IP address 192.168.1.5, and that you're using the EchoDNS server at ip.alfrescodemo.net to deal with DNS wildcards. Therefore, *.192-168-1-5.ip.alfrescodemo.net will be resolved as the IP address 192.168.1.5. Consequently, the following URL will resolve to 192.168.1.5 (on port 8180):

  http://alice.mysite.www--sandbox.192-168-1-5.ip.alfrescodemo.net:8180/...
                              

Note that the same is true of the following url for Bob:

  http://bob.mysite.www--sandbox.192-168-1-5.ip.alfrescodemo.net:8180/...

The wiki page entitled Configuring the Virtualization Server covers this in greater detail.

In any event, using the area info embedded into the 'virtual host name' the virtualization server is able to figure out the associated AVM repository and area-ize the request path accordingly when it makes calls to AVMRemote behind the scenes. There's a lot of other stuff going on too, like how it knows when to reload a webapp, how it maps between JNDI/webapp/CIFS/URL namespaces, and so on.

The virtualization server is a modified version of Tomcat that knows how to use interpret the area-izing information embedded within a URL ( *.www--sandbox.*:8180/...), and fetch data via AVMRemote. It also does some fancy footwork with classloaders to allow many users to have what appears to be a separate version of your webapp in the 'staging' area, and yet still share the jar files they have in common, when possible (this allows the system to scale).




How is the virtualization server programmatically connected to the Alfresco webapp?


At startup, the virtualization server attempts to connect to the Alfresco webapp every few seconds until it succeeds.   Once connected, it registers itself with the Alfresco webapp (to receive JMX event notifications later), and queries the AVM to determine which virtual repositories contain webapp data, and what DNS name has been associated with each of them.   Then, a classloader hierarchy for the virtual webapps is created that parallels the overlay structure of the AVM stores containing them (this is used whenever possible, to allow the virtualization server to scale).  From there, a webapp name mangling scheme is used to create a lexically unique name for every virtual webapp.   Reverse proxy logic within the virtualization server maps the DNS name provided by clickable 'eyeball' URLs in the Alfresco webapp into a request for the appropriate 'name mangled' webapp.  From there, Tomcat's normal servlet container logic takes over.  Access to the 'file system' is abstracted via JNDI, so Tomcat does not know or care that it is fetching 'file data' using AVMRemote, rather than 'native file system APIs'.   Certain events within the Alfresco webapp (such as the creation or destruction of a sandbox) cause JMX event notification messages to be sent to the virtualization server.  This allows the set of virtual webapps hosted by the virtualization server to remain in-sync with the virtualized file and directory data maintained by the AVM.

This is an over-simplification, and like most implementation-level details, it's subject to change without notice.  However, it's nice to have a rough idea how things work as an aid during trouble-shooting sessions.




Why are there constraints on the virtualization server's installation directory?


By default, neither Alfresco's Tomcat-based virtualization server nor the instance of Tomcat used to host the Alfesco webapp contains whitespace characters (or any other characters that require URL encoding).   If you customize your installation, you must abide by the same constraint.  This is due to a long-standing bug in the JRE. 

If you're curious, you can read all about it on Sun's bug database:  http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4273532

There's also another related issue on Sun's bug database that's marked as a 'duplicate', but it's worth reading too:  http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4496398




Why is it that when I click on the 'eyeball' icon in the webapp, I just get a blank browser window?


Here's a helpful checklist:


  •   Make sure the virtualization server is actually running
  •   Make sure the security settings between the webapp & virt server match
  •   DNS issues (are you relying on ip.alfrescodemo.net, yet disconnected from the Internet?)
  •   Firewall issues (have you blocked the RMI ports?  Try turning your firewall off to verify).




Why am I getting a repoRemoteTransportRMI error?


Make sure you didn't do anything illegal, like customize the installation of either the virtualization server or the Tomcat hosting the Alfresco webapp so they're on a path that includes a space character (or any other character that requires URL encoding).

See also: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4273532
and:  http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4496398




Why is the virtualization server being denied RMI access to the AVM?


Currently, if you change the password for the 'admin' user in the Alfresco webapp, you must also propagate that change to  $VIRTUAL_TOMCAT_HOME/conf/alfresco-virtserver.properties  by hand.  You can find the relevant properties at the end of this file.  By default, they look like this:

 # Admin level user and password to connect to login to Alfresco server.
#
alfresco.server.user=admin
alfresco.server.password=admin




Why am I seeing 'WARNING: Retrying JNDI connection...'  in the virtualization server log file?


There are two cases:


  1. Recoverable
    If the virtualization server is started before the Alfresco webapp is fully initialized, you'll see this warning. Once the AVM comes online (i.e.: once the Alfresco webapp is fully up), the virtualization server connects to it, and no longer issues the warning.

  2. Non-recoverable
    If this warning is emitted for several minutes *after* the Alfresco webapp is fully up, then you've got a configuration or permission error. The most typical problems are that your virtualization server is not actually trying to connect to the right IP address/port, or that you've changed the admin password in the Alfresco webapp but you've forgotten to make a corresponding change to the $VIRTUAL_TOMCAT_HOME/conf/alfresco-virtserver.properties file.  The relevant properties look like this:
 # Admin level user and password to connect to login to Alfresco server.
#
alfresco.server.user=admin
alfresco.server.password=admin

Also, to rule out any possible browser caching issues, if you do get a blank page, be sure you hit refresh after the virtualization server stops warning about not being able to connect yet.




What ports does the virtualization server need?


In order for the virtualization server to work properly, firewalls must not block the ports it requires to operate.  Here are the ports that must be open by default:

 Port       Description
---------+----------------------------------------------------------------------
  50500  |  On alfresco webapp machine
         |  RMI registry port for all remote services.
         |  Within virt server config conf/alfresco-shared.properties
         |  see alfresco.rmi.services.port
--------| -------------------------------------------------------------------------------
         |  On virt server machine.
  50510* |  RMI registry port for Virtualisation server used to enable the alfresco webapp
         |  to do callbacks
         |  Within virt server config: conf/alfresco-virtserver.properties
         |  see alfresco.virtserver.jmxrmi.port property
---------+----------------------------------------------------------------------
   8180  |  On virt server machine.
         |  HTTP port for servicing requests of virtualized content (e.g.: html)
         |  See also:  conf/server.xml
         |  Within virt server config, see:
         |  conf/alfresco-virtserver.properties:alfresco.virtserver.http.port
---------+----------------------------------------------------------------------

  • Older versions of Alfresco use 50501 for the virtual server rmi registry. This was changed in V2.2.1, and 3.0.0 to 50510 to avoid a clash with the AVM remote service.

What features are being planned?


Information regarding the development/availability of new features will be announced via Alfresco's normal channels, but in case you're wondering 'what's next?', the following sections should provide some idea.   As always, your feedback is greatly appreciated.




Security Enhancements


Currently, anybody can browse anything via the virtualization server; no Alfresco-specific login is required (though your webapp you're virtualizing may insist upon a login).   For intranet-based content development, that's often OK because webapps are mostly public anyhow, and services that are only exposed on an intranet are typically not subject to the same level of security threat that is seen on a live Internet site.

However, real threats do exist within intranets, and malicious (or careless) users should be thwarted by the virtualization server just as they are by the Alfresco webapp.   The plan is to make it so that if you access the virtualization server via clicking on an asset within the webapp, you won't be prompted to login again (nor will you be challenged if you click on any of the links within this or any other page). However, if you just click on a link to an asset within the virtualization server *without* having authenticated at some point, then you'll get temporarily redirected to the login page; if you have provided proper credentials, you'll get to surf around again, as usual.  All actions on the virtualization server will then be done *as* the logged in user.  Credentials will be bound to the connection internally, and maintained across connections on the client side via cookies.

This use of cookies implies does not conflict with the idea that a user should be able to see multiple virtualized vies of *content* simultaneously, but it does mean that the system won't virtualize that user's credentials  (i.e.:  you won't be able to authenticate the same identity in two different ways on the same browser at once).   That does not seem like a very severe constraint, and the alternative (pushing opaque authentication tokens into the FQDN) would lead to other problems, such as excessively long FQDNs, and bookmarks with embedded 'identity tokens'.   Identity is a component of the runtime environment, not the 'location' specified by the URL... thus from a design perspective you are forced to use ephemeral data that is out-of-band with respect to the URL  (and cookies are are the only mechanism that browsers afford).

As a nice side-benefit, this enhancement will eliminate the common configuration error of changing the admin password within the Alfresco webapp, but neglecting to update the alfresco.server.password property within  $VIRTUAL_TOMCAT_HOME/conf/alfresco-virtserver.properties 

Security will be turned on by default.




Support for virtualizing snapshots within staging


The AVM supports a very powerful feature that lets you take 'snapshots' of websites are they were at any given moment in time.   However, the 2.0.1 GUI does not expose the capability that the virtualization server has to virtualize these snapshots.  All that remains is to provide GUI support (everything else works).

Assuming you have a web project named 'mysite', and your virtualization domain looks like:

  <some-hyphen-encoded-address>.ip.alfrescodemo.net

For example:

  192-168-1-5.ip.alfrescodemo.net

Then if you're browsing the 'version 4'  snapshot in staging, the URL will look like:

  http://mysite.www--sandbox.version--v4.192-168-1-5.ip.alfrescodemo.net:8180/...

Note also that the following URLs will be equivalent:

  http://mysite.www--sandbox.version--v-1.192-168-1-5.ip.alfrescodemo.net:8180/...
  http://mysite.www--sandbox.192-168-1-5.ip.alfrescodemo.net:8180/...

This is because version '-1' and the 'HEAD' version are the same.

Bringing an snapshot on/offline in terms of the virtualiation server will probably be a privileged operation within the Alfresco webapp, because of the resources consumed;   these settings will be persistent across restarts.  Just like when you're browsing around in a 'HEAD' version, you'll be able to surf around, and set bookmarks (you'll also be able to browse multiple snapshots and/or HEAD versions simultaneously).




Multiple virtualization servers per webapp


It would be nice if the Alfresco webapp were able to be configured so that the load of servicing different virtualized webapps could be distributed amongst a number of servers on a different machines.  A related feature is the ability to make finer-grained mappings between assets in the AVM and their associated content server using custom logic (e.g.: regex rules, logic encapsulated within beans, etc.).   For a bit more discussion, see    this forum post.




Virtualizing other content servers (e.g.: Apache2)


Currently, Alfresco's only virtualized content server is Tomcat  5.5 -based.  Support for a deep integration of Apache2 would be the first step to providing first-class support to creating an efficient collaborative environment for  PHP developers.  As things stand, PHP developers can virtualize specific areas by creating virtual hosts 'by hand' within Apache, setting up CIFS, then making Apache act as reverse proxy for Tomcat in all cases where the site needs to do something webapp-ish/jsp-ish.  Right now, PHP developers don't have a good solution for previewing items within a workflow 'in context'.  If Apache2 were to be virtualized and integrated, that (and more) should be possible.


What version of tomcat is supported


The virtualisation server requires version 5.5 of Tomcat


ESB integration


The Alfresco webapp generates various JMX events to keep the virtualization server's set of virtual contexts in sync.  Ideally, these (and other) events should go over an ESB (perhaps Mule), so that as 3rd-party applications are integrated, there will be a common and maintainable framework for everyone to use.  The first client of this enhanced architecture will almost certainly be the virtualization server(s).


Attachments

    Outcomes