Skip navigation
All Places > Alfresco Content Services (ECM) > Blog > 2012 > June
2012
Alfresco Share has a number of features to protect against XSS (Cross Site Scripting) attacks, session hijacking and similar. One of the most aggressive features is the automatic processing of 3rd party HTML to 'sanitise' or 'strip' out unwanted HTML tags and attributes before rendering in the page. By 3rd party HTML, I mean any HTML content that is displayed in Share that is sourced from a node content stream - such as a Wiki page, Blog post or Discussion post. So any content that may be user edited or could come from any source (not just Share itself!)



This is a well tested feature that handles all commonly known XSS attack holes and many less well known ones - including all the attack vectors listed here: http://ha.ckers.org/xss.html



One of the downsides to this, is the stripping of some otherwise useful HTML attributes and elements is mainly to support issues in legacy browsers such as IE6 and IE7. Consider the STYLE attribute - not a problem attribute you would assume, how could setting a STYLE cause an XSS attack?! Well in IE8, FireFox, Safari, Chrome etc. it can't. But in IE6/7 Microsoft in their wisdom allowed JavaScript to be inserted into a STYLE attribute (called 'CSS Expressions' - a better name would have 'CSS Hacks'). This is a potential XSS hole that only affects those legacy browsers - but the HTML stripping process cannot rely on your browser agent (which of course could be faked) so must always assume the worst and strip those STYLE attributes.



For the majority Alfresco users who discarded IE6 (or even just IE...) long ago, why should they be punished with this limitation? And it is an annoying limitation, as most of the in-line editing capabilities of TinyMCE and other in-line editors that can potentially be used with Alfresco use STYLE attributes to apply formatting to much of their generated content.



In Alfresco 3.4.9/4.0.2 and onwards, it is now possible to fully configure the black/white list of HTML tags and attributes that the HTML stripping process will use.



This is the default configuration this is applied OFTB:

      <!-- the set of HTML tags considered safe for rendering when mixing with existing client-side output -->

      <!-- NOTE: define all tags in UPPER CASE only -->

      <property name='tagWhiteList'>

         <set>

            <value>!DOCTYPE</value>

            <value>HTML</value>

            <value>HEAD</value>

            <value>BODY</value>

            <value>META</value>

            <value>BASE</value>

            <value>TITLE</value>

            <value>LINK</value>

            <value>CENTER</value>

            <value>EM</value>

            <value>STRONG</value>

            <value>SUP</value>

            <value>SUB</value>

            <value>P</value>

            <value>B</value>

            <value>I</value>

            <value>U</value>

            <value>BR</value>

            <value>UL</value>

            <value>OL</value>

            <value>LI</value>

            <value>H1</value>

            <value>H2</value>

            <value>H3</value>

            <value>H4</value>

            <value>H5</value>

            <value>H6</value>

            <value>SPAN</value>

            <value>DIV</value>

            <value>A</value>

            <value>IMG</value>

            <value>FONT</value>

            <value>TABLE</value>

            <value>THEAD</value>

            <value>TBODY</value>

            <value>TR</value>

            <value>TH</value>

            <value>TD</value>

            <value>HR</value>

            <value>DT</value>

            <value>DL</value>

            <value>DT</value>

            <value>PRE</value>

            <value>BLOCKQUOTE</value>

            <value>BUTTON</value>

            <value>CODE</value>

            <value>FORM</value>

            <value>OPTION</value>

            <value>SELECT</value>

            <value>TEXTAREA</value>

         </set>

      </property>

      <!-- The set of HTML tag attributes that are to be removed before rendering -->

      <!-- NOTE: define all attributes in UPPER CASE only -->

      <!-- IMPORTANT: JavaScript event handler attributes starting with 'on' are always removed -->

      <property name='attributeBlackList'>

         <set>

            <value>STYLE</value>

         </set>

      </property>

      <!-- The set of HTML tag attributes that are considered for sanitisation i.e. script content removed -->

      <!-- NOTE: define all attributes in UPPER CASE only -->

      <property name='attributeGreyList'>

         <set>

            <value>SRC</value>

            <value>DYNSRC</value>

            <value>LOWSRC</value>

            <value>HREF</value>

            <value>BACKGROUND</value>

         </set>

      </property>


As you can see it's quite a list. The import config for STYLE attribute processing is here:

      <property name='attributeBlackList'>

         <set>

            <value>STYLE</value>

         </set>

      </property>


So simply override the black list in the stringutils bean in your custom-slingshot-application-context.xml file - generally found in \tomcat\shared\classes\alfresco\web-extension - as detailed in previous blog posts:

<?xml version='1.0' encoding='UTF-8'?>

<!DOCTYPE beans PUBLIC '-//SPRING//DTD BEAN//EN' 'http://www.springframework.org/dtd/spring-beans-2.0.dtd'>



<beans>



   <!-- Override HTML processing black list -->

   <bean id='webframework.webscripts.stringutils' parent='webframework.webscripts.stringutils.abstract'

         class='org.springframework.extensions.webscripts.ui.common.StringUtils'>

      <property name='attributeBlackList'>

         <set></set>

      </property>

   </bean>



</beans>


Restart the Share web-application and STYLE attributes will no longer be removed by Share.


tl;dr



Many open source contributor agreements (including the Alfresco Contributor Agreement) do not involve any reassignment of copyright - instead they grant the project maintainer a license to use, modify and distribute the contribution.

The Full Picture



I was recently chatting with Jennifer Venables (Alfresco's awesome general counsel) and she mentioned something in passing that I hadn't realised before, and that I'm guessing many of you may not know either.  I had always been under the impression that most, if not all, contributor agreements (whether for open source projects or not) involved the individual contributor handing over copyright on their contribution to the project maintainer (for example Alfresco Software Inc., in the case of contributions to the open source Alfresco content management system).  If you're like me, the thought of handing over 'your babies' to someone else is not particularly appealing, and in general I've made a point of not becoming involved in projects that require me to give up rights to my own creations.



So Jennifer's off-hand comment took me a little by surprise, and after she patiently explained how contributor agreements typically work, I'm looking at them in a more positive light.  Specifically, the Alfresco Contributor Agreement (and those of some other open source projects) do not involve any reassignment of copyright - you as the creator of a particular contribution retain full ownership of the copyright of that contribution.  Instead, the project maintainer is simply requesting that you license your intellectual property to them, so that they can also use, modify and distribute it - rights you've probably granted to the public anyway (at least if you've chosen one of the more popular open source licenses).



It's also worth noting that the Harmony project is an attempt by the wider open source community to try to standardise and clarify contributor agreements, as there seems to be a lot of confusion around them.  Jennifer has been keeping an eye on their progress on behalf of Alfresco, as their initiative would help to clarify what is often (and definitely was for me) a confusing legal mechanism.



<disclaimer aka IANAL>Now I'm about as far from a lawyer as it's possible to get, so everything I've said here you'd be strongly advised to double check with someone who has legal expertise in this area.</disclaimer>



What I can say with certainty is that I had completely misunderstood the intent of Alfresco's Contributor Agreement and (more importantly) the legal basis upon which it operates, and that misunderstanding has prevented me in the past from contributing to other open source projects.  I guess I'll chalk this up to 'sometimes, you just don't know what you don't know'!

Introduction

In the past three posts in this series I've introduced some of the new custom FreeMarker directives that we've added to Surf and how we're going to use them to make it easier to customize the client-side JavaScript widgets that are instantiated on each page in Share. In the post I'm going to try to explain some of the less obvious changes that we've made and some of the behaviour that should be expected as a result.

 

The “group” attribute

In the “documentlist.get.html.ftl” example of the new boiler-plate template you may have noticed the “group” attribute in the <@link>, <@script>, <@inlineScript> and <@createWidgets> directives.  This attribute has an important bearing upon the order in which dependency requests and JavaScript code are output into the rendered HTML page.

 

Surf now supports the ability to aggregate multiple files into a single resource to reduce the number of HTTP requests made by the client to increase page loading performance. The “group” attribute is used to determine how dependencies are aggregated into the generated resources. Managing the groups is important because once generated a resource is cached on the server to improve response times to subsequent requests so a balance to achieve optimum performance. If a single group were to be used then only one HTTP request would be made per page, but the performance gained through reduced requests would be lost to server side aggregation for each request.

 

PLEASE NOTE: At this moment in time aggregation is not a major concern as it is NOT recommended to be used (and may not fully work) until all of Share is converted to the boiler-plate code - this is definitely one for the future road-map.

 

However, in order for the same Share code to be able to support different Surf operation modes the “group” attribute is also applied when processing individual dependency requests.  The only thing that you need to know is that groups are output in the order they are requested and that all the dependency requests and code are output for each group in turn.

 

So the output on the HTML page for:

<@script src=”/aaa.js” group=”1”/>
<@script src=”/bbb.js” group=”2”/>
<@script src=”/ccc.js” group=”3”/>
<@script src=”/ddd.js” group=”2”/>
<@script src=”/eee.js” group=”1”/>

...will be:

<script src='/aaa.js'></script>
<script src='/eee.js'></script>
<script src='/bbb.js'></script>
<script src='/ddd.js'></script>
<script src='/ccc.js'></script>


Note that '/eee.js' is the second requested import despite appearing last in the list and that '/ccc.js' is last despite it appearing 3rd. This is because all of group “1” is output before any of group “2”, and all of group “2” is output before group '3”.

Mixing <@script> and <@inlineScript>

Let's say you have files 'A.js' and 'B.js'  and you have a WebScript template containing the following:

<@script src='${url.context}/res/A.js' group='calc'/>
<@inlineScript group='calc'>
// A comment between imports
</@>
<@script src='${url.context}/res/B.js' group='calc'>

When you the final page is rendered in the source you might see an import like this:

<script type='text/javascript' src='/share/res/A.js'></script>
<script type='text/javascript'>//<![CDATA[
   // A comment between imports
//]]>
</script>
<script type='text/javascript' src='/share/res/B.js'></script>


Note that the JavaScript from the <@inlineScript> directive is placed between the two imports because they are in the same group (the same is true for any custom directive that outputs JavaScript, e.g. the <@createWidgets> directive).

Configuring Surf to Aggregate Dependencies

Should you wish to enable the use of aggregate dependencies then you will need to make a Surf configuration change. By default the capability is disabled in Surf and is unlikely to ever enabled by default in future releases of Alfresco. However, once all of the Share WebScripts have been converted to the 'boiler-plate' template then it will be possible to run in this mode - HOWEVER, any 3rd party modules or add-ons that have been applied may not support this feature.

 

To enable it you simply need to set the following line within the Surf configuration (this can be found in the “webapps/share/WEB-INF/surf.xml”):

<web-framework>
   …
   <aggregate-dependencies>true</aggregate-dependencies>
   …
</web-framework>

 

Aggregated Dependency Output

If you do enable this capability then you can expect the following behaviour to occur. If the  file 'A.js' contains:

var a = 1;

and the file 'B.js' which contains:

var c = a + b;

...and you have a WebScript template containing the following:

<@script src='${url.context}/res/A.js' group='calc'/>
<@inlineScript group='calc'>
var b = 1;
</@>
<@script src='${url.context}/res/B.js' group='calc'>

When you the final page is rendered in the source you might see an import like this:

<script type='text/javascript' src='/share/res/20146f7250123ea2437a0d16d5c323.js'></script> <!-- Group Name: 'calc' -->

And if you viewed the source of that file you'd see:

var a = 1;
var b = 1;
var c = a + b;


(NOTE: The contents would actually be compressed, but there's not a lot of point in showing the compressed content in this blog!)

 

The resource name is an MD5 checksum generated from the combined source code (NOTE: I made the one up as an example purely to illustrate the point). The generated resource is cached on the server so that it doesn't need to be generated each time. If extra content is added to the group (even dynamically by a module) then the resource will be regenerated and the checksum will naturally change to ensure that the browser requests a different file.

Debugging

The '<client-debug>' setting (found in 'webapps/share/WEB-INF/classes/alfresco/share-config.xml' will still work when enabled. An aggregated resource will still be produced but each aggregated file will be separated by a comment similar to this:

/*Path=A.js*/


This will allow you to determine the source file in which errors are occurring whilst debugging.

The Output Directives

The current released version of Alfresco Share relies on the use of the '${head}'FreeMarker model property to output all the dependency requests generated on the first pass of all the WebScript '*.head.ftl' files. This property is populated during this first pass and then output in <head> HTML  element defined in the 'alfresco-template.ftl' template. If you look in the current committed version of that template you'll still see a reference to that property (which is still used to support legacy '*.head.ftl' files and any dependencies defined through any <dependencies> elements in extension module configuration) but also two new directives:  <@outputJavaScript/> and <@outputCSS/>.

 

As their names suggest these directives are used to output the JavaScript and CSS dependency requests made via the <@link>, <@script>, <@inlineScript> and <@createWidgets> directives.  Without wanting to go into too much detail at this stage the 'output' directives act as placeholders in extensibility model and accumulate requests to output content as the remainder of the Surf page is processed - only when the page has completely been processed is their final content rendered into the output stream.

 

Towards the end of the 'alfresco-template.ftl' file you will also see a commented out directive <@relocateJavaScript>. As it's name suggests the purpose of this directive is to change the location in the page where JavaScript output is rendered. It is suggested that moving JavaScript to the end of a page is desirable as it increases page performance.  It's only possible to use this directive if there is no hard-coded <script> elements on the page that depend on imported files or JavaScript dependencies output via the ${head} property. When uncommented though you will see that it produces a very clean source file for your page with all the JavaScript located at the end.  The <@relocateJavaScript> directive is something else we've created for the future and is unlikely to be used in the next release (although we'll probably make it configurable for those that wish to use it rather than needing to edit the template file!)

Filter Blog

By date: By tag: