Content Transformation Limits 4.0

Document created by resplin Employee on Jun 6, 2015
Version 1Show Document
  • View in full screen mode

Obsolete Pages{{Obsolete}}

The official documentation is at: http://docs.alfresco.com



This is the historical version of the Content Transformation Limits page replaced by the 4.2d version.




Links


Content Transformation Limits (4.2 version)

Content Transformation Debug

Optional Thumbnail Creation


Introduction


In the 4.0 nightly community builds, extra transformation configuration was added (code from 4.0.1 and 3.4.8 service packs) to add limits to transformations from one mimetype to another. This was to improve system performance and provided better feedback on the Share preview page.

Frequently there is more than one transformer available between two mimetypes. Initially all transformers are used in turn and then the one with the fastest average time is used. Depending on which files are initially transformed, it is possible to end up with the situation where a generally slower transformer is always used because its average is still lower than the time taken the first time by what we know to be the fastest transformer.

With limits it is possible to disable selected transformations were it is known there are better choices, to avoid doing transformations that are known to be too costly (such a previews of large files) or only perform limited transformations (such as previews of the first few pages).



The initial configuration included the following. However this is a general mechanism which may be used to limit other transformations.


  • Preview of plain text files is limited to 10 MB.
  • Preview of Microsoft Excel (xls and xlsx) files is limited to 1MB.
  • Preview of PDF is limited to 1.25 MB.

Above these levels the Share preview page gives the option to download the file.


  • The creation of thumbnails for the Document Library from plain text is much faster as only the first page of the intermediate PDF format is created.
  • Indexing of plain text files is much faster, as OpenOffice and JOD transformers are not used to simply change the character encoding of text.
  • A general timeout on most (excluded OpenOffice) transformations after 2 minutes.



A separate Optional Thumbnail Creation mechanism allows one to disable the creation of thumbnails in general and disable thumbnail creation for a given mimetype based on the size of the file. This second mechanism is based on a customisation that has been applied on a number of customer sites. Unless you have no need for thumbnails at all or already have this customisation, it is probable better to use the approach describe on the rest of this page.



Content Transformation Debug describes how to monitor transformer activity. When adding new configuration it is best to use trace level logging, as this includes mimetype values needed in the configuration.


Detail


Transformation Limits


There are six possible limits. As can be seen there are really three pairs (time, size and page) of values and only one of each pair may be set on a transformer or thumbnail definition.


  • Maximum source size (KB) If a source file is larger than the specified size the transformation will not be attempted.
  • Timeout (ms) A timeout on reading data from the source file.
  • Maximum pages The maximum number of pages that may be read (or created) before an Exception is thrown.
  • Read limit size(KB) A read limit in terms of size after which end-of-file (EOF) is returned.
  • Read limit time (ms) A read limit in terms of time after which end-of-file (EOF) is returned.
  • Page limit A read (or creation) limit in terms of pages.

Timeout and Read limit time only works with transformers that don't bulk read their source data, as these limits are enforced by a modified InputStream that either throws and Exception or returned EOF early. The OpenOffice and JOD transformers cannot make use of these options currently as they create temporary files to work with, but in the case of the JOD transformers, they also have their own timeout set to two minutes by default.

With Read limit size and Read limit time, the source file is effectively truncated, so should only be used with mimetypes where it is acceptable to truncate the file without resulting in an invalid format error.

The Maximum page and Page limit values are only  meaningful to transformers that know about pages. Initially this was only the TextToPdfContentTransformer. In such cases, when a transformer is asked if it can perform a transformation it will ignore the 'Maximum  source size' value if either 'page' value is set. This allows the  transformer to affectively truncate the source file, so the actual size is not meaningful.

Although  the Page limit implies a number of pages, transformers may interpret the value in their own terms, such as  characters, words, blocks or even pages.



The following debug shows a transformer is not available because of a Maximum source size limit has been set (> 1 MB) and EOF has been returned after a single page of PDF was created. It also shows the use of other transformers between intermediate mimetypes.



DEBUG 7         store://2012/2/13/15/18/07047bce-e523-4e78-88f5-17dc39434b35.bin
DEBUG 7         txt  png  24.5 MB ContentService.transform(...)
DEBUG 7         **a) transformer.complex.Text.Image<Complex>> 0 ms
DEBUG 7         --b) transformer.PdfToImage<Failover>>        > 1 MB
DEBUG 7.1       txt  pdf  24.5 MB transformer.PdfBox.TextToPdf
DEBUG 7.1       Page limit (1) reached. Returning EOF
DEBUG 7.2       pdf  png  1.8 KB transformer.complex.PDF.Image<Complex>>
DEBUG 7.2.1     pdf  png  1.8 KB transformer.PdfToImage<Failover>>
DEBUG 7.2.1.1   pdf  png  1.8 KB failover.transformer.PdfRenderer.PdfToImage
DEBUG 7.2.2     png  png  50.8 KB transformer.ImageMagick<Proxy>>
DEBUG 7         Finished in 664 ms

Setting Transformer Limits


Limits may be applied to a transformer as a whole by setting a property on a transformer's bean. In the following case the maxSourceSizeKBytes property. Only one of each pair of limits may be set.



<bean id='transformer.PdfBox.TextToPdf'
      class='org.alfresco.repo.content.transform.TextToPdfContentTransformer'
      parent='baseContentTransformer' >
           . . .
  <property name='maxSourceSizeKBytes'>
    <value>${content.transformer.PdfBox.TextToPdf.maxSourceSizeKBytes}</value>
  </property>
</bean>

The property names are:


  • timeoutMs
  • readLimitTimeMs
  • maxSourceSizeKBytes
  • readLimitKBytes
  • pageLimit
  • maxPages

The parent bean baseContentTransformer (in content-services-context.xml) sets default values that may be overridden in individual transformer beans. The actual actual values set by the baseContentTransformer come form the following Alfresco global properties. Note the 2 minute timeout is the only one set.



content.transformer.default.timeoutMs=120000
content.transformer.default.readLimitTimeMs=-1
content.transformer.default.maxSourceSizeKBytes=-1
content.transformer.default.readLimitKBytes=-1
content.transformer.default.pageLimit=-1
content.transformer.default.maxPages=-1

Setting Transformer Limits by mimetype


Limits may be applied to a transformer so that they only apply to a given combination of source and target mimetypes. As can be seen from the following example, a mimetypeLimits property may be set. To simplify configuration mimetypes may be replaced by '*' to indicate all other mimetypes. A lookup of the source mimetype takes place before the target minetype. Any of the six limits may be set. The following example sets maxSourceSizeKBytes limits for:


  • text to pdf
  • text to text (to transform encoding) is disabled as a 0 bytes value is used.
  • xlsx to pdf

This also has an impact on complex transformers that go via pdf to swf as this transformer may not always be available depending on the source size.



<bean id='transformer.OpenOffice'
      class='org.alfresco.repo.content.transform.ProxyContentTransformer'
      parent='baseContentTransformer'>
     . . .
  <property name='mimetypeLimits'>
    <map>

      <entry key='text/plain'>
        <map>
          <entry key='application/pdf'>
            <bean class='org.alfresco.service.cmr.repository.TransformationOptionLimits'>
              <property name='maxSourceSizeKBytes'>
                <value>${content.transformer.OpenOffice.mimeTypeLimits.txt.pdf.maxSourceSizeKBytes}</value>
              </property>
            </bean>
          </entry>
           <entry key='text/plain'>
            <bean class='org.alfresco.service.cmr.repository.TransformationOptionLimits'>
              <property name='maxSourceSizeKBytes'><value>0</value></property>
            </bean>
          </entry>
        </map>
      </entry>

      <entry key='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'>
        <map>
          <entry key='application/pdf'>
            <bean class='org.alfresco.service.cmr.repository.TransformationOptionLimits'>
              <property name='maxSourceSizeKBytes'>
                <value>${content.transformer.OpenOffice.mimeTypeLimits.xlsx.pdf.maxSourceSizeKBytes}</value>
              </property>
            </bean>
          </entry>
        </map>
      </entry>
     . . .
    </map>
  </property>

Setting Thumbnail Limits


The creation of thumbnails is a bit of a special transformation case. For example there are extra rules about the size of images and how they may be cropped. Each class of thumbnails (such as the thumbnails used by Share's Document Library) has its own ThumbnailDefinitions element in the thumbnail-service-context.xml configuartion file. These extra rules are combined with the transformer's own options when creating a thumbnail. It is also possible to add Transformer Limits to ThumbnailDefinitions and they too will be combined with the transformer's own limits. As a result it is possible to add limits for specific usages. For example setting a 'Page Limit' or 'Read Limit' (which restricts how much is read from the source file) when creating small thumbnails.

The following is part of the configuration for Share's Document Library thumbnails. As can be seen any of the transformation limits may be specified.



<bean class='org.alfresco.repo.thumbnail.ThumbnailDefinition'>
  <property name='name' value='doclib' />
  <property name='mimetype' value='image/png'/>
  <property name='transformationOptions'>
    . . .
    <bean class='org.alfresco.repo.content.transform.magick.ImageTransformationOptions'>
      <property name='readLimitTimeMs' value='${system.thumbnail.definition.doclib.readLimitTimeMs}' />
      <property name='maxSourceSizeKBytes' value='${system.thumbnail.definition.doclib.maxSourceSizeKBytes}' />
      <property name='readLimitKBytes' value='${system.thumbnail.definition.doclib.readLimitKBytes}' />
      <property name='pageLimit' value='${system.thumbnail.definition.doclib.pageLimit}' />
      <property name='maxPages' value='${system.thumbnail.definition.doclib.maxPages}' />
      . . .
    </bean>
  </property>
  . . .
</bean>

The actual values are held in Alfresco global properties. Note only the page limit is set. Only the first page is created of any intermediate format when creating an image, as all other pages are discarded anyway.



system.thumbnail.definition.doclib.readLimitTimeMs=-1
system.thumbnail.definition.doclib.maxSourceSizeKBytes=-1
system.thumbnail.definition.doclib.readLimitKBytes=-1
system.thumbnail.definition.doclib.pageLimit=1
system.thumbnail.definition.doclib.maxPages=-1

Handling Transformation failures


If   there are no transformers available between two mimetypes, this may be   handled in a variety of ways. For example Share's Document Library may   display an alternate place holder and the preview page will indicate   that it was not possible to preview the content and offer a download   link.

If an Exception is   thrown by a transformer, this is handled as it would have been in   previous versions. In the case of Share's preview page, a blank preview will be shown. This is why it is important to specify limits on a complex transformer if there are limits on any of the component transformers.

If a transformer fails for any reason (an exception has been thrown or there is a timeout), the failed transformation time is set at 60 seconds (considered to be a long time), in the hope of making the transformer's average time worse. This makes it less likely that the transformer will be selected again unless other transformers are failing too.


Combining Limits


When limits for a transformer, any component transformers or thumbnail definition are combined, the smaller value of each limit pair (time, size or page) is used. The other value in the pair becomes unlimited, as in theory it should never be reached. An unlimited limit value represented by a negative number (-1 is normally used) also loses out to a value greater than or equal to 0.

For example if we combine a Maximum Source Size of  1 MB with a Read Limit size value of 3 MB, the  combined limits have a Maximum Source Size of 1 MB and an unlimited Read Limit size.



As already noted, when a transformer that 'understands' page limits is asked if it can perform a transformation it will ignore the 'Maximum  source size' value if either 'page' value is set. It does this because the page limit is effectively truncating the source file, so the actual size is not meaningful.


Complex transformer Limits


It is generally better to set 'Maximum source sizes' on both individual and complex transformers so that resources are not wasted  and to provide a better user experience. Complex transformers use other transformers to go between intermediate formats. If a 'Maximum  source size' is not set on a complex transformer, there is a risk that a number of transformations will take place only to fail after one or  more intermediate transformations. For example the initial  configuration included a 1.25 MB limit on PDF to SWF. As a result the complex transformers between Excel and SWF (which go via PDF) set a 1 MB limit on the Excel as the intermediate PDF is general only a little larger than the original Excel.

In addition to any limits set on a complex transformer, the 'Maximum source size' also takes the value from the first component transformer into account.


Failover transformer limits


In addition to any limits set on a failover transformer, the  'Maximum source size' is also taken into account of all component transformers. The highest component value is effectively combined with the failover transformers own value.


Transformation Rules


Rules that include transformation will fail if there are no transformers available or the transformer fails for some other reason. Historically rules have failed if the transformation failed. Unless one is careful setting limits may make these rules fail more frequently.

It should be noted that rules that run synchronously with the initial action such as copying a file into a folder run within the same transaction, so if the transformation fails because there are no transformers available, the file will not appear to be copied into the folder. Asynchronous rules however run in their own transaction so the file will be copied to the folder but the transformation will not appear to take place.
Content Transformation
3.4
4.0

Attachments

    Outcomes