MAGE-ML Metadata Extractor

Document created by resplin Employee on Jun 6, 2015Last modified by alfresco-archivist on Aug 31, 2016
Version 2Show Document
  • View in full screen mode

Obsolete Pages{{Obsolete}}

The official documentation is at:

Community Contribution Ideas
Back to Project Proposals

Project Description

The importance of the XML language in clinical-genetic domains

The importance of the XML language in clinical-genetic domains is a well known challenge in the biomedical world. The adoption of XML on biomedical information interoperability has involved great changes representing clinical and genomic data from different aspects:

  1. allowing the interoperability among heterogeneous systems in order to exchange and share  clinical and genetic information;
  2. allowing the definition and representation of clinical and genetic information in a structured way.

The MAGE-ML standard for microarray (genetic) data representation

Many international standards have been developed in order to define common structure for clinical and genetic documents. The most important and well known by health care providers, labs and researchers for microarray genetic data is MAGE-ML (for further details please visit
MAGE-ML (Microarray and Gene Expression � Model Language) is a UML model implemented in XML for microarray expression experiments. Microarray experiments are executed by genetic labs and represent a particular kind of genetic experiment in order to evaluate the gene expression profiles in individuals. MIAME protocol points out the Minimum Information About Microarray Experiments, that is it individuates a set of information (metadata) that each lab conducting a microarray experiment has to provide in order to allow some other labs to replicate the experimental conditions.

An example of MAGE-ML matadata

An example of a MAGE-ML document follows (only few lines�):


Project Requirements/Objectives

Project requirements

Inside the MAGE consortium it has been developed a set of open source Java APIs allowing the manipulation of XML documents validated against a specific MAGE-ML schema and the extraction of both experimental and clinical information. MAGE-ML Java APIs are based on other open source Java packages for accessing and manipulating XML documents via DOM, SAX and XPath.

Project objectives

Objectives of the project could be the following:

  1. Recognize MAGE-ML documents natively (automatic recognition based on specific metadata);
  2. Extract some specific metadata from MAGE-ML documents. The extraction of metadata could be implemented in both a static and dynamic way (please see the paragraph about discussion about the implementation approach above);
  3. Associate some specific metadata to MAGE-ML documents in order to allow users (lab users) to execute advanced searches on MAGE-ML metadata.

Can you elaborate on what an the advanced searches might be and to some high level extent, what they involve?

Initial Project Scope

The project scope in general is to extend the core Alfresco classes for metadata extraction in order to provide Alfresco lab users with the possibility of associating MAGE-ML metadata to specific XML documents.

At the very beginning, we could think about a pre-defined and static list of MAGE-ML metadata linked to a document.

Afterwards, a more challenging goal could be allowing users to define in a dynamic and flexible way (a configuration file) the list of MAGE-ML metadata they are interested in.

Discussion of Design/Implementation Approach

From the implementation point of view we could think about the following steps:

  1. Extending the set of Java classes of the Alfresco repo creating a new class � similar to the existing ones such as � called This new class will be able to extract a set of predefined metadata in a static way;
  2. Defining an XML configuration file that contains a predefined list of MAGE-ML metadata a common lab user could be interested in to extract;
  3. Defining an empty XML configuration file that will contain a dynamic list of MAGE-ML metadata a common lab user could be interested in to extract in addition to the predefined metadata.

The metadata extraction policy is under evaluation, so we could discuss about it to individuate the best and more suitable strategy�--

sergio 10:29, 25 September 2006 (BST)


Please use the following format:

Comment Title

<Comment Content>

<Your Name and/or contact info>

Sample Comment 1

This is a sample comment...

Joe Bloggs

Sample Comment 2

This is another sample comment...

Joe Bloggs