Search Documentation

Document created by resplin Employee on Jun 6, 2015
Version 1Show Document
  • View in full screen mode

Obsolete Pages{{Obsolete}}

The official documentation is at: http://docs.alfresco.com



Search
The repository can be searched in two ways.


  1. Directly against the node service
  2. Against an index using the Searcher component




XPath search using the node service


The node service has two methods that support XPath style searches.

/**
* Select nodes using an xpath expression.
*
* @param contextNode - the context node for relative expressions etc
* @param XPath - the xpath string to evaluate
* @param parameters - parameters to bind in to the xpath expression
* @param namespacePrefixResolver - prefix to namespace mappings
* @param followAllParentLinks - if false '..' follows only the primary parent links, if true it follows all
* @return a list of all the child assoc relationships to the selected nodes
*/
public List<ChildAssocRef> selectNodes(NodeRef contextNode,
                                        String XPath,
                                        QueryParameterDefinition[] parameters,                                
                                        NamespacePrefixResolver namespacePrefixResolver,
                                        boolean followAllParentLinks);

/**
* Select properties using an xpath expression
*
* @param contextNode - the context node for relative expressions etc
* @param XPath - the xpath string to evaluate
* @param parameters - parameters to bind in to the xpath expression
* @param namespacePrefixResolver - prefix to namespace mappings
* @param followAllParentLinks - if false '..' follows only the primary parent links, if true it follows all
* @return a list of property values
* TODO: Should be returning a property object
*/
public List<Serializable> selectProperties(NodeRef contextNode,
                                            String XPath,
                                            QueryParameterDefinition[] parameters,
                                            NamespacePrefixResolver namespacePrefixResolver,
                                            boolean followAllParentLinks);

Overview


Jaxen is used to evaluate the xpath expression using a custom document navigator. That means a full XPath implementation is available. The down side is that it uses the node service to navigate the node structure, essentially in the same way an XPath expression would be used against and XML DOM model. So whilst it is complete some queries will not be performant; particularly unconstrained full text search.



There are two methods to distinguish selecting only attributes and only elements. At the moment there is no support for selecting a mixture of properties and nodes. The XPath implementation supports the standard $namespace:name variable substitution and has the additional functions required by the JSR 170 specification (which hide some inbuilt functions such as contains())



Function extensions


  1. like (SQL like pattern expressions using ? to match a singale character and % to match a string
  2. contains (Google like full text search - in fact, the Lucene way)
  3. deref (to follow references from reference properties to nodes



To suport these functions the node service interface defines the like() and contains() methods which are optionally supported by node service implementations. The indexing node service supports them fully.

Note: the functions contains() and like() can contain wild card elements at the start of the query but this may have performance issues.

The repository supports nodes with multiple parents. The is a much better alternative to reference nodes and can be used to avoid the limitiations with the deref() function. This raises the issue of a node having mutiple parents. The meaning of '..' in path expression could be 'find all parents' or 'find my primary parent'. This behaviour is controlled using the followAllParentLinks parameter on the methos calls.



XPath expressions are executed in the context of a given node - so that relative xpath expressions can be evaluated.
The store root node is '/'.

A name space prefix resolver is always required. For any XPath implementation there needs to be a way to map from the prefixes used in the XPath expression to the actually URIs that they represent. For example '/alf:space' would need to map 'alf' to the alfresco URI. The name space prefix resolver provides this support (as well as the information to navigate the name space axis, if required).



Parameters are provided as parameter defintions where the default value is used as the actual value. This identifes the fully qualified name and the type used to select the appropriate XPath type. Lists of node and attributes as parameters are not supported. We do not have property types to support this at the moment.

NOTE:


  • Property Objects
  • Collection types?

XPath Functions Available


Functions on Boolean Values


  • boolean
  • not
  • false
  • true

Functions on Numeric Values


  • number
  • ceiling
  • floor
  • round

Aggregate Functions


  • count
  • sum

Context Functions


  • last
  • position

Functions that Generate Sequences


  • id
  • document

Functions on Strings


  • string
  • concat
  • contains
  • normalize-space
  • starts-with
  • string-length
  • substring-after
  • substring-before
  • substring
  • translate

Functions on Nodes


  • name
  • namespace-uri
  • lang




Extension Functions


  • matrix-concat
  • evaluate
  • lower-case
  • upper-case
  • ends-with
  • subtypeOf
  • hasAspect
  • deref
  • like
  • contains
  • first




JCR Functions


  • jcr:like
  • jcr:score
  • jcr:contains
  • jcr:deref

Comparison with JSR 170


There are some differences between JSR 170 and what is provided here:


  1. This is a complete XPath implementation
  2. the like function currently also uses * meaning the same as # (could be fixed by changing the lucene implementation of wild card queries to use ? and %, not ? and *. 
  3. the contains function will look at all attributes on a node and the full text representation of any content
  4. the contains function can be constrained to one attribute and the full text index
  5. the deref function can not be used in a path (as shown in the JSR 170 examples) - Jaxen does not support this
  6. the deref function must be given the full path of the attribute to dereference




Examples


Two simple examples to illustrate use in code.

// A name space resolver is required - this could be the name space service
DynamicNamespacePrefixResolver namespacePrefixResolver = new DynamicNamespacePrefixResolver(null);
namespacePrefixResolver.addDynamicNamespace(NamespaceService.ALFRESCO_PREFIX, NamespaceService.ALFRESCO_URI);
namespacePrefixResolver.addDynamicNamespace(NamespaceService.ALFRESCO_TEST_PREFIX, NamespaceService.ALFRESCO_TEST_URI);

// Select all nodes below the context node
List<ChildAssocRef> answer =  searchService.selectNodes(rootNodeRef, '*', null, namespacePrefixResolver, false);  
// Find all the property values for @alftest:animal   
List<Serializable> attributes = searchService.selectProperties(rootNodeRef, '//@alftest:animal', null, namespacePrefixResolver, false);

Other xpath examples and explanations:

Find all nodes with an @alftest:animal property equal to 'monkey'
'//.[@alftest:animal='monkey']'

Find all nodes directly linked to the current node
'*'

Find all nodes with one node between them and the current node
'*/*'

Find all nodes with two nodes between them and the current node
'*/*/*'

Find all nodes with three nodes between them and the current node
'*/*/*/*'

Find the parents of all nodes with three nodes between them and the current node
'*/*/*/*/..'
This may not be the same as '*/*/*' as nodes have multiples parents.
e.g. Going down we may follow a non primary child relationship and then navigate up the primary child relationship
     We may go up all parent relationships
     (We could control navigating only primary relationships)

Find all nodes below the context node (excluding the context node) 
'*//.'

Follow a named path from the current context node 
'alftest:root_p_n1'

Find all nodes below the context node (excluding the context node) that have an @alftest:animal property
'*//.[@alftest:animal]'

Find all nodes below the context node (excluding the context node) that have an @alftest:animal property equal to 'monkey'
'*//.[@alftest:animal='monkey']'

Find all nodes that have an @alftest:animal property equal to 'monkey'
(This will navigate to all nodes in the store and will have performance issues)
'//.[@alftest:animal='monkey']'

Find all nodes that have an @alftest:animal property equal to the value of the variable $alf:test
'//.[@alftest:animal=$alf:test]'

Find the principal parent or all parents of the current context node
'..'

Find the values of all properties @alftest:animal
Again this will have performance issues as it will visit all nodes and all properties
'//@alftest:animal'

Find the values of all properties @alftest:reference
Again this will have performance issues as it will visit all nodes and all properties
'//@alftest:reference'

Derefernce the node identified by the attributes at /alftest:root_p_n1/alftest:n1_p_n3/@alftest:reference
The second attribute of the deref() function is not used at the moment
'deref(/alftest:root_p_n1/alftest:n1_p_n3/@alftest:reference, )'

Find all nodes in the store that have an attribute @alftest:animal ending with monkey
Again, this will visit all nodes in the repository.
'//*[like(@alftest:animal, '*monkey')]'

Find all nodes in the store that have an attribute @alftest:animal ending with monkey
Again, this will visit all nodes in the repository.
'//*[like(@alftest:animal, '%monkey')]'

Find all nodes in the store that have an attribute @alftest:animal starting with monk
Again, this will visit all nodes in the repository.
'//*[like(@alftest:animal, 'monk*')]'

Find all nodes in the store that have an attribute @alftest:animal starting with monk
Again, this will visit all nodes in the repository.
'//*[like(@alftest:animal, 'monk%')]'

Find all nodes in the store that have an attribute @alftest:animal that equal monk%
Again, this will visit all nodes in the repository.
TODO: check the requirements for escaping here
'//*[like(@alftest:animal, 'monk\%')]'

Find all the nodes with any attribute or content containing 'monkey'
This query will have the worst performance. It visits all nodes and searches an appropriate index for the full text  
and all attribute values. It is much more efficient to use the searcher API.
'//*[contains('monkey')]'

Find all the values of any attribute wher the attrbute or content contains 'monkey'
This query will have the worst performance. It visits all nodes and searches an appropriate index for the full text  
and all attribute values. It is much more efficient to use the searcher API.
'//@*[contains('monkey')]'

Find all the nodes with any attribute or content containing 'mon?ey' e.g. monkey, monaey, ...
This query will have the worst performance. It visits all nodes and searches an appropriate index for the full text  
and all attribute values. It is much more efficient to use the searcher API.
'//*[contains('mon?ey')]'

Find all the values of any attribute wher the attrbute or content contains 'mon?ey'
This query will have the worst performance. It visits all nodes and searches an appropriate index for the full text  
and all attribute values. It is much more efficient to use the searcher API.
'//@*[contains('mon?ey')]'

Similar pattern examples to teh above
'//*[contains('m*y')]'
'//@*[contains('mon*')]'
'//*[contains('*nkey')]'
'//@*[contains('?onkey')]'

Searching using the searcher component


The searcher component decides which indexing service to call for queries against a given store. Each store may support different indexing and different query languages. The default indexing store uses lucene to provide indexing and query support. It supports two languages: lucene and a very limited, but optimised, XPath implementation. The functionality of this xpath expression will be expanded over time.


Optimised XPath langauge


This can be called using the 'xpath' language specifier (case insensitive)

The implementation currently supports the following axes:


  • child
  • descendant
  • descendant-or-self
  • parent
  • self

It does not currently support the attribute axis or predictes.
These are next on the road map.

Parameterisation using $namsespace:name is not supported.

However text replacement is supported using ${namespace:name}

These queries can be canned in the query register.


Examples


The optimised xpath syntax is identical to that used for the PATH field in lucene queries.
Any PATH content below in the lucene query examples is also a valid xpath query.

Find all the attributes available for all nodes at any level (excluding the root node)
ResultSet results = searcher.query(storeRef, 'xpath', '//*', null, null);

Generate a query entirely by variable substitution
QueryParameterDefinition paramDef = new QueryParameterDefImpl(QName.createQName('alf:query', namespacePrefixResolver), (PropertyTypeDefinition) null, true, '//./*');
ResultSet results = searcher.query(storeRef, 'xpath', '${alf:query}', null, new QueryParameterDefinition[] { paramDef });

Lucene Language


This is the recommended language as it is supported by the recommended indexer.

The query language is described on the Lucene site http://lucene.apache.org/java/2_4_0/queryparsersyntax.html. The QueryParser has been modified to allow wild cards at the start of wild card query elements otherwise the syntax is the same.

Note that certain characters need to be escaped in the query string. There is support to do this on a static method on the LuceneQueryParser.

The following fields are available


  • ASPECT
    • All the aspects of the node
    • Tokenised as the fully qualified qname of each aspect
  • FTSSTATUS
    • Indicates if there are attributes waiting to be indexed in the back ground. Could be used to indicate that full text searches may be out of date matches
  • ID
    • The id from the node reference - all nodes in the index are from the same store
    • A UUID
  • PARENT
    • All the parent IDs (UUIDs)
  • PATH
    • An XPATH expression used to select nodes
    • This should only be access via a phrase query (ie in '') as it requires special tokenisation
  • PRIMARYPARENT
    • The ID of the primary parent node
  • QNAME
    • All the QNames by which this node is known in its parents
    • Should be queried using phrases as it requires special tokenisation
  • TEXT
    • The full text representation of the node content
  • TYPE
    • The fully qualified type of the node

Attributes as fields


  • @{namespace-uri}name

Attributes should be searched using phrase expressions.



The following fields are used internally


  • ANCESTOR
  • ISCONTAINER
  • ISROOT
  • ISNODE
  • TX

Examples


// Find all the nodes under the root node by QName namespace:one
// The prefix must be resolved to a URI
ResultSet results = searcher.query(rootNodeRef.getStoreRef(), 'lucene', 'PATH:\'/namespace:one\'', null, null);
results = searcher.query(storeRef, 'lucene', 'PATH:\'/namespace:one/namespace:five\'', null, null);
results = searcher.query(storeRef, 'lucene', 'PATH:\'/namespace:one/namespace:five/namespace:twelve\'', null, null);
results = searcher.query(storeRef, 'lucene', 'PATH:\'/namespace:*\'', null, null);
results = searcher.query(storeRef, 'lucene', 'PATH:\'/namespace:*/namespace:*\'', null, null);
results = searcher.query(storeRef, 'lucene', 'PATH:\'/namespace:*/namespace:*/namespace:*\'', null, null);
results = searcher.query(storeRef, 'lucene', 'PATH:\'/namespace:one/namespace:*\'', null, null);
results = searcher.query(storeRef, 'lucene', 'PATH:\'/namespace:*/namespace:five/namespace:*\'', null, null);
results = searcher.query(storeRef, 'lucene', 'PATH:\'/namespace:one/namespace:*/namespace:nine\'', null, null);
results = searcher.query(storeRef, 'lucene', 'PATH:\'/*\'', null, null);
results = searcher.query(storeRef, 'lucene', 'PATH:\'/*/*\'', null, null);
results = searcher.query(storeRef, 'lucene', 'PATH:\'/*/namespace:five\'', null, null);
results = searcher.query(storeRef, 'lucene', 'PATH:\'/*/*/*\'', null, null);
results = searcher.query(storeRef, 'lucene', 'PATH:\'/namespace:one/*\'', null, null);
results = searcher.query(storeRef, 'lucene', 'PATH:\'/*/namespace:five/*\'', null, null);
results = searcher.query(storeRef, 'lucene', 'PATH:\'/namespace:one/*/namespace:nine\'', null, null);
results = searcher.query(storeRef, 'lucene', 'PATH:\'//.\'', null, null);
results = searcher.query(storeRef, 'lucene', 'PATH:\'//*\'', null, null);
results = searcher.query(storeRef, 'lucene', 'PATH:\'//*/.\'', null, null);
results = searcher.query(storeRef, 'lucene', 'PATH:\'//*/./.\'', null, null);
results = searcher.query(storeRef, 'lucene', 'PATH:\'//./*\'', null, null);
results = searcher.query(storeRef, 'lucene', 'PATH:\'//././*/././.\'', null, null);
// Examples using the default namespace
results = searcher.query(storeRef, 'lucene', 'PATH:\'//common\'', null, null);
results = searcher.query(storeRef, 'lucene', 'PATH:\'/one//common\'', null, null);
results = searcher.query(storeRef, 'lucene', 'PATH:\'/one/five//*\'', null, null);
results = searcher.query(storeRef, 'lucene', 'PATH:\'/one/five//.\'', null, null);
results = searcher.query(storeRef, 'lucene', 'PATH:\'/one//five/nine\'', null, null);
results = searcher.query(storeRef, 'lucene', 'PATH:\'/one//thirteen/fourteen\'', null, null);
results = searcher.query(storeRef, 'lucene', 'PATH:\'/one//thirteen/fourteen//.\'', null, null);
results = searcher.query(storeRef, 'lucene', 'PATH:\'/one//thirteen/fourteen//.//.\'', null, null);

Type based queries.

escapeQName uses QueryParser static method to escape the string.

QName qname = QName.createQName(NamespaceService.ALFRESCO_URI, 'int-ista');
results = searcher.query(storeRef, 'lucene', '\\@' + escapeQName(qname) + ':\'01\'', null, null);

qname = QName.createQName(NamespaceService.ALFRESCO_URI, 'long-ista');
results = searcher.query(storeRef, 'lucene', '\\@' + escapeQName(qname) + ':\'2\'', null, null);
   
qname = QName.createQName(NamespaceService.ALFRESCO_URI, 'float-ista');
results = searcher.query(storeRef, 'lucene', '\\@' + escapeQName(qname) + ':\'3.4\'', null, null);
     
results = searcher.query(storeRef, 'lucene', '\\@' + escapeQName(QName.createQName(NamespaceService.ALFRESCO_URI, 'double-ista')) + ':\'5.6\'', null, null);
  
Date date = new Date();
String sDate = CachingDateFormat.getDateFormat().format(date);
results = searcher.query(storeRef, 'lucene', '\\@' + escapeQName(QName.createQName(NamespaceService.ALFRESCO_URI, 'date-ista')) + ':\'' + sDate + '\'', null, null);
   
results = searcher.query(storeRef, 'lucene',
               '\\@' + escapeQName(QName.createQName(NamespaceService.ALFRESCO_URI, 'datetime-ista')) + ':\'' + sDate + '\'', null, null);

results = searcher.query(storeRef, 'lucene', '\\@' + escapeQName(QName.createQName(NamespaceService.ALFRESCO_URI, 'boolean-ista')) + ':\'true\'', null,
               null);

results = searcher.query(storeRef, 'lucene', '\\@' + escapeQName(QName.createQName(NamespaceService.ALFRESCO_URI, 'qname-ista')) + ':\'{wibble}wobble\'',
               null, null);
results = searcher.query(storeRef, 'lucene', '\\@' + escapeQName(QName.createQName(NamespaceService.ALFRESCO_URI, 'guid-ista')) + ':\'My-GUID\'', null,
               null);
 
results = searcher.query(storeRef, 'lucene', '\\@' + escapeQName(QName.createQName(NamespaceService.ALFRESCO_URI, 'category-ista')) + ':\'CategoryId\'',
               null, null);

results = searcher.query(storeRef, 'lucene', '\\@' + escapeQName(QName.createQName(NamespaceService.ALFRESCO_URI, 'noderef-ista')) + ':\'' + n1 + '\'',
               null, null);
         
results = searcher.query(storeRef, 'lucene', '\\@' + escapeQName(QName.createQName(NamespaceService.ALFRESCO_URI, 'path-ista')) + ':\''
               + nodeService.getPath(n3) + '\'', null, null);
     

Queries based on type.

results = searcher.query(storeRef, 'lucene', 'TYPE:\'' + testType.toString() + '\'', null, null);
   
results = searcher.query(storeRef, 'lucene', 'TYPE:\'' + testSuperType.toString() + '\'', null, null);

results = searcher.query(storeRef, 'lucene', 'ASPECT:\'' + testAspect.toString() + '\'', null, null);
     
results = searcher.query(storeRef, 'lucene', 'ASPECT:\'' + testSuperAspect.toString() + '\'', null, null);
  

Full text search examples

results = searcher.query(storeRef, 'lucene', 'TEXT:\'fox\'', null, null);
      
QName queryQName = QName.createQName('alf:test1', namespacePrefixResolver);
results = searcher.query(storeRef, queryQName, null);
      

Canned queries and query parameters

queryQName = QName.createQName('alf:test2', namespacePrefixResolver);
results = searcher.query(storeRef, queryQName, null);
      
queryQName = QName.createQName('alf:test2', namespacePrefixResolver);
QueryParameter qp = new QueryParameter(QName.createQName('alf:banana', namespacePrefixResolver), 'woof');
results = searcher.query(storeRef, queryQName, new QueryParameter[] { qp });
     
queryQName = QName.createQName('alf:test3', namespacePrefixResolver);
qp = new QueryParameter(QName.createQName('alf:banana', namespacePrefixResolver), '/one/five//*');
results = searcher.query(storeRef, queryQName, new QueryParameter[] { qp });
   
// TODO: should not have a null property type definition
QueryParameterDefImpl paramDef = new QueryParameterDefImpl(QName.createQName('alf:lemur', namespacePrefixResolver), (PropertyTypeDefinition) null, true, 'fox');
results = searcher.query(storeRef, 'lucene', 'TEXT:\'${alf:lemur}\'', null, new QueryParameterDefinition[] { paramDef });
      
paramDef = new QueryParameterDefImpl(QName.createQName('alf:intvalue', namespacePrefixResolver), (PropertyTypeDefinition) null, true, '1');
qname = QName.createQName(NamespaceService.ALFRESCO_URI, 'int-ista');
results = searcher.query(storeRef, 'lucene', '\\@' + escapeQName(qname) + ':\'${alf:intvalue}\'', null, new QueryParameterDefinition[] { paramDef });

Other

results = searcher.query(rootNodeRef.getStoreRef(), 'lucene', 'PARENT:\'' + rootNodeRef.toString() + '\'', null, null);
      
results = searcher.query(rootNodeRef.getStoreRef(), 'lucene', '+PARENT:\'' + rootNodeRef.toString() + '\' +QNAME:\'one\'', null, null);

Attachments

    Outcomes