Search - Prototype 4

Document created by resplin Employee on Jun 6, 2015
Version 1Show Document
  • View in full screen mode

Obsolete Pages{{Obsolete}}

The official documentation is at: http://docs.alfresco.com



Search Prototype


Revised Index structure


All Nodes:


  • UUID (The unique identifier for the node)
  • FTS (The full text search entry for the node)
  • QNAMES (The fully qualified name of the node in its parents - ordered to match parent Id)
  • PARENTIDS
  • STOREREF (The id of the workspace)
  • Attributes as (Name=@ns:name Value=value)
    • Better as Name=@ns:name only and build multiterm serach aginst differnt ns:name combinations
  • CATEGORIES (ids of category membership) (allows for doing category intersections of use)
  • READACCESS - simple list fo allowed names (this needs to be finalised)

Container Nodes:


  • ANCESTORID (A repeated field containing the IDS of all the nodes ancestors including itself)
  • PATH (The full path to the node) If there are multiple paths this can be repeated)

Category nodes will have path that defines them *grin*
Just need to treat them specially in the search


The impact of renaming and moving on the index


Leaves


  • Renaming a leaf node
    • requires one existing document in the index to be deleted and one to be added.
    • This is removing and adding the link




  • Adding a leaf node
    • requires one document to be added to the index.
    • Creating a node and adding a link




  • Deleting a leaf node
    • requires one existing document in the index to be deleted.
    • Delete and unlink




  • Unreference a leaf node
    • requires one document to be reindexed
    • deleting a link
    • Does the last link do a cascade delete or do you have to do an explicit delete on the 'primary' link?




  • Updating a leaf node
    • requires one existing document in the index to be deleted and one to be added.
    • No parent relationships are changed




  • Moving a leaf node
    • Reindex (add and delete) the node
    • The same cost as renaming - removing and adding the link
  • Reference a leaf node
    • Reindex the leaf node

Containers


  • Renaming a container node
    • Requires the container node and all its children to be deleted and readded to the index
    • All leaf nodes are unaffacted as they refer to the parent by id




  • Adding a container node
    • Requires one addition to the index (other new children will also have one addition)
    • The order of these will matter
  • Deleting
    • Delete the node and all of its children that had the node as the lone parent
  • Unreference - assuming there is no cascade delete
    • Update the index for the container (it must exists if the above assumption is true)
    • Update hte index for all the containers that have this entry in the parent list
  • Update
    • Reindex the container node
  • Moving
    • As delete
    • Recursive add for all children
  • Link
    • Updat the linked container and all those with this container in the parent list
    • They will appear under a new path

Categories


No effect other then that described above.


General Query structure


Changes to Path Queries


Proposal for join queries


Simple PATH and CATEGORY parser


Performance test


  • Delete
  • Search multiple indexes
  • Parallel search
  • Individual operations as described above
  • Degradation of performance as indexes grow on disk
  • Optimisation of the index as it grows
  • The correct lucene indexer parameters for large indexes
    • Do these depend on OS?

Search

Attachments

    Outcomes