AnsweredAssumed Answered

Additional DB table for search/retrieval of hundred million documents

Question asked by tgrozdek on Sep 3, 2015
Latest reply on Sep 4, 2015 by tgrozdek
Hi,

I wonder how to implement solution for fast search/retrieval of documents when you might have a hundreds of millions documents in Alfresco 5.0 (One or Community version).

Alfresco is primarily used as rudimentary document management system - import document and search/retrieve document, nothing else.
Documents are mostly standard office documents with average 100 kB size. Documents arrive in tens of thousands every month (few thousand per day).

Idea is, when a document arrives in Alfresco, copy its attributes and docID to some DB table (whether direct online or by job offline). Searching of documents would be done by fast queries on DB table and retrieval should be done directly on Alfresco.
Solution for importing documents to Alfresco and fetching them already exists (web services) and could be changed.

I'm interested is this a right way to go and how to implement this solution with best performance for searching/retrieval of documents (import will work with this doc numbers) - our general concern is how will Alfresco work with this this number of documents and what are possible bottlenecks in Alfresco.
What should You use to copy doc attributes to table (events, java), would you do it online or offline, how exactly would you retrieve documents on Alfresco ?
Currently, we have two virtual servers, one for whole Alfresco and the other one as Index server.

Thanks in advance.
Tom

PS.
I know that it is possible to scale server architecture horizontally and vertically with a number of servers but I'm interested in this kind of solution with exeternal table.

Outcomes