AnsweredAssumed Answered

Alfresco design for a huge repository

Question asked by ivan.plestina on Jun 17, 2013
Hi Alfresco team and community members,

We are a long time community users but now we have an opportunity to do the a lot larger project with a client that estimate they will produce about 100 million records in repository per year (lucene search engine) and have 200/2000 concurrent/occasional users. So we decided it would probably be a good idea to go enterprise.

Talking to Alfresco partners, explaining them the project size, we were repeatedly told that all that matters is that we get 2,5GB RAM and 8 CPU cores and that we were good to go (which is copy/paste from wiki). We were refused to be given absolutely any advice on database sizing at least which vendor or that MySQL + Alfresco we normally use can handle this (we estimate billions of rows in alf_node_properties) and just told to "go get a DBA" and list of supported databases is this and that. We were also not given any advice on design of infrastructure what we need to look out for, do we need to partition repository, use clusters, whatever.

So basically, from what we're told we should get an 8-core server with 4GB of RAM, few TBs of disk capacity and we're good to go as long as we figure out the DB size? 2K users can insert their 275 000 documents per day (in one 8-hour shift to be precise) all the while executing searches, workflows, etc? I have lots of doubts here. I mean, I would be happy if we didn't have to pay much to get this performance, but come one, give us some real life examples/whitepapers that support what is in docs. And some database examples would also make sense. DBA can't miraculously say for a 250GB DB running Alfresco use XY memory, CPU, SSDs… They need to measure how Alfresco performs in this environment and I'd bet you guys have the best data how your software performs and what it needs across the hardware/software stack to run properly. So why are you hiding it? We don't know what exactly we need to buy to run Alfresco of this size. Have a look at your competition: <a href=""></a>

So what I'm asking is, if any members here could post what are they running alfresco on, how big is the repository and how well do you think it handles it and what general advice you have for designing large repositories like proposed above (10-100 million documents per year) mainly in terms of infrastructure but also software side?