AnsweredAssumed Answered

need architecture advice on kickstarter project

Question asked by mattsnyder on Jul 23, 2012
Latest reply on Aug 8, 2012 by mattsnyder
Hi all, I've been planning a Kickstarter project for some time that involves Alfresco, and it's time for me to do some architecture work.  The project is called Youluh, and the catch phrase is "a service for people who accept EULAs without reading them." 

The basic idea in 150 words is: whenever a user encounters a EULA, for example when updating iTunes, he invokes a bit of software on his device that logs in to Alfresco in the cloud and uploads the EULA.  The EULA becomes part of the user's private document library, and at the same time, the service does some quick analysis and returns a report card to the user.  The "quick analysis" consists of breaking the EULA into clauses, vectorizing the clauses, and submitting the vectorized clauses to a special instance of Solr, which does a particular kind of search and returns a list of similar or identical clauses in the index.  The report card gives some statistics in buckets: how many clauses in this EULA has this user already agreed to before, how many clauses are new to this user but relatively common in the population of the service, how many clauses are rare or totally new to the system, and how many clauses have comments associated with them.  The report card, and especially the comments associated with clauses, is the main feature of the project.  The idea of the service is to reduce the amount of reading you have to do and still be able to say you read the EULA.

It's important for the EULAs to be stored in Alfresco for each user, for legal reasons, even though there will be an enormous amount of duplication.  Imagine an initial user community of 10,000, uploading a few EULAs every week.  Each EULA is a very small text file.  If possible, when a user uploads a new EULA that is very similar to one he uploaded before, the new EULA will be added as a revision to the older EULA.

The main architectural question I have at this point is, given the way I'm using Alfresco, and the growth rate, what's the best way to host it?  I don't think Alfresco in the Cloud is any use to me, since I'm going to need API access.  I initially thought Amazon EC2, but comparing EC2 with other options, it's not clear if my application is in the sweet spot for EC2.  I think it might be, because of the potential burstiness of user traffic.  For example, Apple comes out with a new license agreement, and every user uploads a new copy in a 24 hour span.

Any/all ideas and comments are welcome