This content has been marked as final. Show 4 replies
As always, the correct answer is "it depends".
- Will there be more reads (queries) or writes (INSERTs)?
- Will there be any UPDATEs?
- Does the use case require any of the ACID guarantees, or would "eventual consistency" be fine?
At first glance, Hadoop (HDFS + MapReduce) doesn't look like a viable option, since you require "interactive queries". Also, if you require any level of ACID guarantees or UPDATE capabilities the best (and arguably only) solution is a RDBMS. Also, keep in mind that Millions of rows is pocket change for modern RDBMSs on average hardware.
On the other hand, if there'll be a lot more queries than inserts, VERY few or no updates at all, and eventual consistency will not be a problem, I'd probably recommend you to test a Key-Value store (such as Oracle NoSQL Database). The idea would be to use (AttributeX,ValueY) as the Key, and a Sorted List of Names that have ValueY for their AttributeX. This way you only do as many reads as attributes you have in the WHERE clause, and then compute the intersection (very easy and fast with sorted lists).
Also, I'd do this computation manually. SQL may be comfortable, but I don't think It's Big Data ready yet (unless you chose the RDBMS way, of course).
I hope it helped,
Edited by: JPuig on Apr 23, 2013 1:45 AM