1 Reply Latest reply on Mar 9, 2012 5:43 PM by Jean-Pierre Dijcks-Oracle

    Hadoop / NoSql

      reading the documents about hadoop and NoSql, I don't understand when it is better to use one or the other and if the two products can be used togheter.

        • 1. Re: Hadoop / NoSql
          Jean-Pierre Dijcks-Oracle

          There is a lot that can be said based on your question. First, my assumption is that when you say "NoSQL" you are implying the use of Oracle NoSQL Database... (Hadoop is NoSQL as well in the broadest sense of the NoSQL term).

          With that, yes you can use Oracle NoSQL Database and Hadoop together. For example you can store your user profile information in NoSQL DB, you can use a MapReduce input format to extract data from NoSQL DB and then work on the data with MR.

          Generally speaking (I'm consciously ignoring HBase now), you would store large write-once, read many times data chunks in HDFS/Hadoop. Classic examples are weblogs, sensor data streams. These are written down in large chuncks onto HDFS (say a weblog file is broken into 128MB chunks). You write and read sequentially (hence the requirement for general disk) but do not update anything - least of all you do not update any "records" based on keys.

          NoSQL DB gives you transactions based on (primary) key. You have random IO patterns, reading single records (I'm simplifying a little) and having the chance to update and delete these based on their key. This is really leveraging a b-tree index like structure. The value (the index is the key) is a string of bytes which are not interpreted by the DB is simply passed to the Java program upon reading.

          So generally speaking, look at the required access pattern. If it is random, single record based use a NoSQL DB (or HBase), if it is batch access for analysis use HDFS.