Forum Stats

  • 3,760,221 Users
  • 2,251,664 Discussions
  • 7,871,026 Comments

Discussions

ROracle vs. ORAAH

Kevin Zhang
Kevin Zhang Member Posts: 251 Bronze Badge
edited Oct 2, 2017 12:22PM in R Technologies

Hi All:

Can someone explain what is the key difference between ROracle vs. ORAAH (Oracle R Advanced Analystics for Hadoop)? As far as I know, ROracle is free; ORAAH is part of Oracle Big Data Connnectors and it is not free.

It is possible to use ROracle to connect to HIVE database? Or ROracle can only used to connect R to Oracle Database?

Thanks!

Kevin

Answers

  • Christos Iraklis Tsatsoulis
    Christos Iraklis Tsatsoulis Member Posts: 85 Blue Ribbon
    edited Sep 20, 2017 1:07PM

    Hi Kevin,

    ROracle has nothing to do with Hadoop or Hive (hence with ORAAH) - it is actually used for connectivity between R and the Oracle Database, offering certain advantages compared with the RJDBC, i.e. the general R package for JDBC database connectivity.

    So, the answer to your two questions should be no and yes, respectively.

    A more "fair" comparison might be ROracle vs ORE (Oracle R Enterprise), i.e. the free ROracle package vs the proprietary enterprise solution of ORE (both are used for connectivity between R and Oracle Database, and in fact ORE itself uses ROracle), but I guess this is another discussion...

    There might be some other options around for connecting to Hive from R, but ROracle is certainly not one of them.

    Hope this helps

    Christos

    Kevin Zhang
  • Kevin Zhang
    Kevin Zhang Member Posts: 251 Bronze Badge
    edited Sep 21, 2017 9:55AM

    Hi Christos:

    Thanks for the clarification. I am wondering if Oracle has an "free" (open source) equivalent like ORAAH to allow customer to connect R to Hadoop. The reason I am asking is that:

    - R is open source

    - Apache Hadoop is open source

    I wish there is a open source project/solution for integration between R and Apache Hadoop. Maybe it is there, I just don't know. If anyone know such, can you share?

    Thanks!

    Kevin

  • Kevin Zhang
    Kevin Zhang Member Posts: 251 Bronze Badge
    edited Sep 21, 2017 10:02AM

    I come cross this thread https://datascienceplus.com/integrating-r-with-apache-hadoop/

    I am wondering if RHADOOP can be installed and configured with Oracle Big Data Appliances to provide integration between R and Hadoop.

    I really wish Oracle can provide a "free"/open-source solution that is equivalent to ORAAH.

    Thanks!

    Kevin

  • Christos Iraklis Tsatsoulis
    Christos Iraklis Tsatsoulis Member Posts: 85 Blue Ribbon
    edited Sep 21, 2017 5:23PM

    Hi Kevin.

    If you think of it, Oracle had a strong incentive for offering ROracle, i.e. a connectivity package between a highly popular open source package (R) and their own flagship product (Oracle Database); on the other hand, with both R & Hadoop being popular open source projects, one can hardly imagine why Oracle should bother offering a similar package here...

    Indeed, what is truly remarkable, is that the open source community itself has failed so far to offer such a (stable & well-maintained) package:

    • RHadoop, initiated with great expectations by Revolution Analytics (before being acquired by Microsoft), is now dead.
    • RHive looks also abandoned, and it has been removed from CRAN already since 2015.
    • RHIPE, too, looks abandoned (although the dadadr package seems to rely on it).

    My two pence: there is some relatively recent stuff out there using plain old JDBC connectivity between R & Hive and claiming decent performance:

    Hive and R Playing Nicely Together


    https://pygot.wordpress.com/2016/10/13/connecting-r-studio-to-hadoop-via-hive/  

    Nevertheless, I have not tried any of them myself, and cannot offer any further advice...

    Christos

    Kevin Zhang
  • Christos Iraklis Tsatsoulis
    Christos Iraklis Tsatsoulis Member Posts: 85 Blue Ribbon
    edited Sep 22, 2017 4:59AM

    Having said the above, it is important to add that speaking of Spark, the situation regarding R is different and certainly better; there are at least 2 open source packages offering Spark functionality through R, and as a by-product, both include access to HDFS and Hive Metastore:

    Keep in mind that Cloudera still does not include SparkR in its own Hadoop distributions (hence it is not included in Oracle Big Data products that rely on CDH); nevertheless, it is still possible to use it in CDH, as I have described in a blog post of mine some time ago (just keep in mind that the post was written for Spark 1.6, and some details have probably changed since then).

    Christos

    Kevin Zhang
  • Kevin Zhang
    Kevin Zhang Member Posts: 251 Bronze Badge
    edited Oct 2, 2017 12:22PM

    Hi Christos:

    Thanks for all your comprehensive information! they are very userful.

    Kevin

This discussion has been closed.