Oracle R Enterprise Data Store

Version 4

    Oracle R Enterprise Data Store

     

    by Brendan  - Brendan Tierney, Oracle ACE Director,

     

    Introduction

     

    This purpose of this article is to give you an overview of the Oracle R Enterprise (ORE) Data  Store. The ORE Data Store is a repository in the Oracle Database that allows us to store and manage R objects that are created as part of our data science projects. The ORE Data Store is a very useful feature that allows for the safe storage of these R objects without interfering with the core database or application objects. These R objects that are stored in the ORE Data Store can be accessed from the client machine running R and ORE, and can also be accessed from within the database using the embedded R execution using SQL.

     

    In a multitenant, container database will have hold on redo/archive logs, so the same redo stream will share by all pluggable databases in that container database.

     

    1. Overview of ORE

     

    Oracle R Enterprise (ORE) is part of the Oracle Advanced Analytics option. Oracle R Enterprise allows you to use the R language to seamlessly integrate with the Oracle Database, allowing you to use the power and scalability of the Oracle Database. With Oracle R Enterprise you can quickly and easily migrate your R advanced analytics to use the Oracle Database with the minimum of code changes to your R scripts. By having the R language as a code element of the Oracle Database significantly expands the statistical, analytical and graphical capabilities of the Oracle Database.

     

    Oracle provides 4 main solutions that are based on the R language:

     

    • Oracle R Enterprise: This is version of the R language that has been built to run as part of the Oracle Database. This allows you to run your R code and scripts in the Oracle Database, utilizing the performance and scalability of the Oracle Database server.

    • R Oracle : An R package that allows you to connect to the Oracle Database. This package is specifically tuned to use Oracle Net to ensure the efficient communication with the Oracle Database and for the ultra fast movement of data between the client machine and the Oracle Database.

    • Oracle R Distribution: This is a customized version of the open source R language that is provided free by Oracle. With Oracle R Distribution Oracle has tuned specific packages and functions to work efficiently with the Oracle Database.

    • Oracle R Advanced Analytics for Hadoop: This allows you to run your R code to access and run on Hadoop utilizing the MapReduce programming framework for R users. This package is part of the Oracle Big Data Connectors Software Suite.

     

     

                

    2.The ORE Data Store

     

    When you are working on your data science projects you will typically create a number of temporary objects. When working with R these get saved in your local environment. But when you are working with your data (big or small) and using Oracle R Enterprise to work with the data in the database you have three main options open to you for saving these temporary objects. The first option is to store them on your local machine. But as your data volumes grows this can start to become a problem. Additionally it is also a data security issue, as you will end up with various parts of your enterprise data located on local machines. This might not be a problem for a small team but if a mid-sized to large them then this can become a problem. The second option is to store the temporary objects are tables in the database. This might not be ideal, as you will be mixing core data with temporary working data sets. Again as your data science team grows this can become a problem, as you will not know what is core and what is not. The third option is to use and ORE Data Store. This allows you to store these temporary objects together in the database but separate to the core data that you are working on. In addition to the various data sub sets you can also use the ORE Data Store to store a variety of other R objects.

     

    When working with temporary ORE objects these will only exist for the duration of your connections. So when you disconnect your ORE session all the temporary objects that were created in the database will also be dropped and removed from the database. It would be useful if we could save these temporary objects for later use without having to go through the extra steps of creating tables to store the data. Additionally with using ORE you can create other types of objects, like data mining models, that we will want to use later. What we don’t want to do is to go through the steps of creating these objects all over again.

     

    With Oracle R Enterprise we can create and ORE Data Store in our database. In this ORE Data Store we can store all of these ORE objects that we have created. We can share the ORE Data Store with other data analysts and data scientists, but perhaps the most important feature of the ORE Data Store is that we can use it when we are performing an embedded ORE execution in SQL.

     

    The examples given in this article cover the creation of an ORE Data Store, the saving of objects to it, the retrieval of objects, getting details of the ORE Data Store, deleting objects and finally deleting the  ORE Data Store.

     

    Table 1 outlines the set of functions available in Oracle R Enterprise that allows us to create and manage an ORE Data Store.

     

                                                            

    ore.Data Store

    Lists information about the ORE Data Store. This defaults to the current schema. When name is provided then it returns details of that ORE Data Store.

    ore.Data StoreSummary

    Provides detailed information about the objects in a specified ORE Data Store.

    ore.delete

    Deletes the ORE Data Store from the database. This will also remove all the ORE objects that were contained in the ORE Data Store

    ore.load

    Loads the objects from the ORE Data Store back into the R environment. These objects are available for use immediately.  Either all the objects can be retrieved or only the objects listed.

    ore.lazyLoad

    Loads the object from the ORE Data Store when they are first used by your R code. Objects are only restored one at a time.

    ore.save

    Creates an ORE Data Store and stores the listed objects in the ORE Data Store for future use.

    Table 1 : Functions to use the in-database ORE Data Store.

     

    With Oracle R Enterprise you can create any number of ORE Data Stores for your database connection. It would be advisable to have at least one ORE Data Store per project or part of project. But you might need to keep watch on the number of ORE Data Stores and try to avoid the number of these getting too large.

     

    3. The ore.save and ore.delete Functions

     

     

    The first of the functions from Table 1 that we will look at is the ore.save function. Apart from listing the objects you want to save in an ORE Data Store, you also need to decide on the name of the ORE Data Store and if objects should be appended to an existing ORE Data Store or if an existing ORE Data Store is to be overwritten. You can save individual objects or a group of objects or all available objects that are available in the ORE Data Store. In this case an object can be an ORE object or an object in your R environment

     

    The following example illustrates the saving of an individual object (CARS_DATA) in an ORE Data Store call ORE_DS.

     

    > ore.save(CARS_DATA, name="ORE_DS", description="Example of ORE Data Store")

     

           

    When creating a new ORE Data Store you should try to give a meaningful description. The description can be up to 2000 characters and this allows you to give a description that clearly explains what the objects in the data store are being used for.

     

    To add any additional objects to the ORE Data Store you will need to add append=TRUEto the end of the ore.save command.

    > ore.save(cars_ds, name="ORE_DS", append=TRUE)

     

    The append parameter allows you to add objects to an existing ORE Data Store. You will need to check to see if the object exists in the ORE Data Store before running the ore.save functions. If the object already exists then you will get an error.  To work around this you will need to first delete the object from the ORE Data Store before running the ore.save function, as illustrated in the following example.

     

    > ore.delete("ORE_DS", list="cars_ds") 
    > ore.save(cars_ds, name="ORE_DS", append=TRUE)

     

    The above example shows you how to delete one object from the ORE Data Store. If you wanted to delete multiple objects then you would form a list.

     

    > ore.delete("ORE_DS", list=c("cars_ds", "CARS_DATA"))

     

    If you need to save all the objects in your current R environment to the ORE Data Store you can use the list parameter to pass in the names of these objects. This will allow us to save these objects to the database, adding these to the data security provided by the database and to your backups.

     

    > ore.save(list=c("cars_ds", "CARS_DATAC"), name="ORE_DS")

     

      

    The following example illustrates adding the local R environment objects to an ORE Data Store called ORE_DS2.

     

    >ore.save(list=ls(), name="ORE_DS2", description="DS for all R Env Data")

     

    4.Getting details about the ORE Data Store

     

    Oracle R Enterprise has two functions that give details about ORE Data Stores. These functions are the ore.Data_Store and the ore.Data_StoreSummary.  The ore.Data_Store function lists all the ORE Data Stores that are available for the user.

     

    > ore.Data_Store()
    Data Store.name object.count     size       creation.date              description
    1         ORE_DS            2     5104 2016-04-04 10:33:25 Example of ORE Data Store
    2        ORE_DS2            5 51466509 2016-04-04 10:30:19    DS for all R Env Data

     

    The ore.Data_StoreSummary function displays the details of what objects are contained in a particular data store.

     

    > ore.Data_StoreSummary("ORE_DS2") 
      object.name      class     size length row.count col.count 
    1     cars_ds data.frame     3798     11        32        11 
    2 cars_ore_ds  ore.frame     1675     11        32        11 
    3    sales_ds data.frame 51455575      7    918843         7 
    4      usa_ds  ore.frame     2749     23     18520        23 
    5     usa_ds2  ore.frame     2712     23     18520        23
    

     

     

    Being able to save your various R and ORE objects to an in-database ORE Data Store assists you in securing your data and being able to utilize the performance and scalability of the database when you want to reuse these objects.

     

     

    Loading Data from an ORE Data Store

     

    As you build up your analytics environment, you will want to store data in various ORE Data Stores and to be able to retrieve these objects so that you can continue with your analytics at a later time. Oracle R Enterprise provides you with the ore.load and the ore.lazyLoad functions.

     

    The ore.load function restores objects to your R environment. The function requires the name of an ORE Data Store. In this case all the objects in the ORE Data Store will be restored as shown in the following example

     

    >ore.load("ORE_DS")

     

    As you add more and more objects to your ORE Data Store, when you run the ore.load function it may take some time to restore all the objects. In such scenarios it would be better to only restore the objects that you require. In this case you will provide a list of the objects you want restored.

     

    >ore.load("ORE_DS2", c("cars_ds", "sales_ds", "usa_ds"))

     

    An alternative approach to take is to only load the objects as and when they are required. This will require us using the ore.load function before each time the object is used for the first time. Alternatively you can use the ore.lazyLoad function. The ore.lazyLoad function does not immediately retrieve the specified objects from an ORE Data Store. Instead objects are retrieved when they are first referenced. The ore.lazyLoad function takes the same parameters as the ore.load function. You can pass the name of the ORE Data Store, or the ORE Data Store name and the list of objects. The following examples are the ore.lazyLoad alternatives to the ore.load examples given above.

     

    > ore.lazyLoad("ORE_DS2") 
    > ore.lazyLoad("ORE_DS2", c("cars_ds", "sales_ds", "usa_ds"))
    

     

    6. Deleting and ORE Data Store

     

    We have already seen an example of how to delete an object from an ORE Data Store. This example is repeated below. We can use the ore.delete function to delete objects within an ORE Data Store and to delete an ORE Data Store itself.

     

    > ore.delete("ORE_DS", list=c("cars_ds", "CARS_DATA"))

     

    To delete an ORE Data Store and all the objects within the ORE Data Store you will use the ore.delete function. To delete an ORE Data Store you just need to provide the name of the data store

     

    > ore.delete("ORE_DS") 

     

    7. Using SQL to Manage the ORE Data Store

     

    You can use SQL to query the what ORE Data Stores you have in your schema, that are in the database and you can also query what objects exist in the ORE Data Stores for your schema. Table 2 lists the data dictionary views that you can use to query the ORE Data Stores. These data dictionary views are created when ORE is installed.

     

                                   

    rqs$Data Store

    Lists the ORE Data Stores that are in the databases. Contains the schema names where the ORE Data Stores are located.

    quser_Data Storelist

    Lists the ORE Data Stores for the current schema.

    rquser_Data

    Storecontents

    Lists the R objects that are stored in the ORE Data Stores for the current schema.

    Table 2 :  Data Dictionary views for ORE Data Stores

     

    The following examples illustrate the use of these data dictionary views.

     

    The first example shown below illustrates the use of the first two dictionary views listed in Table 2. The first query will list all the ORE Data Stores in the database and the second query lists the ORE Data Stores for the current schema.

     

    SQL> select * from RQ$DATA STORE;
          DSID DSOWNER              DSNAME               CDATE     DESCRIPTION
    ---------- -------------------- -------------------- --------- ----------------------------      1089 ORE_USER             ORE_DS2              04-AUG-15 DS for all R Env Data     
           1095 ORE_USER             ORE_DS               04-AUG-15 Example of ORE Data Store
    

     

    Probably the better query to use the following one as it only lists the ORE Data Stores in your schema.

     

    SQL> select * from RQUSER_DATA STORELIST;
    DSNAME                     NOBJ     DSSIZE CDATE     DESCRIPTION
    -------------------- ---------- ---------- --------- -----------------------------ORE_DS                        2       5104 04-AUG-15 Example of ORE Data Store
    ORE_DS2                       5   51466509 04-AUG-15 DS for all R Env Data
    

     

    After identifying the ORE Data Stores the next task is to list the contents of the ORE Data Store. For this we use the RQUSER_DATA STORECONTENTS data dictionary view. The first query example below lists all the objects in all of our schema ORE Data Stores.

     

    SQL> select * from RQUSER_DATA STORECONTENTS;
    DSNAME     OBJNAME       CLASS            OBJSIZE     LENGTH       NROW       NCOL
    ---------- ------------- ------------- ---------- ---------- ---------- ----------
    ORE_DS     CARS_DATA     ore.frame           1306         11         32         11
    ORE_DS     cars_ds       data.frame          3798         11         32         11
    ORE_DS2    cars_ds       data.frame          3798         11         32         11
    ORE_DS2    cars_ore_ds   ore.frame           1675         11         32         11
    ORE_DS2    sales_ds      data.frame      51455575          7     918843          7
    ORE_DS2    usa_ds        ore.frame           2749         23      18520         23
    ORE_DS2    usa_ds2       ore.frame           2712         23      18520         23
    

     

    As our data science projects evolve and we start working on more and more projects, the number of ORE Data Stores will also grow and grow. The following example illustrates how to query the objects from one of the ORE Data Stores.

     

    SQL> select * from RQUSER_DATA STORECONTENTS
      2  where dsname = 'ORE_DS2';
    DSNAME     OBJNAME       CLASS            OBJSIZE     LENGTH       NROW       NCOL
    ---------- ------------- ------------- ---------- ---------- ---------- ----------
    ORE_DS2    cars_ds       data.frame          3798         11         32         11
    ORE_DS2    cars_ore_ds   ore.frame           1675         11         32         11
    ORE_DS2    sales_ds      data.frame      51455575          7     918843          7
    ORE_DS2    usa_ds        ore.frame           2749         23      18520         23
    ORE_DS2    usa_ds2       ore.frame           2712         23      18520         23
    

     

    8. Summary

     

    Oracle R Enterprise allows us to greatly expand the range of analytics that is available in the Oracle Database by allowing us to have an R engine as part of the Oracle Database. The ORE Data Store provides us with a repository that can be used to store the various R objects that we can create as part of out data science projects. These objects are available to us on our R client machine and also available to us inside the database. We can also access these objects using the embedded R execution in SQL. In this article a number of examples were given to illustrate how to create, manage, store and retrieve object in the ORE Data Store.

     

      

    About the Author

        

    Brendan Tierney [Brendan ] is an Oracle ACE Director and an independent consultant based in Dublin, Ireland. He has 22+ years of extensive experience working in the areas of Data Mining, Predictive Analytics, Data Warehousing and Data Architecture. Brendan regularly works on projects in Ireland, UK, Europe, Canada and USA. For these projects Brendan is based in his home office and remotely connects to each organization network. Brendan is the editor of the UKOUG Oracle Scene magazine and deputy chair of the OUG Ireland BI & Big Data SIG. Brendan is a regular speaker at conferences across Europe and the USA and has written technical articles for OTN, Oracle Scene, IOUG SELECT Journal and ODTUG Technical Journal.

    Brendan has published a number of books. These include the following books published by Oracle Press / McGraw-Hill:

    • Predictive Analytics using Oracle Data Miner (August 2014)
    • Oracle R Enterprise: Harnessing the Power of R in the Oracle Database (2016)
    • Real World SQL & PL/SQL: Advice from the Experts (2016), co-written with Arup Nada, Heli Helskyaho, Martin Widlake and Alex Nuijten.

     

    Visit Brendan's Community profile for links to Brendan's blog, and contact information.