How To Query Hive and Impala from Oracle using ODBC Heterogeneous Gateway

Version 2

    Introduction

     

    Big Data technologies have spread throughout organizations and the challenge of integration with legacy systems must be addressed somehow. There are several ways to connect Oracle or other RDBMS systems with the big data ecosystem, such as dedicated connectors, ETL, and custom ones. In the following article we will demonstrate one simple method to connect and query from Oracle directly to Apache Hive and Cloudera Impala using the ODBC heterogeneous gateway. Heterogeneous gateways were developed by Oracle to address the direct connectivity and type translations from non-Oracle to Oracle databases. There are specialized gateways, subjected to licensing and the generic ODBC gateway which is free. The ODBC heterogeneous gateway is a generic gateway that can be used practically with any data source that provides a functional ODBC driver.

     

    Apache Hive is a data warehouse framework built on top of Hadoop used for mostly for analytics. It is using a query-like language called HiveQL. All queries are translated into map reduce jobs and executed further in hadoop. It lacks transaction support and comes also with several limitations. More about Hive at https://hive.apache.org.

     

    Cloudera Impala is a more elaborated framework, starting to get wider acceptance and support by more and more vendors such as Amazon, Oracle, MapR, and others. It is implemented as a Massive Parallel Processing query engine with SQL ANSI compatibility, having low latency that completely bypasses map reduce operations. Also it has support for many different file formats.  More about Impala (http://www.cloudera.com/documentation/enterprise/latest/topics/impala.html). 

     

     

    Environmental Setup

    Oracle server

     

    Operating system: Oracle Virtual Box guest running OEL 6.5

    Database server: Oracle Enterprise Edition Ver. 12.1.0.2.0

    Hostname: Node1

    IP address: 192.168.1.20

     

    Cloudera QuickStart

     

    Operating system: Oracle Virtual Box guest running CentOS 6.7

    Cloudera version: 5.7

    Hostname: Cloudera

    IP address: 192.168.1.88

     

    To setup the Cloudera environment we used a Cloudera Quickstart image that can be downloaded from http://www.cloudera.com/downloads/quickstart_vms/5-7.html. The hive tables used in this article are created according to Cloudera tutorial that can be found here (http://www.cloudera.com/developers/get-started-with-hadoop-tutorial/exercise-1.html)

     

    Installing Cloudera Hive and Impala ODBC driver

     

    As a perquisite to install the Cloudera Hive and Impala ODBC drivers we need to have installed unixODBC or iODBC. We have opted for unixODBC and installed as follows:

     

     

    [root@node1 etc]# yum install unixODBC

    Loaded plugins: refresh-packagekit, security

    ..........................................................................................................

    Setting up Install Process

    Resolving Dependencies

    --> Running transaction check

    --> Package unixODBC.x86_64 0:2.2.14-14.el6 will be installed

    --> Finished Dependency Resolution

     

    Dependencies Resolved

    =============================================================

    Package                     Arch                     Version                                    Repository                               

    Size

    =============================================================

    Installing:

    unixODBC                x86_64                    2.2.14-14.el6                           ol6_latest                              

    377 k

     

    Transaction Summary

    ============================================================

    Install       1 Package(s)

     

    Total download size: 377 k

    Installed size: 1.1 M

    Is this ok [y/N]: y

    Downloading Packages:

    unixODBC-2.2.14-14.el6.x86_64.rpm 

    ….............................................................................................................................  

     

     

    Next, download the Cloudera Hive ODBC driver from the following address:

    http://www.cloudera.com/downloads/connectors/hive/odbc/2-5-12.html

     

     

    As root execute the following command to install the Cloudera Hive ODBC driver :

     

    [root@node1 kit]# rpm -Uhv ClouderaHiveODBC-2.5.19.1004-1.el6.x86_64.rpm

    Preparing...                                    ########################################### [100%]

       1:ClouderaHiveODBC                ########################################### [100%]

    [root@node1 kit]#

     

    Download the Cloudera Impala ODBC driver form the following location:

    (http://www.cloudera.com/downloads/connectors/impala/odbc/2-5-22.html)

     

    As root execute the following command to install the odbc cloudera impala driver :

     

    [root@node1 kit]# rpm -Uhv ClouderaImpalaODBC-2.5.33.1004-1.el6.x86_64.rpm

    Preparing...                                                           ###################### [100%]

       1:ClouderaImpalaODBC                                   ###################### [100%]

    [root@node1 kit]#

     

     

     

    Configuring the Cloudera Hive ODBC driver

    The driver installation directory will be located in /opt/cloudera/hiveodbc. In the /opt/cloudera/hiveodbc/Setup directory we will find sample configuration files that can be used with few modification for performing the setup.

    Create a directory as root /usr/local/odbc.This directory will be used as placement for odbcinst.ini:

     

    [root@node1 ~]# mkdir -p /usr/local/odbc

     

    Copy odbc.ini in /home/oracle directory as follows:

    [root@node1 Setup]# cp odbc.ini /home/oracle


    Next copy cloudera.hiveodbc.ini from /opt/cloudera/hiveodbc/lib/64 to /etc directory as follows:

    [root@node1 64]# cp cloudera.hiveodbc.ini /etc

     

     

    Next, we will configure the environment variables for Cloudera Hive ODBC driver.

    Open /home/oracle/.bash_profile for editing and add the following variables (marked with bold) as follows:

     

    Next add in the /home/oracle/.bash_profile the following variables.

    [oracle@node1 ~]$ vi .bash_profile

    # .bash_profile

     

    # Get the aliases and functions

    if [ -f ~/.bashrc ]; then

                . ~/.bashrc

    fi

     

    # User specific environment and startup programs

    export ORACLE_HOME=/u01/app/oracle/product/12.1.0/dbhome_1

    export ORACLE_SID=ORCL

    export ODBCINI=~/odbc.ini

    export ODBCSYSINI=/usr/local/odbc

    export CLOUDERAHIVEINI=/etc/cloudera.hiveodbc.ini

    PATH=$PATH:$HOME/bin:$ORACLE_HOME/bin

    export LD_LIBRARY_PATH=$LIBRARY_PATH:$ORACLE_HOME/lib

     

    Source the profile to get the environment variables loaded as follows:

    [oracle@node1 ~]$ source .bash_profile

     

     

    The odbc.ini file copied under /home/oracle is containing two examples for configuring Data source names (DSN) for the 32 and 64 driver version. All parameters are explained with comments in the sample files, comments which are omitted from the current listing. If you are interested in learning more about these parameter then check the driver documentation at http://www.cloudera.com/documentation/other/connectors/hive-odbc/latest.html.

     

    We will change only the host, port, and UID under [Cloudera ODBC Driver for Apache Hive (64-bit) DSN] and rename the data source name to HIVEDSN as follows:

     

     

    [HiveDSN]

    Description     = Cloudera ODBC Driver for Apache Hive (64-bit) DSN

    Driver              = /opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so

    HOST                        = 192.168.1.88

    PORT             = 10000

    Schema            = default

    ServiceDiscoveryMode                      = 0

    ZKNamespace =

    HiveServerType          = 2

    AuthMech                  = 2

    ThriftTransport          = 1

    UseNativeQuery         = 0

    KrbHostFQDN          = [Hive Server 2 Host FQDN]

    KrbServiceName         = [Hive Server 2 Kerberos service name]

    KrbRealm                    = [Hive Server 2 Kerberos realm]

    SSL                              = 0

    TwoWaySSL               = 0

    ClientCert                   =

    ClientPrivateKey        =

    ClientPrivateKeyPassword                =

    UID=cloudera

     

     

    Next, test the connection as follows:

     

    [oracle@node1 ~]$ isql -v hivedsn

    +---------------------------------------+

    | Connected!                            |

    |                                                |

    | sql-statement                         |

    | help [tablename]                   |

    | quit                                        |

    |                                                |

    +---------------------------------------+

    SQL> 

     

    List the tables and try a count against the orders table to ensure that everything is working fine:

     

    SQL> show tables;

    +-----------------------------------+

    | tab_name |+-----------------------------------------+

    | categories || customers || departments || order_items || orders || products |+------------------------------------------+

    SQLRowCount returns -1

    6 rows fetched

     

    SQL> select count(*) from orders;

    +---------------------+

    | EXPR_1              |

    +---------------------+

    | 68883               |

    +---------------------+

    SQLRowCount returns -1

    1 rows fetched

     

    The connection and data retrieval seems to work fine.

     

     

    Oracle ODBC transparent gateway configuration for Cloudera Hive ODBC

    Under the directory $ORACLE_HOME/hs/admin we created a initialization parameters file named initHIVEDSN.ora (here we used the dg4odbc.ini file as a template) with the following contents:

     

    [oracle@node1 admin]$ vi initHIVEDSN.ora

     

    # This is a sample agent init file that contains the HS parameters that are needed for the Database Gateway for ODBC

     

    HS_FDS_CONNECT_INFO = "HIVEDSN"

    HS_FDS_TRACE_LEVEL = 0

    HS_FDS_SHAREABLE_NAME = /opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so

    #

    # ODBC specific environment variables

    #

    set ODBCINI=/home/oracle/odbc.ini

     

    Note: The name of the init file must be init<HS_FDS_CONNECT_INFO value>.ora otherwise the connection will fail.

     

     

    Listener configuration

    Open the listener.ora file

     

    [oracle@node1 admin]$ vi /u01/app/oracle/product/12.1.0/dbhome_1/network/admin/listener.ora

    # listener.ora Network Configuration File: /u01/app/oracle/product/12.1.0/dbhome_1/network/admin/listener.ora

    # Generated by Oracle configuration tools.

     

    SID_LIST_LISTENER =

      (SID_LIST =

        (SID_DESC =

          (PROGRAM = dg4odbc)

          (ARGS = ENVS=LD_LIBRARY_PATH=/opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so)

          (SID_NAME = HIVEDSN)

          (ORACLE_HOME = /u01/app/oracle/product/12.1.0/dbhome_1)

        )

      )

     

    LISTENER =

      (DESCRIPTION_LIST =

        (DESCRIPTION =

          (ADDRESS = (PROTOCOL = TCP)(HOST = node1)(PORT = 1521))

        )

        (DESCRIPTION =

          (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))

        )

      )

     

    ADR_BASE_LISTENER = /u01/app/oracle

     

     

    Add a network service in tnsnames.ora, called HIVEDSN as follows :

     

    HIVEDSN =

      (DESCRIPTION=

    (ADDRESS=(PROTOCOL=tcp)(HOST=node1)(PORT=1521))

    (CONNECT_DATA=(SID=HIVEDSN))

          (HS=OK)

        )

     

     

    Reload the listener as follows:

     

    [oracle@node1 admin]$ lsnrctl reload

     

    LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 20-JUN-2016 08:05:57

    Copyright (c) 1991, 2014, Oracle.  All rights reserved.

     

    Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=node1)(PORT=1521)))

    The command completed successfully

      [oracle@node1 admin]$

     

     

    Check the services availability:

     

    [oracle@node1 admin]$ lsnrctl status

     

    LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 20-JUN-2016 08:06:32

    Copyright (c) 1991, 2014, Oracle.  All rights reserved.

     

    Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=node1)(PORT=1521)))

    STATUS of the LISTENER

    -----------------------------------------------------------------------------------------------------------------------

    Alias                            LISTENER

    Version                        TNSLSNR for Linux: Version 12.1.0.2.0 - Production

    Start Date                    19-JUN-2016 08:04:43

    Uptime                        1 days 0 hr. 1 min. 49 sec

    Trace Level                 off

    Security                       ON: Local OS Authentication

    SNMP                          OFF

    Listener Parameter File   /u01/app/oracle/product/12.1.0/dbhome_1/network/admin/listener.ora

    Listener Log File /u01/app/oracle/diag/tnslsnr/node1/listener/alert/log.xml

    Listening Endpoints Summary...

    (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=node1)(PORT=1521)))

      (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))

    (DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=node1)(PORT=5500))

    (Security=(my_wallet_directory=/u01/app/oracle/product/12.1.0/dbhome_1/admin/ORCL/xdb_wallet))(Presentation=HTTP)(Session=RAW))

    Services Summary...

    Service "HIVEDSN" has 1 instance(s).

      Instance "HIVEDSN", status UNKNOWN, has 1 handler(s) for this service...

    Service "ORCL" has 1 instance(s).

      Instance "ORCL", status READY, has 1 handler(s) for this service...

    Service "ORCLXDB" has 1 instance(s).

      Instance "ORCL", status READY, has 1 handler(s) for this service...

    The command completed successfully

     

    Ping the network service HIVEDSN to check its availability as follows:

     

    [oracle@node1 admin]$ tnsping HIVEDSN

     

    TNS Ping Utility for Linux: Version 12.1.0.2.0 - Production on 20-JUN-2016 08:07:56

    Copyright (c) 1997, 2014, Oracle.  All rights reserved.

     

    Used parameter files:

    /u01/app/oracle/product/12.1.0/dbhome_1/network/admin/sqlnet.ora

     

    Used TNSNAMES adapter to resolve the alias

    Attempting to contact (DESCRIPTION= (ADDRESS=(PROTOCOL=tcp)(HOST=node1)(PORT=1521)) (CONNECT_DATA=(SID=HIVEDSN)) (HS=OK))

    OK (10 msec)

     

     

    Create a public database link called hivedsn as follows:

     

    [oracle@node1 admin]$ sqlplus / as sysdba

    SQL*Plus: Release 12.1.0.2.0 Production on Mon Jun 20 08:09:35 2016

    Copyright (c) 1982, 2014, Oracle.  All rights reserved.

     

    Connected to:

    Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

    With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

     

    SQL> create public database link hivedsn connect to cloudera identified by cloudera using 'HIVEDSN';

    Database link created.

     

    Next, try to fetch some data from hive using the database link:

     

    SQL> select * from customers@hivedsn;

    select * from customers@hivedsn

                            *

    ERROR at line 1:

    ORA-28500: connection from ORACLE to a non-Oracle system returned this message:

    [Cloudera][ODBC] (11560) Unable to locate SQLGetPrivateProfileString function.

    {HY000,NativeErr = 11560}

    ORA-02063: preceding 2 lines from HIVEDSN

     

    The SQLGetPrivateProfileString function is contained and exported from libodbcinst.so shared library. To solve this issue we have to add the complete path (with bold) into /etc/cloudera.hiveodbc.ini driver configuration file as follows.:

     

    [Driver]

    ODBCInstLib=/usr/lib64/libodbcinst.so

    ErrorMessagesPath=/opt/cloudera/hiveodbc/ErrorMessages/

    LogLevel=0

    LogPath=

    SwapFilePath=/tmp

     

    Also if the Hive ODBC driver is not linked with libodbcinst.so you should add the LD_PRELOAD variable to bash_profile or issue export LD_PRELOAD=/usr/lib64/libodbcinst.so.

     

     

    [oracle@node1 ~]$ export LD_PRELOAD=/usr/lib64/libodbcinst.so

     

    Verify that libodbcinst.so is loaded as follows:

     

    [oracle@node1 ~]$ ldd /opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so

                linux-vdso.so.1 =>  (0x00007fffad5ff000)

                libdl.so.2 => /lib64/libdl.so.2 (0x00007fd1af2b2000)

                libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fd1af094000)

                libsasl2.so.2 => /usr/lib64/libsasl2.so.2 (0x00007fd1aee7a000)

                librt.so.1 => /lib64/librt.so.1 (0x00007fd1aec72000)

                libresolv.so.2 => /lib64/libresolv.so.2 (0x00007fd1aea57000)

                libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007fd1ae854000)

                libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007fd1ae54e000)

                libm.so.6 => /lib64/libm.so.6 (0x00007fd1ae2c9000)

                libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fd1ae0b3000)

                libc.so.6 => /lib64/libc.so.6 (0x00007fd1add20000)

                /lib64/ld-linux-x86-64.so.2 (0x00000033a1a00000)

                libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007fd1adae8000)

                libfreebl3.so => /lib64/libfreebl3.so (0x00007fd1ad886000)

     

     

     

    Reload the listener, connect again and reissue the select as follows:

     

    [oracle@node1 admin]$ sqlplus / as sysdba

     

    SQL*Plus: Release 12.1.0.2.0 Production on Mon Jun 20 08:23:36 2016

    Copyright (c) 1982, 2014, Oracle.  All rights reserved.

     

    Connected to:

    Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

    With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

     

    SQL> select * from customers@hivedsn;

    select * from customers@hivedsn

                            *

    ERROR at line 1:

    ORA-28500: connection from ORACLE to a non-Oracle system returned this message:

    [Cloudera][ODBC] (11470) Transactions are not supported. {HYC00,NativeErr =

    11470}

    ORA-02063: preceding 2 lines from HIVEDSN

     

    The cause of this error is that  Hive is a non transactional/non logging data store, therefore we should issue our transactions in read only and non-logging mode. We can tweak this by using the heterogeneous service HS_TRANSACTION_MODEL initialization parameter changed to READ_ONLY_AUTOCOMMIT. More about initialization parameters used by ODBC heterogeneous gateway can be found at (https://docs.oracle.com/database/121/ODBCU/feature.htm#ODBCU763)

     

     

    Open the initHIVEDSN.ora parameters file for editing and add the HS_TRANSACTION_MODEL parameter as follows:

     

    # This is a sample agent init file that contains the HS parameters that are

    # needed for the Database Gateway for ODBC

    HS_FDS_CONNECT_INFO = "HIVEDSN"

    HS_FDS_TRACE_LEVEL = 0

    HS_FDS_SHAREABLE_NAME = /opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so

    HS_TRANSACTION_MODEL=READ_ONLY_AUTOCOMMIT

    #

    # ODBC specific environment variables

    #

    set ODBCINI=/home/oracle/odbc.ini

     

    Save, connect again and reissue the statement as follows:

     

    SQL> select * from customers@hivedsn

    -------------------------------------------------------------------------------------------

    1530 David        Smith           XXXXXXXXX         XXXXXXXXX                 

     

    The connection and data retrieval is fully functional.

     

     

    Configuring the Cloudera Impala ODBC driver

     

    The driver installation directory will be located in /opt/cloudera/impalaodbc. In the /opt/cloudera/impalaodbc/Setup directory we will find sample configuration files that can be used with few modification for performing the setup.

    Create a directory as root /usr/local/odbc as follows:

     

    [root@node1 ~]# mkdir -p /usr/local/odbc

     

    This directory will be used as placement for odbcinst.ini.

    Copy odbc.ini in /home/oracle directory as follows:

     

    [root@node1 Setup]# cp odbc.ini /home/oracle

    Next, copy cloudera.impalaodbc.ini from /opt/cloudera/hiveodbc/lib/64

    [root@node1 64]# cp cloudera.impalaodbc.ini /etc

     

     

    Next, we will configure the environment variables for Cloudera Impala ODBC driver. Open .bash_profile for editing and add the following variables (marked with bold) as follows:

     

    [oracle@node1 ~]$ vi .bash_profile

    # .bash_profile

     

    # Get the aliases and functions

    if [ -f ~/.bashrc ]; then

                . ~/.bashrc

    fi

     

    # User specific environment and startup programs

    export ORACLE_HOME=/u01/app/oracle/product/12.1.0/dbhome_1

    export ORACLE_SID=ORCL

    export ODBCINI=~/odbc.ini

    export ODBCSYSINI=/usr/local/odbc

    export SIMBAINI=/etc/cloudera.impalaodbc.ini

    PATH=$PATH:$HOME/bin:$ORACLE_HOME/bin

      export LD_LIBRARY_PATH=$LIBRARY_PATH:$ORACLE_HOME/lib

     

     

    Source the profile to get the environment variables loaded:

     

    [oracle@node1 ~]$ source .bash_profile

     

    Similarly with the Hive driver, the odbc.ini file copied under /home/oracle is containing two examples for configuring DSN for the 32 and 64 driver version. In our case we are going to use the 64 version. All parameters are explained with comments in the sample file, which are omitted from the listing. For more information about parameters you can consult the driver installation and configuration manual located at http://www.cloudera.com/documentation/other/connectors/impala-odbc/2-5-22.html.

     

    We will change only the host, port, rename the data source to IMPLDSN as follows:

     

    [ImplDSN]

     

    # Description: DSN Description.

    # This key is not necessary and is only to give a description of the data source.

    Description=Cloudera ODBC Driver for Impala (64-bit) DSN

    # Driver: The location where the ODBC driver is installed to.

    Driver=/opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so

    HOST=192.168.1.88

    PORT=21050

    Database=default

    AuthMech=0

     

     

    Test the connection as follows:

     

    [oracle@node1 ~]$ isql -v ImplDSN

    +---------------------------------------+

    | Connected!                   |

    |                                       |

    | sql-statement                |

    | help [tablename]          |

    | quit                               |

    |                                       |

    +---------------------------------------+

    SQL>

     

     

    List the tables and try a count against one table to ensure that everything is working fine:

     

    SQL> show tables;

    +------------------+

    |tab_name                                                                                                                                                                                                                                                 |+------------------+

    | categories

    || customers

    || departments

    || order_items

    || orders

    || products

    |+-------------------+

    SQLRowCount returns -1

    6 rows fetched

     

    SQL> select count(*) from orders;

    +---------------------+

    | EXPR_1              |

    +---------------------+

    | 68883               |

    +---------------------+

    SQLRowCount returns -1

    1 rows fetched

     

     

     

    Oracle ODBC Transparent Gateway Configuration for Cloudera Impala ODBC Driver

     

    Under directory $ORACLE_HOME/hs/admin we created an initialization parameters file named initIMPLDSN.ora (we used the dg4odbc.ini file as a template) with the following contents :

     

    [oracle@node1 admin]$ vi initHIVEDSN.ora

    # This is a sample agent init file that contains the HS parameters that are

    # needed for the Database Gateway for ODBC

    HS_FDS_CONNECT_INFO = "IMPLDSN"

    HS_FDS_TRACE_LEVEL = 0

    HS_FDS_SHAREABLE_NAME = /opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so

    HS_TRANSACTION_MODEL=READ_ONLY_AUTOCOMMIT

    #

    # ODBC specific environment variables

    #

    set ODBCINI=/home/oracle/odbc.ini

     

      Important : The name of the init file must be init<HS_FDS_CONNECT_INFO value>.ora otherwise the connection will fail.

     

    Add also libodbcinst.so path in /etc/cloudera.impalaodbc.ini driver configuration file as follows :

     

    [Driver]

    ODBCInstLib=/usr/lib64/libodbcinst.so

    ErrorMessagesPath=/opt/cloudera/impalaodbc/ErrorMessages/

    LogLevel=0

    LogPath=

    SwapFilePath=/tmp

     

     

    Listener configuration

     

    Open the listener.ora file for editing and add the following (marked with bold):

     

    [oracle@node1 admin]$ vi /u01/app/oracle/product/12.1.0/dbhome_1/network/admin/listener.ora

    # listener.ora Network Configuration File: /u01/app/oracle/product/12.1.0/dbhome_1/network/admin/listener.ora

    # Generated by Oracle configuration tools.

     

    SID_LIST_LISTENER =

      (SID_LIST =

        (SID_DESC =

          (PROGRAM = dg4odbc)

          (ARGS = ENVS=LD_LIBRARY_PATH=/opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so)

          (SID_NAME = IMPLDSN)

          (ORACLE_HOME = /u01/app/oracle/product/12.1.0/dbhome_1)

        )

      )

     

    LISTENER =

      (DESCRIPTION_LIST =

        (DESCRIPTION =

          (ADDRESS = (PROTOCOL = TCP)(HOST = node1)(PORT = 1521))

        )

        (DESCRIPTION =

          (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))

        )

      )

     

    ADR_BASE_LISTENER = /u01/app/oracle

     

     

    Add a network service in tnsnames.ora as follows:

     

    IMPLDSN =

      (DESCRIPTION=

    (ADDRESS=(PROTOCOL=tcp)(HOST=node1)(PORT=1521))

          (CONNECT_DATA=(SID=IMPLDSN))

          (HS=OK)

        )

     

    Reload the listener as follows:

    [oracle@node1 admin]$ lsnrctl reload

     

    LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 20-JUN-2016 08:05:57

    Copyright (c) 1991, 2014, Oracle.  All rights reserved.

     

    Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=node1)(PORT=1521)))

    The command completed successfully

    [oracle@node1 admin]$

     

     

    Check the services availability:

     

    [oracle@node1 admin]$ lsnrctl status

     

    LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 20-JUN-2016 08:06:32

    Copyright (c) 1991, 2014, Oracle.  All rights reserved.

     

    Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=node1)(PORT=1521)))

    STATUS of the LISTENER

    ------------------------------------------------------------------------------------------------------------------------

    Alias                            LISTENER

    Version                        TNSLSNR for Linux: Version 12.1.0.2.0 - Production

    Start Date                    19-JUN-2016 08:04:43

    Uptime                        1 days 0 hr. 1 min. 49 sec

    Trace Level                 off

    Security                       ON: Local OS Authentication

    SNMP                          OFF

    Listener Parameter File /u01/app/oracle/product/12.1.0/dbhome_1/network/admin/listener.ora

    Listener Log File             /u01/app/oracle/diag/tnslsnr/node1/listener/alert/log.xml

    Listening Endpoints Summary...

    (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=node1)(PORT=1521)))

    (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))

    (DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=node1)(PORT=5500))

    (Security=(my_wallet_directory=/u01/app/oracle/product/12.1.0/dbhome_1/admin/ORCL/xdb_wallet))

    (Presentation=HTTP)(Session=RAW))

    Services Summary...

    Service "IMPLDSN" has 1 instance(s).

      Instance "IMPLDSN", status UNKNOWN, has 1 handler(s) for this service...

    Service "ORCL" has 1 instance(s).

      Instance "ORCL", status READY, has 1 handler(s) for this service...

    Service "ORCLXDB" has 1 instance(s).

      Instance "ORCL", status READY, has 1 handler(s) for this service...

    The command completed successfully

     

     

    Ping the network service IMPLDSN to check its availability as follows:

     

    [oracle@node1 admin]$ tnsping IMPLDSN

    TNS Ping Utility for Linux: Version 12.1.0.2.0 - Production on 20-JUN-2016 08:07:56

    Copyright (c) 1997, 2014, Oracle.  All rights reserved.

     

    Used parameter files:

    /u01/app/oracle/product/12.1.0/dbhome_1/network/admin/sqlnet.ora

     

    Used TNSNAMES adapter to resolve the alias

    Attempting to contact (DESCRIPTION=

    (ADDRESS=(PROTOCOL=tcp)(HOST=node1)(PORT=1521))

    (CONNECT_DATA=(SID=IMPLDSN)) (HS=OK))

    OK (10 msec)

     

     

     

    Create a public database link to connect from oracle as follows:

     

    [oracle@node1 admin]$ sqlplus / as sysdba

     

    SQL*Plus: Release 12.1.0.2.0 Production on Mon Jun 20 08:09:35 2016

    Copyright (c) 1982, 2014, Oracle.  All rights reserved.

     

    Connected to:

    Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

    With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

     

    SQL> create public database link impldsn connect to cloudera identified by cloudera using 'IMPLDSN';

      Database link created.

     

     

    Test the database link as follows:

     

    SQL> select * from customers@impldsn

    -----------------------------------------------------------------------------------

    1530 David     Smith               XXXXXXXXX         XXXXXXXXX      

     

      Connection and data retrieval from Cloudera Impala seems fine.

     

     

    Observations

     

    Whenever possible, use Impala over Hive because of the several limitations in terms of speed and syntax. It is also possible to have several syntax differences when you query from Hive or Impala directly from Oracle. If some complex query should be executed that it is better to create a view locally in Hive or Impala and try to select that directly. Another benefit is that this will leverage query optimizations on the Impala or Hive side.

     

    One example, if you try to run the following query, you will get a warning that the count inside this query is not a supported feature :

    SQL> l

      1  select c."category_name", count("order_item_quantity") as count

      2  from order_items@impaladsn oi

      3  inner join products@impaladsn p on oi."order_item_product_id" = p."product_id"

      4  inner join categories@impaladsn c on c."category_id" = p."product_category_id"

      5  group by c."category_name"

      6* order by count("order_item_quantity") desc

    SQL> /

    select c."category_name", count("order_item_quantity") as count

     

     

    ERROR at line 1:

    ORA-02070: database IMPALADSN does not support in this context

     

     

    To solve this issue, create a view in Impala :

     

    [cloudera@quickstart Desktop]$ impala-shell

    Starting Impala Shell without Kerberos authentication

    Connected to quickstart.cloudera:21000

    Server version: impalad version 2.5.0-cdh5.7.0 RELEASE (build ad3f5adabedf56fe6bd9eea39147c067cc552703)

    ***********************************************************************************

    Welcome to the Impala shell. Copyright (c) 2015 Cloudera, Inc. All rights reserved.

    (Impala Shell v2.5.0-cdh5.7.0 (ad3f5ad) built on Wed Mar 23 11:33:33 PDT 2016)

    After running a query, type SUMMARY to see a summary of where time was spent.

    ***********************************************************************************

    [quickstart.cloudera:21000] >

    create view summaryvw as -- Most popular product categories

                                   > select c.category_name, count(order_item_quantity) as count

                                   > from order_items oi

                                   > inner join products p on oi.order_item_product_id = p.product_id

                                   > inner join categories c on c.category_id = p.product_category_id

                                   > group by c.category_name

                                   > order by count desc

                                   > limit 10;

    Query: create view summaryvw as -- Most popular product categories

    select c.category_name, count(order_item_quantity) as count

    from order_items oi

    inner join products p on oi.order_item_product_id = p.product_id

    inner join categories c on c.category_id = p.product_category_id

    group by c.category_name

    order by count desc

    limit 10

     

     

    Reissue the query this time against the sumaryvw view created in Impala:

     

      1* select * from summaryvw@impaladsn

     

    SQL> /

     

    category_name count

    -------------------------------------------------------------------------------- ----------

    Cleats 24551

    Men's Footwear 22246

    Women's Apparel                                                                                 21035

    Indoor/Outdoor Games 19298

    Fishing                                                                                                 17325

    Water Sports 15540

    Camping & Hiking 13729

    Cardio Equipment 12487

    Shop By Sport 10984

    Electronics 3156

     

    10 rows selected.

     

    Summary

    In this article we demonstrated how to use Cloudera Hive and Impala ODBC drivers to connect from Oracle using ODBC heterogeneous gateway. This approach could be very helpful when we want to retrieve small amount of data using relative simple queries and need to setup the connection very quickly. Also the method can be used for archive or backup of sensitive data from Hive or Impala.

     

    About the Authors

    Y V Ravi Kumar is an Oracle ACE and Oracle Certified Master (OCM) with 17 years of experience in BFSI vertical. He is also OCP in Oracle 8i, 9i, 10g, 11g &12c and Certified in Golden Gate, RAC, Performance Tuning & Oracle Exadata. He continuously motivates many DBAs and helps the Oracle Community by publishing his tips/ideas/suggestions/solutions in his blog. He has written 40+ OTN articles on Oracle Exadata, Oracle RAC and Oracle GoldenGate for OTN for Spanish, Portuguese and English and 17 articles for TOAD World, 2 Articles for UKOUG, 3 Articles for OTech Magazine and 2 Articles for Redgate. He is a frequent Oracle speaker in @OTN, AIOUG, Sangam and IOUG. Learn more from his profile at LaserSoft

     

    Adrian Neagu has over 15 years of experience as a database administrator, having expertise in various RDBMS systems. He has experience in many areas such as financial industry, the pharmaceutical industry, telecom and aviation. He is an Oracle Certified Master 10g and 11g, Oracle Certified Professional 9i, 10g, and 11g, Cloudera Certified Administrator for Apache Hadoop, IBM DB2 Certified Administrator version 8.1.2 and 9, IBM DB2 9 Advanced Certified Administrator 9, and Sun Certified System Administrator Solaris 10. He is an expert in many areas of database administration, BigData and Operating systems including high-performance tuning, high availability, replication, backup, and recovery.