Import of tables and views from Hive was introduced in Oracle SQL Developer Data Modeler 4.0.2 and probably some of you have seen that description https://blogs.oracle.com/datawarehousing/entry/oracle_sql_developer_data_modeler

I'm going to describe some detail on the process

1. The goal - there is no dedicated physical model for Hive so the goal was to extract metadata about tables defined in Hive and prepare them for use with  Oracle Big Data SQL | Oracle Database | Oracle

 

2. Setting the connection

If you use SQL Developer 4.0.3 then steps described here https://blogs.oracle.com/datawarehousing/entry/oracle_sql_developer_data_modeler will work for you, setting the connection in standalone version of Data Modeler looks different:

Connection_to_Hive.png

 

If you use a JDBC driver from another vendor you need to check driver class used (could be org.apache.hive.jdbc.HiveDriver ).

 

3. Mapping of Hive primitive types

There is no physical model for Hive thus Hive native primitive types cannot fit into rules around logical types to native types mapping:

-    Import – direct name match between hive primitive type and existing Logical data type is required. New “String “ logical data types is added in order to cover “String” primitive Hive data type and it’s mapped for all supported databases to logical type Varchar with default size set to “max” – max means it’ll take max possible value for each database version supported

-    DDL generation and complex type presentation in compare/merge – the name of logical data type is taken if it matches Hive primitive type otherwise data type is transformed to Oracle 12 c data type and then transformed to Hive type:

•    CHAR, NCHAR – to CHAR

•    VARCHAR2, NVARCHAR2 – to Varchar

•    LONGRAW, BLOB, RAW – to Binary

•    FLOAT, BINARY_FLOAT – to Float

•    BINARY_DOUBLE – to Double

•    Date – to Date

•    Timestamp – to Timestamp

•    LONG, CLOB, NCLOB and everything else – to STRING

 

4. Mapping of Hive complex types

Structured types and collection types in Data Types model are used to represent Object relational definitions in Oracle database. The same constructs are used to represent Hive complex type:

-    ARRAY – mapped to collection type

-    STRUCT – mapped to Structured type

-    MAP and UNIONTYPE – mapped to Structured type – one more property is added to structured type – “Hive Type” with possible values (empty – Hive Struct, MAP – Hive MAP, UNIONTYPE – Hive Uniontype

Examples:

CREATE TABLE emp1 (

  name         STRING,

  salary       FLOAT,

  subordinates ARRAY<STRING>,

  deductions   MAP<STRING, FLOAT>,

  address      STRUCT<street:STRUCT<name:STRING, number:INT>, city:STRING, state:STRING, zip:INT>

)

 

CREATE TABLE emp_addresses (

  name         STRING,

  salary       FLOAT,

  subordinates ARRAY<STRING>,

  deductions   MAP<STRING, FLOAT>,

  addresses      ARRAY<STRUCT<street:STRUCT<name:Varchar(30), number:INT>, city:STRING, state:varchar(2), zip:INT>>

)

 

create table union_test(

foo UNION_TYPE< int , double , aray <string> , struct< a: int , b : string>>);

 

Following types will be created in data types model after importing of these definitions from Hive:

Data_Types_Model.png

 

So Hive complex type - STRUCT<street:STRUCT<name:Varchar(30), number:INT>, city:STRING, state:varchar(2), zip:INT>

Will be presented in Data Types model :

Struct_address2.png

And column with same complex type definition will be based on the same structured type – equivalent definitions are recognized and reused

 

5. Imported objects

5.1 Tables - columns with data types and defined comments for table and columns are imported; also details about some Hive specific constructs are imported and stored in dynamic properties for table and columns – partitioned columns, clustering and sorting columns, table type (external or managed – in Hive terms), location …; Those dynamic properties can be included into reports or visualized on relational diagram using settings for existing dynamic properties. In DM 4.1 they can be promoted to User defined Properties for better visualization (color) and manipulation (type, list of values,default value and group)

5.2 Views- definitions are imported as they are defined in Hive – no translation is applied

 

6.Hive tables as external tables in Oracle 12c physical model

New functionality for presenting Oracle external tables in relational model is used to represent Hive tables in relational model (with set classification type as “External”) and in Oracle physical model (with organization set to “External”).

Size for column with primitive types can be changed using already existing in DM functionality for setting size, precision and scale.

By default complex types represented by structured types and collection types are represented in generated DDL (for Oracle external tables) with size 4000 (columns subordinates, deductions and address in the example below):

CREATE TABLE emp1_addresses

    (

     name         VARCHAR2 (4000) ,

     salary       FLOAT ,

     subordinates VARCHAR2(4000) ,

     deductions   VARCHAR2(4000) ,

     address      VARCHAR2(4000)

    )

    ORGANIZATION EXTERNAL

    (

    TYPE ORACLE_HIVE

    DEFAULT DIRECTORY Dir1

    ACCESS PARAMETERS

        (

com.oracle.bigdata.tablename:emp1_addresses

        )

    )

;

 

That size for complex types can be changed at structured type/collection type or at column level in physical model for Oracle.

New property is introduced for structured and collection types – “Max Size as String”

Complex_type_size.png

Complex type can be used in definitions of several  tables and if the size is different then the size can be redefined at column level in physical model:

Size_in_physical_model.png

So we can tune generated DDL to:

CREATE TABLE emp1_addresses

    (

     name         VARCHAR2 (50) ,

     salary       FLOAT ,

     subordinates VARCHAR2(160) ,

     deductions   VARCHAR2(70) ,

     address      VARCHAR2(70)

    )

    ORGANIZATION EXTERNAL

    (

    TYPE ORACLE_HIVE

    DEFAULT DIRECTORY Dir1

    ACCESS PARAMETERS

        (

com.oracle.bigdata.tablename:emp1_addresses

        )

    )

;

 

7. Expanded visualization of complex type

New option is added to relational model diagrams that allow seeing details of structured types and collection of structured types:

expand_complex_types.png

After "resize to visible" details of complex types can be seen on diagram:

expanded_types.png

UnionType (Hive type) has a specific presentation – its meaning in Hive is that column can accommodate value of one of declared types. In used structured type those possible types are represented by attributes (>attr_1, >attr_2..) each one with related type – column foo in table default.union_test3.

Map (Hive type) is represented by structured type with 2 attributes “key” and “value” each one with type related to Hive definition – column deductions in table emp2_changed_name.

Structured type has one more property – “Hive Type” and it can be set to Map or UNIONTYPE

 

8. Visibility of existing dynamic properties

Visibility can be set for dynamic properties already created on tables and entities. New page is added for that in design level settings (settings in design properties dialog

Visibility_of_props.png

And option to show them in View>details menu on diagram ("Dynamic properties" need to be chcked - the same is used for User Defined Properties in DM 4.1):

View_details.png

9. Hive DDl preview

Limited Hive DDL can be generated If table has dynamic property hiveDDL with value true - then it’s possible to have DDL preview for Hive.

(Data Modeler 4.1 provides possibility for user defined DDL using scripting that can replace DDL generated by built-in DDL generator - so it's possible customers to generate complete Hive DDL based on Dynamic/user defined properties set for tables and columns)

Hive_DDL.png

That also works on tables not imported from Hive – as the example above from OE schema

Hive_DDL_Preview.png

DDL statement:

 

CREATE TABLE OE.CUSTOMERS

    (

        CUSTOMER_ID       DECIMAL(6) ,

        CUST_FIRST_NAME   VARCHAR(20) ,

        CUST_LAST_NAME    VARCHAR(20) ,

        CUST_ADDRESS      STRUCT<STREET_ADDRESS:VARCHAR(40),POSTAL_CODE:VARCHAR(10),CITY:VARCHAR(30),STATE_PROVINCE:VARCHAR(10),COUNTRY_ID:CHAR(2)> ,

        PHONE_NUMBERS     ARRAY<VARCHAR(25)> ,

        NLS_LANGUAGE      VARCHAR(3) ,

        NLS_TERRITORY     VARCHAR(30) ,

        CREDIT_LIMIT      DECIMAL(9,2) ,

        CUST_EMAIL        VARCHAR(30) ,

        ACCOUNT_MGR_ID    DECIMAL(6) ,

        CUST_GEO_LOCATION STRUCT<SDO_GTYPE:DECIMAL,SDO_SRID:DECIMAL,SDO_POINT:STRUCT<X:DECIMAL,Y:DECIMAL,Z:DECIMAL>,SDO_ELEM_INFO:ARRAY<DECIMAL>,SDO_ORDINATES:ARRAY<DECIMAL>> ,

        DATE_OF_BIRTH     Date ,

        MARITAL_STATUS    VARCHAR(20) ,

        GENDER            VARCHAR(1) ,

        INCOME_LEVEL      VARCHAR(20)

    )

;

 

Philip Stoyanov