Forum Stats

  • 3,827,872 Users
  • 2,260,836 Discussions
  • 7,897,401 Comments

Discussions

Building a Dimension from two separate tables

Hello there,

I need some help trying to build a dimension / hierarchy in the Business Model layer using Administrator . The situation is the following:

I have two different logical tables DATE and TIME. I created a DATE Dimension from the DATE Table which goes like this:

DATE DIM: Year --> Quarter --> Month --> Day

and a TIME Dimension from the TIME Table :

TIME DIM: Hours --> Minutes

Now, I'm having problems trying to build a unique dimension that will allow me to drill down from Days (lowest level in DATE Dim) to Hours (Highest level in TIME Dim).

Is there a way to build a unique dimension having these to separate tables? (Drill down from Days to Hours).

The only solution that I found was to build a view from the DATE Table and TIME Table and then create a dimension on top of that view, but this might bring performance issues to my model.

Thanks for your Help.

Pablo.

Edited by: PabloC2 on Sep 11, 2008 10:29 AM

Answers

  • 14728
    14728 Member Posts: 633
    edited Sep 11, 2008 1:41PM
    Why don't you create a logical table that contains both DATE AND TIME (2 physical table mappings) and create a Dimension on top of this?


    Does your Time Dimension just have Time or does it have DATETIME?
  • PabloC2
    PabloC2 Member Posts: 59
    Thanks Matt for your reply. I'll try your approach building a logical table from 2 physical sources. Now,

    My Time Dimension does not have a DATETIME. I'm following Kimball's warehouse design approach, so my Time dimension has the following fields:

    Sample row: (csv format)

    TIME_DIM_ID,HOUR24,HOUR12,AMPM
    80052, 13, 1, PM
    80053, 14, 2, PM
    80054, 15, 3, PM

    ... and so on.

    Which means that.. there isn't a FK between the Date and Time dimension. They're independent from each other.
  • 14728
    14728 Member Posts: 633
    edited Sep 11, 2008 2:17PM
    What about the Physical Joins? Are they to attributes which are DATETIME? Ie. Order_DateTime, do you have specific TIME attributes?

    Do you have Time ID and Date ID as per Kimball split out into separate attributes on the Fact table?

    ie. For Order you'd have ORDER_DATE_TIME_ID, ORDER_DATE_ID?

    The design decision may depend on the type of Answers you are looking for

    ie. When would you want to use just the Date Dimension?
    When would you want to use just the Time Dimension?
    When would you want to use the full Date/Time Dimension?
  • PabloC2
    PabloC2 Member Posts: 59
    Its a little hard to explain all this thru the forum but I'll try:

    Just to show you part of my warehouse schema:


    DATE DIMENSION ----------------------< FACT TABLE >----------------------------- TIME DIMENSION


    So, for each row in my fact table there is a FK for my DATE DIM and a FK for my TIME DIM. Which means that, the only way that DATE and TIME dimensions are connected is thru the FACT TABLE (all this is created thru ETL process).

    So, Sometimes we might use just dates in our reports or maybe go down to the minute. I just want the user to be able to drill down from Days to Hours and then minutes (Combining both dimensions):

    YEAR ---> QUARTER ---> MONTH ---> DAY ----> HOURS ----> MINUTES

    ******** DATE DIMENSION************** | *** TIME DIMENSION ***


    Hope this answers your question Matt.

    Thanks for your time.
  • 14728
    14728 Member Posts: 633
    Yes I think the issue you will have is to get the full Dimension drill you will effectively need a Cartesian between Date + Time, and specify a unique key which is DATE_ID||TIME_ID. Obviously performance would suffer unless you look at creating some function based indexes with this method(Oracle).

    The alternative would be to have a physical table for both levels, with the Time Dimension having actual DATETIME information in there, it would increase the number of records significantly, but you'd have a natural key, and a canonical time dimension.

    From an Oracle performance point of view I'd be interested to see the optimum way to setup something like you have described (we stop at DATE level), it will take me a little while to produce but I suspect it will point to going down a non-Kimball route and use natural keys to achieve the requirements you have specified.

    I'll let you know.
  • PabloC2
    PabloC2 Member Posts: 59
    Thanks for your help Matt, I'll check the forums later on.
  • 14728
    14728 Member Posts: 633
    edited Sep 11, 2008 4:09PM
    Had a little play, by combining the two separate tables into a single logical table, and setting the key as a two part TIME_ID, DATE_ID column then I can get it to work as you require with a simple example.

    I have included the DB Objects at the bottom if you wish to replicate...



    Here is a picture of the basic model

    <img src="http://farm4.static.flickr.com/3267/2848422877_fa8f4dbdc4_b.jpg" width="1024" height="360" alt="Screen 1" />


    The Time Dimension is defined with Time_ID and Date_ID as the key, these appear in the Dimension but are not drillable.

    <img src="http://farm4.static.flickr.com/3084/2849255506_e37fae4b1b_o.jpg" width="433" height="359" alt="Screen 2" />


    The result is I can select the Date and Fact Values

    <img src="http://farm4.static.flickr.com/3015/2849255580_51d4c0f36f_o.jpg" width="273" height="580" alt="Screen 3" />

    <pre>

    select T54.DATE_KEY as c1,
    sum(T43.PRICE) as c2,
    count(T43.ORDER_ID) as c3,
    sum(T43.QUANTITY) as c4
    from
    DIM_DATES T54 /* Dim - Dates */ ,
    ORDERS T43 /* Fact - Orders */
    where ( T43.ORDER_DATE_ID = T54.DATE_ID )
    group by T54.DATE_KEY
    order by c1

    </pre>


    and then drill into the Time part of the Dimension

    <img src="http://farm4.static.flickr.com/3254/2848423095_6482576453_o.jpg" width="303" height="177" alt="Screen 4" />

    <pre>

    select T54.DATE_KEY as c1,
    T51.TIME_KEY as c2,
    sum(T43.PRICE) as c3,
    count(T43.ORDER_ID) as c4,
    sum(T43.QUANTITY) as c5,
    T51.TIME_ID as c6,
    T54.DATE_ID as c7
    from
    DIM_TIME T51 /* Dim - Time */ ,
    DIM_DATES T54 /* Dim - Dates */ ,
    ORDERS T43 /* Fact - Orders */
    where ( T43.ORDER_DATE_ID = T54.DATE_ID and T43.ORDER_TIME_ID = T51.TIME_ID and T54.DATE_KEY = TIMESTAMP '2003-03-25 00:00:00' )
    group by T51.TIME_ID, T51.TIME_KEY, T54.DATE_ID, T54.DATE_KEY
    order by c1, c2


    </pre>

    The SQL looks good, no Cartesian in sight. From a performance point of view the only issue, and its a known one discussed on here before is caused by T54.DATE_KEY = TIMESTAMP '2003-03-25 00:00:00' which stops the Unique index being used on Dim - Date.

    <pre>
    Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop

    SELECT STATEMENT Optimizer Mode=ALL_ROWS 109 24
    SORT GROUP BY 109 3 K 24
    NESTED LOOPS 109 3 K 23
    HASH JOIN 109 2 K 20
    TABLE ACCESS FULL BEYOND.DIM_DATES 40 440 5
    TABLE ACCESS FULL BEYOND.ORDERS 10 K 156 K 14
    TABLE ACCESS BY INDEX ROWID BEYOND.DIM_TIME 1 9 1
    INDEX UNIQUE SCAN BEYOND.SYS_C005476 1 0

    </pre>



    <pre>

    CREATE TABLE dim_dates ( date_id NUMBER NOT NULL PRIMARY KEY,
    date_key DATE NOT NULL );

    INSERT
    INTO dim_dates
    SELECT ROWNUM,
    TRUNC(ROWNUM - 2000 + SYSDATE )
    FROM dual
    CONNECT BY LEVEL <= 4000;

    CREATE UNIQUE INDEX dd_uk ON dim_dates( date_key );

    CREATE TABLE dim_time( time_id NUMBER NOT NULL PRIMARY KEY,
    time_key VARCHAR2(5) NOT NULL );

    CREATE UNIQUE INDEX dt_uk ON dim_time( time_key );

    INSERT INTO dim_time
    SELECT ROWNUM,
    TO_CHAR(TRUNC(SYSDATE) + ( ROWNUM / 24/60 ),'HH24:MI')
    FROM dual
    CONNECT BY LEVEL <= 60 * 24;

    CREATE TABLE orders( order_id NUMBER NOT NULL PRIMARY KEY,
    order_date DATE NOT NULL,
    order_date_id NUMBER,
    order_time_id NUMBER,
    product_id NUMBER NOT NULL,
    quantity NUMBER,
    price NUMBER );
    </pre>

    <pre>
    INSERT
    INTO orders
    ( order_id,
    order_date,
    order_date_id,
    order_time_id,
    product_id,
    quantity,
    price )
    WITH data AS
    (
    SELECT ROWNUM order_id,
    TRUNC(SYSDATE + dbms_random.value(-1999,2000),'MI') order_date,
    TRUNC( dbms_random.value(1,10)) product_id,
    TRUNC(dbms_random.value(1,200)) quantity
    FROM dual
    CONNECT BY LEVEL <=10000
    )
    SELECT order_id,
    order_date,
    date_id,
    time_id,
    product_id,
    quantity,
    CASE WHEN product_id <=3 THEN 4.99 ELSE 5.99 END
    FROM data,
    dim_dates,
    dim_time
    WHERE 1=1
    AND TRUNC(order_date) = date_key
    AND TO_CHAR(order_date,'HH24:MI') = time_key;

    </pre>
    14728
  • PabloC2
    PabloC2 Member Posts: 59
    Hey Matt, I had a quick solution for this situation, but I'll try yours to see which one gives a better performance.

    I created a view from my DATE Dimension and my TIME Dimension including the PK from each one and joined this view in my model to the fact table using "AND" operator in my join.

    FACT_DATE_ID = DATE_DIM_ID AND FACT_TIME_ID = TIME_DIM_ID

    Which allows to have all the data in one table. Drill down work fine.

    Thanks for your help.
  • PabloC2
    PabloC2 Member Posts: 59
    edited Sep 11, 2008 5:58PM
    Hey Matt, for some reason your approach is not working for me.

    This is the actual message that I get from Answers:

    State: HY000. Code: 10058. [NQODBC] [SQL_STATE: HY000] [nQSError: 10058] A general error has occurred. [nQSError: 14044] Missing join between logical tables: Market Local Period and Market Local Period. [nQSError: 14045] There must be at least one physical join link between the underlying physical tables 3001:11604 T11604 and 3001:11631 T11631. (HY000)

    So, its asking for a join between my two physical tables, in this case Date Dim and Time Dim. I don't know how you got it working.

    I wanted to attach a Screenshot from my model but I can't see the way to do it in this forum.

    Edited by: PabloC2 on Sep 11, 2008 2:50 PM

    Edited by: PabloC2 on Sep 11, 2008 2:58 PM
  • PabloC2
    PabloC2 Member Posts: 59
    Hey Matt,

    After all this I got it working.

    I created two Dimensions ..one for Date and One for Time ( as I have them before). Then when I click on the Days tag and option opens to do a "Preferred Drill Path" and it "jumps" from Date Dimension (Days) to Time Dimension (Hours).
This discussion has been closed.