This content has been marked as final. Show 8 replies
What is not-so-smart about creating two dimensions: BOOKING_TIME and SHOW_TIME
and assigning both to the cube. Its a small effort to create and populate these two dimensions.
If the data has to be analyzed by BOOKING_TIME and by SHOW_TIME, then you will need two separate time dimensions.
Are you concerned about size of cube OR aggregation times OR something else?
Thanks Nasar for your quick response!
My concern comes from:
1) Actually we have 4 time dimensions in this one cube, booking time, show time, booking biz time, show biz time. The difference between biz time (or biz date) and time is the date of biz time starts from 7am. With this approach I will need to create 4 time dimensions. Yes I can create one and then export it, modify the XML and then import to create the other 3, but it doesn't seem efficient to me to create and maintain 4 dimensions, when only difference between them is the name.
2) I do have concerns about the time required to maintain the dimensions, because of I noticed it took Oracle significant longer time to maintain a time dimension the very first time. I have only less than 10 years date which is about less than 3650 records in the underlying time dimension table so I'm not too much concern about the size.
3) In the underlying star schema, we have only one time dimension table, in the ticket booking fact table, we have 4 dimension columns referencing the same time dimension table. So it's intuitive to expect similar design in the cube. Current approach seems like replicating the dimension table for each dimension column.
Edited by: user11955172 on 2012-12-10 下午3:07
Edited by: user11955172 on 2012-12-10 下午3:17
It seems you are thinking from a relational mindset where SELECT...GROUP BY... happens at query time. One of the reasons for using any OLAP engine is to "pre-aggregate" the data so that user query response times are sub-seconds. Think of a cell in a cube which has to be pre-populated and it is dimensioned by all the dimensions, which exist physically (not virtually).
OLAP cubes are created for reporting. So if your ticket booking fact table has four dimension columns referencing the same time dimension table, the question will be "How many of those time dimensions are needed for reporting/analysis".
If only two are needed (BOOKING_TIME and SHOW_TIME) then you will have to create two time dimensions in OLAP and assign to the cube. If three are needed for reporting/analysis then you will create three time dimensions in OLAP ... and so on. Not every dimensions from fact table has to exist in olap, if end users do not need it for reporting.
I am getting the impression that you are concerned about refresh of time dimension in AW. It is common for an olap dimension to have tens of thousands of members. Generally we create source sql views which are "mapped" to olap dimensions or cubes in AWM. One best practice is to add an extra step in the olap loading process where you first create tables from those source views and then "map" those tables (instead of sql-views) to dimensions and cubes. Use parallelism or any other relational techniques to create/populate those "source" tables. Sometimes, loading records into a cube or into a dimension are faster if the source is a table instead of a sql-view.
Later on, if your cube aggregation times are more than expected time (or SLA) then you can ask question about "dynamic" aggregation over certain hierarchies or dimensions - which have been discussed in the past on this forum.
Edited by: Nasar on Dec 11, 2012 8:03 AM
Edited by: Nasar on Dec 11, 2012 8:04 AM
Many thanks Nasar.
I understand cube contains aggregated data. From cube perspective, whether a dimension can be referenced multiple times or not will have no impact to the data eventually exist in the cube, but I think it does have impact to the dimension side. For my case all four time dimensions are required for analysis (otherwise we won't even bother to have them in the fact table). Now we need to maintain 4 time dimensions while the only difference is the name, and these 4 time dimensions will become 12 views in the database (there are 2 hierarchies in the dimension, Oracle creates one dimension view and 2 hierarchy view for each time dimension, so we get 4 x 3 = 12 views). I can understand these redundance have no impact to query performance, just feel it's not so clean.
Even though it turns out there's no 'smarter' approach as I originally expected, you've helped us confirmed the approach we are taking is correct. Very much appreciated!
It makes sense that OLAP generates one sql view per hierarchy. If you use any reporting tool like OBIEE, BO, Cognos etc. then to populate the repository of these tools, you do need a sql view per hierarchy, which is where these automatically generated CUBE_TABLE views help.
Maybe you are not using any of these reporting tools.
You can create one custom sql view on top of your 8 hierarchy views, and use that for your queries:
(1). If all the 8 hierarchy names are unique for the 4 time dimensions, then Using UNION ALL you can create a sql view where you can combine all the automatically generated 8 hierarchy views into one view and add an extra column to identify hierarchy name.
(2). If the hierarchy names are not unique across 4 time dimensions (and you are reusing the same two hierarchy names), then you can create a similar type of sql view and add two extra columns - one to identify dimension name and one for hierarchy name.
I'm totally fine with one view per hierarchy. I'm just not very comfortable with duplicated views. Suppose the dimension view and hierarchy views of first time dimension are named as v_time1_d, v_time1_h1, v_time1_h2, now I have:
v_time1_d = v_time2_d = v_time3_d = v_time4_d
v_time1_h1 = v_time2_h1 = v_time3_h1 = v_time4_h1
v_time1_h2 = v_time2_h2 = v_time3_h2 = v_time4_h2
The equals above means the structure and content of the views are identical.
Because of structure and content are identitcal, I can use any one set of them when querying from the cube, there's no need to create a view on top of them with UNION ALL.
BTW the BI tool we are using is Tableau.
For hand written/custom sql going against the Cube:
You're right that you can use one of the 3 time dimension based views (duplicated 4 times for time1/2/3/4 dimensions).
If you need to use 2 out of 4 time dimensions in your query (by applying an application based filter/restriction) then you would need to alias the relevant view(s) 2 times and get your result.
If you needed to use all 4 time attribute dimensions, then you need to alias the time dimension view 4 times.
12 views represent the full spectrum of possible report/query usage scenarios available to users via these 4 time dimension views. A typical session with various reports/queries is unlikely to refer to all of them (possible but not likely).
In different scenarios, any one of the 3 views - dimension view, h1, h2 view - comes into play for each dimension.
In every query/report scenario, only 1 out of the 3 applicable views would be needed or suitable based on the query/report specifics.
* If your query is at lowest level and is also referencing attributes/descriptions which pertain to >1 hierarchies, then the view v_time<N>d is the appropriate view to use. Neither vtime<N>h1 or vtime<N>_h2 will be able to give you the results you require if you have attributes defined exclusively in either hierarchy h1 or h2.
* Setting the default hierarchy to either h1 or h2 determines which hierarchy view is expected to be used/preferred in cases where dimension is not referenced or where Top level of dimension is desired in query/report. E.g: Top level is same for both hierarchies h1 and h2 but read the Top Level values from h1 view as h1 is marked as the default hierarchy for dimension.
* The level of the query is generally determined by the lowest level of the attributes/fields in your query. If you have attributes like Long_Desc, Short_Desc (available at all levels/all hiers) and you have 2 attributes from Hierarchy H1 and from levels L1 (one level above leaf) and L3 (higher than L1 by 2 levels)... then the query uses v_time<N>h1 and will apply a filter to choose/constraint levelname field = 'L1'.
* Reporting tools like OBIEE etc. have security filters defined to ensure that the joins to missing dimensions and the appropriate dimension level filters appended to the query in a transparent "behind the scenes" manner.
NOTE: By modeling the time based fields as different dimensions, you get the benefit of time series analysis along any of the 4 dimensions. You can create a Month-to-Date Revenue measure along "Show Biz Time" dimension and also create a Year-to-Date Revenue Measure based on "Booking Time" dimension. Also compare % variance of "MTD - Revenue - Show Biz Time" with corresponding value for Last year "MTD LY - Revenue - Show Biz Time". This is the benefit you get by breaking out the 4 columns in relational fact and modeling them as explicit time dimensions. You pay a small price in terms of cube space expansion, time dimension load increase but as acknowledged earlier, this dimension can be loaded incrementally, the full load process is done only initially. It is an offline process (batch) and does not hamper query performance.
3) In the underlying star schema, we have only one time dimension table, in the ticket booking fact table, we have 4 dimension columns referencing the same time dimension table. So it's intuitive to expect similar design in the cube. Current approach seems like replicating the dimension table for each dimension column.Do you have parentage along h1/h2 laid out in terms of different parent columns ... 06:30am should have parent D1 in h1 hierarchy (regular day) and parent=D1-1 in h2 hierarchy (business day)?
h1=regular day hierarchy
h2=biz day hierarchy
In that case, perhaps you want "Booking Time" (Time1) to use view: v_time1_h1 by default and "Booking Biz Time" (Time3) to use v_time3_h2 by default.
Using h2 hierarchy for Time1 or h1 hierarchy for Time3 would be illogical/invalid. Maybe you can simplify things by removing the hierarchies from olap model which dont apply to the dimension.
Thanks Shankar, your post answered my questions regarding when to use which hierarchy view, appreciated.
FYI, my h1 and h2 are both applicable for regular day and business day, the difference is h1 contains week (all, year, week, day) and h2 contains quarter and month (all, year, quarter, month, day).