This content has been marked as final. Show 16 replies
Cube loading is a two step process:
(1). Loading relational data into the cube (i.e., stored measures)
(2). Aggregating that data in the cube
You have to find out which step is taking more time now.
Assuming you are using 11.2 version of db, take a look at the CUBE_BUILD_LOG table and compare the records for an earlier BUILD_ID vs. a recent one.
CUBE_BUILD_LOG contains very useful information.
One more thing.
I noticed that you are doing AVG aggregation.
Can you explain why is it being done, and for which dimension?
The reason I am asking this is because SUM aggregation is quicker and in one case I was able to achieve Average aggregation of financial ADB (Average-Daily-Balance) data over TIME dimension hierarchy by only using SUM aggregation for TIME.
That is why I want to know if that technique can help you or not.
ADB data in the source fact table was at MONTH level. It was mapped to the ADB cube in olap.
FACT1 column mapped to MEAS1
FACT2 column mapped to MEAS2
Required aggregation over time dimension was AVERAGE.
Quarter = ((Period1 x its time_span) + (Period2 x its time_span) + (Period3 x its time_span)) / (Quarter time_span)
Q1 = ( (Jan ADB data) x 31 + (Feb ADB data) x 28 + (Mar ADB data) x 31) / 90
To achieve AVG time aggregation, the fact columns in the ADB cube source view were multiplied with timespan and then mapped to olap stored measures. I had to join the FACT table with TIME dimension source sql table, as timespan column was in the TIME table and not in fact table.
FACT1 x time_span mapped to MEAS1
FACT2 x time_span mapped to MEAS2 etc.
TIME aggregation for the cube was then set to SUM in olap.
Now here is the trick.+
A calculated measure was created corresponding to each stored measure.
CALC_MEAS1 = CUBE.MEAS1 / "TIME".TIME_SPAN
CALC_MEAS2 = CUBE.MEAS2 / "TIME".TIME_SPAN
This will show correct data at MONTH, QUARTER and YEAR levels.
So it was simple and fast.
Forgot to mention that YTD calculation for the above ADB data required the sum of timespan.
OLAP Expression Syntax is shown below.
SUM(CUBE.MEAS1) OVER (HIERARCHY "TIME".TIMEHIER BETWEEN UNBOUNDED PRECEDING AND CURRENT MEMBER WITHIN ANCESTOR AT DIMENSION LEVEL "TIME".FISCAL_YEAR) / SUM("TIME".TIME_SPAN) OVER HIERARCHY ("TIME".TIMEHIER BETWEEN UNBOUNDED PRECEDING AND CURRENT MEMBER WITHIN ANCESTOR AT DIMENSION LEVEL "TIME".FISCAL_YEAR)
SUM(CUBE.MEAS2) OVER (HIERARCHY "TIME".TIMEHIER BETWEEN UNBOUNDED PRECEDING AND CURRENT MEMBER WITHIN ANCESTOR AT DIMENSION LEVEL "TIME".FISCAL_YEAR) / SUM("TIME".TIME_SPAN) OVER HIERARCHY ("TIME".TIMEHIER BETWEEN UNBOUNDED PRECEDING AND CURRENT MEMBER WITHIN ANCESTOR AT DIMENSION LEVEL "TIME".FISCAL_YEAR)
Interesting! Nice to know abt this!
In our case, the scenario is bit different as we have got the measure to be averaged across all the dimensions.
Another thing to be checked with you is, how the log entries get into CUBE_BUILD_LOG table. What we see when we refresh the cube through AWM or through the script directly in a database session it logs, however when we refresh by invoking a shell script (which includes the same cube build script) we dont see the log entries. Is there anything needs to be turned on to get all the log entries ..any thoughts??
Records are logged to CUBE_BUILD_LOG whether you build from DBMS_CUBE.BUILD or from AWM. (The situation for CUBE_REJECTED_RECORDS is different.)
Perhaps you are building from another schema? Records are sent to the CUBE_BUILD_LOG table in the invoker's schema, not the AW's schema. As an example, if you have an AW in the SCOTT schema and you execute the DBMS_CUBE.BUILD call logged in as SCOTT, then all log records will be inserted into SCOTT.CUBE_BUILD_LOG. But if you execute DBMS_CUBE.BUILD logged in as some master schema, OLAPDBA say, then the records would be inserted into OLAPDBA.CUBE_BUILD_LOG. In either case if the table doesn't exist, then the build will continue without error (and without logging).
On the general build slowdown, have you tried clearing the cube and rebuilding it from scratch? If you have been running incremental loads for a while, then cube build performance can slow down. Think about disk-fragmentation and you get the basic idea.
Do you have a test environment where you can run a complete build on the same data to see how long it takes?
There are no hard and fast rules as to when to run a "complete build" (i.e. clear then load all records and aggregate from scratch) instead of an "incremental build". I certainly can't say something like "run a complete build once a month" since it depends on too many factors -- how often you update, initial cube size, how many records are added each build, number of dimensions, structure of hierarchies, types of modification, hardware, disk performance, etc. But as a common sense rule I would say you should run a complete build at the point where the incremental build becomes more expensive than a complete build would be. This requires trial and error.
Take a look at DBMS_CUBE.BUILD documentation at http://download.oracle.com/docs/cd/E11882_01/appdev.112/e16760/d_cube.htm#ARPLS218 and DBMS_CUBE_LOG documentation at http://download.oracle.com/docs/cd/E11882_01/appdev.112/e16760/d_cube_log.htm#ARPLS72789
You can also search this forum for more questions/examples about DBMS_CUBE.BUILD
David Greenfield has covered many Cube loading topics in the past on this forum.
Mapping to Relational tables
Re: Enabling materialized view for fast refresh method
DBMS CUBE BUILD
CUBE_DFLT_PARTITION_LEVEL in 11g?
Reclaiming space in OLAP 22.214.171.124
Re: During a cube build how do I use an IN list for dimension hierarchy?