This content has been marked as final. Show 9 replies
The behaviour you describe is expected. Specifically if you run a 'fast solve', then the server will reload every partition in the cube, whether or not it has changed data. What I am less clear about, since you didn't give enough detail, is whether the aggregation (a.k.a. SOLVE) is fired in each partition.
There are many tricks you can play to reduce the amount of work performed by a build, but the precise approach will depend on the amount of data you want to load each time. If you are loading a relatively small amount of data each time, and especially if it corresponds to only one or two partitions, then you could try the following.
The SERIAL keyword instructs the server to load all values from the mapped table at once (i.e. not by partition). This is a big win if your table contains data for only a few partitions or if it has very few rows. The SOLVE step will be parallelised, but will only fire for partitions changed by the load step. Obviously you should set the parallelism to the level you want. If only one partition will be affected, then you may choose parallelism = 0.
exec dbms_cube.build('my_cube using (load serial, solve)', parallelism=>2, add_dimensions=>false)
The suggested mechanism of LOAD SERIAL,SOLVE has worked perfectly when the partitioning dimension(i.e. date in our case) is the only dimension that has changed after refresh. But if any other dimension has new values(eg. customer,ip etc), the cube seems to rebuild all the partitions (in our case all the date partitions of last two years).
Now my question is
Is this an expected behaviour?
a)If yes then, is there any way to avoid refresh of all the partitions of all the dates.
b)If no then, are we missing something
Our Maintenance Script
LOAD NO SYNCH,
LOAD NO SYNCH,
LOAD NO SYNCH,
LOAD NO SYNCH,
'CCCCS', -- refresh method
false, -- refresh after errors
1, -- parallelism
true, -- atomic refresh
true, -- automatic order
false); -- add dimensions
Edited by: user11183570 on Apr 18, 2011 7:41 AM
Edited by: user11183570 on Apr 18, 2011 7:42 AM
Sadly this is expected behaviour. The OLAP option (up to and including release 188.8.131.52) solves a leaf partition if
If you look in the OUTPUT column of the cube_build_log where command = 'SOLVE' and status = 'COMPLETED' then you will see entries like this
(it is not empty) AND ( (new values have been loaded into the partition) OR (any (non-partitioning) dimension has changed) )
This tells you why a particular partition was processed. In this example it was because new values were loaded into the partition. Below is a case where the data was unchanged, but one of the dimensions has changed.
<SolveStatistics IS_EMPTY="no" CHANGED_VALUES="yes" CHANGED_RELATIONS="no"/>
The next question is what happens to a partition if the dimensions have changed? This will depend in part on the type of hierarchy change. There is a specific list of changes that will cause a partition to be completely reaggregated. From memory I believe they include
<SolveStatistics IS_EMPTY="no" CHANGED_VALUES="no" CHANGED_RELATIONS="yes"/>
(1) Deleting a dimension member
(2) Changing the parent of an existing dimension member.
(3) Adding a new non-leaf member
If all you do is add a new leaf member and assign it to an existing parent, then the aggregation should be incremental. An example would be adding a new customer to a customer dimension within an existing location. I say should here because that is the intent of the code, but bugs are always possible.
As a special case of this, if you add a new leaf member as above, then partitions that don't contain any data for the new member should be processed quickly. (e.g. less than 30 seconds). An example of this is adding a new product to the product dimension. This would cause the partition for last year to be re-processed, but since the new product was not sold last year (by definition), the partition should be processed quickly. Again, if you are not seeing this behaviour, then it could be a bug.
In release 184.108.40.206 you can specify aggregation for only a single partition using DBMS_CUBE.BUILD, but there is no way to do this in 220.127.116.11.
For the record, in 18.104.22.168 you can specify a single partition as in the following example. It assumes that UNITS_CUBE is partitioned by the CALENDAR_QUARTER level in the CALENDAR hierarchy of the TIME dimension.
The syntax is accepted in earlier versions, but it will not work properly.
-- Build the single partition representing calendar_quarter '14'. begin dbms_cube.build(q'! UNITS_CUBE USING ( FOR "TIME" WHERE HIER_ANCESTOR(WITHIN TIME.CALENDAR LEVEL TIME.CALENDAR_QUARTER) = '14' BUILD ( LOAD, SOLVE ) ) !', parallelism=>1); end; /
This may work for this customer's particular situation to load just a single partition. However, we are loading cubes that have cube materialized views with query rewrite enabled. The documentation for dbms_cube.build states: "A cube with a rewrite materialized view cannot have a USING clause, except for the ANALYZE command. It uses the default build options." And sure enough, the build fails for us, using the syntax above.
In our environment, the following is what we observe. We are partitioned on time (by month). Prior partitions never change, so we just add new partitions. When we load the newest month of data, it does its load and solve, but then it goes to every prior partition and does a Clear Leaves and a SQL Load. There is no solve--it completely skips solve. It appears to be doing a complete load of every historical partition.
Granted, the dimensions change as new members are added, so we expect it to briefly touch each partition. But considering we have materialized view logs and such, we are surprised to see a complete reload of the leaves. As the cube grows, this has the potential to become the most time-consuming part of our refresh. We will spend 4 to 6 hours loading the star schema and refreshing the most recent cube partition, and then 15+ hours waiting for it to reload every historical cube partition even though they have not changed.
How do we avoid what seems to be all this unnecessary processing of unchanged partitions during cube maintenance?
PCT may not work for us because the dimension tables are not partitioned (only the fact tables are partitioned), and PCT will not work for a materialized view that is defined with joins to a non-partitioned table. This is unfortunate, because it seems like this eliminates PCT as an option for the most common designs: partitioned fact tables joined to non-partitioned dimension tables.
We can use straight inserts, and then we could just do a fast refresh relying on our materialized view logs. We tried this option, but the load+solve of the newest partition is much slower using "fast" than it is with complete. The only benefit of fast in this situation is that only the latest partition needs to be loaded (unlike a complete). A fast refresh on one partition is still faster than a complete refresh of all partitions (but for a single partition, complete is faster).
Here's what we ended up doing. We partitioned our dimension tables so we could use PCT. That's right, every one of our dimension tables is now partitioned on its primary key, but there is only one partition per dimension table. (We range partitioned on ID and created one partition for all values less than maxvalue.) Our refresh process looks like this now:
Load all the dimensions tables
Refresh all dimensions
Do a "consider fresh" on the dimension MVs
Load fact tables for the new partition
Refresh cubes with PCT
We have no materialized view logs. PCT is faster than a fast refresh in our situation, plus we have the added benefit of no MV log overhead. Other implementations may benefit from fast refreshes and the use of MV logs, but for this project we have no small incremental updates.
It is too bad we had to create the bogus dimension partitions, however. Aside from that, this solution seems to work for us.
Suggestion to Oracle: allow PCT on non-partitioned tables (treat a non-partitioned table as a table with only one partition).
When we use either PCT or Fast refreshes, we see that our newest partition loads and solves much slower than it did during complete refreshes. Why is that?
The solve times range from 1 hour per partition during a complete refresh to 7 hours per partition during a PCT refresh, using the exact same data sets. We only load one partition at a time either way. It is not due to deletes either, as this happens with new partitions as well.