This discussion is archived
9 Replies Latest reply: May 11, 2011 1:31 PM by AdamMartin RSS

Partition rebuild of cubes

user11183570 Newbie
Currently Being Moderated
It will be very helpful if I can have a bit of information over here.

I made a cube partitioned on a particular level of a particular dimension (say, day level of date dimension). While maintaining the cube using dbms_cube.build procedure, I have observed each and every partition of the cube is getting rebuilt/maintained for which there are records in the dimension. It also maintains the partitions for which no updates/inserts have occured in my fact table (on which the cube is built). Is this behaviour normal as the maintain time is unnecessarily very high (in case of a single records also in my fact table but the date dimension is having 2 years retention) ?

The cube is actually build on an incremental view of the fact table based on update time. I am using complete refresh mechanism for the dimensions and fast solve mechanism for cube. AWM version is 11.1.0.7A and database is Oracle 11.1.0.7.

Any response will be highly appreciated.
Thanking in advance,
Mainak
  • 1. Re: Partition rebuild of cubes
    DavidGreenfield Expert
    Currently Being Moderated
    The behaviour you describe is expected. Specifically if you run a 'fast solve', then the server will reload every partition in the cube, whether or not it has changed data. What I am less clear about, since you didn't give enough detail, is whether the aggregation (a.k.a. SOLVE) is fired in each partition.

    There are many tricks you can play to reduce the amount of work performed by a build, but the precise approach will depend on the amount of data you want to load each time. If you are loading a relatively small amount of data each time, and especially if it corresponds to only one or two partitions, then you could try the following.
    exec dbms_cube.build('my_cube using (load serial, solve)', parallelism=>2, add_dimensions=>false)
    The SERIAL keyword instructs the server to load all values from the mapped table at once (i.e. not by partition). This is a big win if your table contains data for only a few partitions or if it has very few rows. The SOLVE step will be parallelised, but will only fire for partitions changed by the load step. Obviously you should set the parallelism to the level you want. If only one partition will be affected, then you may choose parallelism = 0.
  • 2. Re: Partition rebuild of cubes
    user11183570 Newbie
    Currently Being Moderated
    Hi,

    The suggested mechanism of LOAD SERIAL,SOLVE has worked perfectly when the partitioning dimension(i.e. date in our case) is the only dimension that has changed after refresh. But if any other dimension has new values(eg. customer,ip etc), the cube seems to rebuild all the partitions (in our case all the date partitions of last two years).

    Now my question is

    Is this an expected behaviour?
    a)If yes then, is there any way to avoid refresh of all the partitions of all the dates.
    b)If no then, are we missing something

    Our Maintenance Script

    EXEC DBMS_CUBE.BUILD(
    '
    DATE_DIM_OLAP USING
    (
    LOAD NO SYNCH,
    COMPILE
    ),
    ISP_DIM_OLAP USING
    (
    LOAD NO SYNCH,
    COMPILE
    ),
    PRODUCT_DIM_OLAP USING
    (
    LOAD NO SYNCH,
    COMPILE
    ),
    URL_DIM_OLAP USING
    (
    LOAD NO SYNCH,
    COMPILE
    ),
    URL_CUBE_OLAP USING
    (
    LOAD SERIAL,
    SOLVE
    )',
    'CCCCS', -- refresh method
    false, -- refresh after errors
    1, -- parallelism
    true, -- atomic refresh
    true, -- automatic order
    false); -- add dimensions

    Many Thanks
    Mainak

    Edited by: user11183570 on Apr 18, 2011 7:41 AM

    Edited by: user11183570 on Apr 18, 2011 7:42 AM
  • 3. Re: Partition rebuild of cubes
    user11183570 Newbie
    Currently Being Moderated
    Reopening as we faced some unexpected behavior..

    Edited by: user11183570 on Apr 20, 2011 1:12 AM
  • 4. Re: Partition rebuild of cubes
    DavidGreenfield Expert
    Currently Being Moderated
    Sadly this is expected behaviour. The OLAP option (up to and including release 11.2.0.2) solves a leaf partition if
    (it is not empty)
    AND
    (
       (new values have been loaded into the partition)
       OR
       (any (non-partitioning) dimension has changed)
    )
    If you look in the OUTPUT column of the cube_build_log where command = 'SOLVE' and status = 'COMPLETED' then you will see entries like this
    <SolveStatistics
      IS_EMPTY="no"
      CHANGED_VALUES="yes"
      CHANGED_RELATIONS="no"/>
    This tells you why a particular partition was processed. In this example it was because new values were loaded into the partition. Below is a case where the data was unchanged, but one of the dimensions has changed.
    <SolveStatistics
      IS_EMPTY="no"
      CHANGED_VALUES="no"
      CHANGED_RELATIONS="yes"/>
    The next question is what happens to a partition if the dimensions have changed? This will depend in part on the type of hierarchy change. There is a specific list of changes that will cause a partition to be completely reaggregated. From memory I believe they include

    (1) Deleting a dimension member
    (2) Changing the parent of an existing dimension member.
    (3) Adding a new non-leaf member

    If all you do is add a new leaf member and assign it to an existing parent, then the aggregation should be incremental. An example would be adding a new customer to a customer dimension within an existing location. I say should here because that is the intent of the code, but bugs are always possible.

    As a special case of this, if you add a new leaf member as above, then partitions that don't contain any data for the new member should be processed quickly. (e.g. less than 30 seconds). An example of this is adding a new product to the product dimension. This would cause the partition for last year to be re-processed, but since the new product was not sold last year (by definition), the partition should be processed quickly. Again, if you are not seeing this behaviour, then it could be a bug.

    In release 11.2.0.2 you can specify aggregation for only a single partition using DBMS_CUBE.BUILD, but there is no way to do this in 11.1.0.7.
  • 5. Re: Partition rebuild of cubes
    DavidGreenfield Expert
    Currently Being Moderated
    For the record, in 11.2.0.2 you can specify a single partition as in the following example. It assumes that UNITS_CUBE is partitioned by the CALENDAR_QUARTER level in the CALENDAR hierarchy of the TIME dimension.
    -- Build the single partition representing calendar_quarter '14'.
    begin
     dbms_cube.build(q'!
    UNITS_CUBE USING 
    (
     FOR 
      "TIME" WHERE HIER_ANCESTOR(WITHIN TIME.CALENDAR LEVEL TIME.CALENDAR_QUARTER) = '14'
     BUILD 
     (
      LOAD, 
      SOLVE
     )
    )
    !', parallelism=>1);
    end;
    /
    The syntax is accepted in earlier versions, but it will not work properly.
  • 6. Re: Partition rebuild of cubes
    AdamMartin Pro
    Currently Being Moderated
    David,

    This may work for this customer's particular situation to load just a single partition. However, we are loading cubes that have cube materialized views with query rewrite enabled. The documentation for dbms_cube.build states: "A cube with a rewrite materialized view cannot have a USING clause, except for the ANALYZE command. It uses the default build options." And sure enough, the build fails for us, using the syntax above.

    In our environment, the following is what we observe. We are partitioned on time (by month). Prior partitions never change, so we just add new partitions. When we load the newest month of data, it does its load and solve, but then it goes to every prior partition and does a Clear Leaves and a SQL Load. There is no solve--it completely skips solve. It appears to be doing a complete load of every historical partition.

    Granted, the dimensions change as new members are added, so we expect it to briefly touch each partition. But considering we have materialized view logs and such, we are surprised to see a complete reload of the leaves. As the cube grows, this has the potential to become the most time-consuming part of our refresh. We will spend 4 to 6 hours loading the star schema and refreshing the most recent cube partition, and then 15+ hours waiting for it to reload every historical cube partition even though they have not changed.

    How do we avoid what seems to be all this unnecessary processing of unchanged partitions during cube maintenance?
  • 7. Re: Partition rebuild of cubes
    DavidGreenfield Expert
    Currently Being Moderated
    If your fact table is also partitioned by month, then you should be able to run a 'Partition Change Tracking' (PCT) refresh for this kind of situation. This means that you would specify a refresh method of 'P'. It should reload only those partitions that have changed.
  • 8. Re: Partition rebuild of cubes
    AdamMartin Pro
    Currently Being Moderated
    PCT may not work for us because the dimension tables are not partitioned (only the fact tables are partitioned), and PCT will not work for a materialized view that is defined with joins to a non-partitioned table. This is unfortunate, because it seems like this eliminates PCT as an option for the most common designs: partitioned fact tables joined to non-partitioned dimension tables.

    We can use straight inserts, and then we could just do a fast refresh relying on our materialized view logs. We tried this option, but the load+solve of the newest partition is much slower using "fast" than it is with complete. The only benefit of fast in this situation is that only the latest partition needs to be loaded (unlike a complete). A fast refresh on one partition is still faster than a complete refresh of all partitions (but for a single partition, complete is faster).

    Edit (5/10/2011):

    Here's what we ended up doing. We partitioned our dimension tables so we could use PCT. That's right, every one of our dimension tables is now partitioned on its primary key, but there is only one partition per dimension table. (We range partitioned on ID and created one partition for all values less than maxvalue.) Our refresh process looks like this now:

    Load all the dimensions tables
    Refresh all dimensions
    Do a "consider fresh" on the dimension MVs
    Load fact tables for the new partition
    Refresh cubes with PCT

    We have no materialized view logs. PCT is faster than a fast refresh in our situation, plus we have the added benefit of no MV log overhead. Other implementations may benefit from fast refreshes and the use of MV logs, but for this project we have no small incremental updates.

    It is too bad we had to create the bogus dimension partitions, however. Aside from that, this solution seems to work for us.

    Suggestion to Oracle: allow PCT on non-partitioned tables (treat a non-partitioned table as a table with only one partition).
  • 9. Re: Partition rebuild of cubes
    AdamMartin Pro
    Currently Being Moderated
    When we use either PCT or Fast refreshes, we see that our newest partition loads and solves much slower than it did during complete refreshes. Why is that?

    The solve times range from 1 hour per partition during a complete refresh to 7 hours per partition during a PCT refresh, using the exact same data sets. We only load one partition at a time either way. It is not due to deletes either, as this happens with new partitions as well.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points