We are testing a new strategy of Do-It-Yourself sub-partitioning of cubes. OLAP cubes support one level of partitioning, but sub-partitioning is not supported, so we are testing the idea of making multiple cubes. We made one cube per month, partitioned by a business key that has a few possible values.
What we end up with is over 80 cubes. Attach time slows down considerably (it takes a few minutes to attach.) Is there anything we can do to improve the attach time?
The attach time is known to increase with the total number of partitions in all cubes. So if you have 80 cubes, each with a hundred partitions, say, then the total number of partitions would be 8000 and I would expect a slow attach time. I would expect the time to be slow even if the cubes themselves are empty since it is the definitions of the partitions that take time to load at attach time, not the data itself. What is the total number of partitions in your case? Have you gathered statistics on the AW$ table since defining the cubes? I will ask around to see if there is anything that can be done to improve things.
For the record, OLAP does support multi-level partitioning. (e.g. partition at month, quarter, and year of the same time dimension.) Contact me directly if you are interested in this.
We actually have two cubes per time period, and 44 time periods. So 88 cubes in all.
There are 9 partitions each in half of the cubes, and 18 partitions each in the other half.
This gives us (44 cubes x 9 partitions) + (44 cubes x 18 partitions) = 1188 partitions total in the workspace.
I am not really interested in subpartitioning along multiple levels of the same dimension. What would really be great, however, is if it were possible to partition along a level of one dimension and subpartition using a level of a different dimension.
I did go back and gather fresh statistics on the AW$ table. That seemed to help, but it still takes over 40 seconds to attach.
We use materialized views with query rewrite, so there is really nothing we need to do to tie the cubes together. All of our queries point to the base tables or views, and the queries are rewritten to the cubes where applicable.
The objective is to reduce refresh time. Breaking the cubes into smaller pieces allows us to take fuller advantage of simultaneously solving multiple partitions of a cube in separate processes. This, in turn, saves time.
Also, the partitions have been broken along the lines of logically grouped data. The divisions are mutually exclusive and aggregates across partitions make no business sense. For example, our top-level partition (actually a cube) is monthly data. Users cannot combine more than one month of data. The way our data is structured makes it so that performing aggregates across multiple months is meaningless. Likewise, the "subpartition" (really cube partitions) are like lines of business. They cannot be mixed and matched or aggregated together.
So it makes sense to have these logical groupings to divide and conquer in order to achieve faster load and solve times. So far, our cube loads do not seem to present a significant strain on our hardware resources. We have plenty of CPU, memory, and IO bandwidth to take advantage of, so running a single solve process in one CPU thread seemed like a waste of resources. (On the other hand, we have also experienced an issue where running 4 or more solve processes at once is problematic even though the hardware was not maxed out, and we have opened a service request with Oracle on this issue.)
DB Version: 184.108.40.206
Edit: Yes, we have considered making one cube per line of business, partitioned by month instead of monthly cubes partitioned by line of business. But either way, the total number of partitions remains the same.
It's an unusual use case (the use of so many cubes as a partitioning strategy), but I understand what you are trying to do and that it makes sense from a requirements point of view. I'll take Dave's word that so many partitions will slow attach times. If that's the case, the only thing to do is to reduce the overall number of partitions. Is there a different dimension that you could partition on that would allow you to take advantage of the hardware? Also, if you are
I'm curious about the problem with running more than one solve process. I regular run 30 or 60 processes (on large hardware). I'm happy to take a look at the SR if you want me to send it directly to firstname.lastname@example.org (do not put it in this forum thread).
Hopefully the attach time issue will not be much of a problem in the user environment, since we use connection pooling to attach from an OBIEE front end. The sessions in the connection pool should remain attached to the analytic workspace, I believe.
It is an option to reduce the number of partitions, but that is done at the expense of having larger partitions that take longer to refresh. We are moving towards more partitions as one solution to achieve better refresh times.
I will send an email with the SR on the multiple solve processes. We can run 2 or 3 solves simultaneously without a problem, but running 4 or more causes them all to slow way down. But this is with large partitions. I'm sure that with smaller volumes of data we could probably do more in parallel.