Hi, I have a dumb question on cube build performance. When choosing the % to precalc, is build time linear (or nearly linear) to this? i.e. if selecting 10% going to be 3 times faster than selecting 30%? Also, is it fair to assume that if only 10% of the values are precalculated, that on average end user queries will have to hit 3 times as much data and therefore be about 3 times slower?
Sorry, doing this on a VM on a laptop, so testing various build configs is taking forever (I have yet to have a cube load actually finish). Guess I shouldn't be trying a 15 dim cube on a VM on a laptop, but trying to sell DBAs on the fact that it could enhance performance of our data mart.
Also as a follow up, I have some dimensions of the 15 dimensions that don't significantly add to the number of rows (which means removing them doesn't significantly reduce the number of rows). But my assumption would be, with less dimensions, there are less "total" values to calculate, so that solving for the same 35 million rows could be a lot faster if I remove a few of the lesser used dimensions. Does that sound correct?
Sorry, old Express / IRI guy, been way too long since I've played around in this world.
I would not say that it is linear. When you are using cost-based aggregation, the precompute number is more like a cost threshold than a percentage of the data that is cubed. So in essence, if you pick 20%, it's not like it just selects 1/5 of the cube to aggregate. Instead, it looks at the top 20% (or so) most costly aggregates to compute (the queries that would take the longest to run if they were not cubed) and pre-aggregates them.
So as you can see, there are diminishing returns the higher you go. Choosing 40% instead of 10% might make a big difference, but choosing 90% instead of 60% may be insignificant in terms of response time. As the number goes up, you still trade storage space for response time, and as the number goes up your cube maintenance time goes up too. At some point, which is different for each application, you will reach a point where higher cost percentage numbers will not seem to help your response time much anymore (it is no faster or imperceptibly faster than calculating on the fly). Or, you may reach a number where the cube maintenance time is unacceptable, even if going higher would improve response time.
So you pick a number that is acceptable in terms of cube solve time and query performance, and (to a lesser degree of concern) disk storage.
Cost-based aggregation (aka "precompute percent") was introduced in 11.1 as a simpler alternative to level-based aggregation. The product management dream was for a linear parameter, but the complexities soon became apparent. What would the linearity be measured against? Build Time? Query Time? Total Disk Size? The result balances all these factors, but is linear against none of them. Luckily the behaviour of the precompute percent level has been fairly consistent across cubes and schemas in our experience, so I can give you a rough characterisation. But bear in mind that this is a guide only -- you must experiment on your own schema and system to see what works for you. In particular you must balance your own requirements on build time, query time, and disk size.
*0%* - This means no precomputation at all, so all data access will be dynamic. It is the recommended setting for the top partition of a cube. If, for some reason, you want to use it for the leaf partitions as well, then I would advise you to switch to a non-compressed cube.
*1%* - This precomputes the smallest portion of the cube allowed by the algorithm and would certainly take longer than 1% of the time taken by a 100% build. For leaf partitions it is usually better to increase the amount since you will get much better query response time for not much more cost in terms of disk size and build time. It may be a good level for the top partition of a cube, but should be used with caution since top partitions are often too large to precompute.
*2%-19%* - These levels don't seem to give much benefit since the amount of time and the total disk size is almost the same as a 20% build, but the queries are slower.
*20%-50%* - This range is probably the best compromise in terms of build versus query time. The AWM default is 35%, which is a good place to start. Lower it towards 20% if you want a faster build and raise it towards 50% if you want faster queries. The setting is closer to linear within this range than outside it.
*51%-99%* - You should probably avoid these levels, although I have seen 60% used in practice. The reason is that while the size of the cube and the length of the build increase rapidly, the queries do not get correspondingly faster. Indeed you may find that the queries are slower because the code spends more time swapping in pages from disk.
*100%* - This will precompute all (non NULL) cells in the cube. It may be a surprise after my advice about 51%-99%, but 100% is a reasonable level to choose. This is because the code is much simpler when it knows that everything is precomputed and hence stored in some page on disk.