This content has been marked as final. Show 7 replies
I have some suggestions here, in no particular order of importance.
1. Track the build progress of the cube by querying the CUBE_BUILD_LOG table. You can discover which phase of the build is taking the longest. Chances are, the solve phase is the bulk of the time.
2. You can reduce solve time by lowering the solve % lower than 50%. Try 30% or so to see if the solve time is acceptable. You might discover that anywhere from 20% to 40% is sufficient for fast response times. Going to 50% will probably not give a noticeable gain over 40%.
3. Set a large olap_page_pool_size in the session running the cube build (using alter session). This should help cut down the solve time, but you will need to submit the build process with pl/sql rather than via AWM.
4. Consider using manual work area sizes if the load is taking a long time. These are also alter session commands for workarea_size_policy, sort_area_size, and hash_area_size.
5. Double-check your star schema design. Are all fact table columns indexed? With single-column bitmap indexes? Do all dimension tables have a primary key? Is there a foreign key relationship between facts and dimensions? Are dimension attribute columns indexed? Is the fact table compressed? Does the fact table have a degree of parallelism set?
6. Check the AW tablespace. Is it set to nologging?
If the refresh time is still unacceptable after trying some of these suggestions, we can explore other options such as mapping some dimensions to the cube at levels that are higher than the lowest level in a dimension hierarchy, to cut back on the size of the cube. This comes with its trade-offs.
But 31 million rows in your fact table, does not sound like a whole lot. That is small enough that you should be able to run this fairly quickly. How big are your dimensions? Do you have some deep hierarchies (7+ levels)? Do you have some dimensions with over a million members?
1) Yes. Issue is with Solve phase.
2) Response times are also very important for us as we do web based reporting(OBIEE). We initially ran some tests with 35 vs 50 and thought 50% would be better for response time. Is there any document that can help me to understand 40% vs 50% comparison?
3) I have already asked this question before. Suggestion was in 11.2 no need to change olap_page_pool_size. It seems it will be set dynamic.
Olap Page Pool Cache
4) Are these still need to set manually ?
5) NA (Issue is with Solve phase)
7) Dimension sizes are :
->show statlen(dim1) 38,808 levels: 4 hierarchy:1
->show statlen(dim2) 57 levels: 5 hierarchy:1
->show statlen(dim3) 5 levels:2 hierarchy:1
->show statlen(dim4) 160 levels:2 hierarchy:1
->show statlen(dim5) 319 levels:3 hierarchy:1
->show statlen(dim6) 2,089 levels:3 hierarchy:1
->show statlen(dim7) 17 parentchild hierarchy:1
->show statlen(dim8) 17 parentchild hierarchy:1
->show statlen(dim9) 17 parentchild hierarchy:1
One more observation I had is I changed to level based aggregation from cost based aggregation. Solve step which took 3 days and did not complete in cost based aggregation finished in 1 day in level based aggregation. I know cost based aggregation is recommended in 11g but during my testing I noticed level based aggregation is performing better than cost based( I know which levels to pre-aggregate and users query only on those levels). If I choose level based over cost based do I miss any performance optimizations ? Which one is recommended(Cost vs Level) when and why?
The chief difference is that with level based precompute you get dense looping in your queries. You should test queries at different hierarchy depths to see if level based is acceptable everywhere.
Is there something different about this cube from those you have built in the past? Have you tried moving the partition level down? (e.g. WEEK instead of MONTH)
What if I explicitly limit all my dimensions to a single value? Does it matter if it loops densely or sparsely as I am limiting all my dimensions members explicitly. I assume in that case it is a direct lookup and it does not have to do any looping internally.
I ask above question because we ran some kind of stress test (using DML commands/sql queries) and level based cube seems to be performed better than cost based. (we limit all dimensions explicitly)
Only difference with this cube from previous cubes is this is a 9 dimensional cube where as all our other cubes are 6 dimensional. Other than that I do not see any difference.
Suggestion was in 11.2 no need to change olap_page_pool_size. It seems it will be set dynamic.Regarding olap_page_pool_size, setting it to dynamic at the database level is great for a run time setting. I am not suggesting you change that parameter. However, setting it to a larger value just for the session doing the cube build could prove to be helpful. That has made a big difference for me in the past. What do you have set for pga_aggregate_target? How much memory does your server have overall?
4) Are these still need to set manually ?Yes, that is a manual setting (just for the session doing the cube build). But setting these would probably only impact load times, and have very little impact on solve times.
5) NA (Issue is with Solve phase)Actually, please do not write-off deficiencies in the base data model, thinking they will have no impact on solve times. You must be thinking that if the data is already past the LOAD phase and into SOLVE, then it should not matter how the base tables are designed. I think it still matters. Could you describe your data model in more detail? Because based on the small amount of data you have to aggregate and the simple dimensions you described, this should definitely not be a 3-day process. I am especially interested in the indexes you have defined on the fact and dim tables.