8 Replies Latest reply: Apr 25, 2014 7:56 AM by PBizme RSS

    Partitioning strategy for OBIEE query performance

    PBizme

      I'm using partitioning for the first time and I'm having trouble determining if I can partition my fact table in a way that will allow partition pruning to work with the queries that OBIEE generates.  I've put together a simple example using queries that I wrote to illustrate my problem.  In this example I have a star schema with a fact table and I'm joining in two dimensions.  My fact table is LIST-RANGE partitioned on JOB_ID and TIME_ID and those are the keys that tie to the two dimensions I'm using in this query.


      select sum(boxbase)

      from TEST_RESPONSE_COE_JOB_QTR A     

           join DIM_STUDY C on A.job_id = C.job_id     

            join DIM_TIME B on A.response_time_id = B.time_id

      where C.job_name = 'FY14 CSAT'

      and B.fiscal_quarter_name = 'Quarter 1';


      From what I can tell though, because the query is actually filtering on columns in the dimensions instead of they fact table columns, the pruning isn't actually taking place.  I'm actually seeing slightly better performance from a non-partitioned table even though I wrote this query specifically targeted at the partitioning strategy that is in place now.


      If I run the statement below, it runs much faster and an explain plan is very simple and looks to me like it is pruning down to one sub partition like I hoped.  This isn't how any query generated by OBIEE is going to look though.


      select sum(boxbase)

      from TEST_RESPONSE_COE_JOB_QTR

      where job_id = 101123480

      and response_time_id < 20000000;


      Any suggestions?  I do get some benefit from partition exchange using this setup, but if I'm going to sacrifice reporting performance then that may not be worthwhile or at a minimum I'd need to get rid of my sub partitions if they aren't providing any benefit.


      Here are the explain plans I got for the two queries in my original post:

       

      Operation

      Object Name

      Rows

      Bytes

      Cost

      Object Node

      In/Out

      PStart

      PStop

      SELECT STATEMENT Optimizer Mode=ALL_ROWS


      1


      20960





        SORT AGGREGATE


      1

      13






          VIEW

      SYS.VW_ST_5BC3A99F

      101 K

      1 M

      20960





            NESTED LOOPS


      101 K

      3 M

      20950





              PARTITION LIST SUBQUERY


      101 K

      2 M

      1281



      KEY(SUBQUERY)

      KEY(SUBQUERY)

                PARTITION RANGE SUBQUERY


      101 K

      2 M

      1281



      KEY(SUBQUERY)

      KEY(SUBQUERY)

                  BITMAP CONVERSION TO ROWIDS


      101 K

      2 M

      1281





                    BITMAP AND









                      BITMAP MERGE









                        BITMAP KEY ITERATION









                          BUFFER SORT









                            INDEX SKIP SCAN

      CISCO_SYSTEMS.DIM_STUDY_UK

      1

      17

      1





                          BITMAP INDEX RANGE SCAN

      CISCO_SYSTEMS.FACT_RESPONSE_JOB_ID_BMID_12






                  KEY

      KEY

                      BITMAP MERGE









                        BITMAP KEY ITERATION









                          BUFFER SORT









                            VIEW

      CISCO_SYSTEMS.index$_join$_052

      546

      8 K

      9





                              HASH JOIN









                                INDEX RANGE SCAN

      CISCO_SYSTEMS.DIM_TIME_QUARTER_IDX

      546

      8 K

      2





                                INDEX FAST FULL SCAN

      CISCO_SYSTEMS.TIME_ID_PK

      546

      8 K

      8





                          BITMAP INDEX RANGE SCAN

      CISCO_SYSTEMS.FACT_RESPONSE_TIME_ID_BMIDX_11






                  KEY

      KEY

              TABLE ACCESS BY USER ROWID

      CISCO_SYSTEMS.TEST_RESPONSE_COE_JOB_QTR

      1

      15

      19679



                  ROWID

      ROW L









      Operation

      Object Name

      Rows

      Bytes

      Cost

      Object Node

      In/Out

      PStart

      PStop

      SELECT STATEMENT Optimizer Mode=ALL_ROWS


      1


      1641





        SORT AGGREGATE


      1

      13






          PARTITION LIST SINGLE


      198 K

      2 M

      1641



                  KEY

      KEY

            PARTITION RANGE SINGLE


      198 K

      2 M

      1641



      1

      1

              TABLE ACCESS FULL

      CISCO_SYSTEMS.TEST_RESPONSE_COE_JOB_QTR

      198 K

      2 M

      1641



      36

      36


       

        • 1. Re: Partitioning strategy for OBIEE query performance
          rp0428
          From what I can tell though, because the query is actually filtering on columns in the dimensions instead of they fact table columns, the pruning isn't actually taking place.

          The plan you posted shows possible partition pruning. See those KEY specs in the plan? Those are the start/stop partitions that will be used. Oracle expects to prune dynamically at run-time based on 'quarter' value' in the query.

          My fact table is LIST-RANGE partitioned on JOB_ID and TIME_ID and those are the keys that tie to the two dimensions I'm using in this query.

          Ok - so Oracle links that TIME_ID to the dimension table and, based on the 'quarter' that you provide expects to be able to prune partitions.. If you want Oracle to 'prune' one of the 'JOB_ID' partitions you need to use a query that lets Oracle determine that the partition is NOT needed.

           

          It can't do that with your original query.

          select sum(boxbase)

          from TEST_RESPONSE_COE_JOB_QTR

          where job_id = 101123480

          and response_time_id < 20000000;

          Ok - now you told Oracle you don't need ANY partition if JOB_ID of 101123480 is not in it. So Oracle can 'prune' the partitions that cannot possibly be needed.

           

          What is it you don't understand?

          • 2. Re: Partitioning strategy for OBIEE query performance
            PBizme

            What I'm trying to understand is if there is a way for me to set up partitioning or indexing on my tables so that the original query can cause pruning to happen.  Everything I've tried indicates to me that the only way I can make it happen is to actually filter directly on the partitioned column in the fact table so I'm just trying to confirm whether that is true or if I'm missing something.

             

            Seems like you'd never get any pruning advantage from any BI request if you have a star schema set up because all your filtering is gonna be based on dimensions.

            • 3. Re: Partitioning strategy for OBIEE query performance
              rp0428
              What I'm trying to understand is if there is a way for me to set up partitioning or indexing on my tables so that the original query can cause pruning to happen.  Everything I've tried indicates to me that the only way I can make it happen is to actually filter directly on the partitioned column in the fact table so I'm just trying to confirm whether that is true or if I'm missing something.

              What am I missing? As I said before the plan shows that Oracle IS pruning. Did you see those lines saying 'PARTITION' with KEY in the plan

                     PARTITION LIST SUBQUERY

               

              101 K

              2 M

              1281

               

               

              KEY(SUBQUERY)

              KEY(SUBQUERY)

                        PARTITION RANGE SUBQUERY

               

              101 K

              2 M

              1281

               

               

              KEY(SUBQUERY)

              KEY

               

              There are two types of pruning: static and dynamic. Static pruning can be done when there is enough info in the query predicates to let Oracle know which partitions are needed.

               

              Your second query example shows that for JOB_ID for a table partitioned on JOB_ID.

               

              Dynamic pruning is pruning done at run-time.

               

              See the VLDB and partitioning doc sections on static and dynamic pruning

              http://docs.oracle.com/cd/E11882_01/server.112/e25523/part_avail.htm

              Partition Pruning

              Partition pruning is an essential performance feature for data warehouses. In partition pruning, the optimizer analyzes FROM and WHERE clauses in SQL statements to eliminate unneeded partitions when building the partition access list. This functionality enables Oracle Database to perform operations only on those partitions that are relevant to the SQL statement.

              This section contains the following topics:

               

              You might find this white paper helpful:

               

              Best Practices for a Data Warehouse on Oracle

               

              Database 11g

              http://www.oracle.com/technetwork/database/bi-datawarehousing/twp-dw-best-practies-11g11-2008-09-132076.pdf

               

              • 4. Re: Partitioning strategy for OBIEE query performance
                PBizme

                I guess I'm kind of running myself in circles.  At one point I was seeing that KEY stuff in the explain plans and thought that meant pruning was being used and partitioning was helping my BI performance.  From what you say in your last post it sounds like the first part of that is correct anyway.

                 

                What brought me to posting this question was doing some testing with different partitioning strategies and finding that I got better performance out of a non-partitioned table from requests that filter on dimensions (and therefore would potentially use dynamic pruning right?).  This was the case even on requests that I tried to build specifically targeted at my partitioning strategy.

                 

                Does it seem reasonable to think that relying on our indexing in a non-partitioned table (or partitioned in a way only focused on helping ETL) might actually perform better than partitioning in a way that we might get some dynamic pruning, but never static pruning?

                • 5. Re: Partitioning strategy for OBIEE query performance
                  rp0428

                  Does it seem reasonable to think that relying on our indexing in a non-partitioned table (or partitioned in a way only focused on helping ETL) might actually perform better than partitioning in a way that we might get some dynamic pruning, but never static pruning?

                  Yes - standard tables with indexes can often outperform partitioned tables. It all depends on the types of queries and query predicates that are typically used and the numbers of rows that are typically returned.

                   

                  Partition pruning eliminates ENTIRE partitions - regardless of the number of rows in the partition or table. An index, on the other hand, may not be used if the query predicate needs a substantial number of rows since Oracle may determine that the cost is lower to just  use multi-block reads and do a full table scan.

                   

                  A table with 1 million rows and a query predicate that wants 100K of them likely won't use an index at all. But the same table with two partitions could easily have one of the partitions pruned making the 'effective row count' only 500k or even less.

                   

                  If you are partitioning for performance you need to test your critical queries to make sure that partitioning/pruning is effective for them.

                   

                  select sum(boxbase)

                  from TEST_RESPONSE_COE_JOB_QTR A    

                       join DIM_STUDY C on A.job_id = C.job_id    

                        join DIM_TIME B on A.response_time_id = B.time_id

                  where C.job_name = 'FY14 CSAT'

                  and B.fiscal_quarter_name = 'Quarter 1';

                  So what is a typical value for 'A.response_time_id'? What does a 'B.time_id' represent?

                   

                  Because one way to provide explicit partition keys could be to use a range of 'response_time_id' values from the FACT table instead of a 'fiscal_quarter_name' value from the DIMENSION table.

                   

                  So 'Quarter 1' might correspond to a date range of '01/01/yyyy' to '03/31/yyyy'.

                   

                  You also said this about the partitioning: JOB_ID and TIME_ID

                   

                  But if your queries are mostly about DATES/TIMES you might be better using TIME_ID for the PARTITIONS and JOB_ID, if needed at all, for the subpartitioning.

                   

                  Date-Range partitioning is one of the most common around.and is used for both performance and ease of maintenance (remove/archive old data).

                  • 6. Re: Partitioning strategy for OBIEE query performance
                    PBizme

                    Thanks for all your responses.  I think I've got a much better understanding of what's going on and whether or not our current setup is performing as expected.  I think we had an unrealistic expectation that partitioning was going to be a magical solution to cutting response times.

                     

                    We've got several things complicating our setup so it may be difficult to come up with anything that will be worthwhile without hurting performance in a large percentage of queries.  I think I have more knowledge to evaluate the options now though.

                     

                    I can give a little more background in case you are curious.  The reason I am partitioning on JOB_ID is that I'm using partition exchange to load the fact table one job at a time.  I know that Date would be a much more typical setup, but in our case we will very frequently be changing rows that are several years old so that won't help us.  The changes that cause this are based on jobs, not anything specific to the date (meaning rows from 1/1/2012 may need to be reloaded for one job, but not for another on any given night).

                     

                    The specific query I posted was just a really simple query I wrote and not something that would likely be running against our warehouse.  We're using OBIEE on top of this WH and we're up to nearly 50 dimensions with several dashboards that allow users to filter on 20+ different dimensions if they want to.  So our queries are generated by OBIEE and normally several hundred lines long involving many tables.

                     

                    I don't know if we'll be able to come up with anything more useful than just partitioning on JOB_ID for the ETL and eliminating any sub-partitioning.  It may be that our focus for request performance will have to be more focused on good indexing along with a series of aggregate tables or Essbase/OLAP.

                    • 7. Re: Partitioning strategy for OBIEE query performance
                      rp0428
                      The reason I am partitioning on JOB_ID is that I'm using partition exchange to load the fact table one job at a time.  I know that Date would be a much more typical setup, but in our case we will very frequently be changing rows that are several years old so that won't help us.  The changes that cause this are based on jobs, not anything specific to the date (meaning rows from 1/1/2012 may need to be reloaded for one job, but not for another on any given night).

                      The comments I made above about using ranges from the FACT table instead of values from a DIMENSION apply to JOB_ID also.

                      join DIM_STUDY C on A.job_id = C.job_id   

                      . . .

                      where C.job_name = 'FY14 CSAT'

                      To prune that DIMENSION 'job_name' needs to correlate to specific FACT 'job_id' values that Oracle can use to prune partitions. That is the key to get pruning - to make sure that correlation is done.

                       

                      You should be getting the pruning now but you will have to test to make sure since you didn't really post the actual queries you are using.

                       

                      Other options are to use JOB_NAME in the fact table instead of, or in addition to, JOB_ID. If the data was static you could possibly use a BITMAP JOIN index which would actually index the fact table based on JOB_NAME in the dimension table. Such indexes can't be recommended for tables with more than minimal DML.

                       

                      See 'Using Bitmap Join Indexes in Data Warehouses' in the Data Warehousing doc

                      http://docs.oracle.com/cd/B28359_01/server.111/b28313/indexes.htm#sthref103

                      Using Bitmap Join Indexes in Data Warehouses

                      In addition to a bitmap index on a single table, you can create a bitmap join index, which is a bitmap index for the join of two or more tables. In a bitmap join index, the bitmap for the table to be indexed is built for values coming from the joined tables. In a data warehousing environment, the join condition is an equi-inner join between the primary key column or columns of the dimension tables and the foreign key column or columns in the fact table.

                       

                      A bitmap join index can improve the performance by an order of magnitude. By storing the result of a join, the join can be avoided completely for SQL statements using a bitmap join index. Furthermore, since it is most likely to have a much smaller number of distinct values for a bitmap join index compared to a regular bitmap index on the join column, the bitmaps compress better, yielding to less space consumption than a regular bitmap index on the join column.

                       

                      Bitmap join indexes are much more efficient in storage than materialized join views, an alternative for materializing joins in advance. This is because the materialized join views do not compress the rowids of the fact tables

                      • 8. Re: Partitioning strategy for OBIEE query performance
                        PBizme

                        Well JOB_ID is the PK of DIM_STUDY and JOB_NAME is unique to DIM_STUDY.  There is also an index on JOB_ID, JOB_NAME in DIM_STUDY.  I do see the stuff in explain plans of my real queries that indicate the dynamic pruning is happening.  My confusion just came in when I realized that my partitioned tables were slower even when I saw that stuff after we were expecting performance gains from it.

                         

                        I've thought about the potential for moving a value out of a dimension and into the fact tables, but I think we have some valid reasons that we probably wouldn't want to do that because of what it would mean to our BI setup.

                         

                        I've tested a little bit with bitmap join indexes on top of our existing indexes (every FK in the fact table has a bitmap index already) and in my limited testing they weren't used by my queries.  It's something to keep in mind though.