6 Replies Latest reply on Nov 28, 2013 4:16 PM by Mohamed Houri

    Bloom filter and performance

    Mohamed Houri

      I have the following query:

       

      SELECT 

                      obst.obst_id obstructionId

                     ,oost.comment_clob CommentClob

                     ,chp1.ptcar_no StartPtcar

                     ,chp2.ptcar_no EndPtcar

                     ,oost.track_code Track

                     ,oost.start_date StartPeriod

                     ,oost.end_date EndPeriod

                     ,oost.doc_no RelaasId

                     ,obst.status_code Status

              FROM   T1 obst

                   , T2 oost

                   , T3 chp1

                   , T3 chp2

                where obst.oost_id      = oost.oost_id

                and oost.chp_id_start = chp1.chp_id

                and oost.chp_id_end   = chp2.chp_id

                and obst.obs_stat     = 2

         and add_months(obst.mod_date,13) >= :ld_curr_date

         and oost.start_date between :ld_from_date and :ld_to_date         

                and exists (select 1

                              from T4  obsv

                                 , T5  veh

                              where  obsv.veh_id = veh.veh_id

                              and (:ln_opr_id is null

                                     OR veh.opr_id = :ln_opr_id)

                              and  obsv.obst_id = obst.obst_id

                              and  veh.ac_number between :ln_min_number and :ln_max_number

                              and  obsv.start_date > obst.start_date

                            )         

               order by obst.obst_id;

       

      Giving the following execution plan where a bloom filter has been used.

       

      ----------------------------------------------------------------------------------------------------------------------------------

      | Id  | Operation                           | Name                    | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |

      ----------------------------------------------------------------------------------------------------------------------------------

      |   0 | SELECT STATEMENT                    |                         |      1 |        |     10 |00:02:13.09 |     128K|  94376 |

      |   1 |  SORT ORDER BY                      |                         |      1 |      8 |     10 |00:02:13.09 |     128K|  94376 |

      |   2 |   NESTED LOOPS                      |                         |      1 |        |     10 |00:02:13.06 |     128K|  94376 |

      |   3 |    NESTED LOOPS                     |                         |      1 |      8 |     10 |00:02:13.06 |     128K|  94376 |

      |   4 |     NESTED LOOPS                    |                         |      1 |      8 |     10 |00:02:13.06 |     128K|  94376 |

      |*  5 |      HASH JOIN SEMI                 |                         |      1 |      8 |     10 |00:02:13.06 |     128K|  94376 |

      |   6 |       JOIN FILTER CREATE            | :BF0000                 |      1 |        |   1469 |00:00:02.06 |   12430 |     76 |

      |   7 |        NESTED LOOPS                 |                         |      1 |        |   1469 |00:00:00.17 |   12430 |     76 |

      |   8 |         NESTED LOOPS                |                         |      1 |    307 |   5522 |00:00:00.11 |   10545 |     52 |

      |*  9 |          TABLE ACCESS BY INDEX ROWID| T2                      |      1 |    316 |   5522 |00:00:00.07 |    3383 |     46 |

      |* 10 |           INDEX RANGE SCAN          | T2_OOST_START_DATE_I    |      1 |   1033 |   8543 |00:00:00.03 |      33 |      3 |

      |* 11 |          INDEX RANGE SCAN           | T1_OBST_OOST_DK_I       |   5522 |      1 |   5522 |00:00:00.08 |    7162 |      6 |

      |* 12 |         TABLE ACCESS BY INDEX ROWID | T1                      |   5522 |      1 |   1469 |00:00:00.13 |    1885 |     24 |

      |  13 |       VIEW                          | VW_SQ_1                 |      1 |  64027 |   1405 |00:00:07.82 |     115K|  94300 |

      |* 14 |        FILTER                       |                         |      1 |        |   1405 |00:00:07.82 |     115K|  94300 |

      |  15 |         JOIN FILTER USE             | :BF0000                 |      1 |  64027 |   1405 |00:00:07.82 |     115K|  94300 |

      |  16 |          PARTITION REFERENCE ALL    |                         |      1 |  64027 |  64027 |00:01:48.22 |     115K|  94300 |

      |* 17 |           HASH JOIN                 |                         |     52 |  64027 |  64027 |00:02:03.37 |     115K|  94300 |

      |  18 |            TABLE ACCESS FULL        | T4                      |     52 |  64027 |  64027 |00:00:00.34 |    1408 |    280 |

      |* 19 |            TABLE ACCESS FULL        | T5                      |     41 |    569K|   5555K|00:02:08.32 |     114K|  94020 |

      |  20 |      TABLE ACCESS BY INDEX ROWID    | T3                      |     10 |      1 |     10 |00:00:00.01 |      22 |      0 |

      |* 21 |       INDEX UNIQUE SCAN             | T3_CHP_PK               |     10 |      1 |     10 |00:00:00.01 |      12 |      0 |

      |* 22 |     INDEX UNIQUE SCAN               | T3_CHP_PK               |     10 |      1 |     10 |00:00:00.01 |      12 |      0 |

      |  23 |    TABLE ACCESS BY INDEX ROWID      | T3                      |     10 |      1 |     10 |00:00:00.01 |      10 |      0 |

      ----------------------------------------------------------------------------------------------------------------------------------

       

       

      Predicate Information (identified by operation id):

      ---------------------------------------------------

         5 - access("ITEM_1"="OBST"."OBST_ID")

             filter("ITEM_2">"OBST"."START_DATE")

         9 - filter(("OOST"."CHP_ID_START" IS NOT NULL AND "OOST"."CHP_ID_END" IS NOT NULL))

        10 - access("OOST"."START_DATE">=:LD_FROM_DATE AND "OOST"."START_DATE"<=:LD_TO_DATE)

        11 - access("OBST"."OOST_ID"="OOST"."OOST_ID")

        12 - filter(("OBST"."OBS_STAT"=2 AND ADD_MONTHS(INTERNAL_FUNCTION("OBST"."MOD_DATE"),13)>=:LD_CURR_DATE))

        14 - filter((:LN_MIN_ac_number<=:ln_max_number AND TO_DATE(:LD_FROM_DATE)<=TO_DATE(:LD_TO_DATE)))

        17 - access("OBSV"."VEH_ID"="VEH"."VEH_ID")

        19 - filter(((:LN_OPR_ID IS NULL OR "VEH"."OPR_ID"=:LN_OPR_ID) AND "VEH"."ac_number">=:LN_MIN_ac_number AND

                    "VEH"."ac_number"<=:ln_max_number))

        21 - access("OOST"."CHP_ID_START"="CHP1"."CHP_ID")

        22 - access("OOST"."CHP_ID_END"="CHP2"."CHP_ID")

       

      This query completes in a not acceptable response time of more than 2 minutes and this is why it has been handled to me for possible improvement.

       

      As you might have already pointed it out from the above execution plan, the operation number 19 is the most time consuming operation (TABLE ACCESS FULL of T5). The query without the EXISTS part represents our data. This part of the query is very fast. It gives about 1500 rows. However, when we want to limit those 1500 records to only records satisfying the exists clause (10 records), the query starts not performing very well because of this full table scan on T5 (5,5 millions).

       

      The ideal situation would have been to :

       

      (a) first join table T5(alias veh) with the table T4(alias obs) using the join condition (predicate 17) limiting the number of record from T5 table

      (b) and then filter from T5 table (predicate 19)

       

      But in this case the CBO is starting by a costly full scan of the entire T5 table (5,5 millions of rows operation 19) before sending this enormous amount of data to the HASH JOIN operation 17 which finally throw away the majority of T5 rows not satisfying the join condition : access("OBSV"."VEH_ID"="VEH"."VEH_ID")

       

      I have already verified that the table statistics and their join column statistics are up-to-date and are reflecting the reality. But failed to know how to let the CBO automatically doing what I want it to do i.e. Join first and then filter.

       

      When I discussed with the client we agreed to re-factor a little bit this query so that it is now completing in few seconds.

       

      The new plan is not using bloom filter and the veh table is accessed via its unique index on VEH_ID (18 - access("OBSV"."VEH_ID"="VEH"."VEH_ID")) before applying the filter operation. Exactly as I wanted it to be in the original query

       

      Have you any idea on how to change this join method/order so that the full table scan is post poned? of course without refactoring the query

       

      Do you think that bloom filter is the culprit here?

       

      Thanks in advance

       

      Mohamed Houri

       

      PS : and  veh.ac_number between :ln_min_number and :ln_max_number

      These two imput parameters are beyound the known low and high value. But even when I used min and max value taken from the table (min = low_value from stats and max = high value from stat), the plan didn't changed

        • 1. Re: Bloom filter and performance
          Jonathan Lewis

          Hard to see exactly why you get your problem - it's an example of where the predicted cost is helpful in explaining the choice of plan.

           

          It sounds as if you need Oracle either to do a nested loop semi-join, or not unnest and push the filter subquery.  The optimizer may calculate that doing the join/subquery the relevant number of times is more expensive than doing the scan and hash join once.

           

          I'd start with a couple of hints in the subquery:  /*+ no_unnest push_subq */ to see what happened - this might push the subquery too early, though.

           

          Regards

          Jonathan Lewis

          1 person found this helpful
          • 2. Re: Bloom filter and performance
            Mohamed Houri

            Jonathan,

             

            I got the plan I was looking for.

             

            It is not exactly the same as the execution plan that corresponds to the re-factored query but it is what I wanted to have.

             

             

            SELECT  /*+ gather_plan_statistics */

                           obst.obst_id obstructionId

                           ,oost.comment_clob CommentClob

                           ,chp1.ptcar_no StartPtcar

                           ,chp2.ptcar_no EndPtcar

                           ,oost.track_code Track

                           ,oost.start_date StartPeriod

                           ,oost.end_date EndPeriod

                           ,oost.doc_no RelaasId

                           ,obst.status_code Status

                    FROM   T1 obst

                         , T2 oost

                         , T3 chp1

                         , T3 chp2

                    where obst.oost_id = oost.oost_id

                      and oost.chp_id_start = chp1.chp_id

                      and oost.chp_id_end = chp2.chp_id

                      and obst.obs_stat = 2 -- only reeel       

                            and exists (select /*+ no_unnest push_subq */ 1

                                    from T4 obsv

                                       , T5 veh

                                    where obsv.veh_id = veh.veh_id

                                      and (null is null

                                           OR veh.opr_id = null

                                          )

                                      and  obsv.obst_id = obst.obst_id

                                      and veh.ac_number between 1 and 99999

                                      and obsv.start_date > obst.start_date

                                  )

                      and oost.start_date between to_date ('20/11/2013','DD/MM/YYYY') and to_date ('27/11/2013','DD/MM/YYYY')

                      and add_months(obst.mod_date,13) >= sysdate

                     order by obst.obst_id

                    ;

             

            ---------------------------------------------------------------------------------------------------------------------

            | Id  | Operation                                 | Name                    | Starts | E-Rows | A-Rows |   A-Time   |

            ---------------------------------------------------------------------------------------------------------------------

            |   0 | SELECT STATEMENT                          |                         |      1 |        |      6 |00:00:00.56 |

            |   1 |  SORT ORDER BY                            |                         |      1 |    254 |      6 |00:00:00.56 |

            |*  2 |   HASH JOIN                               |                         |      1 |    254 |      6 |00:00:00.11 |

            |   3 |    TABLE ACCESS FULL                      | T3                      |      1 |   2849 |   2849 |00:00:00.01 |

            |*  4 |    HASH JOIN                              |                         |      1 |    254 |      6 |00:00:00.11 |

            |   5 |     TABLE ACCESS FULL                     | T3                      |      1 |   2849 |   2849 |00:00:00.01 |

            |   6 |     NESTED LOOPS                          |                         |      1 |        |      6 |00:00:00.10 |

            |   7 |      NESTED LOOPS                         |                         |      1 |    254 |   5012 |00:00:00.09 |

            |*  8 |       TABLE ACCESS BY INDEX ROWID         | T2                      |      1 |    262 |   5012 |00:00:00.06 |

            |*  9 |        INDEX RANGE SCAN                   | T2_OOST_START_DATE_I    |      1 |    857 |   7722 |00:00:00.01 |

            |* 10 |       INDEX RANGE SCAN                    | T1_OBST_OOST_DK_I       |   5012 |      1 |   5012 |00:00:00.03 |

            |* 11 |      TABLE ACCESS BY INDEX ROWID          | T1                      |   5012 |      1 |      6 |00:00:00.48 |

            |  12 |       NESTED LOOPS                        |                         |   1277 |        |      6 |00:00:00.46 |

            |  13 |        NESTED LOOPS                       |                         |   1277 |      2 |      6 |00:00:00.46 |

            |  14 |         PARTITION REFERENCE ALL           |                         |   1277 |      4 |      6 |00:00:00.46 |

            |* 15 |          TABLE ACCESS BY LOCAL INDEX ROWID| T4                      |  66380 |      4 |      6 |00:00:00.43 |

            |* 16 |           INDEX RANGE SCAN                | T4_OBSV_OBST_FK_I       |  66380 |     86 |      6 |00:00:00.28 |

            |* 17 |         INDEX UNIQUE SCAN                 | T5_VEH_PK               |      6 |      1 |      6 |00:00:00.01 |

            |* 18 |        TABLE ACCESS BY GLOBAL INDEX ROWID | T5                      |      6 |      1 |      6 |00:00:00.01 |

            ---------------------------------------------------------------------------------------------------------------------

             

             

            Predicate Information (identified by operation id):

            ---------------------------------------------------

               2 - access("OOST"."CHP_ID_END"="CHP2"."CHP_ID")

               4 - access("OOST"."CHP_ID_START"="CHP1"."CHP_ID")

               8 - filter(("OOST"."CHP_ID_START" IS NOT NULL AND "OOST"."CHP_ID_END" IS NOT NULL))

               9 - access("OOST"."START_DATE">=TO_DATE(' 2013-11-20 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "OOST"."START_DATE"<=TO_DATE(' 2013-11-27

                          00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

              10 - access("OBST"."OOST_ID"="OOST"."OOST_ID")

              11 - filter(("OBST"."OBS_STAT"=2 AND ADD_MONTHS(INTERNAL_FUNCTION("OBST"."MOD_DATE"),13)>=SYSDATE@! AND  IS NOT NULL))

              15 - filter("OBSV"."START_DATE">:B1)

              16 - access("OBSV"."OBST_ID"=:B1)

              17 - access("OBSV"."VEH_ID"="VEH"."VEH_ID")

              18 - filter(("VEH"."ac_number">=1 AND "VEH"."ac_number"<=99999))


            Spot how the predicate 17 and 18 are now in the position I wanted them to be i.e probe first the T5 unique index using the join condition and then filter the table T5


            But, there is always a but, why am I obliged to hint?


            Where did I got something wrong (statistics, optimizer parameter, etc...) so that the CBO did not opted for this execution plan without a hint?


            Thanks


            Mohamed Houri


            • 3. Re: Bloom filter and performance
              Jonathan Lewis

              Interesting plan - the optimizer did what I thought it might but the presentation isn't what I was expecting (the Filter subquery starting at line 12 is obscured in a way that would make OEM - every other GUI I know of - report the execution order incorrectly) and I wasn't expecting the nested loops to T3 to turn into hash joins with swapped join order.

               

              Regards

              Jonathan Lewis

              • 4. Re: Bloom filter and performance
                Aliyev Chinar

                Hi Mohamed.

                What is your oracle version?

                Did you post all predicate options of the last plan here? (Because i can not see filter-(EXISTS) part).

                 

                MohamedHouri wrote:


                But, there is always a but, why am I obliged to hint?


                Where did I got something wrong (statistics, optimizer parameter, etc...) so that the CBO did not opted for this execution plan without a hint?


                Thanks


                Mohamed Houri


                 

                CBO select access paths (join methods) based on cost. If here oracle do not select as your desired access method(or join) then i believe here is cardinality( selectivity estimation) is a matter. Also your second plan(without unnesting) i do not think is optimal. Here is estimated and actual rows are significant different.

                Can you post execution plan of both queries(with and without unnesting) including cost column.

                It can be very good if you post(upload) here 10053 trace file of both queries(then we can say something more correctly).

                • 5. Re: Bloom filter and performance
                  Jonathan Lewis

                  The existence subquery appears from line 12 onwards of the plan. There is no visible FILTER operation because that's what happens when subquery pushing takes place; but the predicate at line 11 shows the link to the subquery "AND   IS NOT NULL", although the subquery text itself is missing because that's a defect of the way that subquery filter predicates are (not) stored properly in v$sql_plan.

                   

                  Regards

                  Jonathan Lewis

                  • 6. Re: Bloom filter and performance
                    Mohamed Houri

                    @Chinar

                    My oracle release is :

                     

                    Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production

                     

                    @jonathan,

                     

                    I have added the suggested hint and promoted the package in production. The web service calling this query is working perfectly since then. Thanks for that.

                     

                    The table T5 has a single column primary key index on veh_id (join column) and a single index on the ac_number (the filtering column). I am thinking for a final solution which consists of gathering extended stats on those two columns and may be creating also an index on both columns also.

                     

                    I will try it and keep the thread updated.

                     

                    Best regards

                    Mohamed Houri