Forum Stats

  • 3,826,750 Users
  • 2,260,703 Discussions
  • 7,897,071 Comments

Discussions

Bloom filter and performance

Mohamed Houri
Mohamed Houri Member Posts: 1,252 Bronze Trophy
edited Nov 28, 2013 11:16AM in General Database Discussions

I have the following query:

SELECT 

                obst.obst_id obstructionId

               ,oost.comment_clob CommentClob

               ,chp1.ptcar_no StartPtcar

               ,chp2.ptcar_no EndPtcar

               ,oost.track_code Track

               ,oost.start_date StartPeriod

               ,oost.end_date EndPeriod

               ,oost.doc_no RelaasId

               ,obst.status_code Status

        FROM   T1 obst

             , T2 oost

             , T3 chp1

             , T3 chp2

          where obst.oost_id      = oost.oost_id

          and oost.chp_id_start = chp1.chp_id

          and oost.chp_id_end   = chp2.chp_id

          and obst.obs_stat     = 2

   and add_months(obst.mod_date,13) >= :ld_curr_date

   and oost.start_date between :ld_from_date and :ld_to_date         

          and exists (select 1

                        from T4  obsv

                           , T5  veh

                        where  obsv.veh_id = veh.veh_id

                        and (:ln_opr_id is null

                               OR veh.opr_id = :ln_opr_id)

                        and  obsv.obst_id = obst.obst_id

                        and  veh.ac_number between :ln_min_number and :ln_max_number

                        and  obsv.start_date > obst.start_date

                      )         

         order by obst.obst_id;

Giving the following execution plan where a bloom filter has been used.

----------------------------------------------------------------------------------------------------------------------------------

| Id  | Operation                           | Name                    | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |

----------------------------------------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT                    |                         |      1 |        |     10 |00:02:13.09 |     128K|  94376 |

|   1 |  SORT ORDER BY                      |                         |      1 |      8 |     10 |00:02:13.09 |     128K|  94376 |

|   2 |   NESTED LOOPS                      |                         |      1 |        |     10 |00:02:13.06 |     128K|  94376 |

|   3 |    NESTED LOOPS                     |                         |      1 |      8 |     10 |00:02:13.06 |     128K|  94376 |

|   4 |     NESTED LOOPS                    |                         |      1 |      8 |     10 |00:02:13.06 |     128K|  94376 |

|*  5 |      HASH JOIN SEMI                 |                         |      1 |      8 |     10 |00:02:13.06 |     128K|  94376 |

|   6 |       JOIN FILTER CREATE            | :BF0000                 |      1 |        |   1469 |00:00:02.06 |   12430 |     76 |

|   7 |        NESTED LOOPS                 |                         |      1 |        |   1469 |00:00:00.17 |   12430 |     76 |

|   8 |         NESTED LOOPS                |                         |      1 |    307 |   5522 |00:00:00.11 |   10545 |     52 |

|*  9 |          TABLE ACCESS BY INDEX ROWID| T2                      |      1 |    316 |   5522 |00:00:00.07 |    3383 |     46 |

|* 10 |           INDEX RANGE SCAN          | T2_OOST_START_DATE_I    |      1 |   1033 |   8543 |00:00:00.03 |      33 |      3 |

|* 11 |          INDEX RANGE SCAN           | T1_OBST_OOST_DK_I       |   5522 |      1 |   5522 |00:00:00.08 |    7162 |      6 |

|* 12 |         TABLE ACCESS BY INDEX ROWID | T1                      |   5522 |      1 |   1469 |00:00:00.13 |    1885 |     24 |

|  13 |       VIEW                          | VW_SQ_1                 |      1 |  64027 |   1405 |00:00:07.82 |     115K|  94300 |

|* 14 |        FILTER                       |                         |      1 |        |   1405 |00:00:07.82 |     115K|  94300 |

|  15 |         JOIN FILTER USE             | :BF0000                 |      1 |  64027 |   1405 |00:00:07.82 |     115K|  94300 |

|  16 |          PARTITION REFERENCE ALL    |                         |      1 |  64027 |  64027 |00:01:48.22 |     115K|  94300 |

|* 17 |           HASH JOIN                 |                         |     52 |  64027 |  64027 |00:02:03.37 |     115K|  94300 |

|  18 |            TABLE ACCESS FULL        | T4                      |     52 |  64027 |  64027 |00:00:00.34 |    1408 |    280 |

|* 19 |            TABLE ACCESS FULL        | T5                      |     41 |    569K|   5555K|00:02:08.32 |     114K|  94020 |

|  20 |      TABLE ACCESS BY INDEX ROWID    | T3                      |     10 |      1 |     10 |00:00:00.01 |      22 |      0 |

|* 21 |       INDEX UNIQUE SCAN             | T3_CHP_PK               |     10 |      1 |     10 |00:00:00.01 |      12 |      0 |

|* 22 |     INDEX UNIQUE SCAN               | T3_CHP_PK               |     10 |      1 |     10 |00:00:00.01 |      12 |      0 |

|  23 |    TABLE ACCESS BY INDEX ROWID      | T3                      |     10 |      1 |     10 |00:00:00.01 |      10 |      0 |

----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   5 - access("ITEM_1"="OBST"."OBST_ID")

       filter("ITEM_2">"OBST"."START_DATE")

   9 - filter(("OOST"."CHP_ID_START" IS NOT NULL AND "OOST"."CHP_ID_END" IS NOT NULL))

  10 - access("OOST"."START_DATE">=:LD_FROM_DATE AND "OOST"."START_DATE"<=:LD_TO_DATE)

  11 - access("OBST"."OOST_ID"="OOST"."OOST_ID")

  12 - filter(("OBST"."OBS_STAT"=2 AND ADD_MONTHS(INTERNAL_FUNCTION("OBST"."MOD_DATE"),13)>=:LD_CURR_DATE))

  14 - filter((:LN_MIN_ac_number<=:ln_max_number AND TO_DATE(:LD_FROM_DATE)<=TO_DATE(:LD_TO_DATE)))

  17 - access("OBSV"."VEH_ID"="VEH"."VEH_ID")

  19 - filter(((:LN_OPR_ID IS NULL OR "VEH"."OPR_ID"=:LN_OPR_ID) AND "VEH"."ac_number">=:LN_MIN_ac_number AND

              "VEH"."ac_number"<=:ln_max_number))

  21 - access("OOST"."CHP_ID_START"="CHP1"."CHP_ID")

  22 - access("OOST"."CHP_ID_END"="CHP2"."CHP_ID")

This query completes in a not acceptable response time of more than 2 minutes and this is why it has been handled to me for possible improvement.

As you might have already pointed it out from the above execution plan, the operation number 19 is the most time consuming operation (TABLE ACCESS FULL of T5). The query without the EXISTS part represents our data. This part of the query is very fast. It gives about 1500 rows. However, when we want to limit those 1500 records to only records satisfying the exists clause (10 records), the query starts not performing very well because of this full table scan on T5 (5,5 millions).

The ideal situation would have been to :

(a) first join table T5(alias veh) with the table T4(alias obs) using the join condition (predicate 17) limiting the number of record from T5 table

(b) and then filter from T5 table (predicate 19)

But in this case the CBO is starting by a costly full scan of the entire T5 table (5,5 millions of rows operation 19) before sending this enormous amount of data to the HASH JOIN operation 17 which finally throw away the majority of T5 rows not satisfying the join condition : access("OBSV"."VEH_ID"="VEH"."VEH_ID")

I have already verified that the table statistics and their join column statistics are up-to-date and are reflecting the reality. But failed to know how to let the CBO automatically doing what I want it to do i.e. Join first and then filter.

When I discussed with the client we agreed to re-factor a little bit this query so that it is now completing in few seconds.

The new plan is not using bloom filter and the veh table is accessed via its unique index on VEH_ID (18 - access("OBSV"."VEH_ID"="VEH"."VEH_ID")) before applying the filter operation. Exactly as I wanted it to be in the original query

Have you any idea on how to change this join method/order so that the full table scan is post poned? of course without refactoring the query

Do you think that bloom filter is the culprit here?

Thanks in advance

Mohamed Houri

PS : and  veh.ac_number between :ln_min_number and :ln_max_number

These two imput parameters are beyound the known low and high value. But even when I used min and max value taken from the table (min = low_value from stats and max = high value from stat), the plan didn't changed

Answers

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,975 Blue Diamond

    Hard to see exactly why you get your problem - it's an example of where the predicted cost is helpful in explaining the choice of plan.

    It sounds as if you need Oracle either to do a nested loop semi-join, or not unnest and push the filter subquery.  The optimizer may calculate that doing the join/subquery the relevant number of times is more expensive than doing the scan and hash join once.

    I'd start with a couple of hints in the subquery:  /*+ no_unnest push_subq */ to see what happened - this might push the subquery too early, though.

    Regards

    Jonathan Lewis

    Jonathan Lewis
  • Mohamed Houri
    Mohamed Houri Member Posts: 1,252 Bronze Trophy
    edited Nov 28, 2013 11:16AM

    Jonathan,

    I got the plan I was looking for.

    It is not exactly the same as the execution plan that corresponds to the re-factored query but it is what I wanted to have.

    SELECT  /*+ gather_plan_statistics */

                   obst.obst_id obstructionId

                   ,oost.comment_clob CommentClob

                   ,chp1.ptcar_no StartPtcar

                   ,chp2.ptcar_no EndPtcar

                   ,oost.track_code Track

                   ,oost.start_date StartPeriod

                   ,oost.end_date EndPeriod

                   ,oost.doc_no RelaasId

                   ,obst.status_code Status

            FROM   T1 obst

                 , T2 oost

                 , T3 chp1

                 , T3 chp2

            where obst.oost_id = oost.oost_id

              and oost.chp_id_start = chp1.chp_id

              and oost.chp_id_end = chp2.chp_id

              and obst.obs_stat = 2 -- only reeel       

                    and exists (select /*+ no_unnest push_subq */ 1

                            from T4 obsv

                               , T5 veh

                            where obsv.veh_id = veh.veh_id

                              and (null is null

                                   OR veh.opr_id = null

                                  )

                              and  obsv.obst_id = obst.obst_id

                              and veh.ac_number between 1 and 99999

                              and obsv.start_date > obst.start_date

                          )

              and oost.start_date between to_date ('20/11/2013','DD/MM/YYYY') and to_date ('27/11/2013','DD/MM/YYYY')

              and add_months(obst.mod_date,13) >= sysdate

             order by obst.obst_id

            ;

    ---------------------------------------------------------------------------------------------------------------------

    | Id  | Operation                                 | Name                    | Starts | E-Rows | A-Rows |   A-Time   |

    ---------------------------------------------------------------------------------------------------------------------

    |   0 | SELECT STATEMENT                          |                         |      1 |        |      6 |00:00:00.56 |

    |   1 |  SORT ORDER BY                            |                         |      1 |    254 |      6 |00:00:00.56 |

    |*  2 |   HASH JOIN                               |                         |      1 |    254 |      6 |00:00:00.11 |

    |   3 |    TABLE ACCESS FULL                      | T3                      |      1 |   2849 |   2849 |00:00:00.01 |

    |*  4 |    HASH JOIN                              |                         |      1 |    254 |      6 |00:00:00.11 |

    |   5 |     TABLE ACCESS FULL                     | T3                      |      1 |   2849 |   2849 |00:00:00.01 |

    |   6 |     NESTED LOOPS                          |                         |      1 |        |      6 |00:00:00.10 |

    |   7 |      NESTED LOOPS                         |                         |      1 |    254 |   5012 |00:00:00.09 |

    |*  8 |       TABLE ACCESS BY INDEX ROWID         | T2                      |      1 |    262 |   5012 |00:00:00.06 |

    |*  9 |        INDEX RANGE SCAN                   | T2_OOST_START_DATE_I    |      1 |    857 |   7722 |00:00:00.01 |

    |* 10 |       INDEX RANGE SCAN                    | T1_OBST_OOST_DK_I       |   5012 |      1 |   5012 |00:00:00.03 |

    |* 11 |      TABLE ACCESS BY INDEX ROWID          | T1                      |   5012 |      1 |      6 |00:00:00.48 |

    |  12 |       NESTED LOOPS                        |                         |   1277 |        |      6 |00:00:00.46 |

    |  13 |        NESTED LOOPS                       |                         |   1277 |      2 |      6 |00:00:00.46 |

    |  14 |         PARTITION REFERENCE ALL           |                         |   1277 |      4 |      6 |00:00:00.46 |

    |* 15 |          TABLE ACCESS BY LOCAL INDEX ROWID| T4                      |  66380 |      4 |      6 |00:00:00.43 |

    |* 16 |           INDEX RANGE SCAN                | T4_OBSV_OBST_FK_I       |  66380 |     86 |      6 |00:00:00.28 |

    |* 17 |         INDEX UNIQUE SCAN                 | T5_VEH_PK               |      6 |      1 |      6 |00:00:00.01 |

    |* 18 |        TABLE ACCESS BY GLOBAL INDEX ROWID | T5                      |      6 |      1 |      6 |00:00:00.01 |

    ---------------------------------------------------------------------------------------------------------------------

    Predicate Information (identified by operation id):

    ---------------------------------------------------

       2 - access("OOST"."CHP_ID_END"="CHP2"."CHP_ID")

       4 - access("OOST"."CHP_ID_START"="CHP1"."CHP_ID")

       8 - filter(("OOST"."CHP_ID_START" IS NOT NULL AND "OOST"."CHP_ID_END" IS NOT NULL))

       9 - access("OOST"."START_DATE">=TO_DATE(' 2013-11-20 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "OOST"."START_DATE"<=TO_DATE(' 2013-11-27

                  00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

      10 - access("OBST"."OOST_ID"="OOST"."OOST_ID")

      11 - filter(("OBST"."OBS_STAT"=2 AND ADD_MONTHS(INTERNAL_FUNCTION("OBST"."MOD_DATE"),13)>[email protected]! AND  IS NOT NULL))

      15 - filter("OBSV"."START_DATE">:B1)

      16 - access("OBSV"."OBST_ID"=:B1)

      17 - access("OBSV"."VEH_ID"="VEH"."VEH_ID")

      18 - filter(("VEH"."ac_number">=1 AND "VEH"."ac_number"<=99999))


    Spot how the predicate 17 and 18 are now in the position I wanted them to be i.e probe first the T5 unique index using the join condition and then filter the table T5


    But, there is always a but, why am I obliged to hint?


    Where did I got something wrong (statistics, optimizer parameter, etc...) so that the CBO did not opted for this execution plan without a hint?


    Thanks


    Mohamed Houri


  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,975 Blue Diamond

    Interesting plan - the optimizer did what I thought it might but the presentation isn't what I was expecting (the Filter subquery starting at line 12 is obscured in a way that would make OEM - every other GUI I know of - report the execution order incorrectly) and I wasn't expecting the nested loops to T3 to turn into hash joins with swapped join order.

    Regards

    Jonathan Lewis

  • Hi Mohamed.

    What is your oracle version?

    Did you post all predicate options of the last plan here? (Because i can not see filter-(EXISTS) part).

    MohamedHouri wrote:
    
    But, there is always a but, why am I obliged to hint?
    Where did I got something wrong (statistics, optimizer parameter, etc...) so that the CBO did not opted for this execution plan without a hint?
    Thanks
    Mohamed Houri

    CBO select access paths (join methods) based on cost. If here oracle do not select as your desired access method(or join) then i believe here is cardinality( selectivity estimation) is a matter. Also your second plan(without unnesting) i do not think is optimal. Here is estimated and actual rows are significant different.

    Can you post execution plan of both queries(with and without unnesting) including cost column.

    It can be very good if you post(upload) here 10053 trace file of both queries(then we can say something more correctly).

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,975 Blue Diamond

    The existence subquery appears from line 12 onwards of the plan. There is no visible FILTER operation because that's what happens when subquery pushing takes place; but the predicate at line 11 shows the link to the subquery "AND   IS NOT NULL", although the subquery text itself is missing because that's a defect of the way that subquery filter predicates are (not) stored properly in v$sql_plan.

    Regards

    Jonathan Lewis

  • Mohamed Houri
    Mohamed Houri Member Posts: 1,252 Bronze Trophy

    @Chinar

    My oracle release is :

    Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production

    @jonathan,

    I have added the suggested hint and promoted the package in production. The web service calling this query is working perfectly since then. Thanks for that.

    The table T5 has a single column primary key index on veh_id (join column) and a single index on the ac_number (the filtering column). I am thinking for a final solution which consists of gathering extended stats on those two columns and may be creating also an index on both columns also.

    I will try it and keep the thread updated.

    Best regards

    Mohamed Houri

This discussion has been closed.