1 2 Previous Next 23 Replies Latest reply on May 2, 2017 2:31 PM by Jonathan Lewis Go to original post
      • 15. Re: query re-write
        AndrewSayer

        Jonathan Lewis wrote:

         

        Andrew Sayer wrote:

         

        I don't know if the hash join is implied by this, the optimizer seems to expect few rows from workflow_report, these could drive a cheapish nested loop using just the report_id index.

         

        You may be right, but it's a little more complex than just "a few rows".

         

        Because the current path involves an existence subquery the optimizer will have used first_rows(1) optimization on the subquery so the rows value will always be (2 - should be 1) however many rows there are that match the subquery predicate. Factor in the cost of the subquery full tablescan, though, which will be pro-rated because of the first_rows(1) optimisation and we see that Oracle expects to do (on average) a lot of work before finding those 2 (1) row(s) - which does, indeed, suggest that there may be a very small number of rows matching the subquery predicate. (A cost of 14,100 when there are 1.4M rows in the table looks very high even if the rows are quite large).

         

        And just to be three-handed: the row counts for the two tables suggest an average of 50 rows in workflow_step_report for every row in workflow_report - so the costs suggest that there'd have to 10s of thousands of driving rows and a really bad clustering_factor (which may be the case depending on the partitioning column) before Oracle would think that a hash join was a better bet than a nested loop.

         

         

        Regards

        Jonathan Lewis

        Thanks for that follow up Jonathan. I've not head of first_rows(1) optimization being used for existence subqueries before, but it makes sense.

         

        It could well be the case that, because the subquery predicate is on a date that I would expect to be at the upper end of the stored values, the value is out of range of the statistics so the optimizer doesn't believe it will ever find matching rows. Although having said that, I had another read of the first query, the plans we have were filtering on a function of the date column there and an equality on user_id which could easily combine and give a really low selectivity. Of course, such low selectivity is typical of where an index could help - but there obviously may be no need (for now).

        • 16. Re: query re-write
          Jonathan Lewis

          I've not head of first_rows(1) optimization being used for existence subqueries before, but it makes sense.

           

          I've mentioned it a couple of times on my blog - here's one example: https://jonathanlewis.wordpress.com/2010/12/29/existence/ with worked example.

           

          Regards

          Jonathan Lewis

          • 17. Re: query re-write
            Nimish Garg

            try creating following 2 indexes

            1. smp_dbuser2.workflow_step_report(report_ID)

            2. smp_dbuser2.workflow_report(trunc(start_time), user_id)

                or

              smp_dbuser2.workflow_report(trunc(start_time), user_id, report_id)

            and modify your condition to

            trunc(start_time) = to_date('28-Apr-2017','dd-Mon-YYYY')

            • 18. Re: query re-write
              AndrewSayer

              Nimish Garg wrote:

               

              try creating following 2 indexes

              1. smp_dbuser2.workflow_step_report(report_ID)

              2. smp_dbuser2.workflow_report(trunc(start_time), user_id)

              or

              smp_dbuser2.workflow_report(trunc(start_time), user_id, report_id)

              and modify your condition to

              trunc(start_time) = to_date('28-Apr-2017','dd-Mon-YYYY')

              Whats wrong with keeping the start_time filter as is and therefore not requiring this FBI?

              (user_id, start_time) would provide a suitable path easily.

               

              If an FBI is used then youre constrained to using the expression if you want this index to be used, what if you know the exact start_time value? Or more likely know it has to be between certain times on a certain day. The other problem with an FBI is the hidden column it creates doesn't get any stats until you gather them, and I find that it's too easy to forget to do that.

              • 19. Re: query re-write
                Jonathan Lewis

                Frd,

                 

                Thanks for the update. I didn't notice it yesterday or I'd have mentioned it sooner.

                 

                Just to satisfy my curiosity (and Andrew's as well, perhaps) could you post the new execution plan. I'm guessing it's either a hash join or a nested loop join with workflow_report appearing first in the execution plan.  I also suspect that Andrew's probably right and that it will be a nested loop into an index-only access to the (report_id) index on workflow_step_report.

                 

                Regards

                Jonathan Lewis

                • 20. Re: query re-write
                  Jonathan Lewis

                  Nimish,

                   

                  The OP hasn't marked the question as answered - but it has been, and the problem was that there was an error in the choice of column name in the subquery.

                   

                  Regards

                  Jonathan Lewis

                  • 21. Re: query re-write
                    3191197

                    Hi Jonathan,

                     

                    Please find below explain plan after change my query according you / john suggestions..!!

                     

                    Plan hash value: 1974594066

                     

                    ----------------------------------------------------------------------------------------------------------------------------

                    | Id  | Operation                 | Name                           | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |

                    ----------------------------------------------------------------------------------------------------------------------------

                    |   0 | SELECT STATEMENT          |                                |     1 |    31 | 22760   (1)| 00:04:34 |       |       |

                    |   1 |  SORT AGGREGATE           |                                |     1 |    31 |            |          |       |       |

                    |   2 |   NESTED LOOPS            |                                | 34458 |  1043K| 22760   (1)| 00:04:34 |       |       |

                    |   3 |    PARTITION HASH ALL     |                                |    72 |  1800 | 22759   (1)| 00:04:34 |     1 |    32 |

                    |*  4 |     TABLE ACCESS FULL     | WORKFLOW_REPORT                |    72 |  1800 | 22759   (1)| 00:04:34 |     1 |    32 |

                    |   5 |    PARTITION HASH ITERATOR|                                |   477 |  2862 |     1   (0)| 00:00:01 |   KEY |   KEY |

                    |*  6 |     INDEX RANGE SCAN      | WORKFLOW_STEP_REPORT_REPORT_ID |   477 |  2862 |     1   (0)| 00:00:01 |   KEY |   KEY |

                    ----------------------------------------------------------------------------------------------------------------------------

                     

                    Index Range scan .. !!! As you said. still but workflow_report is taking full table scan.. !!

                     

                    Regards

                    Frd

                    • 22. Re: query re-write
                      AndrewSayer

                      3191197 wrote:

                       

                      Hi Jonathan,

                       

                      Please find below explain plan after change my query according you / john suggestions..!!

                       

                      Plan hash value: 1974594066

                       

                      ----------------------------------------------------------------------------------------------------------------------------

                      | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |

                      ----------------------------------------------------------------------------------------------------------------------------

                      | 0 | SELECT STATEMENT | | 1 | 31 | 22760 (1)| 00:04:34 | | |

                      | 1 | SORT AGGREGATE | | 1 | 31 | | | | |

                      | 2 | NESTED LOOPS | | 34458 | 1043K| 22760 (1)| 00:04:34 | | |

                      | 3 | PARTITION HASH ALL | | 72 | 1800 | 22759 (1)| 00:04:34 | 1 | 32 |

                      |* 4 | TABLE ACCESS FULL | WORKFLOW_REPORT | 72 | 1800 | 22759 (1)| 00:04:34 | 1 | 32 |

                      | 5 | PARTITION HASH ITERATOR| | 477 | 2862 | 1 (0)| 00:00:01 | KEY | KEY |

                      |* 6 | INDEX RANGE SCAN | WORKFLOW_STEP_REPORT_REPORT_ID | 477 | 2862 | 1 (0)| 00:00:01 | KEY | KEY |

                      ----------------------------------------------------------------------------------------------------------------------------

                       

                      Index Range scan .. !!! As you said. still but workflow_report is taking full table scan.. !!

                       

                      Regards

                      Frd

                      Is the query fast enough?

                      Is the full tablescan the cause of it being too slow?

                       

                      If you want to avoid it then you can try by creating an index like

                      create index workflow_rep_idx on workflow_report (user_id, start_time)

                       

                      (assuming you're stilling using start_time and not trunc(start_time))


                      If your statistics tell Oracle the index is useful then it will use it.

                      • 23. Re: query re-write
                        Jonathan Lewis

                        Frd,

                         

                        Thanks for that.

                         

                        (It looks like the nested loop was partly due to the low volume of data, then aided by the choice to hash partition workflow_step_report on report_id (at least, that's my guess based on the numbers)).

                         

                        You're bound to get a full tablescan on workflow_report because you don't have an index that could support any other path given the text of the query:

                         

                        SELECT COUNT(*) FROM smp_dbuser2.workflow_step_report wsr

                        WHERE wsr.report_ID IN ( SELECT wr.id

                                                                        FROM smp_dbuser2.workflow_report wr

                                                                        WHERE wr.start_time >= to_date( '28-Apr-2017','dd-mon-yyyy') and wr.start_time < to_date( '28-Apr-2017','dd-mon-yyyy') +1

                                                                        AND wr.user_id = 'nbi_ssc');

                         

                        If you want an indexed access path into workflow_report you'd need an index starting with user_id, or start_time.

                         

                        If you want to avoid visiting the table at all you'd need an index that included both those columns and id - and the order (user_id, start_time, id) would be the best column order (probably compress 1 and local)

                        .

                        On the other hand (a) you said the query takes only 5 seconds, perhaps that good enough already (b) as a general principle you should avoid creating indexes just to address a single query.

                         

                        Regards

                        Jonathan Lewis

                         

                         

                        P.S.  A couple of extra thoughts.  The cost of the tablescan (of 32 partitions) looks rather high, but you complete it in just 5 seconds. It's possible that the time was good because you've been experimenting and testing and running the wrong query so much in the recent past that the whole table was buffered - so production runtimes might not be so go.

                         

                        On the other hand the table is only 1.4M rows - and a cost of 22,759 looks suspiciously high to query a table with such a small number of rows. Unless your rows are rather long (avg_row_len in the order of 100s of bytes) this cost may be a very bad estimate anyway. Normally I wouldn't worry about a tablescan of a table with that small number of rows, but I'd look at the oddity of costing if it were my system.  (Sorry the last para was italic, I don't know how that happened and it won't change back)
                        1 2 Previous Next