1 2 Previous Next 23 Replies Latest reply on May 2, 2017 2:31 PM by Jonathan Lewis

    query re-write

    3191197

      Hi Masters,

       

      I have a sql query which is taking ages of time. Need to re-write this query in a optimized mode. Please help me.

       

      SELECT COUNT(*) FROM smp_dbuser2.workflow_step_report

      WHERE report_ID IN ( SELECT report_id

                                               FROM smp_dbuser2.workflow_report

                                               WHERE trunc(start_time) = '28-Apr-2017'

                                                    AND user_id = 'nbi_ssc');

       

      Workflow_step_report info.

      --------------------------------------

      There are 2 indexes on this table (id, report_id) . 737160 978  records in the table. Please help me to write this query in a optimized mode.

       

      Workflow_report info

      ------------------------------

      rows : 1472969

       

      Indexs  : EVENT_LOG_ID, ID , WORKFLOW_EXEC_ID 

       

       

      Plan hash value: 2962753634

       

      --------------------------------------------------------------------------------------------------------------

      | Id  | Operation             | Name                 | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |

      --------------------------------------------------------------------------------------------------------------

      |   0 | SELECT STATEMENT      |                      |     1 |     6 |    10T  (1)|999:59:59 |       |       |

      |   1 |  SORT AGGREGATE       |                      |     1 |     6 |            |          |       |       |

      |*  2 |   FILTER              |                      |       |       |            |          |       |       |

      |   3 |    PARTITION HASH ALL |                      |   731M|  4187M|  5363K  (1)| 17:52:47 |     1 |   128 |

      |   4 |     TABLE ACCESS FULL | WORKFLOW_STEP_REPORT |   731M|  4187M|  5363K  (1)| 17:52:47 |     1 |   128 |

      |*  5 |    FILTER             |                      |       |       |            |          |       |       |

      |   6 |     PARTITION HASH ALL|                      |     2 |    38 | 14161   (1)| 00:02:50 |     1 |    32 |

      |*  7 |      TABLE ACCESS FULL| WORKFLOW_REPORT      |     2 |    38 | 14161   (1)| 00:02:50 |     1 |    32 |

      --------------------------------------------------------------------------------------------------------------

       

      Predicate Information (identified by operation id):

      ---------------------------------------------------

       

         2 - filter( EXISTS (SELECT 0 FROM "SMP_DBUSER2"."WORKFLOW_REPORT" "WORKFLOW_REPORT" WHERE :B1=:B2

                    AND "USER_ID"='nbi_ssc' AND TRUNC(INTERNAL_FUNCTION("START_TIME"))=TO_DATE(' 2017-04-28 00:00:00',

                    'syyyy-mm-dd hh24:mi:ss')))

         5 - filter(:B1=:B2)

         7 - filter("USER_ID"='nbi_ssc' AND TRUNC(INTERNAL_FUNCTION("START_TIME"))=TO_DATE(' 2017-04-28

                    00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

       

       

       

       

      Regards

      Frd

        • 1. Re: query re-write
          John Thorton

          How many rows are returned by query?

           

          post CREATE INDEX for all indexes on all tables involved.

           

          Are statistics current for all table & indexes?

           

          3. How to  improve the performance of my query? / My query is running slow.

          • 2. Re: query re-write
            3191197

            Hi John,

             

            Yes statistics are updated by current date. I have given indexes info. and explain plan for that query. Both tables are going to FULL TABLE SCAN.

            Please help me.

             

            Regards

            Frd

            • 3. Re: query re-write
              John Thorton

              3191197 wrote:

               

              Hi John,

               

              Yes statistics are updated by current date. I have given indexes info. and explain plan for that query. Both tables are going to FULL TABLE SCAN.

              Please help me.

               

              Regards

              Frd

              Since you won't answer my questions, then I won't waste me time with yours.

              • 4. Re: query re-write
                3191197

                Hi John

                 

                Since morning 9:30 a.m I ran this select query. Still it is running. now 1 PM. that is the reason I may not able to provide how many rows this query is processed. Please find below the index scripts.

                 

                CREATE UNIQUE INDEX SMP_DBUSER2.WORKFLOW_REPORT_PKEY ON SMP_DBUSER2.WORKFLOW_REPORT(ID)

                 

                 

                ALTER TABLE SMP_DBUSER2.WORKFLOW_REPORT ADD (

                  CHECK ("ID" IS NOT NULL),

                  CHECK ("WORKFLOW_ID" IS NOT NULL),

                  CHECK ("WORKFLOW_EXEC_ID" IS NOT NULL),

                  CONSTRAINT WORKFLOW_REPORT_PKEY

                  PRIMARY KEY

                  (ID)

                  USING INDEX LOCAL);

                 

                  -------------------------------------------------------------------------------------------------------

                   

                  CREATE UNIQUE INDEX SMP_DBUSER2.WORKFLOW_STEP_REPORT_PKEY ON SMP_DBUSER2.WORKFLOW_STEP_REPORT(ID);

                   

                  ALTER TABLE SMP_DBUSER2.WORKFLOW_STEP_REPORT ADD (

                  CHECK ("ID" IS NOT NULL),

                  CHECK ("REPORT_ID" IS NOT NULL),

                  CHECK ("STEP_NAME" IS NOT NULL),

                  CONSTRAINT WORKFLOW_STEP_REPORT_PKEY

                  PRIMARY KEY

                  (ID)

                  USING INDEX LOCAL);

                 

                 

                ALTER TABLE SMP_DBUSER2.WORKFLOW_STEP_REPORT ADD (

                  CONSTRAINT WORKFLOW_REPORT_ID_FKEY

                  FOREIGN KEY (REPORT_ID)

                  REFERENCES SMP_DBUSER2.WORKFLOW_REPORT (ID));

                • 5. Re: query re-write
                  Jonathan Lewis

                  According to the execution plan you do not have a report_id in table workflow_report. Looking at your latest post the column referenced in the subquery should probably be ID.

                   

                  Regards

                  Jonathan Lewis

                  1 person found this helpful
                  • 6. Re: query re-write
                    3191197

                    Hi Jonathan

                     

                    This the info. I have on this select query .. Please advsie.

                     

                    Regards

                    FRd

                    • 7. Re: query re-write
                      Andrew Sayer

                      In addition to Jonathan's comment, the two tables to be constrained by a foreign key, so every report_id in one table should exist as an id in the othe, this would allow Oracle to eliminate the join completely and just count the distinct report_id values in workflow_step_report. If Oracle doesn't eliminate the join for you when you write it correctly, you can rewrite it yourself.

                       

                      trunc(start_time) = '28-Apr-2017'

                       

                      is comparing a function of a date column to a string, not good. Try instead

                      start_time >= to_date( '28-Apr-2017','dd-mon-yyyy')

                      and start_time < to_date( '28-Apr-2017','dd-mon-yyyy') +1

                       

                       

                      As your tables are partitioned, it's always useful to explain how and why. Both seem to be partitioned on their own id columns, which doesn't really help you on this query.

                      • 8. Re: query re-write
                        Jonathan Lewis

                        3191197 wrote:

                         

                         

                        This the info. I have on this select query .. Please advsie.

                         

                         

                        Find the person who wrote the report and ask them if they think it will give them the result that they really want. It probably isn't doing what they intended.

                         

                        As a general guideline, give each table in the query a short (3 or 4 letter) alias, and apply the alias to each column used in the query that way you can avoid the risk of the "column capture" type of error that this one appears to be.

                         

                        Regards

                        Jonathan Lewis

                         

                         

                        NOTE: given my observation about the absence of a column called report_id in the workflow_report table I would expect a reply that included a comment saying that there was or wasn't a column of that name in that table.

                        1 person found this helpful
                        • 9. Re: query re-write
                          Jonathan Lewis

                          Andrew,

                           

                          Nice thought, but you've got the parent and child tables the wrong way round. You can eliminate the parent from the join if the only thing reference primary key.

                          In this case the workflow_report table is the parent and it has two non-key columns referenced.

                           

                          With the (assumed) corrected column in the subquery the optimizer should unnest and convert to a hash join, but it can't eliminate the join.

                           

                          Regards

                          Jonathan Lewis

                          • 10. Re: query re-write
                            Andrew Sayer

                            Jonathan Lewis wrote:

                             

                            Andrew,

                             

                            Nice thought, but you've got the parent and child tables the wrong way round. You can eliminate the parent from the join if the only thing reference primary key.

                            In this case the workflow_report table is the parent and it has two non-key columns referenced.

                             

                            With the (assumed) corrected column in the subquery the optimizer should unnest and convert to a hash join, but it can't eliminate the join.

                             

                            Regards

                            Jonathan Lewis

                            Doh! Yes, I must have read it the wrong way round.

                             

                            I don't know if the hash join is implied by this, the optimizer seems to expect few rows from workflow_report, these could drive a cheapish nested loop using just the report_id index.

                            • 11. Re: query re-write
                              3191197

                              Hi Jonathan,

                               

                              Yes, your observation is correct . There is no Report_id in workflow_report table. Only Id, workflow_id columns are there. Please advise.

                               

                              Regards

                              Frd

                              • 12. Re: query re-write
                                Andrew Sayer

                                3191197 wrote:

                                 

                                Hi Jonathan,

                                 

                                Yes, your observation is correct . There is no Report_id in workflow_report table. Only Id, workflow_id columns are there. Please advise.

                                 

                                Regards

                                Frd

                                Jonathan - "Looking at your latest post the column referenced in the subquery should probably be ID."

                                 

                                So make the fix to your code.

                                1 person found this helpful
                                • 13. Re: query re-write
                                  3191197

                                  Hi Jonathan

                                   

                                  Got it ..  I change the query according your suggestions. Thanks alot. it returned rows in 5 secs.

                                   

                                  SELECT COUNT(*) FROM smp_dbuser2.workflow_step_report wsr

                                  WHERE wsr.report_ID IN ( SELECT wr.id

                                                                                  FROM smp_dbuser2.workflow_report wr

                                                                                  WHERE wr.start_time >= to_date( '28-Apr-2017','dd-mon-yyyy') and wr.start_time < to_date( '28-Apr-2017','dd-mon-yyyy') +1

                                                                                  AND wr.user_id = 'nbi_ssc');

                                   

                                  Regards

                                  Frd

                                  • 14. Re: query re-write
                                    Jonathan Lewis

                                    Andrew Sayer wrote:

                                     

                                    I don't know if the hash join is implied by this, the optimizer seems to expect few rows from workflow_report, these could drive a cheapish nested loop using just the report_id index.

                                     

                                    You may be right, but it's a little more complex than just "a few rows".

                                     

                                    Because the current path involves an existence subquery the optimizer will have used first_rows(1) optimization on the subquery so the rows value will always be (2 - should be 1) however many rows there are that match the subquery predicate. Factor in the cost of the subquery full tablescan, though, which will be pro-rated because of the first_rows(1) optimisation and we see that Oracle expects to do (on average) a lot of work before finding those 2 (1) row(s) - which does, indeed, suggest that there may be a very small number of rows matching the subquery predicate.  (A cost of 14,100 when there are 1.4M rows in the table looks very high even if the rows are quite large).

                                     

                                    And just to be three-handed: the row counts for the two tables suggest an average of 50 [correction: 500] rows in workflow_step_report for every row in workflow_report - so the costs suggest that there'd have to 10s of thousands of driving rows and a really bad clustering_factor (which may be the case depending on the partitioning column) before Oracle would think that a hash join was a better bet than a nested loop.

                                     

                                     

                                    Regards

                                    Jonathan Lewis

                                    1 2 Previous Next