1 2 3 Previous Next 32 Replies Latest reply on Apr 28, 2015 8:17 PM by jihuyao

    How to tune this simple query?

    Mark1970

      Hi,

      I've got this query that looks very simple.

       

      select count(*)
      from MY_TABLE w1
      where not exists (select 1
              from MY_TABLE w2
              where W2.X = W1.X
              and W2.Y <> W1.Y);
      

       

      The table contains about 2,000,000 records and it gets deleted and reloaded every day with about the same amount of records.

      The query works for more than 2 hours and I can't undestand why.

      Could you help me?

      I don't think it would be usefull to create an index 'cause there's a <> and so the query performs a full scan, but 2,000,000 records don't seem too many, am I right?

       

      Thanks in advance!

        • 1. Re: How to tune this simple query?
          Saubhik

          Please post the execution plan (with dbms_xplan.display). It will be best of you execute this with gather_plan_statistics and post the output wuth dbms_xplan.display_cursor.

          • 2. Re: How to tune this simple query?
            Mark1970

            Here is the explain plan. The name of the table is different but it doesn't matter

             

            plan.jpg

             

            Thanks!

            • 3. Re: How to tune this simple query?
              Martin Preiss

              as already suggested it would be helpful if you could add the execution plan. Furthermore the release version would be of interest since there are some new Anti-Join operations available in current releases.

               

              Regarding the indexes: if your tables have many additional columns an index could be useful even if there are no limiting conditions: to act as a smaller version of the table (using INDEX FAST FULL SCANs).

              • 4. Re: How to tune this simple query?
                Martin Preiss

                the plan looks plausible.

                Some additional questions: How many blocks does the segement contain and what is the avg_row_len in user_tables? And do you use compression?

                • 5. Re: How to tune this simple query?
                  Chris Hunt

                  So you want to count the number of rows in MY_TABLE that have values of X that only have one value of Y? Maybe recast the query something like this:

                  SELECT COUNT(*)

                  FROM my_table w1

                  WHERE w1.x IN (SELECT w2.x

                                 FROM   (SELECT DISTINCT x,y FROM my_table) w2

                                 GROUP BY w2.x

                                 HAVING COUNT(*) = 1)

                  • 6. Re: Re: How to tune this simple query?
                    Chris Hunt

                    Another way:

                    WITH counts AS

                       (SELECT x,y,count(*) xy_count

                        FROM   my_table

                        GROUP BY x,y)

                    SELECT SUM(x_count)

                    FROM  (SELECT x, SUM(xy_count) x_count

                           FROM   counts

                           GROUP BY x

                           HAVING COUNT(*) = 1)

                    • 7. Re: How to tune this simple query?
                      Etbin

                      You might be better off rewriting it instead of tuning, but we have to guess about the purpose of this query.

                      When X are unique (there may be several null values),the query would return the count of rows in the table as Y (even if null) cannot be different (as the query uses <> as the comparison operator)

                      My guess is the query is used to find out rows having the same X but at least two different Y - in that case the count returned by the query differs from the count of rows in the table.

                      If I'm not wrong, you might try something as

                       

                      select col1,count(*) cnt

                        from t

                      group by col1

                      having count(distinct col2) > 1

                       

                      Regards

                       

                      Etbin

                      • 8. Re: How to tune this simple query?
                        chris227

                        Mark1970 wrote:

                         

                        Hi,

                        I've got this query that looks very simple.

                         

                        1. select count(*) 
                        2. from MY_TABLE w1 
                        3. where not exists (select
                        4.         from MY_TABLE w2 
                        5.         where W2.X = W1.X 
                        6.         and W2.Y <> W1.Y); 

                         

                        The table contains about 2,000,000 records and it gets deleted and reloaded every day with about the same amount of records.

                        The query works for more than 2 hours and I can't undestand why.

                        Could you help me?

                        I don't think it would be usefull to create an index 'cause there's a <> and so the query performs a full scan, but 2,000,000 records don't seem too many, am I right?

                         

                        Thanks in advance!

                        As your plan shows the query as written accesses my_table tow times.

                        You may try to rewrite it in order to reduce that to one access.

                         

                        You are asking for the number of rows being identical in x and y:

                         

                        select count(*) from (

                        select

                        count(distinct y)  over (partition by x) cnt

                        from my_table

                        )

                        where cnt = 1

                        • 9. Re: How to tune this simple query?
                          chris227

                          Etbin wrote:

                           

                          You might be better off rewriting it instead of tuning, but we have to guess about the purpose of this query.

                          When X are unique (there may be several null values),the query would return the count of rows in the table as Y (even if null) cannot be different (as the query uses <> as the comparison operator)

                          My guess is the query is used to find out rows having the same X but at least two different Y - in that case the count returned by the query differs from the count of rows in the table.

                          If I'm not wrong, you might try something as

                           

                          select col1,count(*) cnt

                            from t

                          group by col1

                          having count(distinct col2) > 1

                           

                          Regards

                           

                          Etbin

                          I am a little bit confused now:

                           

                          Is he not asking for = 1?

                           

                          To get one row the result has to be summed up at last, hasnt it?

                           

                          select sum(cnt) from (

                          select count(*) cnt

                            from t

                          group by col1

                          having count(distinct col2) = 1

                          )

                          • 10. Re: Re: How to tune this simple query?
                            Etbin

                            Sorry for the confusion.

                            You're right as (at the present) the OP needs one row to be returned to figure out the situation.

                            My guessing got me one step further - find the rows causing the (bad) situation - somehow out of context.

                             

                            Regards

                             

                            Etbin

                            • 11. Re: How to tune this simple query?
                              Mark1970

                              It seems I've solved the problem  by using the NO_UNNEST hint

                               

                              1. select count(*) 
                              2. from MY_TABLE w1 
                              3. where not exists (select/*+ no_unnest */
                              4. from MY_TABLE w2 
                              5. where W2.X = W1.X 
                              6. and W2.Y <> W1.Y);

                               

                              I wonder why but it works ... :-)

                              • 12. Re: How to tune this simple query?
                                chris227

                                Mark1970 wrote:

                                 

                                It seems I've solved the problem  by using the NO_UNNEST hint

                                 

                                1. select count(*) 
                                2. from MY_TABLE w1 
                                3. where not exists (select/*+ no_unnest */
                                4. from MY_TABLE w2 
                                5. where W2.X = W1.X 
                                6. and W2.Y <> W1.Y);

                                 

                                I wonder why but it works ... :-)

                                Because of the same reason as the rewrites will do. The table is accessed only one time

                                Another hint will work too:

                                 

                                1. select /*+ no_query_transformation */
                                2.   count(*) 
                                3. from MY_TABLE w1 
                                4. where not exists (select 1
                                5. from MY_TABLE w2 
                                6. where W2.X = W1.X 
                                7. and W2.Y <> W1.Y);

                                 

                                Anyway i would usually prefer rewriting to hinting as a kind of more "robust" approach.

                                1 person found this helpful
                                • 13. Re: How to tune this simple query?
                                  Martin Preiss

                                  just out of curiosity: could you add the plan for the no_unnest version?

                                  • 14. Re: How to tune this simple query?
                                    Jonathan Lewis

                                    Two hours seems far too long for the job - but we don't know how long the columns are, or what there type is, nor the pattern of the data, nor the version of Oracle that you're running.

                                     

                                    In principle if the number of distinct sets of (x,y) pairs was relatively small and most of these high repetition count pairs didn't have a candidate that would fail the test then you could be doing close to 2M

                                    probes of the hash table that had to walk some very long hash chains - this could use a lot of CPU.

                                     

                                    If we also assume that the average column lengths of the X and Y columns is fairly high, and the available memory for the query is relatively small so that the hash table spills to disc (perhaps with a multipass hash join rather than just a single pass) then you could spend a lot of time on re-reading the temporary tablespace.

                                     

                                    The only other thing that could add  significant time is contention - either simple disk contention on the tablescans (but not 2 hours worth) - or some side-effects of read-consistency or delayed block cleanout (but again 2 hours seems extreme).

                                     

                                    You don't say anything about how fast the query ran after you changed the plan with the unnest hint; but if you had an index on (x,y) by that time, and my comments about a small number of repetitive pairs is correct then you could get a massive advantage from scalar subquery caching.

                                     

                                    If you want to find out more about where the time is going:

                                    connect to Oracle

                                    execute the query

                                     

                                    set linesize 180

                                    set trimspool on

                                    set pagesize 60

                                    column event format a40

                                     

                                    select sn.name, ms.value from v$mystat ms, v$statname sn where sn.statistic# = ms.statistic# and ms.value != 0;

                                    select event, total_waits, time_waited from V$session_event where sid = (select sid from v$mystat where rownum = 1) order by time_waited

                                     

                                     

                                    This assumes that you have privileges to access those dynamic performance views.

                                     

                                     

                                    Regards

                                    Jonathan Lewis

                                    1 2 3 Previous Next