1 2 Previous Next 27 Replies Latest reply: Aug 11, 2009 2:35 PM by 666352 RSS

    analytic sql question

    715939
      Dear All,
      I have the following problem of getting the output table from the input table below.
      Do you have any idea of doing this with the help of analytic sql?
      p.s. I have done this by using pure plsql block which is too slow to run with high amount of data. Below data is just a sample, in real I have millions rows of data.

      Input table:
      TIME     USER     VALUE     

      1     A     X
      2     A     X
      3     B     Y
      4     B     Y
      5     A     X
      5     B     X
      6     A     Y
      7     B     Y
      7     A     Y



      Output table:
      START_TIME     END_TIME     USER     VALUE
      1          2          A     X
      5          5          A     X
      6          7          A     Y
      3          4          B     Y
      5          5          B     X
      7          7          B     Y
        • 1. Re: analytic sql question
          548849
          Hi,
          Welcome to the forum.
          row_number () over (partition by user,value)
          Regards,
          Jacob
          • 2. Re: analytic sql question
            Boneist
            I feel sure I've over-complicated things here, and that there's an easier way of doing it, but here's one solution:
            with my_tab as (select 1 col1, 'A' col2, 'X' col3 from dual union all
                            select 2 col1, 'A' col2, 'Y' col3 from dual union all
                            select 3 col1, 'B' col2, 'Y' col3 from dual union all
                            select 4 col1, 'B' col2, 'X' col3 from dual union all
                            select 5 col1, 'A' col2, 'X' col3 from dual union all
                            select 5 col1, 'B' col2, 'X' col3 from dual union all
                            select 6 col1, 'A' col2, 'Y' col3 from dual union all
                            select 7 col1, 'B' col2, 'Y' col3 from dual union all
                            select 7 col1, 'A' col2, 'Y' col3 from dual)
            select distinct start_col1,
                            end_col1,
                            first_value(col2) over (partition by col2, start_col1, end_col1 order by col1, start_col1, end_col1) col2,
                            first_value(col3) over (partition by col2, start_col1, end_col1 order by col1, start_col1, end_col1) col3
            from   (select col1,
                           col2,
                           col3,
                           last_value(start_col1 ignore nulls) over (order by col1, col2) start_col1,
                           last_value(end_col1 ignore nulls) over (order by col1 desc, col2 desc) end_col1
                    from   (select col1, 
                                   col2, 
                                   col3,
                                   case when lag(col2, 1, '{NULL}') over (order by col1, col2) <> col2
                                             then col1
                                   end start_col1,
                                   case when lead(col2, 1, '{NULL}') over (order by col1, col2) <> col2
                                             then col1
                                   end end_col1 
                            from   my_tab))
            order by col2,
                     start_col1,
                     end_col1;
            
            START_COL1   END_COL1 C C
            ---------- ---------- - -
                     1          2 A X
                     5          5 A X
                     6          7 A Y
                     3          4 B Y
                     5          5 B X
                     7          7 B Y
            • 3. Re: analytic sql question
              715939
              I'm getting this error ORA-00920
              What is the reason for..
              • 4. Re: analytic sql question
                548849
                Hi

                Post the query you are executing.
                • 5. Re: analytic sql question
                  715939
                  when I change your code's inner most part as the following

                  with my_tab as (select 1 col1, 'A' col2, 'X' col3 from dual union all
                  select 2 col1, 'A' col2, 'Y' col3 from dual union all
                  select 3 col1, 'B' col2, 'Y' col3 from dual union all
                  select 4 col1, 'B' col2, 'X' col3 from dual union all
                  select 5 col1, 'A' col2, 'X' col3 from dual union all
                  select 5 col1, 'B' col2, 'X' col3 from dual union all
                  select 6 col1, 'A' col2, 'Y' col3 from dual union all
                  select 7 col1, 'B' col2, 'Y' col3 from dual union all
                  select 7 col1, 'A' col2, 'Y' col3 from dual)
                  select distinct start_col1,
                  end_col1,
                  first_value(col2) over (partition by col2, start_col1, end_col1 order by col1, start_col1, end_col1) col2,
                  first_value(col3) over (partition by col2, start_col1, end_col1 order by col1, start_col1, end_col1) col3
                  from (select col1,
                  col2,
                  col3,
                  last_value(start_col1 ignore nulls) over (order by col1, col2) start_col1,
                  last_value(end_col1 ignore nulls) over (order by col1 desc, col2 desc) end_col1
                  from (select col1,col2,col3,case when start_col1 =col2 then col1 end start_col1,case when end_col1 =col2 then col1 end end_col1 from (
                  select col1,
                  col2,
                  col3,
                  lag(col2, 1, '{NULL}') over (order by col1, col2) start_col1,
                  lead(col2, 1, '{NULL}') over (order by col1, col2) end_col1
                  from my_tab)))
                  order by col2,
                  start_col1,
                  end_col1;



                  here is my result

                  START_COL1,END_COL1,COL2,COL3
                  2,3,A,Y
                  4,6,A,X
                  7,,A,Y
                  ,1,A,X
                  2,3,B,Y
                  4,6,B,X
                  7,,B,Y
                  • 6. Re: analytic sql question
                    189821
                    Boneist, you got typos in your sample data.

                    Here comes my try:
                    SQL>WITH my_tab AS (
                      2          SELECT 1 col1, 'A' col2, 'X' col3 FROM DUAL UNION ALL
                      3          SELECT 2 col1, 'A' col2, 'X' col3 FROM DUAL UNION ALL
                      4          SELECT 3 col1, 'B' col2, 'Y' col3 FROM DUAL UNION ALL
                      5          SELECT 4 col1, 'B' col2, 'Y' col3 FROM DUAL UNION ALL
                      6          SELECT 5 col1, 'A' col2, 'X' col3 FROM DUAL UNION ALL
                      7          SELECT 5 col1, 'B' col2, 'X' col3 FROM DUAL UNION ALL
                      8          SELECT 6 col1, 'A' col2, 'Y' col3 FROM DUAL UNION ALL
                      9          SELECT 7 col1, 'B' col2, 'Y' col3 FROM DUAL UNION ALL
                     10          SELECT 7 col1, 'A' col2, 'Y' col3 FROM DUAL),
                     11       start_stop_tab AS (
                     12          SELECT col1, col2, col3, ROW_NUMBER() OVER(PARTITION BY col2, col3 ORDER BY col1) AS rn
                     13            FROM my_tab)
                     14  SELECT   start_tab.col1 AS "START_TIME", NVL(stop_tab.col1, start_tab.col1) AS "END_TIME",
                     15           start_tab.col2 AS "USER", start_tab.col3 AS "VALUE"
                     16      FROM start_stop_tab start_tab, start_stop_tab stop_tab
                     17     WHERE MOD(start_tab.rn, 2) = 1 AND stop_tab.col2(+) = start_tab.col2 AND stop_tab.col3(+) = start_tab.col3
                     18           AND stop_tab.rn(+) = start_tab.rn + 1
                     19  ORDER BY 3, 1;
                    
                    START_TIME   END_TIME U V
                    ---------- ---------- - -
                             1          2 A X
                             5          5 A X
                             6          7 A Y
                             3          4 B Y
                             5          5 B X
                             7          7 B Y
                    Urs
                    • 7. Re: analytic sql question
                      Rob van Wijk
                      Using only one full table scan:
                      SQL> create table mytable (time,myuser,value)
                        2  as
                        3  select 1 col1, 'A' col2, 'X' col3 from dual union all
                        4  select 2 col1, 'A' col2, 'X' col3 from dual union all
                        5  select 3 col1, 'B' col2, 'Y' col3 from dual union all
                        6  select 4 col1, 'B' col2, 'Y' col3 from dual union all
                        7  select 5 col1, 'A' col2, 'X' col3 from dual union all
                        8  select 5 col1, 'B' col2, 'X' col3 from dual union all
                        9  select 6 col1, 'A' col2, 'Y' col3 from dual union all
                       10  select 7 col1, 'B' col2, 'Y' col3 from dual union all
                       11  select 7 col1, 'A' col2, 'Y' col3 from dual
                       12  /
                      
                      Tabel is aangemaakt.
                      
                      SQL> select min(time) start_time
                        2       , max(time) end_time
                        3       , myuser
                        4       , value
                        5    from ( select time
                        6                , myuser
                        7                , value
                        8                , sum(new_group_ind) over (partition by myuser,value order by time) grp
                        9             from ( select time
                       10                         , myuser
                       11                         , value
                       12                         , case lag(time) over (partition by myuser,value order by time) when time-1 then 0 else 1 end new_group_ind
                       13                      from mytable
                       14                  )
                       15         )
                       16   group by grp
                       17       , myuser
                       18       , value
                       19   order by myuser
                       20       , start_time
                       21  /
                      
                      START_TIME   END_TIME M V
                      ---------- ---------- - -
                               1          2 A X
                               5          5 A X
                               6          7 A Y
                               3          4 B Y
                               5          5 B X
                               7          7 B Y
                      
                      6 rijen zijn geselecteerd.
                      Regards,
                      Rob.
                      • 8. Re: analytic sql question
                        189821
                        Good point!
                        Even simpler:
                        SQL>r
                          1  SELECT   time AS "START_TIME", NVL(next_time, time) AS "END_TIME", myuser AS "USER", value AS "VALUE"
                          2    FROM (SELECT time, myuser, value, ROW_NUMBER() OVER(PARTITION BY myuser, value ORDER BY time) AS rn,
                          3                 LEAD(time) OVER(PARTITION BY myuser, value ORDER BY time) AS next_time
                          4            FROM mytable)
                          5     WHERE MOD(rn, 2) = 1
                          6* ORDER BY 3, 1
                        
                        START_TIME   END_TIME U V
                        ---------- ---------- - -
                                 1          2 A X
                                 5          5 A X
                                 6          7 A Y
                                 3          4 B Y
                                 5          5 B X
                                 7          7 B Y
                        
                        6 rows selected.
                        
                        
                        Execution Plan
                        ----------------------------------------------------------
                           0      SELECT STATEMENT Optimizer=CHOOSE
                           1    0   SORT (ORDER BY)
                           2    1     VIEW
                           3    2       WINDOW (SORT)
                           4    3         TABLE ACCESS (FULL) OF 'MYTABLE'
                        • 9. Re: analytic sql question
                          Boneist
                          metzguar wrote:
                          Boneist, you got typos in your sample data.
                          Oh yeah! Whoops...!
                          • 10. Re: analytic sql question
                            Rob van Wijk
                            metzguar wrote:
                            Even simpler:
                            That query only works when a group will never contain more than two rows.
                            • 11. Re: analytic sql question
                              189821
                              The group 'A', 'X' does contain more than two rows.
                              And it works.
                              • 12. Re: analytic sql question
                                715939
                                Rob means that it does not work when three consecutive rows in the same group exist.
                                By the way a full scan takes time. When it is run over a a couple of millions of data it takes considerable amount of time. (running for an hour and not complete yet :) )
                                Thanks for your help. It seems that it is working correctly..
                                • 13. Re: analytic sql question
                                  Rob van Wijk
                                  emrahsaglik@gmail.com wrote:
                                  Rob means that it does not work when three consecutive rows in the same group exist.
                                  That only applies to metzguar's solution.
                                  By the way a full scan takes time. When it is run over a a couple of millions of data it takes considerable amount of time. (running for an hour and not complete yet :) )
                                  Of course. But it takes less time than two full table scans :-).

                                  Regards,
                                  Rob.
                                  • 14. Re: analytic sql question
                                    189821
                                    still unable to get why this would not be correct:
                                    SQL>select * from mytable order by myuser, value, time;
                                    
                                          TIME M V
                                    ---------- - -
                                             1 A X
                                             2 A X
                                             3 A X
                                             4 A X
                                             5 A X
                                             6 A Y
                                             7 A Y
                                             5 B X
                                             3 B Y
                                             4 B Y
                                             7 B Y
                                    
                                    11 rows selected.
                                    
                                    SQL>set autotrace on
                                    SQL>SELECT   time AS "START_TIME", NVL(next_time, time) AS "END_TIME", myuser AS "USER", value AS "VALUE"
                                      2        FROM (SELECT time, myuser, value, ROW_NUMBER() OVER(PARTITION BY myuser, value ORDER BY time) AS rn,
                                      3                     LEAD(time) OVER(PARTITION BY myuser, value ORDER BY time) AS next_time
                                      4                FROM mytable)
                                      5         WHERE MOD(rn, 2) = 1
                                      6      ORDER BY 3, 1;
                                    
                                    START_TIME   END_TIME U V
                                    ---------- ---------- - -
                                             1          2 A X
                                             3          4 A X
                                             5          5 A X
                                             6          7 A Y
                                             3          4 B Y
                                             5          5 B X
                                             7          7 B Y
                                    
                                    7 rows selected.
                                    
                                    
                                    Execution Plan
                                    ----------------------------------------------------------
                                       0      SELECT STATEMENT Optimizer=CHOOSE
                                       1    0   SORT (ORDER BY)
                                       2    1     VIEW
                                       3    2       WINDOW (SORT)
                                       4    3         TABLE ACCESS (FULL) OF 'MYTABLE'
                                    
                                    
                                    
                                    
                                    Statistics
                                    ----------------------------------------------------------
                                              0  recursive calls
                                              0  db block gets
                                              7  consistent gets
                                              0  physical reads
                                              0  redo size
                                            385  bytes sent via SQL*Net to client
                                            235  bytes received via SQL*Net from client
                                              2  SQL*Net roundtrips to/from client
                                              2  sorts (memory)
                                              0  sorts (disk)
                                              7  rows processed
                                    There are now five consecutive rows in group 'A', 'X'.
                                    1 2 Previous Next