This discussion is archived
1 2 Previous Next 27 Replies Latest reply: Aug 11, 2009 2:35 PM by 666352 RSS

analytic sql question

715939 Newbie
Currently Being Moderated
Dear All,
I have the following problem of getting the output table from the input table below.
Do you have any idea of doing this with the help of analytic sql?
p.s. I have done this by using pure plsql block which is too slow to run with high amount of data. Below data is just a sample, in real I have millions rows of data.

Input table:
TIME     USER     VALUE     

1     A     X
2     A     X
3     B     Y
4     B     Y
5     A     X
5     B     X
6     A     Y
7     B     Y
7     A     Y



Output table:
START_TIME     END_TIME     USER     VALUE
1          2          A     X
5          5          A     X
6          7          A     Y
3          4          B     Y
5          5          B     X
7          7          B     Y
  • 1. Re: analytic sql question
    548849 Journeyer
    Currently Being Moderated
    Hi,
    Welcome to the forum.
    row_number () over (partition by user,value)
    Regards,
    Jacob
  • 2. Re: analytic sql question
    Boneist Guru
    Currently Being Moderated
    I feel sure I've over-complicated things here, and that there's an easier way of doing it, but here's one solution:
    with my_tab as (select 1 col1, 'A' col2, 'X' col3 from dual union all
                    select 2 col1, 'A' col2, 'Y' col3 from dual union all
                    select 3 col1, 'B' col2, 'Y' col3 from dual union all
                    select 4 col1, 'B' col2, 'X' col3 from dual union all
                    select 5 col1, 'A' col2, 'X' col3 from dual union all
                    select 5 col1, 'B' col2, 'X' col3 from dual union all
                    select 6 col1, 'A' col2, 'Y' col3 from dual union all
                    select 7 col1, 'B' col2, 'Y' col3 from dual union all
                    select 7 col1, 'A' col2, 'Y' col3 from dual)
    select distinct start_col1,
                    end_col1,
                    first_value(col2) over (partition by col2, start_col1, end_col1 order by col1, start_col1, end_col1) col2,
                    first_value(col3) over (partition by col2, start_col1, end_col1 order by col1, start_col1, end_col1) col3
    from   (select col1,
                   col2,
                   col3,
                   last_value(start_col1 ignore nulls) over (order by col1, col2) start_col1,
                   last_value(end_col1 ignore nulls) over (order by col1 desc, col2 desc) end_col1
            from   (select col1, 
                           col2, 
                           col3,
                           case when lag(col2, 1, '{NULL}') over (order by col1, col2) <> col2
                                     then col1
                           end start_col1,
                           case when lead(col2, 1, '{NULL}') over (order by col1, col2) <> col2
                                     then col1
                           end end_col1 
                    from   my_tab))
    order by col2,
             start_col1,
             end_col1;
    
    START_COL1   END_COL1 C C
    ---------- ---------- - -
             1          2 A X
             5          5 A X
             6          7 A Y
             3          4 B Y
             5          5 B X
             7          7 B Y
  • 3. Re: analytic sql question
    715939 Newbie
    Currently Being Moderated
    I'm getting this error ORA-00920
    What is the reason for..
  • 4. Re: analytic sql question
    548849 Journeyer
    Currently Being Moderated
    Hi

    Post the query you are executing.
  • 5. Re: analytic sql question
    715939 Newbie
    Currently Being Moderated
    when I change your code's inner most part as the following

    with my_tab as (select 1 col1, 'A' col2, 'X' col3 from dual union all
    select 2 col1, 'A' col2, 'Y' col3 from dual union all
    select 3 col1, 'B' col2, 'Y' col3 from dual union all
    select 4 col1, 'B' col2, 'X' col3 from dual union all
    select 5 col1, 'A' col2, 'X' col3 from dual union all
    select 5 col1, 'B' col2, 'X' col3 from dual union all
    select 6 col1, 'A' col2, 'Y' col3 from dual union all
    select 7 col1, 'B' col2, 'Y' col3 from dual union all
    select 7 col1, 'A' col2, 'Y' col3 from dual)
    select distinct start_col1,
    end_col1,
    first_value(col2) over (partition by col2, start_col1, end_col1 order by col1, start_col1, end_col1) col2,
    first_value(col3) over (partition by col2, start_col1, end_col1 order by col1, start_col1, end_col1) col3
    from (select col1,
    col2,
    col3,
    last_value(start_col1 ignore nulls) over (order by col1, col2) start_col1,
    last_value(end_col1 ignore nulls) over (order by col1 desc, col2 desc) end_col1
    from (select col1,col2,col3,case when start_col1 =col2 then col1 end start_col1,case when end_col1 =col2 then col1 end end_col1 from (
    select col1,
    col2,
    col3,
    lag(col2, 1, '{NULL}') over (order by col1, col2) start_col1,
    lead(col2, 1, '{NULL}') over (order by col1, col2) end_col1
    from my_tab)))
    order by col2,
    start_col1,
    end_col1;



    here is my result

    START_COL1,END_COL1,COL2,COL3
    2,3,A,Y
    4,6,A,X
    7,,A,Y
    ,1,A,X
    2,3,B,Y
    4,6,B,X
    7,,B,Y
  • 6. Re: analytic sql question
    189821 Expert
    Currently Being Moderated
    Boneist, you got typos in your sample data.

    Here comes my try:
    SQL>WITH my_tab AS (
      2          SELECT 1 col1, 'A' col2, 'X' col3 FROM DUAL UNION ALL
      3          SELECT 2 col1, 'A' col2, 'X' col3 FROM DUAL UNION ALL
      4          SELECT 3 col1, 'B' col2, 'Y' col3 FROM DUAL UNION ALL
      5          SELECT 4 col1, 'B' col2, 'Y' col3 FROM DUAL UNION ALL
      6          SELECT 5 col1, 'A' col2, 'X' col3 FROM DUAL UNION ALL
      7          SELECT 5 col1, 'B' col2, 'X' col3 FROM DUAL UNION ALL
      8          SELECT 6 col1, 'A' col2, 'Y' col3 FROM DUAL UNION ALL
      9          SELECT 7 col1, 'B' col2, 'Y' col3 FROM DUAL UNION ALL
     10          SELECT 7 col1, 'A' col2, 'Y' col3 FROM DUAL),
     11       start_stop_tab AS (
     12          SELECT col1, col2, col3, ROW_NUMBER() OVER(PARTITION BY col2, col3 ORDER BY col1) AS rn
     13            FROM my_tab)
     14  SELECT   start_tab.col1 AS "START_TIME", NVL(stop_tab.col1, start_tab.col1) AS "END_TIME",
     15           start_tab.col2 AS "USER", start_tab.col3 AS "VALUE"
     16      FROM start_stop_tab start_tab, start_stop_tab stop_tab
     17     WHERE MOD(start_tab.rn, 2) = 1 AND stop_tab.col2(+) = start_tab.col2 AND stop_tab.col3(+) = start_tab.col3
     18           AND stop_tab.rn(+) = start_tab.rn + 1
     19  ORDER BY 3, 1;
    
    START_TIME   END_TIME U V
    ---------- ---------- - -
             1          2 A X
             5          5 A X
             6          7 A Y
             3          4 B Y
             5          5 B X
             7          7 B Y
    Urs
  • 7. Re: analytic sql question
    RobvanWijk Oracle ACE
    Currently Being Moderated
    Using only one full table scan:
    SQL> create table mytable (time,myuser,value)
      2  as
      3  select 1 col1, 'A' col2, 'X' col3 from dual union all
      4  select 2 col1, 'A' col2, 'X' col3 from dual union all
      5  select 3 col1, 'B' col2, 'Y' col3 from dual union all
      6  select 4 col1, 'B' col2, 'Y' col3 from dual union all
      7  select 5 col1, 'A' col2, 'X' col3 from dual union all
      8  select 5 col1, 'B' col2, 'X' col3 from dual union all
      9  select 6 col1, 'A' col2, 'Y' col3 from dual union all
     10  select 7 col1, 'B' col2, 'Y' col3 from dual union all
     11  select 7 col1, 'A' col2, 'Y' col3 from dual
     12  /
    
    Tabel is aangemaakt.
    
    SQL> select min(time) start_time
      2       , max(time) end_time
      3       , myuser
      4       , value
      5    from ( select time
      6                , myuser
      7                , value
      8                , sum(new_group_ind) over (partition by myuser,value order by time) grp
      9             from ( select time
     10                         , myuser
     11                         , value
     12                         , case lag(time) over (partition by myuser,value order by time) when time-1 then 0 else 1 end new_group_ind
     13                      from mytable
     14                  )
     15         )
     16   group by grp
     17       , myuser
     18       , value
     19   order by myuser
     20       , start_time
     21  /
    
    START_TIME   END_TIME M V
    ---------- ---------- - -
             1          2 A X
             5          5 A X
             6          7 A Y
             3          4 B Y
             5          5 B X
             7          7 B Y
    
    6 rijen zijn geselecteerd.
    Regards,
    Rob.
  • 8. Re: analytic sql question
    189821 Expert
    Currently Being Moderated
    Good point!
    Even simpler:
    SQL>r
      1  SELECT   time AS "START_TIME", NVL(next_time, time) AS "END_TIME", myuser AS "USER", value AS "VALUE"
      2    FROM (SELECT time, myuser, value, ROW_NUMBER() OVER(PARTITION BY myuser, value ORDER BY time) AS rn,
      3                 LEAD(time) OVER(PARTITION BY myuser, value ORDER BY time) AS next_time
      4            FROM mytable)
      5     WHERE MOD(rn, 2) = 1
      6* ORDER BY 3, 1
    
    START_TIME   END_TIME U V
    ---------- ---------- - -
             1          2 A X
             5          5 A X
             6          7 A Y
             3          4 B Y
             5          5 B X
             7          7 B Y
    
    6 rows selected.
    
    
    Execution Plan
    ----------------------------------------------------------
       0      SELECT STATEMENT Optimizer=CHOOSE
       1    0   SORT (ORDER BY)
       2    1     VIEW
       3    2       WINDOW (SORT)
       4    3         TABLE ACCESS (FULL) OF 'MYTABLE'
  • 9. Re: analytic sql question
    Boneist Guru
    Currently Being Moderated
    metzguar wrote:
    Boneist, you got typos in your sample data.
    Oh yeah! Whoops...!
  • 10. Re: analytic sql question
    RobvanWijk Oracle ACE
    Currently Being Moderated
    metzguar wrote:
    Even simpler:
    That query only works when a group will never contain more than two rows.
  • 11. Re: analytic sql question
    189821 Expert
    Currently Being Moderated
    The group 'A', 'X' does contain more than two rows.
    And it works.
  • 12. Re: analytic sql question
    715939 Newbie
    Currently Being Moderated
    Rob means that it does not work when three consecutive rows in the same group exist.
    By the way a full scan takes time. When it is run over a a couple of millions of data it takes considerable amount of time. (running for an hour and not complete yet :) )
    Thanks for your help. It seems that it is working correctly..
  • 13. Re: analytic sql question
    RobvanWijk Oracle ACE
    Currently Being Moderated
    emrahsaglik@gmail.com wrote:
    Rob means that it does not work when three consecutive rows in the same group exist.
    That only applies to metzguar's solution.
    By the way a full scan takes time. When it is run over a a couple of millions of data it takes considerable amount of time. (running for an hour and not complete yet :) )
    Of course. But it takes less time than two full table scans :-).

    Regards,
    Rob.
  • 14. Re: analytic sql question
    189821 Expert
    Currently Being Moderated
    still unable to get why this would not be correct:
    SQL>select * from mytable order by myuser, value, time;
    
          TIME M V
    ---------- - -
             1 A X
             2 A X
             3 A X
             4 A X
             5 A X
             6 A Y
             7 A Y
             5 B X
             3 B Y
             4 B Y
             7 B Y
    
    11 rows selected.
    
    SQL>set autotrace on
    SQL>SELECT   time AS "START_TIME", NVL(next_time, time) AS "END_TIME", myuser AS "USER", value AS "VALUE"
      2        FROM (SELECT time, myuser, value, ROW_NUMBER() OVER(PARTITION BY myuser, value ORDER BY time) AS rn,
      3                     LEAD(time) OVER(PARTITION BY myuser, value ORDER BY time) AS next_time
      4                FROM mytable)
      5         WHERE MOD(rn, 2) = 1
      6      ORDER BY 3, 1;
    
    START_TIME   END_TIME U V
    ---------- ---------- - -
             1          2 A X
             3          4 A X
             5          5 A X
             6          7 A Y
             3          4 B Y
             5          5 B X
             7          7 B Y
    
    7 rows selected.
    
    
    Execution Plan
    ----------------------------------------------------------
       0      SELECT STATEMENT Optimizer=CHOOSE
       1    0   SORT (ORDER BY)
       2    1     VIEW
       3    2       WINDOW (SORT)
       4    3         TABLE ACCESS (FULL) OF 'MYTABLE'
    
    
    
    
    Statistics
    ----------------------------------------------------------
              0  recursive calls
              0  db block gets
              7  consistent gets
              0  physical reads
              0  redo size
            385  bytes sent via SQL*Net to client
            235  bytes received via SQL*Net from client
              2  SQL*Net roundtrips to/from client
              2  sorts (memory)
              0  sorts (disk)
              7  rows processed
    There are now five consecutive rows in group 'A', 'X'.
1 2 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points