Forum Stats

  • 3,836,735 Users
  • 2,262,175 Discussions
  • 7,900,088 Comments

Discussions

analytic sql question

715939
715939 Member Posts: 5
edited Aug 11, 2009 5:35PM in SQL & PL/SQL
Dear All,
I have the following problem of getting the output table from the input table below.
Do you have any idea of doing this with the help of analytic sql?
p.s. I have done this by using pure plsql block which is too slow to run with high amount of data. Below data is just a sample, in real I have millions rows of data.

Input table:
TIME USER VALUE

1 A X
2 A X
3 B Y
4 B Y
5 A X
5 B X
6 A Y
7 B Y
7 A Y



Output table:
START_TIME END_TIME USER VALUE
1 2 A X
5 5 A X
6 7 A Y
3 4 B Y
5 5 B X
7 7 B Y

Best Answer

  • Aketi Jyuuzou
    Aketi Jyuuzou Member Posts: 1,072 Bronze Badge
    Answer ✓
    create table mytable (time,myuser,value) as
    select 1 col1, 'A' col2, 'X' col3 from dual union all
    select 2 col1, 'A' col2, 'X' col3 from dual union all
    select 3 col1, 'B' col2, 'Y' col3 from dual union all
    select 4 col1, 'B' col2, 'Y' col3 from dual union all
    select 5 col1, 'A' col2, 'X' col3 from dual union all
    select 5 col1, 'B' col2, 'X' col3 from dual union all
    select 6 col1, 'A' col2, 'Y' col3 from dual union all
    select 7 col1, 'B' col2, 'Y' col3 from dual union all
    select 7 col1, 'A' col2, 'Y' col3 from dual union all
    select 8 col1, 'A' col2, 'Y' col3 from dual;
    
    select min(time),max(time),myuser,value
    from (select time,myuser,value,
           dense_rank() over(order by time)
          -Row_Number() over(partition by myuser,value order by time)
          as distance
           from mytable)
    group by myuser,value,distance
    order by myuser,min(time);
    
    MIN(TIME)  MAX(TIME)  M  V
    ---------  ---------  -  -
            1          2  A  X
            5          5  A  X
            6          8  A  Y
            3          4  B  Y
            5          5  B  X
            7          7  B  Y
    I used sense of Tabibitosan B-)
    920572
«13

Answers

  • 548849
    548849 Member Posts: 1,083
    Hi,
    Welcome to the forum.
    row_number () over (partition by user,value)
    Regards,
    Jacob
  • Boneist
    Boneist Member Posts: 4,983 Gold Trophy
    I feel sure I've over-complicated things here, and that there's an easier way of doing it, but here's one solution:
    with my_tab as (select 1 col1, 'A' col2, 'X' col3 from dual union all
                    select 2 col1, 'A' col2, 'Y' col3 from dual union all
                    select 3 col1, 'B' col2, 'Y' col3 from dual union all
                    select 4 col1, 'B' col2, 'X' col3 from dual union all
                    select 5 col1, 'A' col2, 'X' col3 from dual union all
                    select 5 col1, 'B' col2, 'X' col3 from dual union all
                    select 6 col1, 'A' col2, 'Y' col3 from dual union all
                    select 7 col1, 'B' col2, 'Y' col3 from dual union all
                    select 7 col1, 'A' col2, 'Y' col3 from dual)
    select distinct start_col1,
                    end_col1,
                    first_value(col2) over (partition by col2, start_col1, end_col1 order by col1, start_col1, end_col1) col2,
                    first_value(col3) over (partition by col2, start_col1, end_col1 order by col1, start_col1, end_col1) col3
    from   (select col1,
                   col2,
                   col3,
                   last_value(start_col1 ignore nulls) over (order by col1, col2) start_col1,
                   last_value(end_col1 ignore nulls) over (order by col1 desc, col2 desc) end_col1
            from   (select col1, 
                           col2, 
                           col3,
                           case when lag(col2, 1, '{NULL}') over (order by col1, col2) <> col2
                                     then col1
                           end start_col1,
                           case when lead(col2, 1, '{NULL}') over (order by col1, col2) <> col2
                                     then col1
                           end end_col1 
                    from   my_tab))
    order by col2,
             start_col1,
             end_col1;
    
    START_COL1   END_COL1 C C
    ---------- ---------- - -
             1          2 A X
             5          5 A X
             6          7 A Y
             3          4 B Y
             5          5 B X
             7          7 B Y
  • 715939
    715939 Member Posts: 5
    I'm getting this error ORA-00920
    What is the reason for..
  • 548849
    548849 Member Posts: 1,083
    Hi

    Post the query you are executing.
  • 715939
    715939 Member Posts: 5
    when I change your code's inner most part as the following

    with my_tab as (select 1 col1, 'A' col2, 'X' col3 from dual union all
    select 2 col1, 'A' col2, 'Y' col3 from dual union all
    select 3 col1, 'B' col2, 'Y' col3 from dual union all
    select 4 col1, 'B' col2, 'X' col3 from dual union all
    select 5 col1, 'A' col2, 'X' col3 from dual union all
    select 5 col1, 'B' col2, 'X' col3 from dual union all
    select 6 col1, 'A' col2, 'Y' col3 from dual union all
    select 7 col1, 'B' col2, 'Y' col3 from dual union all
    select 7 col1, 'A' col2, 'Y' col3 from dual)
    select distinct start_col1,
    end_col1,
    first_value(col2) over (partition by col2, start_col1, end_col1 order by col1, start_col1, end_col1) col2,
    first_value(col3) over (partition by col2, start_col1, end_col1 order by col1, start_col1, end_col1) col3
    from (select col1,
    col2,
    col3,
    last_value(start_col1 ignore nulls) over (order by col1, col2) start_col1,
    last_value(end_col1 ignore nulls) over (order by col1 desc, col2 desc) end_col1
    from (select col1,col2,col3,case when start_col1 =col2 then col1 end start_col1,case when end_col1 =col2 then col1 end end_col1 from (
    select col1,
    col2,
    col3,
    lag(col2, 1, '{NULL}') over (order by col1, col2) start_col1,
    lead(col2, 1, '{NULL}') over (order by col1, col2) end_col1
    from my_tab)))
    order by col2,
    start_col1,
    end_col1;



    here is my result

    START_COL1,END_COL1,COL2,COL3
    2,3,A,Y
    4,6,A,X
    7,,A,Y
    ,1,A,X
    2,3,B,Y
    4,6,B,X
    7,,B,Y
  • 189821
    189821 Member Posts: 656
    Boneist, you got typos in your sample data.

    Here comes my try:
    SQL>WITH my_tab AS (
      2          SELECT 1 col1, 'A' col2, 'X' col3 FROM DUAL UNION ALL
      3          SELECT 2 col1, 'A' col2, 'X' col3 FROM DUAL UNION ALL
      4          SELECT 3 col1, 'B' col2, 'Y' col3 FROM DUAL UNION ALL
      5          SELECT 4 col1, 'B' col2, 'Y' col3 FROM DUAL UNION ALL
      6          SELECT 5 col1, 'A' col2, 'X' col3 FROM DUAL UNION ALL
      7          SELECT 5 col1, 'B' col2, 'X' col3 FROM DUAL UNION ALL
      8          SELECT 6 col1, 'A' col2, 'Y' col3 FROM DUAL UNION ALL
      9          SELECT 7 col1, 'B' col2, 'Y' col3 FROM DUAL UNION ALL
     10          SELECT 7 col1, 'A' col2, 'Y' col3 FROM DUAL),
     11       start_stop_tab AS (
     12          SELECT col1, col2, col3, ROW_NUMBER() OVER(PARTITION BY col2, col3 ORDER BY col1) AS rn
     13            FROM my_tab)
     14  SELECT   start_tab.col1 AS "START_TIME", NVL(stop_tab.col1, start_tab.col1) AS "END_TIME",
     15           start_tab.col2 AS "USER", start_tab.col3 AS "VALUE"
     16      FROM start_stop_tab start_tab, start_stop_tab stop_tab
     17     WHERE MOD(start_tab.rn, 2) = 1 AND stop_tab.col2(+) = start_tab.col2 AND stop_tab.col3(+) = start_tab.col3
     18           AND stop_tab.rn(+) = start_tab.rn + 1
     19  ORDER BY 3, 1;
    
    START_TIME   END_TIME U V
    ---------- ---------- - -
             1          2 A X
             5          5 A X
             6          7 A Y
             3          4 B Y
             5          5 B X
             7          7 B Y
    Urs
  • Rob van Wijk
    Rob van Wijk Member Posts: 5,856 Bronze Trophy
    Using only one full table scan:
    SQL> create table mytable (time,myuser,value)
      2  as
      3  select 1 col1, 'A' col2, 'X' col3 from dual union all
      4  select 2 col1, 'A' col2, 'X' col3 from dual union all
      5  select 3 col1, 'B' col2, 'Y' col3 from dual union all
      6  select 4 col1, 'B' col2, 'Y' col3 from dual union all
      7  select 5 col1, 'A' col2, 'X' col3 from dual union all
      8  select 5 col1, 'B' col2, 'X' col3 from dual union all
      9  select 6 col1, 'A' col2, 'Y' col3 from dual union all
     10  select 7 col1, 'B' col2, 'Y' col3 from dual union all
     11  select 7 col1, 'A' col2, 'Y' col3 from dual
     12  /
    
    Tabel is aangemaakt.
    
    SQL> select min(time) start_time
      2       , max(time) end_time
      3       , myuser
      4       , value
      5    from ( select time
      6                , myuser
      7                , value
      8                , sum(new_group_ind) over (partition by myuser,value order by time) grp
      9             from ( select time
     10                         , myuser
     11                         , value
     12                         , case lag(time) over (partition by myuser,value order by time) when time-1 then 0 else 1 end new_group_ind
     13                      from mytable
     14                  )
     15         )
     16   group by grp
     17       , myuser
     18       , value
     19   order by myuser
     20       , start_time
     21  /
    
    START_TIME   END_TIME M V
    ---------- ---------- - -
             1          2 A X
             5          5 A X
             6          7 A Y
             3          4 B Y
             5          5 B X
             7          7 B Y
    
    6 rijen zijn geselecteerd.
    Regards,
    Rob.
    Rob van Wijk
  • 189821
    189821 Member Posts: 656
    Good point!
    Even simpler:
    SQL>r
      1  SELECT   time AS "START_TIME", NVL(next_time, time) AS "END_TIME", myuser AS "USER", value AS "VALUE"
      2    FROM (SELECT time, myuser, value, ROW_NUMBER() OVER(PARTITION BY myuser, value ORDER BY time) AS rn,
      3                 LEAD(time) OVER(PARTITION BY myuser, value ORDER BY time) AS next_time
      4            FROM mytable)
      5     WHERE MOD(rn, 2) = 1
      6* ORDER BY 3, 1
    
    START_TIME   END_TIME U V
    ---------- ---------- - -
             1          2 A X
             5          5 A X
             6          7 A Y
             3          4 B Y
             5          5 B X
             7          7 B Y
    
    6 rows selected.
    
    
    Execution Plan
    ----------------------------------------------------------
       0      SELECT STATEMENT Optimizer=CHOOSE
       1    0   SORT (ORDER BY)
       2    1     VIEW
       3    2       WINDOW (SORT)
       4    3         TABLE ACCESS (FULL) OF 'MYTABLE'
  • Boneist
    Boneist Member Posts: 4,983 Gold Trophy
    metzguar wrote:
    Boneist, you got typos in your sample data.
    Oh yeah! Whoops...!
  • Rob van Wijk
    Rob van Wijk Member Posts: 5,856 Bronze Trophy
    metzguar wrote:
    Even simpler:
    That query only works when a group will never contain more than two rows.
This discussion has been closed.