Skip to Main Content

SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Data Joins and Pivot Help

684479Jan 22 2010 — edited Jan 27 2010
Hi, I have one fact table and one huge history table that goes really deep with scores stored with month end dates.

I am looking for a way to manipulate the month and year of DateChanged in table A along with joins on the acct no to retrieve scores from previous months, resulting in a de-normalized data format as in the result table.

Any ideas?? Thanks!


Table A

Acct	DateChanged
-----------------------------------
123	1/12/2008
456	8/25/2008
 


Table B:
 
Acct           DateUpload        Score
-----------------------------------------
123           11/30/2007        620
123           12/31/2007        650
123           01/31/2008        712
123           02/29/2008        800
456           05/31/2008        520  
456           06/30/2008        630
456           07/31/2008        680
456           08/31/2008        710
456           09/30/2008        780
456           10/31/2008        786
456           11/30/2008        823



Result Table:
		        
Acct      Score2MonthPrior     Score1MonthPrior
-------------------------------------------------------
123            620                     650
456            630                     680
This post has been answered by Frank Kulash on Jan 22 2010
Jump to Answer

Comments

Hoek
Hi Wateenmooiedag,

it's cloudy over here btw ;)

Have you tried using LAG:
MHO%xe> select t
  2  ,      userid
  3  ,      case when trunc( mod( (t-lag(t) over ( partition by userid order by userid, t))*24*60, 60 ) ) >= 1
  4              then 1
  5              else 0
  6         end cnt
  7  from   clickstream;

T                       USERID        CNT
------------------- ---------- ----------
05-09-2009 10:00:00          2          0
05-09-2009 10:00:24          2          0
05-09-2009 10:01:23          2          0
05-09-2009 10:02:40          2          1
05-09-2009 00:58:24          7          0
05-09-2009 02:30:33          7          1

6 rijen zijn geselecteerd.
( It reminded me of this link:http://asktom.oracle.com/tkyte/Misc/DateDiff.html )
486393
It is not exactly what I had in mind.

When I add two new rows:

insert into clickstream values (to_date('5-9-2009 10:02:50','dd-mm-yyyy hh24:mi:ss') , 2);
insert into clickstream values (to_date('5-9-2009 10:05:40','dd-mm-yyyy hh24:mi:ss') , 2);

It should return:
T                       USERID        CNT
------------------- ---------- ----------
05-09-2009 10:00:00          2          0
05-09-2009 10:00:24          2          0
05-09-2009 10:01:23          2          0
05-09-2009 10:02:40          2          1
05-09-2009 10:02:50          2          1
05-09-2009 10:05:40          2          2
05-09-2009 00:58:24          7          0
05-09-2009 02:30:33          7          1
Because user 2 started a new session on 10:02:40 and another one on 10:05:40 .


By the way it's cloudy here too but that should not suprise you! :) (wateenmooiedag means what a beautiful day in Dutch)

Edited by: wateenmooiedag on Sep 6, 2009 3:33 AM
Hoek
Aha, OK, that calls for some adjustment , indeed.

One little thing:
Shouldn't
insert into clickstream values (to_date('5-9-2009 10:02:40','dd-mm-yyyy hh24:mi:ss') , 2);
(from your extra inserts)
be
insert into clickstream values (to_date('5-9-2009 10:02:50','dd-mm-yyyy hh24:mi:ss') , 2);
when looking at your desired output?

And what exactly makes the requirement for CNT = 2 on 10:05:40?
And how does 10:02:50 being 1 relate to that (since it differs only 10 secs from the previous t)?
Because I interpreted the requirement like:
"For at least a difference of 60 seconds (one minute) between the current and previous t we will set CNT to 1."
But that apparently isn't enough.
486393
It should indeed be: insert into clickstream values (to_date('5-9-2009 10:02:_50_'...

Every session should have a unique number (SESSIONID) assigned (unique per userid).

I should have renamed column CNT to SESSIONID.
Hoek
If the number has to be unique but it has no further meaning or purpose then maybe using row_number() is OK here:
MHO%xe> select t
  2  ,      userid
  3  ,      nvl(last_value(nullif(sessionid, 0) ignore nulls) over ( partition by userid order by userid, t),0) sessionid
  4  from ( select t
  5         ,      userid 
  6  --       ,      trunc( mod( (t-lag(t) over ( partition by userid order by userid, t))*24*60, 60 ) ) minutes
  7         ,      case when trunc( mod( (t-lag(t) over ( partition by userid order by userid, t))*24*60, 60 )) >= 1
  8                     then row_number() over ( partition by userid order by userid, t)
  9                     else 0
 10                end sessionid
 11         from   clickstream
 12       );

T                       USERID  SESSIONID
------------------- ---------- ----------
05-09-2009 10:00:00          2          0
05-09-2009 10:00:24          2          0
05-09-2009 10:01:23          2          0
05-09-2009 10:02:40          2          4
05-09-2009 10:02:50          2          4
05-09-2009 10:05:40          2          6
05-09-2009 00:58:24          7          0
05-09-2009 02:30:33          7          2

8 rijen zijn geselecteerd.
I've made a few assumptions here regarding setting sessionid to 4 on 10:02:50:
Once a session is identified, then keep that session id for the next records, regardless of the difference in minutes/seconds until the difference is at least a minute again.
486393
That looks very good!! I added a dense_rank() to improve the sessionid:
select t
,      userid
,      -1 + (dense_rank() over (partition by userid order by sessionid)) sessionid
from
(
select t
,      userid
,      nvl(last_value(nullif(sessionid, 0) ignore nulls) over 
       ( partition by userid order by userid, t),0) sessionid
from ( select t
     ,      userid 
     ,      case when trunc( mod( (t-lag(t) over ( partition by userid order by userid, t))*24*60, 60 )) >= 1
                    then row_number() over ( partition by userid order by userid, t)
                    else 0
            end sessionid
     from   clickstream
)
)
/ 
You can read here more: http://www.asterdata.com/resources/downloads/whitepapers/sqlmr.pdf , chapter 1.2 .

Edited by: wateenmooiedag on Sep 6, 2009 5:27 AM
Hoek
Yes ofcourse, dense_rank() tops it off nicely, didn't think of that one.

Interesting link by the way, thanks Tuinstoel.
It states SQL needs 'an expensive self-join' but by using analytics we don't need a self-join at all.
MHO%xe> select t
  2  ,      userid
  3  ,      -1 + (dense_rank() over (partition by userid order by sessionid)) sessionid
  4  from ( select t
  5         ,      userid
  6         ,      nvl(last_value(nullif(sessionid, 0) ignore nulls) over 
  7                ( partition by userid order by userid, t),0) sessionid
  8         from ( select t
  9                ,      userid 
 10                ,      case when trunc( mod( (t-lag(t) over ( partition by userid order by userid, t))*24*60, 60 )) >= 1
 11                               then dense_rank() over ( partition by userid order by userid, t)
 12                               else 0
 13                       end sessionid
 14                from   clickstream
 15              )
 16       );
Verstreken: 00:00:00.25

Uitvoeringspan
----------------------------------------------------------
Plan hash value: 1413192182

--------------------------------------------------------------------------------------
| Id  | Operation              | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |             |     8 |   280 |     5  (40)| 00:00:01 |
|   1 |  WINDOW SORT           |             |     8 |   280 |     5  (40)| 00:00:01 |
|   2 |   VIEW                 |             |     8 |   280 |     4  (25)| 00:00:01 |
|   3 |    WINDOW BUFFER       |             |     8 |   280 |     4  (25)| 00:00:01 |
|   4 |     VIEW               |             |     8 |   280 |     4  (25)| 00:00:01 |
|   5 |      WINDOW SORT       |             |     8 |   176 |     4  (25)| 00:00:01 |
|   6 |       TABLE ACCESS FULL| CLICKSTREAM |     8 |   176 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement
It would be very interesting to see how this query would look like when using the MODEL-clause or some XML function, by the way...
I think this requirement can be met using less code that way.

hopes a few other well-known users will drop in here ;)
MichaelS
Answer
Looks to me you can do a bit simpler:
SQL>  select userid, t, nvl (sum (s) over (partition by userid order by t), 0) sessionid
from (select userid, 
             t, 
             case when t - lag(t) over (partition by userid order by t) > 1 / (24 * 60) then 1 end s
      from clickstream
      order by userid, t)
/
    USERID T                      SESSIONID
---------- --------------------- ----------
         2 05.09.2009 10:00:00            0
         2 05.09.2009 10:00:24            0
         2 05.09.2009 10:01:23            0
         2 05.09.2009 10:02:40            1
         2 05.09.2009 10:02:50            1
         2 05.09.2009 10:05:40            2
         7 05.09.2009 00:58:24            0
         7 05.09.2009 02:30:33            1

8 rows selected.
Marked as Answer by 486393 · Sep 27 2020
Hoek
Nice and very elegant, Michael!
It looks so easy, but it just never came to my mind...
memorizing the '> 1 / (24 * 60)' - part
Aketi Jyuuzou
I like using interVal ;-)
create table streamT(USERID,TimeVal) as
select 2,to_date('2009-09-05 10:00:00','YYYY/MM/DD HH24:MI:SS') from dual union
select 2,to_date('2009-09-05 10:00:24','YYYY/MM/DD HH24:MI:SS') from dual union
select 2,to_date('2009-09-05 10:01:23','YYYY/MM/DD HH24:MI:SS') from dual union
select 2,to_date('2009-09-05 10:02:40','YYYY/MM/DD HH24:MI:SS') from dual union
select 2,to_date('2009-09-05 10:02:50','YYYY/MM/DD HH24:MI:SS') from dual union
select 2,to_date('2009-09-05 10:05:40','YYYY/MM/DD HH24:MI:SS') from dual union
select 7,to_date('2009-09-05 00:58:24','YYYY/MM/DD HH24:MI:SS') from dual union
select 7,to_date('2009-09-05 02:30:33','YYYY/MM/DD HH24:MI:SS') from dual;

select USERID,to_char(TimeVal,'yyyy-mm-dd hh24:mi:ss') as TimeVal,
count(willCnt) over(partition by USERID order by TimeVal) as SESSIONID
from (select USERID,TimeVal,
      case when TimeVal-interVal '1' minute >=
                Lag(TimeVal) over(partition by USERID order by TimeVal)
           then 1 end as willCnt
        from streamT)
order by USERID,TimeVal;

 USERID  TimeVal              SESSIONID
-------  -------------------  ---------
      2  2009-09-05 10:00:00          0
      2  2009-09-05 10:00:24          0
      2  2009-09-05 10:01:23          0
      2  2009-09-05 10:02:40          1
      2  2009-09-05 10:02:50          1
      2  2009-09-05 10:05:40          2
      7  2009-09-05 00:58:24          0
      7  2009-09-05 02:30:33          1
1 - 10
Locked Post
New comments cannot be posted to this locked post.

Post Details

Locked on Feb 24 2010
Added on Jan 22 2010
14 comments
2,201 views