Skip to Main Content

SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

Is there a more efficient way to do this conditional cross-join?

rwsBertmanFeb 25 2015 — edited May 9 2015

Abstracting the situation: we are monitoring the condition of a variety of units, looking at each unit for a recorded period of time. At any single point in time, there may be nearly 8000 units currently being monitored.

Every unit has an ID (required), a start_date (required), and end_date (may be null, i.e. still being monitored), and the date of the most recent info collected for the unit (latest_data; equal to some date later than start_date).

Every unit gets a periodic check (review), which classifies it into a group, identified by group_ID.

If the unit has not been checked yet, then review_date and group_ID are both null.

What I need to do (for the data warehouse) is create a record of the details for each unit for every day of its life in the system, including its most recent review date and current group_ID.

If the unit is still being monitored, i.e. does not have an end_date, then the daily records should continue up to the date of the most recent info collected for the unit, or sysdate if no data was received yet.

  select
     CR.ID, CR.start_date, CR.end_date,
     cal.day, cal.month,
     max(CR.rd) as review_date, --get the most recent review date
     max(CR.group_ID) keep(dense_rank last order by rd nulls first) as group_ID --get the most recent group_ID
  from (
     select
          C.ID, C.start_date, C.end_date, C.latest_data,
          R.rd, R.group_ID
     from table_1 C
     left outer join table_2 R
     on R.ID = C.ID
     ) CR 
  cross join (
     --create a calendar of days from 1-Jan-2012 to sysdate
     select
          day, trunc(day,'month') as month
     from (
          select rownum-1 + to_date('1-Jan-2012','dd-Mon-yyyy') as day
          from (select 1 from dual)
          connect by level <= trunc(sysdate) - to_date('1-Jan-2012','dd-Mon-yyyy')
          )
     ) cal
  where cal.day >= CR.start_date 
  and cal.day <= least(nvl(CR.end_date, CR.latest_data), CR.latest_data, sysdate) 
  --if the review date hasn't happened yet, then show null 
  and (CR.rd <= cal.day or CR.rd is null) 
  group_ID by CR.ID, CR.start_date, CR.end_date, cal.day, cal.month 
  ;

Table_1 (C) has about 17k records, and is fully indexed (=all units ever monitored; about +500 turnaround per month)

Table_2 (R) has about 330k records, and is indexed on ID and group_ID (=all checks on all units ever done; about 4000+ new per month)

When table_1 and table_2 are left-outer-joined, the result set is about 85k records.

The entire data set and query are stored and running on Oracle 10.2.0.3.

I have tried a number of different configurations for this query, and this one seems to be the most efficient, but it still runs rather slowly -- about 10 minutes per month of data / 2 hours per (fiscal) year of data.

I am currently perusing various Oracle docs and other resources on query tuning, but it's a lot to wade through, and I'm not finding much that directly relates to this kind of situation.

Any suggestions, preferably with explanations and/or references would be very appreciated.

Thanks!

Robert

EDIT: Please see reply 18 and 19 for corrections, and reply 23 for sample data and output.

This post has been answered by Stew Ashton on May 9 2015
Jump to Answer

Comments

Processing
Locked Post
New comments cannot be posted to this locked post.

Post Details

Locked on Jun 6 2015
Added on Feb 25 2015
42 comments
13,878 views