Skip to Main Content

SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Is there a more efficient way to do this conditional cross-join?

rwsBertmanFeb 25 2015 — edited May 9 2015

Abstracting the situation: we are monitoring the condition of a variety of units, looking at each unit for a recorded period of time. At any single point in time, there may be nearly 8000 units currently being monitored.

Every unit has an ID (required), a start_date (required), and end_date (may be null, i.e. still being monitored), and the date of the most recent info collected for the unit (latest_data; equal to some date later than start_date).

Every unit gets a periodic check (review), which classifies it into a group, identified by group_ID.

If the unit has not been checked yet, then review_date and group_ID are both null.

What I need to do (for the data warehouse) is create a record of the details for each unit for every day of its life in the system, including its most recent review date and current group_ID.

If the unit is still being monitored, i.e. does not have an end_date, then the daily records should continue up to the date of the most recent info collected for the unit, or sysdate if no data was received yet.

  select
     CR.ID, CR.start_date, CR.end_date,
     cal.day, cal.month,
     max(CR.rd) as review_date, --get the most recent review date
     max(CR.group_ID) keep(dense_rank last order by rd nulls first) as group_ID --get the most recent group_ID
  from (
     select
          C.ID, C.start_date, C.end_date, C.latest_data,
          R.rd, R.group_ID
     from table_1 C
     left outer join table_2 R
     on R.ID = C.ID
     ) CR 
  cross join (
     --create a calendar of days from 1-Jan-2012 to sysdate
     select
          day, trunc(day,'month') as month
     from (
          select rownum-1 + to_date('1-Jan-2012','dd-Mon-yyyy') as day
          from (select 1 from dual)
          connect by level <= trunc(sysdate) - to_date('1-Jan-2012','dd-Mon-yyyy')
          )
     ) cal
  where cal.day >= CR.start_date 
  and cal.day <= least(nvl(CR.end_date, CR.latest_data), CR.latest_data, sysdate) 
  --if the review date hasn't happened yet, then show null 
  and (CR.rd <= cal.day or CR.rd is null) 
  group_ID by CR.ID, CR.start_date, CR.end_date, cal.day, cal.month 
  ;

Table_1 (C) has about 17k records, and is fully indexed (=all units ever monitored; about +500 turnaround per month)

Table_2 (R) has about 330k records, and is indexed on ID and group_ID (=all checks on all units ever done; about 4000+ new per month)

When table_1 and table_2 are left-outer-joined, the result set is about 85k records.

The entire data set and query are stored and running on Oracle 10.2.0.3.

I have tried a number of different configurations for this query, and this one seems to be the most efficient, but it still runs rather slowly -- about 10 minutes per month of data / 2 hours per (fiscal) year of data.

I am currently perusing various Oracle docs and other resources on query tuning, but it's a lot to wade through, and I'm not finding much that directly relates to this kind of situation.

Any suggestions, preferably with explanations and/or references would be very appreciated.

Thanks!

Robert

EDIT: Please see reply 18 and 19 for corrections, and reply 23 for sample data and output.

This post has been answered by Stew Ashton on May 9 2015
Jump to Answer

Comments

Locked Post
New comments cannot be posted to this locked post.

Post Details

Locked on Jun 6 2015
Added on Feb 25 2015
42 comments
14,076 views