Abstracting the situation: we are monitoring the condition of a variety of units, looking at each unit for a recorded period of time. At any single point in time, there may be nearly 8000 units currently being monitored.
Every unit has an ID (required), a start_date (required), and end_date (may be null, i.e. still being monitored), and the date of the most recent info collected for the unit (latest_data; equal to some date later than start_date).
Every unit gets a periodic check (review), which classifies it into a group, identified by group_ID.
If the unit has not been checked yet, then review_date and group_ID are both null.
What I need to do (for the data warehouse) is create a record of the details for each unit for every day of its life in the system, including its most recent review date and current group_ID.
If the unit is still being monitored, i.e. does not have an end_date, then the daily records should continue up to the date of the most recent info collected for the unit, or sysdate if no data was received yet.
select
CR.ID, CR.start_date, CR.end_date,
cal.day, cal.month,
max(CR.rd) as review_date, --get the most recent review date
max(CR.group_ID) keep(dense_rank last order by rd nulls first) as group_ID --get the most recent group_ID
from (
select
C.ID, C.start_date, C.end_date, C.latest_data,
R.rd, R.group_ID
from table_1 C
left outer join table_2 R
on R.ID = C.ID
) CR
cross join (
--create a calendar of days from 1-Jan-2012 to sysdate
select
day, trunc(day,'month') as month
from (
select rownum-1 + to_date('1-Jan-2012','dd-Mon-yyyy') as day
from (select 1 from dual)
connect by level <= trunc(sysdate) - to_date('1-Jan-2012','dd-Mon-yyyy')
)
) cal
where cal.day >= CR.start_date
and cal.day <= least(nvl(CR.end_date, CR.latest_data), CR.latest_data, sysdate)
--if the review date hasn't happened yet, then show null
and (CR.rd <= cal.day or CR.rd is null)
group_ID by CR.ID, CR.start_date, CR.end_date, cal.day, cal.month
;
Table_1 (C) has about 17k records, and is fully indexed (=all units ever monitored; about +500 turnaround per month)
Table_2 (R) has about 330k records, and is indexed on ID and group_ID (=all checks on all units ever done; about 4000+ new per month)
When table_1 and table_2 are left-outer-joined, the result set is about 85k records.
The entire data set and query are stored and running on Oracle 10.2.0.3.
I have tried a number of different configurations for this query, and this one seems to be the most efficient, but it still runs rather slowly -- about 10 minutes per month of data / 2 hours per (fiscal) year of data.
I am currently perusing various Oracle docs and other resources on query tuning, but it's a lot to wade through, and I'm not finding much that directly relates to this kind of situation.
Any suggestions, preferably with explanations and/or references would be very appreciated.
Thanks!
Robert
EDIT: Please see reply 18 and 19 for corrections, and reply 23 for sample data and output.