1 2 3 Previous Next 42 Replies Latest reply on May 9, 2015 11:20 AM by Stew Ashton Go to original post
      • 30. Re: Is there a more efficient way to do this conditional cross-join?
        Jonathan Lewis

        Randolf,

         

        Thanks for that; for some reason I always manage to get it wrong - even when I remember to think very carefully about how I got it wrong previously.

         

        Regards

        Jonathan Lewis

        • 31. Re: Is there a more efficient way to do this conditional cross-join?
          Jonathan Lewis

          It looks like Oracle has an error in the times for the output from the ALLSTATS call - I think it's lost the time spent in 14 line, the line that probes the sorted data 94,000 times.  (Note it doesn't really SORT the data that many times, it sorts it once on the first pass then revisits the sorted data in memory).  It looks like you've changed your earliest date to 152 days ago to be able to run the query in a reasonable time (judging from the A-rows = 153 in line 15 - note also that 94,000 * 153 = 14M the A-rows in  line 14).

           

          The key feature to note is that your plan joins t1 and t2 to, scaling up from 17,000 rows to 94,000 rows before joining to the date range; James solution aims to join t1 to the date ranges, using the smaller number of probes. Then Chris' suggestion is trying to find a way to aggregate early rather than generating 3.6M rows and aggregating them (I haven't looked at his suggestion, though, so I don't know if it would have any effect).

           

          There is an oddity, though - the 3.6M rows are aggregated down to 250, but Oracle claims that it needs to write 717,000 blocks (5.6GB) and then only reads 1322 blocks.  Whatever the time missed in line 14 it seems likely that there's a lot of time being spent in this surprising amount of writing. Since you're on 10.2.0.3, and there's some use of analytic functions involved, it's possible that this is related to an issue I wrote about a few years ago:  https://jonathanlewis.wordpress.com/2009/09/07/analytic-agony/

           

          If I were in your position, I'd probably be working to test James' solution; but I'd also enable the 10032, 10033, and 10046 traces (as shown in the article) to find out exactly what all the I/O was, and I'd also check the session activity statistics and wait events. This might then encourage me to test the effect of manually setting a much larger sort_area_size to see how that affects the performance.

           

          Regards

          Jonathan Lewis

          1 person found this helpful
          • 32. Re: Is there a more efficient way to do this conditional cross-join?
            rwsBertman

            Jonathan,

            Thanks so much for taking the time to analyze my query.

            Yes, I did reduce the calendar subquery to the last quarter (i.e. since 1-Oct-2014) simply to reduce the exec time for getting stats.

            I am going to test James' and Chris' suggested solutions in more detail as I have time over the next few weeks.

             

            Much thanks to all, this really has been very educational.

            I will mark someone's post as the correct answer when I have had sufficient time to determine my optimal solution; thanks for your patience until then. :-)

            • 33. Re: Is there a more efficient way to do this conditional cross-join?
              chris227

              Dont consider my solution, i probably misread your data and requirements.

              • 34. Re: Re: Is there a more efficient way to do this conditional cross-join?
                Stew Ashton

                James Su wrote:

                 

                Without that sys_guid it will raise this error:

                ORA-01436: CONNECT BY loop in user data

                 

                This is because Oracle doesn't see any difference between the prior and current row.

                There is more to it than that. Blushadow explained it to me a while back: https://community.oracle.com/thread/2601745?start=15&tstart=0

                Yes, "id = prior id" will cause a connect by loop, but you can avoid that by saying "nocycle": the exception will not be raised, but the "connect by" will stop because as you say "prior" and "current" are no different.

                Here we not only want to avoid the error, but we want to force the "connect by" to keep going, so we add an artificial unique value to make "prior" and "current" different.

                 

                Here is a simple example:

                 

                > with data as (

                  select 1 n from dual

                  union all

                  select 2 from dual

                )

                select n, level lvl from data

                connect by n = prior n and level <= 2;

                ...

                SQL Error: ORA-01436: CONNECT BY loop in user data

                ...

                > with data as (

                  select 1 n from dual

                  union all

                  select 2 from dual

                )

                select n, level lvl from data

                connect by nocycle n = prior n and level <= 2;

                 

                         N        LVL

                ---------- ----------

                         1          1

                         2          1

                 

                > with data as (

                  select 1 n from dual

                  union all

                  select 2 from dual

                )

                select n, level lvl from data

                connect by n = prior n and level <= 2

                and prior sys_guid() is not null;

                 

                         N        LVL

                ---------- ----------

                         1          1

                         1          2

                         2          1

                         2          2

                • 35. Re: Re: Is there a more efficient way to do this conditional cross-join?
                  Stew Ashton

                  I'm coming late to this party, however I think Jame's solution can be improved on: it does two table scans on each table.

                   

                  My approach goes like this:

                  • TABLE_1 contains one row per ID, with a date range;
                  • TABLE_2 contains 0 to N rows per ID, with date ranges that can be calculated;
                  • I join two rows when the date ranges intersect;
                  • After the join, I expand the result to get one row per date.

                  I calculate the date ranges for TABLE_2 using the lead() analytic function.

                  There are special cases when TABLE_2 has 0 rows, or when the TABLE_1 date range starts before the first TABLE_2 date range. I use the lag() analytic function to identify the first TABLE_2 date range.

                  Here is the code:


                  select id, start_date, end_date,
                    column_value day,
                    trunc(column_value,'MM') month,
                    case when column_value < rd then null else rd end review_date,
                    case when column_value < rd then null else group_id end group_id
                  from (
                    select id, start_date, end_date, final_date, group_id, rd,
                      greatest(start_date, nvl2(lag_rd, rd, start_date)) range_start,
                      least(final_date, coalesce(lead_rd, final_date)) range_end
                    from (
                      select id, start_date, end_date,
                        least(
                          coalesce(end_date, latest_data, start_date),
                          coalesce(latest_data, start_date)
                        ) + 1 final_date
                      from table_1
                    )
                    left join (
                      select id, group_id, rd,
                        lag(rd) over(partition by id order by rd) lag_rd,
                        lead(rd) over(partition by id order by rd) lead_rd
                      from table_2
                    ) using (id)
                    where start_date < coalesce(lead_rd, final_date)
                      and final_date > nvl2(lag_rd, rd, start_date)
                  ),
                  table(cast(multiset(
                    select range_start - 1 + level from dual
                    connect by range_start - 1 + level < range_end
                  ) as sys.odcidatelist));
                  
                  

                   

                  And here are the details of the execution plans with 1000 IDs. Notice I do one full scan per table and I use lots less memory.

                  James:

                  ----------------------------------------------------------------------------------------------------------------------------------
                  | Id  | Operation                         | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
                  ----------------------------------------------------------------------------------------------------------------------------------
                  |   0 | SELECT STATEMENT                  |         |      1 |        |    154K|00:00:01.38 |     166 |       |       |          |
                  |   1 |  VIEW                             | VBEFORE |      1 |   1000 |    154K|00:00:01.38 |     166 |       |       |          |
                  |   2 |   SORT ORDER BY                   |         |      1 |   1000 |    154K|00:00:01.31 |     166 |    11M|  1285K|    9M (0)|
                  |   3 |    WINDOW SORT                    |         |      1 |   1000 |    154K|00:00:03.20 |     166 |    13M|  1384K|   11M (0)|
                  |*  4 |     HASH JOIN OUTER               |         |      1 |   1000 |    154K|00:00:01.41 |     166 |    13M|  2774K|   15M (0)|
                  |   5 |      VIEW                         |         |      1 |   1000 |    154K|00:00:01.95 |      90 |       |       |          |
                  |*  6 |       CONNECT BY WITHOUT FILTERING|         |      1 |        |    154K|00:00:01.00 |      90 | 99328 | 99328 |88064  (0)|
                  |*  7 |        HASH JOIN RIGHT OUTER      |         |      1 |   1000 |   1000 |00:00:00.04 |      90 |  1393K|  1393K| 1475K (0)|
                  |   8 |         VIEW                      |         |      1 |    990 |    432 |00:00:00.02 |      83 |       |       |          |
                  |   9 |          SORT GROUP BY            |         |      1 |    990 |    432 |00:00:00.02 |      83 | 43008 | 43008 |38912  (0)|
                  |* 10 |           HASH JOIN               |         |      1 |    990 |   8385 |00:00:00.11 |      83 |  1557K|  1557K| 1567K (0)|
                  |  11 |            TABLE ACCESS FULL      | TABLE_1 |      1 |   1000 |   1000 |00:00:00.01 |       7 |       |       |          |
                  |  12 |            TABLE ACCESS FULL      | TABLE_2 |      1 |  19795 |  19480 |00:00:00.03 |      76 |       |       |          |
                  |  13 |         TABLE ACCESS FULL         | TABLE_1 |      1 |   1000 |   1000 |00:00:00.01 |       7 |       |       |          |
                  |  14 |      TABLE ACCESS FULL            | TABLE_2 |      1 |  19795 |  19480 |00:00:00.03 |      76 |       |       |          |
                  ----------------------------------------------------------------------------------------------------------------------------------
                  
                  Predicate Information (identified by operation id):
                  ---------------------------------------------------
                  
                     4 - access("R"."RD"=INTERNAL_FUNCTION("T1"."DAY") AND "R"."ID"="T1"."ID")
                     6 - access("C"."ID"=PRIOR NULL)
                     7 - access("C"."ID"="T2"."ID")
                    10 - access("R"."ID"="C"."ID")
                         filter("R"."RD"<"C"."START_DATE")
                  

                   

                  Stew:

                  ------------------------------------------------------------------------------------------------------------------------------------
                  | Id  | Operation                           | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
                  ------------------------------------------------------------------------------------------------------------------------------------
                  |   0 | SELECT STATEMENT                    |         |      1 |        |    154K|00:00:00.94 |      83 |       |       |          |
                  |   1 |  NESTED LOOPS                       |         |      1 |    161M|    154K|00:00:00.94 |      83 |       |       |          |
                  |*  2 |   FILTER                            |         |      1 |        |   3673 |00:00:00.29 |      83 |       |       |          |
                  |*  3 |    HASH JOIN OUTER                  |         |      1 |  19795 |  20034 |00:00:00.18 |      83 |  1229K|  1229K| 1452K (0)|
                  |   4 |     TABLE ACCESS FULL               | TABLE_1 |      1 |   1000 |   1000 |00:00:00.01 |       7 |       |       |          |
                  |   5 |     VIEW                            |         |      1 |  19795 |  19480 |00:00:00.10 |      76 |       |       |          |
                  |   6 |      WINDOW SORT                    |         |      1 |  19795 |  19480 |00:00:00.08 |      76 |   903K|   523K|  802K (0)|
                  |   7 |       TABLE ACCESS FULL             | TABLE_2 |      1 |  19795 |  19480 |00:00:00.02 |      76 |       |       |          |
                  |   8 |   COLLECTION ITERATOR SUBQUERY FETCH|         |   3673 |   8168 |    154K|00:00:00.73 |       0 |       |       |          |
                  |   9 |    CONNECT BY WITHOUT FILTERING     |         |   3673 |        |    154K|00:00:00.48 |       0 |  2048 |  2048 | 2048  (0)|
                  |  10 |     FAST DUAL                       |         |   3673 |      1 |   3673 |00:00:00.01 |       0 |       |       |          |
                  ------------------------------------------------------------------------------------------------------------------------------------
                  
                  Predicate Information (identified by operation id):
                  ---------------------------------------------------
                  
                     2 - filter(("START_DATE"<COALESCE("from$_subquery$_005"."LEAD_RD",LEAST(COALESCE("END_DATE","LATEST_DATA","START_DATE"),C
                                OALESCE("LATEST_DATA","START_DATE"))+1) AND LEAST(COALESCE("END_DATE","LATEST_DATA","START_DATE"),COALESCE("LATEST_DATA","ST
                                ART_DATE"))+1>NVL2("from$_subquery$_005"."LAG_RD","from$_subquery$_005"."RD","START_DATE")))
                     3 - access("ID"="from$_subquery$_005"."ID")
                  
                  • 36. Re: Is there a more efficient way to do this conditional cross-join?
                    rwsBertman

                    Stew,

                    Thanks for adding your input, always welcome. It looks interesting, I will investigate it.

                     

                    chris227,

                    I will consider your solution anyway, even if it needs modification to comply with the requirements. Thanks for your input.

                     

                    My delay for getting back to this thread is due to a server upgrade and Oracle version upgrade (to 11g2), so it might be a little while longer yet...

                    • 37. Re: Is there a more efficient way to do this conditional cross-join?
                      rwsBertman

                      Well, I still don't have permissions to do performance analysis on the new server yet, but in the meantime, I had a brainstorm.

                       

                      I did some research on managing historical and temporal data, and among the plethora of resources, I came across this blog: https://www.salvis.com/blog/2012/12/24/joining-temporal-intervals/ (and following pages). Yes it's a bit dated, but his example shed a new light on how to approach my query.

                       

                      Essentially, what I'm trying to do is a temporal join, where dates in one set need to coincide with dates in the other set.

                      If I consider the start and end dates of each unit as one interval, and the time between each unit's review as another interval(s), then I can build a temporal join the same way Philipp did in his blog example:

                       

                        SELECT
                          t1.ID, t1.start_date, t1.end_date, t1.latest_data,
                          t2.rd, t2.next_rd, t2.group_ID,
                          --the first valid_from date from the temporal join is the start of each unit's valid interval ( = start_date from table_1)
                          MIN(v.valid_from) AS valid_from,
                          --calculate the end of the unit's valid interval based on my business rules)
                          --( = the earliest of next_RD, end_date, and latest_data, for whichever ones exist)
                          LEAST(COALESCE(next_RD, end_date, latest_data), nvl(end_date, latest_data), latest_data) AS valid_to
                        FROM table_1 t1
                      
                        JOIN (
                          --collect all start_dates and review dates from both tables, as interval start dates
                          SELECT ID, start_date as valid_from FROM table_1
                          UNION ALL
                          SELECT ID, rd as valid_from FROM table_2
                          ) v
                          --for each unit,
                          ON v.ID = t1.ID
                          --find all the interval start dates for the unit (across both data sets) that occur while the unit is active
                          AND v.valid_from BETWEEN t1.start_date AND NVL(t1.end_date, DATE '9999-12-31')
                      
                        --if a review exists, find all the interval start dates when it was valid
                        LEFT OUTER JOIN (
                          select id, rd, group_id,
                            --look ahead to the day before the next review date for the unit to create a valid date range for each review
                            lead(rd,1) over(partition by ID order by rd)-1 as next_rd
                          from table_2
                          ) t2
                          --for each unit,
                          ON v.ID = t2.ID
                          --find all the interval start dates for the unit (across both data sets) that occur between each review date
                          AND v.valid_from BETWEEN t2.rd AND NVL(t2.next_rd, DATE '9999-12-31')
                      
                        GROUP BY
                          t1.ID, t1.start_date, t1.end_date, t1.latest_data,
                          t2.rd, t2.next_rd, t2.group_ID
                        ORDER BY ID, start_date, rd, valid_from, valid_to;
                      

                       

                      The result is a kind of history table, with sequential valid_from and valid_to dates for each record for each unit, based on start, end, review, and latest_data dates.

                      The most recent review date before a unit's start date is automatically selected, along with the correct group ID for each unit's review date, without requiring any (additional) analytic functions. :-)

                       

                      Note that Philipp's example depends on there being no gaps between intervals in either data set.

                      Since each unit in my table_1 has only one interval, this condition is not a problem. In table_2, the condition is easily accommodated by using LEAD to find the next review date for each unit, creating adjacent sequential review intervals.

                       

                      Now this history table can be joined to a generated calendar table without needing any grouping or sorting or analytic functions at the calendar join level, which saves a bleepload of time and processing resources:

                        SELECT /*+ PARALLEL(cal) */
                          ID, start_date, end_date, latest_data, rd, group_ID,
                          cal.day
                        FROM (
                          --create a calendar of days from 1-Jan-2012 to sysdate
                          SELECT TO_DATE('1-Jan-2012')+level-1 as day from dual
                          CONNECT BY LEVEL <= (TRUNC(SYSDATE) - TO_DATE('1-Jan-2012'))+1
                          ) cal
                        JOIN (
                          --all of the above query, except without the ORDER BY clause
                          ) h
                        ON cal.day BETWEEN h.valid_from AND h.valid_to;
                      

                       

                      Implementation of this query structure (with minor tweaks) using 3 years' worth of production data improved the performance of the original query from well over 3.5 hours down to about 50 minutes (or less, if you don't count the time to rebuild indexes).

                      It also enabled processing of more than just 3 months of data at a time without choking on out-of-temp-tablespace errors. (I haven't tried, but it probably would be able to do the whole table of 3 years at once.)

                       

                      EXPLAIN PLAN on the production-data query:

                      Plan hash value: 4260020065
                      ------------------------------------------------------------------------------------------------------
                      | Id  | Operation                      | Name        | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
                      ------------------------------------------------------------------------------------------------------
                      |   0 | SELECT STATEMENT               |             |     2 |  4206 |       |   255   (2)| 00:00:04 |
                      |   1 |  NESTED LOOPS                  |             |     2 |  4206 |       |   255   (2)| 00:00:04 |
                      |   2 |   VIEW                         |             |     1 |     6 |       |     2   (0)| 00:00:01 |
                      |*  3 |    CONNECT BY WITHOUT FILTERING|             |       |       |       |            |          |
                      |   4 |     FAST DUAL                  |             |     1 |       |       |     2   (0)| 00:00:01 |
                      |*  5 |   VIEW                         |             |     2 |  4194 |       |  3645   (2)| 00:00:44 |
                      |   6 |    SORT GROUP BY               |             |   886 | 53160 |       |  3645   (2)| 00:00:44 |
                      |*  7 |     HASH JOIN OUTER            |             |   886 | 53160 |       |  3644   (2)| 00:00:44 |
                      |*  8 |      HASH JOIN                 |             |   886 | 33668 |       |  1191   (1)| 00:00:15 |
                      |   9 |       MAT_VIEW ACCESS FULL     | TEMP_1_LIVE | 17599 |   429K|       |   141   (1)| 00:00:02 |
                      |  10 |       VIEW                     |             |   303K|  3856K|       |  1047   (1)| 00:00:13 |
                      |  11 |        UNION-ALL               |             |       |       |       |            |          |
                      |  12 |         MAT_VIEW ACCESS FULL   | TEMP_1_LIVE | 17599 |   223K|       |   141   (1)| 00:00:02 |
                      |  13 |         MAT_VIEW ACCESS FULL   | TEMP_2_LIVE |   286K|  3633K|       |   913   (2)| 00:00:11 |
                      |  14 |      VIEW                      |             |   286K|  6149K|       |  2450   (2)| 00:00:30 |
                      |  15 |       WINDOW SORT              |             |   286K|  4472K|    15M|  2450   (2)| 00:00:30 |
                      |  16 |        MAT_VIEW ACCESS FULL    | TEMP_2_LIVE |   286K|  4472K|       |   913   (2)| 00:00:11 |
                      ------------------------------------------------------------------------------------------------------
                      Predicate Information (identified by operation id):
                      ---------------------------------------------------
                         3 - filter(LEVEL<=TRUNC(SYSDATE@!)-TO_DATE('2012-01-01 00:00:00', 'yyyy-mm-dd hh24:mi:ss')+1)
                         5 - filter("H"."VALID_FROM"<=INTERNAL_FUNCTION("CAL"."DAY") AND
                                    "H"."VALID_TO">=INTERNAL_FUNCTION("CAL"."DAY"))
                         7 - access("V"."ID"="T2"."ID"(+))
                             filter("V"."VALID_FROM"<=NVL("T2"."NEXT_RD"(+),TO_DATE('9999-12-31 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))
                                    AND "V"."VALID_FROM">="T2"."RD"(+))
                         8 - access("V"."ID"="T1"."ID")
                             filter("V"."VALID_FROM">="START_DATE" AND
                                    "V"."VALID_FROM"<=NVL("END_DATE",TO_DATE('9999-12-31 00:00:00', 'yyyy-mm-dd hh24:mi:ss')))
                      

                       

                      I expect there are still a number of other optimization techniques and tweaks and such that could be done to improve it some more...

                      Any thoughts?

                      • 38. Re: Re: Is there a more efficient way to do this conditional cross-join?
                        Randolf Geist

                        rwsBertman wrote:

                         

                        I expect there are still a number of other optimization techniques and tweaks and such that could be done to improve it some more...

                        Any thoughts?

                        I haven't spent much time looking into the logic, but looking at the execution plan posted there is a Nested Loop join which is driven by your cal(endar) subquery based on DUAL - however the optimizer doesn't have a clue how many rows (days) your subquery will produce and estimates one row, so assumes to iterate over the complex inline view exactly once.

                         

                        Therefore you could try to provide more information to the optimizer, by using a suitable CARDINALITY hint in the "cal" subquery:

                         

                        FROM (

                            --create a calendar of days from 1-Jan-2012 to sysdate

                            SELECT /*+ CARDINALITY(1000) NO_MERGE */ TO_DATE('1-Jan-2012')+level-1 as day from dual

                            CONNECT BY LEVEL <= (TRUNC(SYSDATE) - TO_DATE('1-Jan-2012'))+1

                            ) cal

                         

                        where 1000 is of course just a rough ballpark figure but much better than the estimate of 1 row. The CARDINALITY in that form is only relevant as long as the query block isn't merged, so you could add an explicit NO_MERGE hint to make this more obvious (and safe in case of upgrades), but at present this subquery cannot be merged anyway, so this is optional.

                         

                        This way the Nested Loop join might turn into a Merge/Sort Join which sorts both row sources to join and performs afterwards an index-like lookup from one row source into the other which is not much different from the iterations that the Nested Loop Join performs but possibly more efficient due to the single evaluation of the complex view and access to pre-sorted data held in the PGA.

                         

                        Whether this really is going to improve performance depends on a number of things, but it's at least worth a try.

                         

                        Randolf

                        • 39. Re: Re: Is there a more efficient way to do this conditional cross-join?
                          Stew Ashton

                          Please don't do that.

                           

                          You don't need to join to an artificial calendar table. I showed you another way using

                           

                          table(cast(multiset(  
                            select range_start - 1 + level from dual  
                            connect by range_start - 1 + level < range_end  
                          ) as sys.odcidatelist));  
                          
                          

                           

                          But that's not the big problem. Your first query produces some overlapping date ranges: that can't be what you want.

                           

                          We all need to start over from the same place.

                          • Please provide CREATE TABLE statements with constraints (NOT NULL, primary and foreign keys) so we know the data is consistent
                          • Please provide minimal test data that shows one example of each condition (for example first review date earlier than start date, or later, or the same)
                          • Show the expected output.
                          • As a bonus, say exactly what is wrong with James' and my solutions.

                          I guarantee you a much faster solution with fewer lines of SQL, provided you share everything needed and give feedback about our solutions.

                          You have world-renowned experts analyzing your execution plans to see what is going on. That is awesome, but it may be a sign that your SQL needs more than tuning or adjusting - it may just need to be rewritten from scratch.

                           

                          P.S. The solution I posted earlier does not produce the results I intended. There is a bug in the JOIN condition. I'll post something I hope is correct once new input data is available.

                          • 40. Re: Is there a more efficient way to do this conditional cross-join?
                            rwsBertman

                            Stew,

                             

                            Thank you for your insights, I appreciate your help.

                            I do realize the quality of the help and analysis provided so far on this topic, and I am very grateful. I have learned a lot from this.

                             

                            There is nothing really "wrong" with any of the solutions provided so far, they all have their merits, and they all are capable of providing the required output. I haven't had the means to test and compare the solutions on the new server yet, but I will certainly post my findings when I do.

                             

                            My last post was simply an attempt to solve the problem from a different angle, and it seems to be a significant improvement on the original query, so why do you say "Please don't do that"? What specifically are you referring to that I shouldn't do? (If you are referring to the calendar table, I didn't see any significant performance difference in the query when using your sys.ocidatelist suggestion... is it more efficient in some way? I am more than willing to use it if it is.)

                             

                            Please accept my apologies for the delays between posts, and lack of timely feedback, as I have had other priorities to attend to. I will provide object SQL, minimal test data, and expected results when I have the time and the means to do so again.

                            • 41. Re: Is there a more efficient way to do this conditional cross-join?
                              Stew Ashton

                              rwsBertman wrote:

                               

                              ...why do you say "Please don't do that"? What specifically are you referring to that I shouldn't do?

                              • As I said in my previous post: "Your first query produces some overlapping date ranges: that can't be what you want." Right now I'm not sure which bits of your query need to be changed to fix this.
                              • It is better to full scan each table once, not twice. My solution aims for this.
                              • It seems more efficient to get the answer in date range form, then expand - rather than expand and then get the answer, especially if expanding generates unneeded rows.
                              • My specific use of TABLE(CAST(MULTISET(... is not quite as efficient as James Su's START WITH...CONNECT BY technique, though either one will work.

                              Please accept my apologies for the delays between posts, and lack of timely feedback, as I have had other priorities to attend to. I will provide object SQL, minimal test data, and expected results when I have the time and the means to do so again.

                              No apologies needed. I appreciate your coming back to dialog; it allows me to discuss ideas that interest me and get feedback from real people.

                              If you're interested, I wrote a blog post explaining my general approach to your question: https://stewashton.wordpress.com/2015/03/21/join-tables-on-date-ranges/

                              • 42. Re: Re: Is there a more efficient way to do this conditional cross-join?
                                Stew Ashton

                                Since you were kind enough to reply, I am going to try to create test conditions myself. Please correct me as I go along.

                                 

                                First, here are James Su's CREATE TABLE statements with added constraints to make my assumptions clear. Please let me know if there is too much or too little.

                                 

                                drop table table_1 cascade constraints purge;
                                
                                create table table_1 (
                                  ID number primary key,
                                  start_date date not null,
                                  end_date date,
                                  latest_data date not null,
                                  check(end_date >= start_date),
                                  check(latest_data >= start_date)
                                );
                                
                                drop table table_2 purge;
                                
                                create table table_2 (
                                  ID references table_1,
                                  rd date,
                                  group_ID varchar2(3) not null,
                                  primary key(id, rd)
                                );
                                

                                 

                                I believe in table_1 you use INCLUSIVE end_date and latest_data, that is you want a row in the output for that day.

                                 

                                Next, here is test data that covers different conditions. See the blog post I referenced above for the list of conditions. When S_ID = 0, that is a special condition where there is no row in TABLE_2 at all.

                                 

                                insert into table_1
                                select rownum-1 id,
                                trunc(sysdate,'YY') start_date,
                                case mod(level,3)
                                  when 0 then trunc(sysdate,'YY')+5
                                  when 1 then trunc(sysdate,'YY')+6
                                end end_date,
                                case mod(level,3)
                                  when 0 then trunc(sysdate,'YY')+6
                                  else trunc(sysdate,'YY')+5
                                end latest_data
                                from dual
                                connect by level <= 14;
                                
                                insert into table_2
                                with t1 as (
                                  select id, start_date s,
                                  least(nvl(end_date, latest_data), latest_data, trunc(sysdate))+1 f
                                  from table_1
                                  where id > 0
                                  order by id
                                )
                                select id,
                                case id
                                  when 1 then f+1
                                  when 2 then f
                                  when 3 then f-2
                                  when 4 then f-2
                                  when 5 then f-4
                                  when 6 then s
                                  when 7 then s
                                  when 8 then s
                                  when 9 then s-2
                                  when 10 then s-2
                                  when 11 then s-1
                                  when 12 then s-2
                                  when 13 then s-4
                                end rd,
                                case id
                                  when 1 then 'PRa'
                                  when 2 then 'MEa'
                                  when 3 then 'OVa'
                                  when 4 then 'FIa'
                                  when 5 then 'COa'
                                  when 6 then 'STa'
                                  when 7 then 'EQa'
                                  when 8 then 'SBa'
                                  when 9 then 'DUa'
                                  when 10 then 'FBa'
                                  when 11 then 'OBa'
                                  when 12 then 'MTa'
                                  when 13 then 'PBa'
                                end group_id
                                from t1
                                union all
                                select id,
                                case id
                                  when 1 then f+3
                                  when 2 then f+2
                                  when 3 then f+2
                                  when 4 then f
                                  when 5 then f-2
                                  when 6 then f+2
                                  when 7 then f
                                  when 8 then s+2
                                  when 9 then f+2
                                  when 10 then f
                                  when 11 then s+2
                                  when 12 then s
                                  when 13 then s-2
                                end rd,
                                case id
                                  when 1 then 'PRb'
                                  when 2 then 'MEb'
                                  when 3 then 'OVb'
                                  when 4 then 'FIb'
                                  when 5 then 'COb'
                                  when 6 then 'STb'
                                  when 7 then 'EQb'
                                  when 8 then 'SBb'
                                  when 9 then 'DUb'
                                  when 10 then 'FBb'
                                  when 11 then 'OBb'
                                  when 12 then 'MTb'
                                  when 13 then 'PBb'
                                end group_id
                                from t1;
                                

                                 

                                Finally, here is my new proposal.

                                • T1 is simply TABLE_1 with the logic about what the final date in the date range should be. I add 1 to make the final date EXCLUSIVE.
                                • T2 is TABLE_2 with a date range: end_rd is an EXCLUSIVE end date found using the LEAD() function. I make the last end_rd a real date to avoid dealing with NULL in later comparisons.
                                • RANGES is the result of a left join of T1 and T2 when the date ranges overlap.
                                • In the final SELECT, I use James Su's technique to generate a row for each day.
                                • As for efficiency:
                                  • each table is scanned only once;
                                  • the join creates a minimum of rows;
                                  • the final SELECT generates only the needed rows;

                                alter session set nls_date_format = 'RR-MM-DD';
                                
                                with t1 as (
                                  select id, start_date,
                                  least(nvl(end_date, latest_data), latest_data, trunc(sysdate))+1 final_date
                                  from table_1
                                )
                                , t2 as (
                                  select id, rd, group_id,
                                  lead(rd,1,date '9999-12-31') over (partition by id order by rd) end_rd,
                                  lag(rd) over (partition by id order by rd) prev_rd
                                  from table_2
                                )
                                , ranges as (
                                  select T1.*, rd, group_id, prev_rd,
                                  case when prev_rd is null
                                    then start_date
                                    else greatest(start_date, rd)
                                  end range_start,
                                  case when end_rd is null
                                    then final_date
                                    else least(final_date, end_rd)
                                  end range_end
                                  from t1 left join t2
                                  on t1.id = t2.id and t1.start_date < t2.end_rd and t1.final_date > t2.rd
                                )
                                select id, start_date, final_date,
                                range_start+level-1 thisday,
                                case when range_start+level-1 >= rd then rd end display_rd,
                                case when range_start+level-1 >= rd then group_id end display_group_id
                                from ranges a
                                connect by id = prior id and range_start = prior range_start
                                and range_start+level-1 < range_end
                                and prior sys_guid() is not null
                                order siblings by id, range_start;
                                

                                 

                                IDSTART_DATEFINAL_DATETHISDAYDISPLAY_RDDISPLAY_GROUP_ID
                                015-01-0115-01-0715-01-01
                                015-01-0115-01-0715-01-02
                                015-01-0115-01-0715-01-03
                                015-01-0115-01-0715-01-04
                                015-01-0115-01-0715-01-05
                                015-01-0115-01-0715-01-06
                                115-01-0115-01-0715-01-01
                                115-01-0115-01-0715-01-02
                                115-01-0115-01-0715-01-03
                                115-01-0115-01-0715-01-04
                                115-01-0115-01-0715-01-05
                                115-01-0115-01-0715-01-06
                                215-01-0115-01-0715-01-01
                                215-01-0115-01-0715-01-02
                                215-01-0115-01-0715-01-03
                                215-01-0115-01-0715-01-04
                                215-01-0115-01-0715-01-05
                                215-01-0115-01-0715-01-06
                                315-01-0115-01-0715-01-01
                                315-01-0115-01-0715-01-02
                                315-01-0115-01-0715-01-03
                                315-01-0115-01-0715-01-04
                                315-01-0115-01-0715-01-0515-01-05OVa
                                315-01-0115-01-0715-01-0615-01-05OVa
                                415-01-0115-01-0715-01-01
                                415-01-0115-01-0715-01-02
                                415-01-0115-01-0715-01-03
                                415-01-0115-01-0715-01-04
                                415-01-0115-01-0715-01-0515-01-05FIa
                                415-01-0115-01-0715-01-0615-01-05FIa
                                515-01-0115-01-0715-01-01
                                515-01-0115-01-0715-01-02
                                515-01-0115-01-0715-01-0315-01-03COa
                                515-01-0115-01-0715-01-0415-01-03COa
                                515-01-0115-01-0715-01-0515-01-05COb
                                515-01-0115-01-0715-01-0615-01-05COb
                                615-01-0115-01-0715-01-0115-01-01STa
                                615-01-0115-01-0715-01-0215-01-01STa
                                615-01-0115-01-0715-01-0315-01-01STa
                                615-01-0115-01-0715-01-0415-01-01STa
                                615-01-0115-01-0715-01-0515-01-01STa
                                615-01-0115-01-0715-01-0615-01-01STa
                                715-01-0115-01-0715-01-0115-01-01EQa
                                715-01-0115-01-0715-01-0215-01-01EQa
                                715-01-0115-01-0715-01-0315-01-01EQa
                                715-01-0115-01-0715-01-0415-01-01EQa
                                715-01-0115-01-0715-01-0515-01-01EQa
                                715-01-0115-01-0715-01-0615-01-01EQa
                                815-01-0115-01-0715-01-0115-01-01SBa
                                815-01-0115-01-0715-01-0215-01-01SBa
                                815-01-0115-01-0715-01-0315-01-03SBb
                                815-01-0115-01-0715-01-0415-01-03SBb
                                815-01-0115-01-0715-01-0515-01-03SBb
                                815-01-0115-01-0715-01-0615-01-03SBb
                                915-01-0115-01-0715-01-0114-12-30DUa
                                915-01-0115-01-0715-01-0214-12-30DUa
                                915-01-0115-01-0715-01-0314-12-30DUa
                                915-01-0115-01-0715-01-0414-12-30DUa
                                915-01-0115-01-0715-01-0514-12-30DUa
                                915-01-0115-01-0715-01-0614-12-30DUa
                                1015-01-0115-01-0715-01-0114-12-30FBa
                                1015-01-0115-01-0715-01-0214-12-30FBa
                                1015-01-0115-01-0715-01-0314-12-30FBa
                                1015-01-0115-01-0715-01-0414-12-30FBa
                                1015-01-0115-01-0715-01-0514-12-30FBa
                                1015-01-0115-01-0715-01-0614-12-30FBa
                                1115-01-0115-01-0715-01-0114-12-31OBa
                                1115-01-0115-01-0715-01-0214-12-31OBa
                                1115-01-0115-01-0715-01-0315-01-03OBb
                                1115-01-0115-01-0715-01-0415-01-03OBb
                                1115-01-0115-01-0715-01-0515-01-03OBb
                                1115-01-0115-01-0715-01-0615-01-03OBb
                                1215-01-0115-01-0715-01-0115-01-01MTb
                                1215-01-0115-01-0715-01-0215-01-01MTb
                                1215-01-0115-01-0715-01-0315-01-01MTb
                                1215-01-0115-01-0715-01-0415-01-01MTb
                                1215-01-0115-01-0715-01-0515-01-01MTb
                                1215-01-0115-01-0715-01-0615-01-01MTb
                                1315-01-0115-01-0715-01-0114-12-30PBb
                                1315-01-0115-01-0715-01-0214-12-30PBb
                                1315-01-0115-01-0715-01-0314-12-30PBb
                                1315-01-0115-01-0715-01-0414-12-30PBb
                                1315-01-0115-01-0715-01-0514-12-30PBb
                                1315-01-0115-01-0715-01-0614-12-30PBb
                                1 2 3 Previous Next