1 2 3 Previous Next 42 Replies Latest reply on May 9, 2015 11:20 AM by Stew Ashton

    Is there a more efficient way to do this conditional cross-join?

    rwsBertman

      Abstracting the situation: we are monitoring the condition of a variety of units, looking at each unit for a recorded period of time. At any single point in time, there may be nearly 8000 units currently being monitored.

      Every unit has an ID (required), a start_date (required), and end_date (may be null, i.e. still being monitored), and the date of the most recent info collected for the unit (latest_data; equal to some date later than start_date).

      Every unit gets a periodic check (review), which classifies it into a group, identified by group_ID.

      If the unit has not been checked yet, then review_date and group_ID are both null.

       

      What I need to do (for the data warehouse) is create a record of the details for each unit for every day of its life in the system, including its most recent review date and current group_ID.

      If the unit is still being monitored, i.e. does not have an end_date, then the daily records should continue up to the date of the most recent info collected for the unit, or sysdate if no data was received yet.

       

        select
           CR.ID, CR.start_date, CR.end_date,
           cal.day, cal.month,
           max(CR.rd) as review_date, --get the most recent review date
           max(CR.group_ID) keep(dense_rank last order by rd nulls first) as group_ID --get the most recent group_ID
        from (
           select
                C.ID, C.start_date, C.end_date, C.latest_data,
                R.rd, R.group_ID
           from table_1 C
           left outer join table_2 R
           on R.ID = C.ID
           ) CR 
        cross join (
           --create a calendar of days from 1-Jan-2012 to sysdate
           select
                day, trunc(day,'month') as month
           from (
                select rownum-1 + to_date('1-Jan-2012','dd-Mon-yyyy') as day
                from (select 1 from dual)
                connect by level <= trunc(sysdate) - to_date('1-Jan-2012','dd-Mon-yyyy')
                )
           ) cal
        where cal.day >= CR.start_date 
        and cal.day <= least(nvl(CR.end_date, CR.latest_data), CR.latest_data, sysdate) 
        --if the review date hasn't happened yet, then show null 
        and (CR.rd <= cal.day or CR.rd is null) 
        group_ID by CR.ID, CR.start_date, CR.end_date, cal.day, cal.month 
        ;
      

       

      Table_1 (C) has about 17k records, and is fully indexed (=all units ever monitored; about +500 turnaround per month)

      Table_2 (R) has about 330k records, and is indexed on ID and group_ID (=all checks on all units ever done; about 4000+ new per month)

      When table_1 and table_2 are left-outer-joined, the result set is about 85k records.

      The entire data set and query are stored and running on Oracle 10.2.0.3.

       

      I have tried a number of different configurations for this query, and this one seems to be the most efficient, but it still runs rather slowly -- about 10 minutes per month of data / 2 hours per (fiscal) year of data.

      I am currently perusing various Oracle docs and other resources on query tuning, but it's a lot to wade through, and I'm not finding much that directly relates to this kind of situation.

       

      Any suggestions, preferably with explanations and/or references would be very appreciated.

      Thanks!

       

      Robert

       

       

      EDIT: Please see reply 18 and 19 for corrections, and reply 23 for sample data and output.

        • 1. Re: Is there a more efficient way to do this conditional cross-join?
          James Su

          How about: cross join table_1 to cal first, then left join to table_2, then use LAST_VALUE(group_ID) OVER(PARTITION BY ID ORDER BY RD NULLS FIRST IGNORE NULLS)

          This way you don't need the aggregation.

           

           

          Please post create table and some sample data.

          • 2. Re: Is there a more efficient way to do this conditional cross-join?
            rwsBertman

            James,

            Yes, i have tried that way in the past, and it does not appear to be nearly as efficient. Using production data, the revised query below takes several times longer to execute than my first query.

            Not entirely sure why that is...

             

            Here is the revised query based on your suggestion:

             

            select
                C_cal.ID, C_cal.start_date, C_cal.end_date,
                C_cal.day, C_cal.month,
                last_value(R.rd) over(partition by C_cal.ID order by rd nulls first) as review_date, --get the most recent review date
                last_value(R.group_ID) over(partition by C_cal.ID order by rd nulls first) as group_ID --get the most recent group_ID
            from (
                select
                     C.ID, C.start_date, C.end_date, C.latest_data,
                     cal.day, cal.month
                from table_1 C
                cross join (
                    --create a calendar of days from 1-Jan-2012 to sysdate
                    select
                        day, trunc(day,'month') as month
                    from (
                        select rownum-1 + to_date('1-Jan-2012','dd-Mon-yyyy') as day
                        from (select 1 from dual)
                        connect by level <= trunc(sysdate) - to_date('1-Jan-2012','dd-Mon-yyyy')
                        )
                    ) cal
                where cal.day >= C.start_date 
                and cal.day <= least(nvl(C.end_date, C.latest_data), C.latest_data, sysdate) 
                ) C_cal
            left outer join table_2 R
            on R.ID = C_cal.ID
            --if the review date hasn't happened yet, then show null 
            and (R.rd <= C_cal.day or R.rd is null) 
            ;

             

            You can compare the EXPLAIN PLANs for the two versions of the query as posted above, they are essentially identical.

            It is difficult for me to get SQL Trace output files for TKPROF (because the server is remote, file system remote access controls, whatnot else...), but I will try. I'm still trying to understand EXPLAIN PLAN and TKPROF output, anyway.

             

             

            Here are the create tables and a bit of sample data:

             

            create table table_1 (
              ID number,
              start_date date,
              end_date date,
              latest_data date
              --other things that just come along for the ride
              );
             
            ID     START_DATE     END_DATE     LATEST_DATA
            1423246     20/06/2012          11/09/2013
            1567463     24/01/2014          16/12/2014
            2429636     14/01/2013          01/02/2013
            6137889     17/09/2013     07/11/2014     09/07/2014
            6138178     30/04/2013          01/05/2013
            6395894     20/11/2014          
            8688230     23/02/2013          21/07/2014
            9213833     06/02/2014          06/02/2014
            9317441     27/11/2012     14/09/2014     27/04/2014
            ...
             
            create table table_2 (
              ID number,
              rd date,
              group_ID varchar2(3)
              --other details that aren't used here
              );
             
            ID     RD     GROUP_ID
            1423246     03/07/2012     HJ8
            1423246     10/08/2012     YC6
            1423246     14/08/2012     DZ5
            1423246     25/05/2013     QK7
            1423246     22/07/2013     LJ1
            1423246     17/12/2013     HJ8
            1423246     05/02/2014     AW2
            1423246     06/07/2014     LJ1
            1567463     23/04/2014     FW5
            1567463     14/06/2014     FW5
            1567463     08/09/2014     FW5
            1567463     29/11/2014     FW5
            2429636     28/01/2013     GD5
            2429636     22/01/2014     PO4
            2429636     29/03/2014     VX8
            4444057     31/10/2014     QL4
            6137889     13/10/2013     QL4
            6137889     02/11/2013     LJ1
            6137889     25/11/2013     TQ5
            6137889     23/01/2014     SM7
            6137889     09/08/2014     EL1
            6137889     10/08/2014     AB4
            6138178     17/05/2013     QK7
            6138178     14/09/2014     LJ1
            6138178     22/10/2014     LJ1
            6138178     17/12/2014     GT3
            8688230     24/02/2013     VR1
            8688230     10/09/2013     AW2
            9213833     19/02/2014     WE8
            9213833     24/04/2014     DR5
            9213833     14/09/2014     LJ1
            9213833     24/12/2014     HX4
            9317441     03/05/2013     QK7
            9317441     09/05/2013     GT3
            9317441     04/08/2013     SM7
            9317441     20/10/2013     ZP3
            9317441     04/08/2014     DZ5
            9317441     08/08/2014     LJ1
            ...
            • 3. Re: Is there a more efficient way to do this conditional cross-join?
              rwsBertman

              BTW, I also noticed a small bug in these queries: The line

               

              left outer join table_2 R

               

              in each query should be

               

              left outer join (
                 select id, rd, group_id from table_2
                 union --this union makes the dates for each ID start with their start_date, not their RD date
                 select distinct id, null, null from table_2
                 ) R
              • 4. Re: Is there a more efficient way to do this conditional cross-join?
                James Su

                When left join to table_2 why not r.rd=cal. day ?

                you can also try connect by on table_1 to generate rows between two dates. I don't have oracle now will play with it tomorrow.

                • 5. Re: Is there a more efficient way to do this conditional cross-join?
                  Bhavin Mamtora

                  Create a materialized view for the calendar table which refreshes once daily (which will add the new date to the view daily)

                  Create an index on the DAY column of Mview

                  use exists clause in the below mentkioned part of the query


                  Add least(NVL(CR.end_date, CR.latest_data), CR.latest_data, sysdate) column in the subquery "CR" and then reference it in below query and try to use "EXISTS"

                  cal.day >= CR.start_date
                  and cal.day <= least(nvl(CR.end_date, CR.latest_data), CR.latest_data, sysdate)

                  • 6. Re: Is there a more efficient way to do this conditional cross-join?
                    Jonathan Lewis

                    The first thing you need to do is generate the execution plan for the query; preferably using a method that generates rowsource execution stats so that we can see where the work goes.

                    Apart from this - a couple of ideas that may help

                     

                    a) Oracle may not be able to see how large the calendar set it (perhaps assuming it is one row), so adding the hints /*+ no_merge cardinality(1200) */ to the select that generates the calendar may help the optimizer to choose a better plan.

                     

                    b) Any "fill the gaps" query might be a candidate for the "partition outer join" syntax, which has the following type of appearance:

                        t2 partition by (alarm_cd) right join t1 on (t2.time_id = t1.time_id)

                    I haven't considered how this might be applied to your query, but it's possible that a merge join on a targetted calendar date (cal.date > start_date) may do better than a cross join, especially if many of the dates are in the near past rather than the far past.

                     

                    BTW:

                       "group_ID by CR.ID, CR.start_date, CR.end_date, cal.day, cal.month"

                    Should this be "group by CR.ID .." or "group by "group_id, cr.id ..."

                     

                    Regards

                    Jonathan Lewis

                     

                    Update: Here's a reference to getting the plan with rowsource execution statistics: https://jonathanlewis.wordpress.com/2006/11/09/dbms_xplan-in-10g/

                    1 person found this helpful
                    • 7. Re: Is there a more efficient way to do this conditional cross-join?
                      James Su

                      You didn't provide the insert so I had to create one for you:

                       

                      create table table_1 (

                        ID number,

                        start_date date,

                        end_date date,

                        latest_data date

                        --other things that just come along for the ride

                        );

                       

                       

                      insert into table_1 values(1423246, date '2012-06-20',     NULL,          date '2013-09-11');

                      insert into table_1 values(1567463, date '2014-01-24',     NULL,          date '2014-12-16');

                      insert into table_1 values(2429636, date '2013-01-14',     NULL,          date '2013-02-01');

                      insert into table_1 values(6137889, date '2013-09-17',     DATE '2014-11-7',date '2014-07-09');

                      insert into table_1 values(6138178, date '2013-04-30',     NULL,          date '2013-05-01');

                      insert into table_1 values(6395894, date '2014-11-20',     NULL,          NULL);

                      insert into table_1 values(8688230, date '2013-02-23',     NULL,          date '2014-07-21');

                      insert into table_1 values(9213833, date '2014-02-06',     NULL,          date '2014-02-06');

                      insert into table_1 values(9317441, date '2012-11-27',     DATE '2014-9-14', date '2014-04-27');

                       

                       

                       

                       

                      create table table_2 (

                        ID number,

                        rd date,

                        group_ID varchar2(3)

                        --other details that aren't used here

                        );

                       

                       

                       

                       

                      INSERT INTO TABLE_2 VALUES (1423246, DATE '2012-07-03','HJ8');

                      INSERT INTO TABLE_2 VALUES (1423246, DATE '2012-08-10','YC6');

                      INSERT INTO TABLE_2 VALUES (1423246, DATE '2012-08-14','DZ5');

                      INSERT INTO TABLE_2 VALUES (1423246, DATE '2013-05-25','QK7');

                      INSERT INTO TABLE_2 VALUES (1423246, DATE '2013-07-22','LJ1');

                      INSERT INTO TABLE_2 VALUES (1423246, DATE '2013-12-17','HJ8');

                      INSERT INTO TABLE_2 VALUES (1423246, DATE '2014-02-05','AW2');

                      INSERT INTO TABLE_2 VALUES (1423246, DATE '2014-07-06','LJ1');

                      INSERT INTO TABLE_2 VALUES (1567463, DATE '2014-04-23','FW5');

                      INSERT INTO TABLE_2 VALUES (1567463, DATE '2014-06-14','FW5');

                      INSERT INTO TABLE_2 VALUES (1567463, DATE '2014-09-08','FW5');

                      INSERT INTO TABLE_2 VALUES (1567463, DATE '2014-11-29','FW5');

                      INSERT INTO TABLE_2 VALUES (2429636, DATE '2013-01-28','GD5');

                      INSERT INTO TABLE_2 VALUES (2429636, DATE '2014-01-22','PO4');

                      INSERT INTO TABLE_2 VALUES (2429636, DATE '2014-03-29','VX8');

                      INSERT INTO TABLE_2 VALUES (4444057, DATE '2014-10-31','QL4');

                      INSERT INTO TABLE_2 VALUES (6137889, DATE '2013-10-13','QL4');

                      INSERT INTO TABLE_2 VALUES (6137889, DATE '2013-11-02','LJ1');

                      INSERT INTO TABLE_2 VALUES (6137889, DATE '2013-11-25','TQ5');

                      INSERT INTO TABLE_2 VALUES (6137889, DATE '2014-01-23','SM7');

                      INSERT INTO TABLE_2 VALUES (6137889, DATE '2014-08-09','EL1');

                      INSERT INTO TABLE_2 VALUES (6137889, DATE '2014-08-10','AB4');

                      INSERT INTO TABLE_2 VALUES (6138178, DATE '2013-05-17','QK7');

                      INSERT INTO TABLE_2 VALUES (6138178, DATE '2014-09-14','LJ1');

                      INSERT INTO TABLE_2 VALUES (6138178, DATE '2014-10-22','LJ1');

                      INSERT INTO TABLE_2 VALUES (6138178, DATE '2014-12-17','GT3');

                      INSERT INTO TABLE_2 VALUES (8688230, DATE '2013-02-24','VR1');

                      INSERT INTO TABLE_2 VALUES (8688230, DATE '2013-09-10','AW2');

                      INSERT INTO TABLE_2 VALUES (9213833, DATE '2014-02-19','WE8');

                      INSERT INTO TABLE_2 VALUES (9213833, DATE '2014-04-24','DR5');

                      INSERT INTO TABLE_2 VALUES (9213833, DATE '2014-09-14','LJ1');

                      INSERT INTO TABLE_2 VALUES (9213833, DATE '2014-12-24','HX4');

                      INSERT INTO TABLE_2 VALUES (9317441, DATE '2013-05-03','QK7');

                      INSERT INTO TABLE_2 VALUES (9317441, DATE '2013-05-09','GT3');

                      INSERT INTO TABLE_2 VALUES (9317441, DATE '2013-08-04','SM7');

                      INSERT INTO TABLE_2 VALUES (9317441, DATE '2013-10-20','ZP3');

                      INSERT INTO TABLE_2 VALUES (9317441, DATE '2014-08-04','DZ5');

                      INSERT INTO TABLE_2 VALUES (9317441, DATE '2014-08-08','LJ1');

                       

                       

                       

                       

                      --------- option #1: cross join first then left join

                      WITH t1 AS (

                      SELECT *

                        FROM table_1 cr

                           cross join (

                           --create a calendar of days from 1-Jan-2012 to sysdate

                           select

                                day, trunc(day,'month') as month

                           from (

                                select rownum-1 + to_date('1-Jan-2012','dd-Mon-yyyy') as day

                                from (select 1 from dual)

                                connect by level <= trunc(sysdate) - to_date('1-Jan-2012','dd-Mon-yyyy')

                                )

                           ) cal

                      where cal.day >= CR.start_date

                            and cal.day <= least(nvl(CR.end_date, CR.latest_data), CR.latest_data, sysdate)

                      )

                      SELECT T1.ID, T1.start_date, T1.end_date,

                           T1.day, T1.month,

                           max(R.rd) OVER(PARTITION BY t1.id ORDER BY t1.day) review_date, --get the most recent review date

                           LAST_VALUE(R.GROUP_ID IGNORE NULLS) OVER(PARTITION BY t1.id ORDER BY t1.day) group_ID --get the most recent review date

                      FROM t1

                            LEFT JOIN table_2 R

                               on R.ID = t1.ID AND r.rd=t1.day

                              

                              

                      --------- option #2: use connect by to fill the missing dates

                      WITH t1 AS (  ----- if both end_date and latest_data are null then this will return 1 row for the id anyway

                      SELECT t.*,start_date+LEVEL-1 day

                        FROM (select c.*

                                   ,least(nvl(c.end_date, c.latest_data), c.latest_data, TRUNC(sysdate)) AS dt2

                               from table_1 C

                             ) t

                      CONNECT BY id=PRIOR id

                           AND LEVEL<=dt2-start_date+1

                           AND PRIOR SYS_GUID() IS NOT NULL

                      )

                      SELECT T1.ID, T1.start_date, T1.end_date,

                           T1.day, T1.month,

                           max(R.rd) OVER(PARTITION BY t1.id ORDER BY t1.day) review_date, --get the most recent review date

                           LAST_VALUE(R.GROUP_ID IGNORE NULLS) OVER(PARTITION BY t1.id ORDER BY t1.day) group_ID --get the most recent review date

                      FROM t1

                            LEFT JOIN table_2 R

                               on R.ID = t1.ID AND r.rd=t1.day

                      1 person found this helpful
                      • 8. Re: Is there a more efficient way to do this conditional cross-join?
                        rwsBertman

                        James,

                        Thanks, I will try your suggestions either tomorrow or Monday, when I have time.

                         

                        Bhavin,

                        I'm familiar with WHERE EXISTS () clauses, but can you be more specific?

                         

                        Jonathan,

                        a) Thanks, I'll try using the no_merge hint.

                        b) I'm not familiar with the "partition outer join" syntax, I'll look into it.

                        BTW: yes, typo. :-)

                        And thanks for the reference, I'll have to read that tomorrow or Monday, too.

                        • 9. Re: Re: Is there a more efficient way to do this conditional cross-join?
                          Jonathan Lewis

                          I tried your code - got 2125 rows for the first version, 2126 for the second - though I had to delete the reference to t1.month from the second to get the SQL to execute.

                           

                          1st plan

                          -----------------------------------------------------------------------------------------------------------------------------------

                          | Id  | Operation                          | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |

                          -----------------------------------------------------------------------------------------------------------------------------------

                          |   0 | SELECT STATEMENT                   |         |      1 |        |   2125 |00:00:00.12 |    3459 |       |       |          |

                          |   1 |  WINDOW SORT                       |         |      1 |      1 |   2125 |00:00:00.12 |    3459 |   214K|   214K|  190K (0)|

                          |*  2 |   HASH JOIN OUTER                  |         |      1 |      1 |   2125 |00:00:00.10 |    3459 |  1021K|  1021K| 1389K (0)|

                          |   3 |    VIEW                            |         |      1 |      1 |   2125 |00:00:00.08 |    3456 |       |       |          |

                          |   4 |     NESTED LOOPS                   |         |      1 |      1 |   2125 |00:00:00.06 |    3456 |       |       |          |

                          |   5 |      VIEW                          |         |      1 |      1 |   1152 |00:00:00.02 |       0 |       |       |          |

                          |   6 |       COUNT                        |         |      1 |        |   1152 |00:00:00.01 |       0 |       |       |          |

                          |   7 |        CONNECT BY WITHOUT FILTERING|         |      1 |        |   1152 |00:00:00.01 |       0 |       |       |          |

                          |   8 |         FAST DUAL                  |         |      1 |      1 |      1 |00:00:00.01 |       0 |       |       |          |

                          |*  9 |      TABLE ACCESS FULL             | TABLE_1 |   1152 |      1 |   2125 |00:00:00.02 |    3456 |       |       |          |

                          |  10 |    TABLE ACCESS FULL               | TABLE_2 |      1 |     38 |     38 |00:00:00.01 |       3 |       |       |          |

                          -----------------------------------------------------------------------------------------------------------------------------------

                           

                          Predicate Information (identified by operation id):

                          ---------------------------------------------------

                             2 - access("R"."RD"=INTERNAL_FUNCTION("T1"."DAY") AND "R"."ID"="T1"."ID")

                             9 - filter(("CR"."START_DATE"<=INTERNAL_FUNCTION("DAY") AND

                                        INTERNAL_FUNCTION("DAY")<=LEAST(NVL("CR"."END_DATE","CR"."LATEST_DATA"),"CR"."LATEST_DATA",SYSDATE@!)))

                           

                           

                          2nd plan

                          --------------------------------------------------------------------------------------------------------------------------------

                          | Id  | Operation                       | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |

                          --------------------------------------------------------------------------------------------------------------------------------

                          |   0 | SELECT STATEMENT                |         |      1 |        |   2126 |00:00:00.07 |       6 |       |       |          |

                          |   1 |  WINDOW SORT                    |         |      1 |      9 |   2126 |00:00:00.07 |       6 |   196K|   196K|  174K (0)|

                          |*  2 |   HASH JOIN OUTER               |         |      1 |      9 |   2126 |00:00:00.05 |       6 |  1096K|  1096K| 1346K (0)|

                          |   3 |    VIEW                         |         |      1 |      9 |   2126 |00:00:00.02 |       3 |       |       |          |

                          |*  4 |     CONNECT BY WITHOUT FILTERING|         |      1 |        |   2126 |00:00:00.01 |       3 |       |       |          |

                          |   5 |      TABLE ACCESS FULL          | TABLE_1 |      1 |      9 |      9 |00:00:00.01 |       3 |       |       |          |

                          |   6 |    TABLE ACCESS FULL            | TABLE_2 |      1 |     38 |     38 |00:00:00.01 |       3 |       |       |          |

                          --------------------------------------------------------------------------------------------------------------------------------

                           

                          Predicate Information (identified by operation id):

                          ---------------------------------------------------

                             2 - access("R"."RD"=INTERNAL_FUNCTION("T1"."DAY") AND "R"."ID"="T1"."ID")

                             4 - access("C"."ID"=PRIOR NULL)

                           

                          There's not really a lot to choose between them.

                           

                          Obviously a cardinality error means Oracle has chosen a "hash join outer" at operation 2 rather than swapping the join inputs and using a "hash join right outer", but that's a detail that can be fixed.

                          In the first plan the Nested Loop join is a result of the same cardinality error  - but that too can be fixed to give a hash join.

                          It's hard to tell whether the hash join (dates/table_1) or the connect by on just table_1 will be the more resource intensive - generally I would be cautious about the recursive approach, but it's comparing the two.

                           

                          The hard work probably comes in at operation 1 where you have to do the WINDOW sort of something that (I guess ) is going to be a very large data set for the production data.  The sample set turns 9 + 38 rows into 2125 - how much data will come from the original thousands of rows, I wonder ?  (The start point of 01-Jan-2012 is likely to be a significant point in the processing, of course - part of optimising the query would entail finding out the most recent valid starting date for the calendar)

                           

                           

                          Regards

                          Jonathan Lewis

                          • 10. Re: Is there a more efficient way to do this conditional cross-join?
                            James Su

                            Thanks Jonathan, I did change t1.month to TRUNC(T1.day,'MM') in my test but I forgot to post the correct version.

                             

                             

                            As I mentioned above, this row causes the extra returned row in the second query:

                            insert into table_1 values(6395894, date '2014-11-20',     NULL,          NULL);

                             

                             

                            Because it's selected in level one in connect by. This can be easily filtered out by adding a "start with" condition but I have a feeling that this row is needed because op used a left join in the first place.

                             

                             

                            The cardinality is hard to predict because it depends on start_date,end_date, latest_data at each row. The whole point of the rewrite is to replace the left join on (R.rd <= C_cal.day or R.rd is null) with an equal join condition, and get rid of the aggregation at the end. Hopefully the window for each id is not too big.

                            • 11. Re: Is there a more efficient way to do this conditional cross-join?
                              Jonathan Lewis

                              James Su wrote:

                               

                              Thanks Jonathan, I did change t1.month to TRUNC(T1.day,'MM') in my test but I forgot to post the correct version.

                               

                               

                              As I mentioned above, this row causes the extra returned row in the second query:

                              insert into table_1 values(6395894, date '2014-11-20',     NULL,          NULL);

                               

                               

                              Because it's selected in level one in connect by. This can be easily filtered out by adding a "start with" condition but I have a feeling that this row is needed because op used a left join in the first place.

                               

                               

                              The cardinality is hard to predict because it depends on start_date,end_date, latest_data at each row. The whole point of the rewrite is to replace the left join on (R.rd <= C_cal.day or R.rd is null) with an equal join condition, and get rid of the aggregation at the end. Hopefully the window for each id is not too big.

                               

                              Okay, I didn't connect the comment you'd put in the second statement with the one extra row that appeared.

                               

                              It's going to be interesting to see how well the strategy works - particularly, perhaps, when the full row lengths are included in the query. You've got rid of the range-based join (which is very elegant) , but it's still hiding (sort of) inside the connect by pump.as the data volume grows from 9 rows to 2125; and we still come to point where the window sort at the end has to sort the entire data set. Strangely the plan for code similar other OPs manages to get the result without doing a window sort (though it IS doing a sort group by - so it may be the multiplication of data that is then aggregated late that is the biggest performance problem).

                               

                              Regards

                              Jonathan Lewis

                               

                              P.S..  What's the significance of introducing the sys_op_guid() call ?

                              • 12. Re: Is there a more efficient way to do this conditional cross-join?
                                James Su

                                Without that sys_guid it will raise this error:

                                ORA-01436: CONNECT BY loop in user data

                                 

                                This is because Oracle doesn't see any difference between the prior and current row.

                                • 13. Re: Is there a more efficient way to do this conditional cross-join?
                                  Saubhik

                                  I think Jonathan Lewis knows what sys_guid does . Cost-Based Oracle Fundamentals is one of my favorite book (although i do not understand all in that fully). I used to visit his blog  Oracle Scratchpad occasionally. He may want to point out something else!

                                  • 14. Re: Is there a more efficient way to do this conditional cross-join?
                                    James Su

                                    Me too, I'm a huge fan of Jonathan's and I learn a lot from him. I translated some of his articles into Chinese and people like them.

                                    1 2 3 Previous Next