2 Replies Latest reply: Aug 13, 2014 5:16 PM by jeff.blankenbiller RSS

    Counting days in a range while grouping

    jeff.blankenbiller

      I'm using Oracle 11.1 and I have the following table and data.

       

      CREATE TABLE DELAYS
      (
         DELAY_ID     NUMBER (5) NOT NULL,
         CASE_ID      NUMBER (9) NOT NULL,
         TYPE_ID      NUMBER (2) NOT NULL,
         START_DATE   DATE NOT NULL,
         END_DATE     DATE
      );
      
      ALTER TABLE DELAYS ADD (PRIMARY KEY (DELAY_ID, CASE_ID, TYPE_ID));
      
      INSERT INTO DELAYS VALUES (1, 3770, 1, DATE '2014-08-01', DATE '2014-08-02');
      --INSERT INTO DELAYS VALUES (2, 3770, 1, DATE '2014-08-02', DATE '2014-08-02');
      --INSERT INTO DELAYS VALUES (3, 3770, 2, DATE '2014-07-01', DATE '2014-07-10');
      --INSERT INTO DELAYS VALUES ( 4, 3770, 5, DATE '2014-7-1'  , DATE '2014-7-10' );
      --INSERT INTO DELAYS VALUES ( 5, 5430, 2, DATE '2014-8-5'  , DATE '2014-8-5'  );
      --INSERT INTO DELAYS VALUES ( 6,    1, 2, DATE '2014-8-1'  , DATE '2014-8-5'  );
      --INSERT INTO DELAYS VALUES ( 7,    2, 2, DATE '2014-8-1'  , DATE '2014-9-1'  );
      --INSERT INTO DELAYS VALUES ( 8,    3, 2, DATE '2014-1-1'  , DATE '2014-8-5'  );
      --INSERT INTO DELAYS VALUES ( 9,    4, 2, DATE '2014-8-1'  , DATE '2014-8-5'  );
      --INSERT INTO DELAYS VALUES (10,    5, 2, DATE '2014-7-1'  , DATE '2014-8-5'  );
      
      COMMIT;
      

       

      I need to return the number of days with delays (END_DATE - START_DATE + 1), grouped by the CASE_ID and TYPE_ID.

       

      Since I need to ensure that days are not counted multiple times, I am expanding the date range from START_DATE to END_DATE to rows of dates using the form:

       

          SELECT D.CASE_ID, D.TYPE_ID, D.START_DATE + LEVEL - 1 DELAY_DAYS
            FROM DELAYS D
      CONNECT BY LEVEL <= (D.END_DATE - D.START_DATE + 1);
      

       

       

      The following query returns the number of delay days for each row (all the delay days are in the cursor expression for verification).

      But I can't group this query together by CASE_ID and TYPE_ID and aggregate the delay days to remove the duplicates.

      If I SUM the COUNT_DELAY_DAYS, then the duplicate days are counted multiple times.

       

      SELECT D.*,
             CURSOR ( (    SELECT (D.START_DATE + LEVEL - 1)
                             FROM DUAL
                       CONNECT BY LEVEL <= (D.END_DATE - D.START_DATE + 1)))
                DELAY_DAYS,
             (    SELECT COUNT (D.START_DATE + LEVEL - 1)
                    FROM DUAL
              CONNECT BY LEVEL <= (D.END_DATE - D.START_DATE + 1))
                COUNT_DELAY_DAYS
        FROM DELAYS D;
      

       

      I tried using this query to group by the CASE_ID and TYPE_ID and to count the delay days, but the counts are wrong as soon as the second row of data is added to the table.

       

          SELECT CASE_ID, TYPE_ID, COUNT (D.START_DATE + LEVEL - 1) COUNT_DELAY_DAYS
            FROM DELAYS D
      CONNECT BY LEVEL <= (D.END_DATE - D.START_DATE + 1)
      GROUP BY CASE_ID, TYPE_ID;
      

       

      If I change the COUNT to be COUNT(DISTINCT ...), then I get the correct number of delay days grouped by the CASE_ID and TYPE_ID, but the performance suffers as more rows of data are added to the table.

       

      This needs to perform well for thousands of CASES and TYPES that cover delays of days, weeks, months, or even years.

      And multiple users need to be able to run this concurrently if that makes a difference.

        • 1. Re: Counting days in a range while grouping
          Solomon Yakobson

          Use start-of-group method modification for combining overlapping/adjacent intervals:

           

          with t1 as (

                      select  case_id,

                              type_id,

                              start_date,

                              end_date,

                              case sign(

                                        start_date - max(end_date) over(

                                                                        partition by case_id,

                                                                                     type_id

                                                                        order by start_date,

                                                                                 end_date

                                                                        rows between unbounded preceding

                                                                                 and 1 preceding

                                                                       )

                                      )

                                when 1 then 1

                                else 0

                              end start_of_group

                        from  delays

                    ),

              t2 as (

                      select  case_id,

                              type_id,

                              start_date,

                              end_date,

                              sum(start_of_group) over(

                                                       partition by case_id,

                                                                    type_id

                                                       order by start_date,

                                                                end_date

                                                      ) grp

                        from  t1

                    ),

              t3 as (

                      select  case_id,

                              type_id,

                              max(end_date) - min(start_date) + 1 grp_delay_days

                        from  t2

                        group by case_id,

                                type_id,

                                grp

                    )

          select  case_id,

                  type_id,

                  sum(grp_delay_days) delay_days

            from  t3

            group by case_id,

                    type_id

            order by case_id,

                    type_id

          /

           

          SY.

          • 2. Re: Counting days in a range while grouping
            Frank Kulash

            Hi,

             

            Thanks for posting the CREATE TABLE and INSERT statements.  Are there 10 rows of sample data, or just 1?

             

            Either way, what are the exact results you want from the given sample data?