1 2 Previous Next 17 Replies Latest reply: Jul 18, 2008 7:05 AM by MichaelS RSS

    Numbers range overlap

    636120
      Range overlapping
      ------------------

      the problem I'd like to solve :

      given a table ranges (range_id#,external_key, from, to)

      I need to detect overlapping grouping by external key, ie:

      range_id external_key from to
      ------------------------+----
      1 A 1 5
           2 A 8 10
           3          A 12 15
           4 A 3 9
           5 B 2 12
           6 B 13 24
           7 B 25 50
           

      I need to detect the overlap in group A because the row #4 overlaps #1

      The table contains 10 millions rows and I would like it to be fast, I think a good solution could be coding a aggregation function and use it as follows:

      select external_key
      from ranges r
      group by external_key
      having overlaps(from,to) = 1
      ;


      I am in the process of coding function overlaps and I'll post it here , any ideas about the function overlaps? can you think about a different approach that would perform better?

      thanks you for reading.

      Miguel Corrales.
        • 1. Re: Numbers range overlap
          94799
          There is already an undocumented 'overlaps' function which can be used to check for overlapping date (but not number) ranges. It may be advisable to choose a different name.
          • 2. Re: Numbers range overlap
            566473
            with
              ranges as
                (
                  select 1 as range_id, 'A' as external_key,  1 as val_from,  5 as val_to from dual union all
                  select 2 as range_id, 'A' as external_key,  8 as val_from, 10 as val_to from dual union all
                  select 3 as range_id, 'A' as external_key, 12 as val_from, 15 as val_to from dual union all
                  select 4 as range_id, 'A' as external_key,  3 as val_from,  9 as val_to from dual union all
                  select 5 as range_id, 'B' as external_key,  2 as val_from, 12 as val_to from dual union all
                  select 6 as range_id, 'B' as external_key, 13 as val_from, 24 as val_to from dual union all
                  select 7 as range_id, 'B' as external_key, 25 as val_from, 50 as val_to from dual
                )
            select t1.*, t2.*
              from ranges t1, ranges t2
            where t1.external_key = t2.external_key
               and t1.range_id != t2.range_id
               and t1.val_to > t2.val_from
               and t1.val_from < t2.val_to
            • 3. Re: Numbers range overlap
              450441
              http://oraclesponge.wordpress.com/2008/06/12/the-overlaps-predicate/

              You can use it on numbers by adding it to a default date.

              The format would be

              WHERE (date1, expression2) overlaps (date3,expression4)

              expression 2/4 can be a date or a number but in this case should be a date again.
              v_date := date '2000-01-01';
              
              select r1.external_key, r1.from, r1.to, r2.from, r2.to
              from ranges r1, ranges r2
              where (v_date+r1.from, v_date+r1.to) OVERLAPS (v_date+r2.from, v_date+r2.to)
              and r1.range_id != r2.range_id
              and r1.external_key = r2.external_key;
              Message was edited by:
              Dave Hemming
              forgot the other conditions
              • 4. Re: Numbers range overlap
                623666
                with
                  ranges as
                    (
                      select 1 as range_id, 'A' as external_key,  1 as val_from,  5 as val_to from dual union all
                      select 2 as range_id, 'A' as external_key,  8 as val_from, 10 as val_to from dual union all
                      select 3 as range_id, 'A' as external_key, 12 as val_from, 15 as val_to from dual union all
                      select 4 as range_id, 'A' as external_key,  3 as val_from,  9 as val_to from dual union all
                      select 5 as range_id, 'B' as external_key,  2 as val_from, 12 as val_to from dual union all
                      select 6 as range_id, 'B' as external_key, 13 as val_from, 24 as val_to from dual union all
                      select 7 as range_id, 'B' as external_key, 25 as val_from, 50 as val_to from dual
                    )
                select t1.*, t2.*
                  from ranges t1, ranges t2
                where t1.external_key = t2.external_key
                   and t1.range_id != t2.range_id
                   and (sysdate+t1.val_from,sysdate+t1.val_to) overLaps
                       (sysdate+t2.val_from,sysdate+t2.val_to);
                 RANGE_ID  E   VAL_FROM     VAL_TO   RANGE_ID  E   VAL_FROM     VAL_TO
                ---------  -  ---------  ---------  ---------  -  ---------  ---------
                        4  A          3          9          1  A          1          5
                        4  A          3          9          2  A          8         10
                        2  A          8         10          4  A          3          9
                        1  A          1          5          4  A          3          9
                • 5. Re: Numbers range overlap
                  450441
                  In this case it probably doesn't matter, but I'd not be happy about sysdate appearing 4 times without a trunc.
                  • 6. Re: Numbers range overlap
                    566473
                    In this case it probably doesn't matter, but I'd not
                    be happy about sysdate appearing 4 times without a
                    trunc.
                    "don't worry, be happy" (c) :) :) :)
                    select count(1), 
                           to_char(min(dt1),'dd.mm.yyyy hh24:mi:ss') as min_dt1,
                           to_char(max(dt1),'dd.mm.yyyy hh24:mi:ss') as max_dt1,
                           to_char(min(dt2),'dd.mm.yyyy hh24:mi:ss') as min_dt2,
                           to_char(max(dt2),'dd.mm.yyyy hh24:mi:ss') as max_dt2,
                           to_char(min(dt3),'dd.mm.yyyy hh24:mi:ss') as min_dt3,
                           to_char(max(dt3),'dd.mm.yyyy hh24:mi:ss') as max_dt3,
                           to_char(min(dt4),'dd.mm.yyyy hh24:mi:ss') as min_dt4,
                           to_char(max(dt4),'dd.mm.yyyy hh24:mi:ss') as max_dt4
                    from (select sysdate as dt1, sysdate as dt2, sysdate as dt3, sysdate as dt4
                             from dual connect by level <= 1000000)
                    • 7. Re: Numbers range overlap
                      Rob van Wijk
                      Good one Sergey.

                      To make the concept of read consistency even more clear:
                      SQL> select count(*),
                        2         to_char(min(dt1),'dd.mm.yyyy hh24:mi:ss.ff') as min_dt1,
                        3         to_char(max(dt1),'dd.mm.yyyy hh24:mi:ss.ff') as max_dt1,
                        4         to_char(min(dt2),'dd.mm.yyyy hh24:mi:ss.ff') as min_dt2,
                        5         to_char(max(dt2),'dd.mm.yyyy hh24:mi:ss.ff') as max_dt2,
                        6         to_char(min(dt3),'dd.mm.yyyy hh24:mi:ss.ff') as min_dt3,
                        7         to_char(max(dt3),'dd.mm.yyyy hh24:mi:ss.ff') as max_dt3,
                        8         to_char(min(dt4),'dd.mm.yyyy hh24:mi:ss.ff') as min_dt4,
                        9         to_char(max(dt4),'dd.mm.yyyy hh24:mi:ss.ff') as max_dt4
                      10   from (select systimestamp as dt1, systimestamp as dt2, systimestamp as dt3, systimestamp as dt4
                      11           from dual connect by level <= 1000000)
                      12  /

                                                    COUNT(*)
                      --------------------------------------
                      MIN_DT1
                      -----------------------------
                      MAX_DT1
                      -----------------------------
                      MIN_DT2
                      -----------------------------
                      MAX_DT2
                      -----------------------------
                      MIN_DT3
                      -----------------------------
                      MAX_DT3
                      -----------------------------
                      MIN_DT4
                      -----------------------------
                      MAX_DT4
                      -----------------------------
                                                     1000000
                      18.07.2008 11:24:49.029804
                      18.07.2008 11:24:49.029804
                      18.07.2008 11:24:49.029804
                      18.07.2008 11:24:49.029804
                      18.07.2008 11:24:49.029804
                      18.07.2008 11:24:49.029804
                      18.07.2008 11:24:49.029804
                      18.07.2008 11:24:49.029804


                      1 row selected.
                      Regards,
                      Rob.
                      • 8. Re: Numbers range overlap
                        636120
                        Hi All,

                        thank you so much for your answers.

                        Well, it makes not sense to write an agregration function given the clean and neat examples provided, I cannot think performance wise could be better either.


                        thanks again
                        • 9. Re: Numbers range overlap
                          94799
                          To make the concept of read consistency even more clear.
                          Interestingly the addition of the 'projection' column in explain plan in 10g makes it clear that something 'magic' takes place with SYSDATE.
                          SQL> EXPLAIN PLAN
                            2  FOR
                            3     SELECT COUNT (SYSDATE)
                            4     FROM   DUAL;

                          Explained.

                          SQL> SELECT projection
                            2  FROM   plan_table
                            3  WHERE  projection IS NOT NULL;

                          PROJECTION
                          --------------------------------------------------------------------------------
                          (#keys=0) COUNT(SYSDATE@!)[22]

                          SQL> SELECT SYSDATE@!
                            2  FROM   dual;

                          SYSDATE@!
                          ---------
                          18-JUL-08

                          SQL>
                          • 10. Re: Numbers range overlap
                            MichaelS
                            Yet another option, in case performance is an issue:
                            SQL>  with
                              t as 
                                (
                                  select 1 as range_id, 'A' as external_key,  1 as val_from,  5 as val_to from dual union all
                                  select 2 as range_id, 'A' as external_key,  8 as val_from, 10 as val_to from dual union all
                                  select 3 as range_id, 'A' as external_key, 12 as val_from, 15 as val_to from dual union all
                                  select 4 as range_id, 'A' as external_key,  3 as val_from,  9 as val_to from dual union all
                                  select 5 as range_id, 'B' as external_key,  2 as val_from, 12 as val_to from dual union all
                                  select 6 as range_id, 'B' as external_key, 13 as val_from, 24 as val_to from dual union all
                                  select 7 as range_id, 'B' as external_key, 25 as val_from, 50 as val_to from dual
                                )
                            select range_id, external_key, val_from, val_to
                              from (select range_id,
                                           external_key,
                                           val_from, val_to,
                                           lead (val_from) over (partition by external_key order by val_from) lead_val_from,
                                           lag (val_to) over (partition by external_key order by val_to) lag_val_to
                                      from t)    
                             where lead_val_from between val_from and val_to
                                or lag_val_to between val_from and val_to
                            
                              RANGE_ID E   VAL_FROM     VAL_TO
                            ---------- - ---------- ----------
                                     1 A          1          5
                                     4 A          3          9
                                     2 A          8         10
                            Message was edited by:
                            michaels
                            • 11. Re: Numbers range overlap
                              Rob van Wijk
                              Yes I've seen this before in one of your investigations about the differences in arguments of the count function, but I am not getting the point you are making here. Is it that sysdate always produces the same output in one query, not because of read consistency, but because of something else?

                              Regards,
                              Rob.
                              • 12. Re: Numbers range overlap
                                94799
                                Not sure to be honest, just thought it was interesting.

                                I don't believe SYSDATE is truly read-consistent, if it was then references to SYSDATE within a serializable (or read-only) transaction would return the same value, it is trivial to show that they do not.

                                However It appears that SYSDATE has been made to behave as if it is read-consistent at the statement level and this implementation appears to involve re-write as previously shown. Possibly (and this is pure speculation) it is persisted somewhere at the start of the statement and this value is referenced by SYSDATE@!. This might give benefits in terms of both performance and statement level read-consistent behaviour (you might see the latter as a side-effect). I notice a similar rewrite is used with the USER function.

                                Of course the implementation is largely irrelevant as long as we clearly understand the behaviour.
                                • 13. Re: Numbers range overlap
                                  Aketi Jyuuzou
                                  create table t as
                                  select 1 as range_id, 'A' as external_key,  1 as val_from,  5 as val_to from dual union
                                  select 2 as range_id, 'A' as external_key,  8 as val_from, 10 as val_to from dual union
                                  select 3 as range_id, 'A' as external_key, 12 as val_from, 15 as val_to from dual union
                                  select 4 as range_id, 'A' as external_key,  3 as val_from,  9 as val_to from dual union
                                  select 5 as range_id, 'B' as external_key,  2 as val_from, 12 as val_to from dual union
                                  select 6 as range_id, 'B' as external_key, 13 as val_from, 24 as val_to from dual union
                                  select 7 as range_id, 'B' as external_key, 25 as val_from, 50 as val_to from dual;
                                  I used famous method.
                                  select *
                                    from t t1,t t2
                                  where t1.external_key = t2.external_key
                                     and t1.range_id != t2.range_id
                                     and t1.val_from <= t2.val_to
                                     and t1.val_to   >= t2.val_from;
                                  RANGE_ID  E   VAL_FROM  VAL_TO   RANGE_ID  E   VAL_FROM  VAL_TO
                                  --------  -  ---------  ------  ---------  -  ---------  ------
                                         4  A          3       9          1  A          1       5
                                         4  A          3       9          2  A          8      10
                                         2  A          8      10          4  A          3       9
                                         1  A          1       5          4  A          3       9
                                  I used greatest and Least
                                  select *
                                    from t t1,t t2
                                  where t1.external_key = t2.external_key
                                     and t1.range_id != t2.range_id
                                     and greatest(t1.val_from,t2.val_from) <= Least(t1.val_to,t2.val_to);
                                  • 14. Re: Numbers range overlap
                                    Rob van Wijk
                                    > I don't believe SYSDATE is truly read-consistent, if
                                    it was then references to SYSDATE within a
                                    serializable (or read-only) transaction would return
                                    the same value, it is trivial to show that they do not.

                                    Good point. I just checked it to be really sure and they show different times indeed.

                                    > However It appears that SYSDATE has been made to
                                    behave as if it is read-consistent at the statement
                                    level and this implementation appears to involve
                                    re-write as previously shown. Possibly (and this is
                                    pure speculation) it is persisted somewhere at the
                                    start of the statement and this value is referenced
                                    by SYSDATE@!. This might give benefits in terms of
                                    both performance and statement level read-consistent
                                    behaviour (you might see the latter as a
                                    side-effect).

                                    This sounds very plausible. All evidences support this theory. And, like you said, the important part to remember here is that sysdate behaves as read consistent at statement level.

                                    Thanks for your thoughts on this one.

                                    Regards,
                                    Rob.
                                    1 2 Previous Next