Forum Stats

  • 3,852,509 Users
  • 2,264,110 Discussions
  • 7,905,089 Comments

Discussions

Reg: <Performance comparison> Index on NUMBER and DATE fields -

ranit B
ranit B Member Posts: 3,268 Silver Trophy
edited Oct 3, 2013 10:01AM in SQL & PL/SQL

Hi Experts,

I'm just trying to do a comparison between Index on a NUMBER field and a DATE field.

But my experiments are giving me a mixed kind of response.

My assumption was that - Index on a NUMBER field would be faster coz it avoids the datatype conversion of DATE fields. (as seen in the 'Predicate section' of the explain plan)

Below I'm trying 2 scenarios -

1. Composite index on fields (NUMBER, DATE) 

2. Composite index on fields (NUMBER, NUMBER) 

CREATE TABLE table_x2
AS
       SELECT LEVEL rid, SYSDATE + LEVEL dt, 'aaaaa-bbbb' name
       FROM dual
       CONNECT BY LEVEL<=1000000;


ALTER TABLE table_x2
ADD CONSTRAINT table_x2_pk PRIMARY KEY(rid, dt);


SQL> exec dbms_stats.gather_table_stats('schema_x','table_x2');

PL/SQL procedure successfully completed.

 
SQL> SELECT *   2  FROM table_x2   3  WHERE rid >= 500 AND dt <= To_Date('04-08-2020 05:21:00','dd-mm-yyyy hh24:mi:ss'); 2001 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1079754163 ------------------------------------------------------------------------------ | Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     | ------------------------------------------------------------------------------ |   0 | SELECT STATEMENT  |          |  2499 | 59976 |  1121   (2)| 00:00:14 | |*  1 |  TABLE ACCESS FULL| TABLE_X2 |  2499 | 59976 |  1121   (2)| 00:00:14 | ------------------------------------------------------------------------------ Predicate Information (identified by operation id): ---------------------------------------------------    1 - filter("DT"<=TO_DATE(' 2020-08-04 05:21:00', 'syyyy-mm-dd               hh24:mi:ss') AND "RID">=500) Statistics ----------------------------------------------------------           1  recursive calls           0  db block gets        4155  consistent gets           0  physical reads           0  redo size       45894  bytes sent via SQL*Net to client        1797  bytes received via SQL*Net from client         135  SQL*Net roundtrips to/from client           0  sorts (memory)           0  sorts (disk)        2001  rows processed
=====================================================================================================================
SCENARIO - 2

CREATE TABLE table_x4 AS        SELECT LEVEL rid, To_Number(To_Char(SYSDATE + LEVEL, 'yyyymmddhh24miss')) dt, 'aaaaa-bbbb' name        FROM dual        CONNECT BY LEVEL<=1000000; ALTER TABLE table_x4 ADD CONSTRAINT table_x4_pk PRIMARY KEY(rid, dt); BEGIN     Dbms_Stats.gather_table_stats('schema_x','table_x4'); END;
SQL> SELECT *   2  FROM table_x4   3  WHERE rid >= 500 AND dt <= 20200805060302; 2001 rows selected. ------------------------------------------------------------------------------ | Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     | ------------------------------------------------------------------------------ |   0 | SELECT STATEMENT  |          |  2547 | 63675 |  1157   (2)| 00:00:14 | |*  1 |  TABLE ACCESS FULL| TABLE_X4 |  2547 | 63675 |  1157   (2)| 00:00:14 | ------------------------------------------------------------------------------ Predicate Information (identified by operation id): ---------------------------------------------------    1 - filter("DT"<=20200805060302 AND "RID">=500) Statistics ----------------------------------------------------------           1  recursive calls           0  db block gets        4221  consistent gets           0  physical reads           0  redo size       24143  bytes sent via SQL*Net to client        1060  bytes received via SQL*Net from client          68  SQL*Net roundtrips to/from client           0  sorts (memory)           0  sorts (disk)        1001  rows processed

Also, not sure why it is not selecting the Index here.

I might be completely wrong... Can you please guide me on this?

Thanks and Regards

-- Ranit

(on Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production)

Tagged:
«1

Answers

  • Karthick2003
    Karthick2003 Member Posts: 13,711 Bronze Badge
    ranitB wrote:
    
    My assumption was that - Index on a NUMBER field would be faster coz it avoids the datatype conversion of DATE fields. (as seen in the 'Predicate section' of the explain plan)
    

    A date is internally stored as a number. So I don't think a comparison between NUMBER and DATE data type columns with respect to INDEX is is much of a valid scenario. Its more of comparing Apples with Oranges. They are there for specific purpose and the correct data type should be used in the correct place.

    Not sure what data type conversion you are talking about for the DATE data type. Nothing of that sort happens. Check your predicate

    Predicate Information (identified by operation id):

    ---------------------------------------------------

       1 - filter("DT"<=TO_DATE(' 2020-08-04 05:21:00', 'syyyy-mm-dd

                  hh24:mi:ss') AND "RID">=500)

    I don't see anything like that. The TO_DATE is the one that is given by you as the input value. Its not something that Oracle is implicitly doing.

  • ranit B
    ranit B Member Posts: 3,268 Silver Trophy
    edited Oct 3, 2013 3:22AM

    Thanks for the response, Karthick. Got your point.

    But since - 

    A date is internally stored as a number.

    So, if in the WHERE clause of a query we use a DATE column, an additional overhead (I guess) needs to be done by Oracle to compare it with NUMBER value stored internally.

    My doubt -

    Will there be any performance difference when Index is built on a NUMBER field to that when on a DATE field?

    Could you please extend some help on this?

  • Karthick2003
    Karthick2003 Member Posts: 13,711 Bronze Badge
    edited Oct 3, 2013 3:32AM
    So, if in the WHERE clause of a query we use a DATE column, an additional overhead (I guess) needs to be done by Oracle to compare it with NUMBER value stored internally.
    

    There is no conversion. When you pass a date value still its internally represented as a number. So there is no need to perform any conversion.

  • ranit B
    ranit B Member Posts: 3,268 Silver Trophy
    My doubt -
    Will there be any performance difference when Index is built on a NUMBER field to that when on a DATE field?
     

    Any pointers to this??

  • Hoek
    Hoek Member Posts: 16,087 Gold Crown

    Here's what I got, also on 11.2.0.3.0, the only difference is the way I gathered the stats and I used dbms_xplan:

    SQL> CREATE TABLE table_x2 AS
      2         SELECT LEVEL rid, SYSDATE + LEVEL dt, 'aaaaa-bbbb' name
      3         FROM dual
      4         CONNECT BY LEVEL<=1000000;
    
    Table created.
    
    SQL> ALTER TABLE table_x2 ADD CONSTRAINT table_x2_pk PRIMARY KEY(rid, dt);
    
    Table altered.
    
    SQL> exec dbms_stats.gather_table_stats(user,'table_x2', cascade => true );
    
    PL/SQL procedure successfully completed.
    
    SQL> SELECT /*+ gather_plan_statistics */
      2         *
      3  FROM   table_x2
      4  where  rid >= 500 AND dt <= To_Date('04-08-2020 05:21:00','dd-mm-yyyy hh24:mi:ss');
    
           RID DT                  NAME
    ---------- ------------------- ----------
    ...
    snip
    ...
    
    1997 rows selected.
    
    SQL> select * from table (dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
    
    PLAN_TABLE_OUTPUT
    -----------------
    SQL_ID  dtbjf0rs83rf4, child number 0
    -------------------------------------
    SELECT /*+ gather_plan_statistics */        * FROM   table_x2 where
    rid >= 500 AND dt <= To_Date('04-08-2020 05:21:00','dd-mm-yyyy
    hh24:mi:ss')
    
    Plan hash value: 3536046022
    
    -----------------------------------------------------------------------------------------------------
    | Id  | Operation                   | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
    -----------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |             |      1 |        |   1997 |00:00:00.01 |    3482 |
    |   1 |  TABLE ACCESS BY INDEX ROWID| TABLE_X2    |      1 |   2496 |   1997 |00:00:00.01 |    3482 |
    |*  2 |   INDEX RANGE SCAN          | TABLE_X2_PK |      1 |   2496 |   1997 |00:00:00.01 |    3339 |
    -----------------------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - access("RID">=500 AND "DT"<=TO_DATE(' 2020-08-04 05:21:00', 'syyyy-mm-dd hh24:mi:ss'))
           filter("DT"<=TO_DATE(' 2020-08-04 05:21:00', 'syyyy-mm-dd hh24:mi:ss'))
    
    
    22 rows selected.
    
    SQL> CREATE TABLE table_x4 AS
      2     SELECT LEVEL rid, To_Number(To_Char(SYSDATE + LEVEL, 'yyyymmddhh24miss')) dt, 'aaaaa-bbbb' name
      3     FROM dual
      4     CONNECT BY LEVEL<=1000000;
    
    Table created.
    
    SQL> ALTER TABLE table_x4 ADD CONSTRAINT table_x4_pk PRIMARY KEY(rid, dt);
    
    Table altered.
    
    SQL> exec Dbms_Stats.gather_table_stats(user,'table_x4', cascade => true);
    
    PL/SQL procedure successfully completed.
    
    SQL> SELECT /*+ gather_plan_statistics */
      2         *
      3  FROM table_x4
      4  WHERE rid >= 500 AND dt <= 20200805060302;
    
           RID         DT NAME
    ---------- ---------- ----------
    ...
    snip
    ...
    
    1998 rows selected.
    
    SQL> select * from table (dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
    
    PLAN_TABLE_OUTPUT
    -----------------
    SQL_ID  c80wn579548jv, child number 0
    -------------------------------------
    SELECT /*+ gather_plan_statistics */        * FROM table_x4 WHERE rid
    >= 500 AND dt <= 20200805060302
    
    Plan hash value: 472343109
    
    -----------------------------------------------------------------------------------------------------
    | Id  | Operation                   | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
    -----------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |             |      1 |        |   1998 |00:00:00.01 |    3623 |
    |*  2 |   INDEX RANGE SCAN          | TABLE_X4_PK |      1 |   2549 |   1998 |00:00:00.01 |    3479 |
    -----------------------------------------------------------------------------------------------------
                                                                                                        
    Predicate Information (identified by operation id):
    ---------------------------------------------------
                                                      
       2 - access("RID">=500 AND "DT"<=20200805060302)
           filter("DT"<=20200805060302)               
                                                      
    
    21 rows selected.
    
  • Karthick2003
    Karthick2003 Member Posts: 13,711 Bronze Badge
    ranitB wrote:
    
    My doubt -
    Will there be any performance difference when Index is built on a NUMBER field to that when on a DATE field?
    
    Any pointers to this??
    

    My basic point is that the item that you are comparing is not something that should be compared. Beyond that I don't think such a detail is documented by oracle. I have also never come across any legitimate article by oracle experts in this regards.

  • Chris Hunt
    Chris Hunt Member Posts: 2,066 Gold Trophy

    I suppose it will take a tiny fraction of a second to convert "2020-08-04 05:21:00" into the internal date format - which it'll do so just once when it parses the query - but beyond that I wouldn't expect there to be any difference in performance. This is not a reason to store dates in number columns in the interests of "efficiency" - you'll lose far more than you gain.

    If you have a date, store it in a DATE column (or a TIMESTAMP if you need subsecond precision). If you have a number, store it in a NUMBER column. If you need to search on those columns, build indexes on them. Don't get cute and use different datatypes in the hope of "speeding things up" - the things the Oracle guys can do to speed up the indexing of date values internally to the database will dwarf anything you can cludge together by using the wrong datatype.

  • padders
    padders Member Posts: 1,081 Silver Trophy

    > Any pointers to this??

    The only reason I can think of that you would be testing this was to try and justify storing DATEs in a NUMBER column, is that correct?

  • Chris Hunt
    Chris Hunt Member Posts: 2,066 Gold Trophy

    I ran a test to make a more direct comparison between the speed of number indexes vs date indexes:

    CREATE TABLE xxx
    AS
    SELECT LEVEL num_key,
           TRUNC(SYSDATE) + LEVEL date_key,
           99 value
    FROM dual
    CONNECT BY LEVEL<=1000000;
    create index xxx_ix1 on xxx (num_key);
    create index xxx_ix2 on xxx (date_key);
    DECLARE
       nk       NUMBER;
       dk       DATE;
       x        NUMBER;
       i        NUMBER;
       t_start  TIMESTAMP;
       t_end    TIMESTAMP;
       diff     INTERVAL DAY TO SECOND;
    BEGIN
       t_start := systimestamp;
       FOR i IN 1..100000 LOOP
          nk := TRUNC(DBMS_RANDOM.VALUE(1,1000000));
          dk := TRUNC(sysdate) + nk;
         
          SELECT value
          INTO   x
          FROM   xxx
          WHERE  num_key = nk;
       END LOOP;
       t_end := systimestamp;    diff := t_end - t_start;
       DBMS_OUTPUT.PUT_LINE('Test 1: Number Index - '||diff);
       t_start := systimestamp;
       FOR i IN 1..100000 LOOP
          nk := TRUNC(DBMS_RANDOM.VALUE(1,1000000));
          dk := TRUNC(sysdate) + nk;
         
          SELECT value
          INTO   x
          FROM   xxx
          WHERE  date_key = dk;
       END LOOP;
       t_end := systimestamp;    diff := t_end - t_start;
       DBMS_OUTPUT.PUT_LINE('Test 2: Date Index - '||diff);    DBMS_OUTPUT.PUT_LINE(' ');
    END;

    Running the test a few times I get this output:

     

    Test 1: Number Index - +00 00:00:04.288427
    
    Test 2: Date Index - +00 00:00:03.289358
    
     
    
    Test 1: Number Index - +00 00:00:04.233118
    
    Test 2: Date Index - +00 00:00:04.582399
    
     
    
    Test 1: Number Index - +00 00:00:06.016131
    
    Test 2: Date Index - +00 00:00:04.893847
    
     
    
    Test 1: Number Index - +00 00:00:05.380649
    
    Test 2: Date Index - +00 00:00:05.512358
    
     
    
    Test 1: Number Index - +00 00:00:03.868854
    
    Test 2: Date Index - +00 00:00:04.672450
      
    Test 1: Number Index - +00 00:00:05.247808
    
    Test 2: Date Index - +00 00:00:04.380884
    

    So the two types indexes take about the same amount of time to fetch data, with neither type establishing clear mastery over the other.

    As I said before - and as has been said many times before in this forum - use DATEs to store date values!

  • BluShadow
    BluShadow Member, Moderator Posts: 42,318 Red Diamond
    ranitB wrote:
    
    My doubt -
    Will there be any performance difference when Index is built on a NUMBER field to that when on a DATE field?
    
    Any pointers to this??
    
    

    Not really a noticable difference.

    Internally the byte storage of numbers and dates differs.

    Dates are always stored as 7 bytes, going from high order to low order (i.e. century, year, month, day, hours, minutes, seconds), so a sequence of dates is stored in a sequential order of bytes... e.g.

    D           DUMP_D
    ----------- ----------------------------------------
    04-Oct-2013 Typ=12 Len=7: 120,113,10,4,10,23,23
    05-Oct-2013 Typ=12 Len=7: 120,113,10,5,10,23,23
    06-Oct-2013 Typ=12 Len=7: 120,113,10,6,10,23,23
    07-Oct-2013 Typ=12 Len=7: 120,113,10,7,10,23,23
    08-Oct-2013 Typ=12 Len=7: 120,113,10,8,10,23,23
    09-Oct-2013 Typ=12 Len=7: 120,113,10,9,10,23,23
    10-Oct-2013 Typ=12 Len=7: 120,113,10,10,10,23,23
    11-Oct-2013 Typ=12 Len=7: 120,113,10,11,10,23,23
    12-Oct-2013 Typ=12 Len=7: 120,113,10,12,10,23,23
    13-Oct-2013 Typ=12 Len=7: 120,113,10,13,10,23,23
    

    So, in terms of binary storage and bytewise comparison, dates are definitely stored as numbers (with gaps as you obviously wouldn't have more than 60 seconds represented in the last byte etc.)

    Numbers on the other hand take on a more unusual storage format.  Bytewise you may expect them to be something stored in something like a 32 bit (4 byte) representation (or 64 bit 8 byte, but let's stick to 4 bytes for demonstrative purposes)...

    num    bytes
    0      0,0,0,0
    1      0,0,0,1
    2      0,0,0,2
    3      0,0,0,3
    4      0,0,0,4
    5      0,0,0,5
    6      0,0,0,6
    7      0,0,0,7
    8      0,0,0,8
    9      0,0,0,9
    .      .
    .      .
    .      .
    256    0,0,1,0
    257    0,0,1,1
    258    0,0,1,2
    259    0,0,1,3

    but if we look how oracle actually stores them...

             N DUMP_N
    ---------- ------------------------------
             1 Typ=2 Len=2: 193,2
             2 Typ=2 Len=2: 193,3
             3 Typ=2 Len=2: 193,4
             4 Typ=2 Len=2: 193,5
             5 Typ=2 Len=2: 193,6
             6 Typ=2 Len=2: 193,7
             7 Typ=2 Len=2: 193,8
             8 Typ=2 Len=2: 193,9
             9 Typ=2 Len=2: 193,10
    . . .        256 Typ=2 Len=3: 194,3,57
           257 Typ=2 Len=3: 194,3,58
           258 Typ=2 Len=3: 194,3,59
           259 Typ=2 Len=3: 194,3,60

    It's still kind of got a sequential order to it, but the number of bytes is not consistent, which is even more noticable as the numbers get larger...

             N DUMP_N
    ---------- ------------------------------
    100000000 Typ=2 Len=2: 197,2
    123456789 Typ=2 Len=6: 197,2,24,46,68,90

    If the numbers were just stored as pure binary then 123456789 would only really take 4 bytes, not the 6 that oracle uses, but then a number like 1000000000 would also take 4 bytes, whereas Oracle only uses 2.

    So, how does this effect indexing... well, mostly it doesn't.  The number of bytes is hardly relevant, but the sequential nature of the data is.

    There are different types on indexes, but in general if we understand something like a B-Tree style index (I'll let you look that up if you don't know)... then these rely on breaking down data of a sequential nature into small groups within other groups etc. so that binary division can be used to quickly access the data on the leaf nodes.  It doesn't matter that the data is a DATE or a NUMBER or even VARCHAR2 etc., the index optimizes the access to data by using the sequential nature of the byte storage used for it.

This discussion has been closed.