Forum Stats

  • 3,734,267 Users
  • 2,246,933 Discussions
  • 7,857,216 Comments

Discussions

Reg: <Performance comparison> Index on NUMBER and DATE fields -

ranit B
ranit B Member Posts: 3,268 Silver Trophy
edited Oct 3, 2013 10:01AM in SQL & PL/SQL

Hi Experts,

I'm just trying to do a comparison between Index on a NUMBER field and a DATE field.

But my experiments are giving me a mixed kind of response.

My assumption was that - Index on a NUMBER field would be faster coz it avoids the datatype conversion of DATE fields. (as seen in the 'Predicate section' of the explain plan)

Below I'm trying 2 scenarios -

1. Composite index on fields (NUMBER, DATE) 

2. Composite index on fields (NUMBER, NUMBER) 

CREATE TABLE table_x2
AS
       SELECT LEVEL rid, SYSDATE + LEVEL dt, 'aaaaa-bbbb' name
       FROM dual
       CONNECT BY LEVEL<=1000000;


ALTER TABLE table_x2
ADD CONSTRAINT table_x2_pk PRIMARY KEY(rid, dt);


SQL> exec dbms_stats.gather_table_stats('schema_x','table_x2');

PL/SQL procedure successfully completed.

 
SQL> SELECT *   2  FROM table_x2   3  WHERE rid >= 500 AND dt <= To_Date('04-08-2020 05:21:00','dd-mm-yyyy hh24:mi:ss'); 2001 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1079754163 ------------------------------------------------------------------------------ | Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     | ------------------------------------------------------------------------------ |   0 | SELECT STATEMENT  |          |  2499 | 59976 |  1121   (2)| 00:00:14 | |*  1 |  TABLE ACCESS FULL| TABLE_X2 |  2499 | 59976 |  1121   (2)| 00:00:14 | ------------------------------------------------------------------------------ Predicate Information (identified by operation id): ---------------------------------------------------    1 - filter("DT"<=TO_DATE(' 2020-08-04 05:21:00', 'syyyy-mm-dd               hh24:mi:ss') AND "RID">=500) Statistics ----------------------------------------------------------           1  recursive calls           0  db block gets        4155  consistent gets           0  physical reads           0  redo size       45894  bytes sent via SQL*Net to client        1797  bytes received via SQL*Net from client         135  SQL*Net roundtrips to/from client           0  sorts (memory)           0  sorts (disk)        2001  rows processed
=====================================================================================================================
SCENARIO - 2

CREATE TABLE table_x4 AS        SELECT LEVEL rid, To_Number(To_Char(SYSDATE + LEVEL, 'yyyymmddhh24miss')) dt, 'aaaaa-bbbb' name        FROM dual        CONNECT BY LEVEL<=1000000; ALTER TABLE table_x4 ADD CONSTRAINT table_x4_pk PRIMARY KEY(rid, dt); BEGIN     Dbms_Stats.gather_table_stats('schema_x','table_x4'); END;
SQL> SELECT *   2  FROM table_x4   3  WHERE rid >= 500 AND dt <= 20200805060302; 2001 rows selected. ------------------------------------------------------------------------------ | Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     | ------------------------------------------------------------------------------ |   0 | SELECT STATEMENT  |          |  2547 | 63675 |  1157   (2)| 00:00:14 | |*  1 |  TABLE ACCESS FULL| TABLE_X4 |  2547 | 63675 |  1157   (2)| 00:00:14 | ------------------------------------------------------------------------------ Predicate Information (identified by operation id): ---------------------------------------------------    1 - filter("DT"<=20200805060302 AND "RID">=500) Statistics ----------------------------------------------------------           1  recursive calls           0  db block gets        4221  consistent gets           0  physical reads           0  redo size       24143  bytes sent via SQL*Net to client        1060  bytes received via SQL*Net from client          68  SQL*Net roundtrips to/from client           0  sorts (memory)           0  sorts (disk)        1001  rows processed

Also, not sure why it is not selecting the Index here.

I might be completely wrong... Can you please guide me on this?

Thanks and Regards

-- Ranit

(on Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production)

Tagged:

Answers

  • Karthick2003
    Karthick2003 Member Posts: 13,711 Bronze Badge
    ranitB wrote:
    
    My assumption was that - Index on a NUMBER field would be faster coz it avoids the datatype conversion of DATE fields. (as seen in the 'Predicate section' of the explain plan)
    

    A date is internally stored as a number. So I don't think a comparison between NUMBER and DATE data type columns with respect to INDEX is is much of a valid scenario. Its more of comparing Apples with Oranges. They are there for specific purpose and the correct data type should be used in the correct place.

    Not sure what data type conversion you are talking about for the DATE data type. Nothing of that sort happens. Check your predicate

    Predicate Information (identified by operation id):

    ---------------------------------------------------

       1 - filter("DT"<=TO_DATE(' 2020-08-04 05:21:00', 'syyyy-mm-dd

                  hh24:mi:ss') AND "RID">=500)

    I don't see anything like that. The TO_DATE is the one that is given by you as the input value. Its not something that Oracle is implicitly doing.

  • ranit B
    ranit B Member Posts: 3,268 Silver Trophy
    edited Oct 3, 2013 3:22AM

    Thanks for the response, Karthick. Got your point.

    But since - 

    A date is internally stored as a number.

    So, if in the WHERE clause of a query we use a DATE column, an additional overhead (I guess) needs to be done by Oracle to compare it with NUMBER value stored internally.

    My doubt -

    Will there be any performance difference when Index is built on a NUMBER field to that when on a DATE field?

    Could you please extend some help on this?

  • Karthick2003
    Karthick2003 Member Posts: 13,711 Bronze Badge
    edited Oct 3, 2013 3:32AM
    So, if in the WHERE clause of a query we use a DATE column, an additional overhead (I guess) needs to be done by Oracle to compare it with NUMBER value stored internally.
    

    There is no conversion. When you pass a date value still its internally represented as a number. So there is no need to perform any conversion.

  • ranit B
    ranit B Member Posts: 3,268 Silver Trophy
    My doubt -
    Will there be any performance difference when Index is built on a NUMBER field to that when on a DATE field?
     

    Any pointers to this??

  • Hoek
    Hoek Member Posts: 16,076 Gold Crown

    Here's what I got, also on 11.2.0.3.0, the only difference is the way I gathered the stats and I used dbms_xplan:

    SQL> CREATE TABLE table_x2 AS
      2         SELECT LEVEL rid, SYSDATE + LEVEL dt, 'aaaaa-bbbb' name
      3         FROM dual
      4         CONNECT BY LEVEL<=1000000;
    
    Table created.
    
    SQL> ALTER TABLE table_x2 ADD CONSTRAINT table_x2_pk PRIMARY KEY(rid, dt);
    
    Table altered.
    
    SQL> exec dbms_stats.gather_table_stats(user,'table_x2', cascade => true );
    
    PL/SQL procedure successfully completed.
    
    SQL> SELECT /*+ gather_plan_statistics */
      2         *
      3  FROM   table_x2
      4  where  rid >= 500 AND dt <= To_Date('04-08-2020 05:21:00','dd-mm-yyyy hh24:mi:ss');
    
           RID DT                  NAME
    ---------- ------------------- ----------
    ...
    snip
    ...
    
    1997 rows selected.
    
    SQL> select * from table (dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
    
    PLAN_TABLE_OUTPUT
    -----------------
    SQL_ID  dtbjf0rs83rf4, child number 0
    -------------------------------------
    SELECT /*+ gather_plan_statistics */        * FROM   table_x2 where
    rid >= 500 AND dt <= To_Date('04-08-2020 05:21:00','dd-mm-yyyy
    hh24:mi:ss')
    
    Plan hash value: 3536046022
    
    -----------------------------------------------------------------------------------------------------
    | Id  | Operation                   | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
    -----------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |             |      1 |        |   1997 |00:00:00.01 |    3482 |
    |   1 |  TABLE ACCESS BY INDEX ROWID| TABLE_X2    |      1 |   2496 |   1997 |00:00:00.01 |    3482 |
    |*  2 |   INDEX RANGE SCAN          | TABLE_X2_PK |      1 |   2496 |   1997 |00:00:00.01 |    3339 |
    -----------------------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - access("RID">=500 AND "DT"<=TO_DATE(' 2020-08-04 05:21:00', 'syyyy-mm-dd hh24:mi:ss'))
           filter("DT"<=TO_DATE(' 2020-08-04 05:21:00', 'syyyy-mm-dd hh24:mi:ss'))
    
    
    22 rows selected.
    
    SQL> CREATE TABLE table_x4 AS
      2     SELECT LEVEL rid, To_Number(To_Char(SYSDATE + LEVEL, 'yyyymmddhh24miss')) dt, 'aaaaa-bbbb' name
      3     FROM dual
      4     CONNECT BY LEVEL<=1000000;
    
    Table created.
    
    SQL> ALTER TABLE table_x4 ADD CONSTRAINT table_x4_pk PRIMARY KEY(rid, dt);
    
    Table altered.
    
    SQL> exec Dbms_Stats.gather_table_stats(user,'table_x4', cascade => true);
    
    PL/SQL procedure successfully completed.
    
    SQL> SELECT /*+ gather_plan_statistics */
      2         *
      3  FROM table_x4
      4  WHERE rid >= 500 AND dt <= 20200805060302;
    
           RID         DT NAME
    ---------- ---------- ----------
    ...
    snip
    ...
    
    1998 rows selected.
    
    SQL> select * from table (dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
    
    PLAN_TABLE_OUTPUT
    -----------------
    SQL_ID  c80wn579548jv, child number 0
    -------------------------------------
    SELECT /*+ gather_plan_statistics */        * FROM table_x4 WHERE rid
    >= 500 AND dt <= 20200805060302
    
    Plan hash value: 472343109
    
    -----------------------------------------------------------------------------------------------------
    | Id  | Operation                   | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
    -----------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |             |      1 |        |   1998 |00:00:00.01 |    3623 |
    |*  2 |   INDEX RANGE SCAN          | TABLE_X4_PK |      1 |   2549 |   1998 |00:00:00.01 |    3479 |
    -----------------------------------------------------------------------------------------------------
                                                                                                        
    Predicate Information (identified by operation id):
    ---------------------------------------------------
                                                      
       2 - access("RID">=500 AND "DT"<=20200805060302)
           filter("DT"<=20200805060302)               
                                                      
    
    21 rows selected.
    
  • Karthick2003
    Karthick2003 Member Posts: 13,711 Bronze Badge
    ranitB wrote:
    
    My doubt -
    Will there be any performance difference when Index is built on a NUMBER field to that when on a DATE field?
    
    Any pointers to this??
    

    My basic point is that the item that you are comparing is not something that should be compared. Beyond that I don't think such a detail is documented by oracle. I have also never come across any legitimate article by oracle experts in this regards.

  • Chris Hunt
    Chris Hunt Member Posts: 2,066 Gold Trophy

    I suppose it will take a tiny fraction of a second to convert "2020-08-04 05:21:00" into the internal date format - which it'll do so just once when it parses the query - but beyond that I wouldn't expect there to be any difference in performance. This is not a reason to store dates in number columns in the interests of "efficiency" - you'll lose far more than you gain.

    If you have a date, store it in a DATE column (or a TIMESTAMP if you need subsecond precision). If you have a number, store it in a NUMBER column. If you need to search on those columns, build indexes on them. Don't get cute and use different datatypes in the hope of "speeding things up" - the things the Oracle guys can do to speed up the indexing of date values internally to the database will dwarf anything you can cludge together by using the wrong datatype.

  • padders
    padders Member Posts: 1,045 Bronze Trophy

    > Any pointers to this??

    The only reason I can think of that you would be testing this was to try and justify storing DATEs in a NUMBER column, is that correct?

  • Chris Hunt
    Chris Hunt Member Posts: 2,066 Gold Trophy

    I ran a test to make a more direct comparison between the speed of number indexes vs date indexes:

    CREATE TABLE xxx
    AS
    SELECT LEVEL num_key,
           TRUNC(SYSDATE) + LEVEL date_key,
           99 value
    FROM dual
    CONNECT BY LEVEL<=1000000;
    create index xxx_ix1 on xxx (num_key);
    create index xxx_ix2 on xxx (date_key);
    DECLARE
       nk       NUMBER;
       dk       DATE;
       x        NUMBER;
       i        NUMBER;
       t_start  TIMESTAMP;
       t_end    TIMESTAMP;
       diff     INTERVAL DAY TO SECOND;
    BEGIN
       t_start := systimestamp;
       FOR i IN 1..100000 LOOP
          nk := TRUNC(DBMS_RANDOM.VALUE(1,1000000));
          dk := TRUNC(sysdate) + nk;
         
          SELECT value
          INTO   x
          FROM   xxx
          WHERE  num_key = nk;
       END LOOP;
       t_end := systimestamp;    diff := t_end - t_start;
       DBMS_OUTPUT.PUT_LINE('Test 1: Number Index - '||diff);
       t_start := systimestamp;
       FOR i IN 1..100000 LOOP
          nk := TRUNC(DBMS_RANDOM.VALUE(1,1000000));
          dk := TRUNC(sysdate) + nk;
         
          SELECT value
          INTO   x
          FROM   xxx
          WHERE  date_key = dk;
       END LOOP;
       t_end := systimestamp;    diff := t_end - t_start;
       DBMS_OUTPUT.PUT_LINE('Test 2: Date Index - '||diff);    DBMS_OUTPUT.PUT_LINE(' ');
    END;

    Running the test a few times I get this output:

     

    Test 1: Number Index - +00 00:00:04.288427
    
    Test 2: Date Index - +00 00:00:03.289358
    
     
    
    Test 1: Number Index - +00 00:00:04.233118
    
    Test 2: Date Index - +00 00:00:04.582399
    
     
    
    Test 1: Number Index - +00 00:00:06.016131
    
    Test 2: Date Index - +00 00:00:04.893847
    
     
    
    Test 1: Number Index - +00 00:00:05.380649
    
    Test 2: Date Index - +00 00:00:05.512358
    
     
    
    Test 1: Number Index - +00 00:00:03.868854
    
    Test 2: Date Index - +00 00:00:04.672450
      
    Test 1: Number Index - +00 00:00:05.247808
    
    Test 2: Date Index - +00 00:00:04.380884
    

    So the two types indexes take about the same amount of time to fetch data, with neither type establishing clear mastery over the other.

    As I said before - and as has been said many times before in this forum - use DATEs to store date values!

  • BluShadow
    BluShadow Member, Moderator Posts: 40,989 Red Diamond
    ranitB wrote:
    
    My doubt -
    Will there be any performance difference when Index is built on a NUMBER field to that when on a DATE field?
    
    Any pointers to this??
    
    

    Not really a noticable difference.

    Internally the byte storage of numbers and dates differs.

    Dates are always stored as 7 bytes, going from high order to low order (i.e. century, year, month, day, hours, minutes, seconds), so a sequence of dates is stored in a sequential order of bytes... e.g.

    D           DUMP_D
    ----------- ----------------------------------------
    04-Oct-2013 Typ=12 Len=7: 120,113,10,4,10,23,23
    05-Oct-2013 Typ=12 Len=7: 120,113,10,5,10,23,23
    06-Oct-2013 Typ=12 Len=7: 120,113,10,6,10,23,23
    07-Oct-2013 Typ=12 Len=7: 120,113,10,7,10,23,23
    08-Oct-2013 Typ=12 Len=7: 120,113,10,8,10,23,23
    09-Oct-2013 Typ=12 Len=7: 120,113,10,9,10,23,23
    10-Oct-2013 Typ=12 Len=7: 120,113,10,10,10,23,23
    11-Oct-2013 Typ=12 Len=7: 120,113,10,11,10,23,23
    12-Oct-2013 Typ=12 Len=7: 120,113,10,12,10,23,23
    13-Oct-2013 Typ=12 Len=7: 120,113,10,13,10,23,23
    

    So, in terms of binary storage and bytewise comparison, dates are definitely stored as numbers (with gaps as you obviously wouldn't have more than 60 seconds represented in the last byte etc.)

    Numbers on the other hand take on a more unusual storage format.  Bytewise you may expect them to be something stored in something like a 32 bit (4 byte) representation (or 64 bit 8 byte, but let's stick to 4 bytes for demonstrative purposes)...

    num    bytes
    0      0,0,0,0
    1      0,0,0,1
    2      0,0,0,2
    3      0,0,0,3
    4      0,0,0,4
    5      0,0,0,5
    6      0,0,0,6
    7      0,0,0,7
    8      0,0,0,8
    9      0,0,0,9
    .      .
    .      .
    .      .
    256    0,0,1,0
    257    0,0,1,1
    258    0,0,1,2
    259    0,0,1,3

    but if we look how oracle actually stores them...

             N DUMP_N
    ---------- ------------------------------
             1 Typ=2 Len=2: 193,2
             2 Typ=2 Len=2: 193,3
             3 Typ=2 Len=2: 193,4
             4 Typ=2 Len=2: 193,5
             5 Typ=2 Len=2: 193,6
             6 Typ=2 Len=2: 193,7
             7 Typ=2 Len=2: 193,8
             8 Typ=2 Len=2: 193,9
             9 Typ=2 Len=2: 193,10
    . . .        256 Typ=2 Len=3: 194,3,57
           257 Typ=2 Len=3: 194,3,58
           258 Typ=2 Len=3: 194,3,59
           259 Typ=2 Len=3: 194,3,60

    It's still kind of got a sequential order to it, but the number of bytes is not consistent, which is even more noticable as the numbers get larger...

             N DUMP_N
    ---------- ------------------------------
    100000000 Typ=2 Len=2: 197,2
    123456789 Typ=2 Len=6: 197,2,24,46,68,90

    If the numbers were just stored as pure binary then 123456789 would only really take 4 bytes, not the 6 that oracle uses, but then a number like 1000000000 would also take 4 bytes, whereas Oracle only uses 2.

    So, how does this effect indexing... well, mostly it doesn't.  The number of bytes is hardly relevant, but the sequential nature of the data is.

    There are different types on indexes, but in general if we understand something like a B-Tree style index (I'll let you look that up if you don't know)... then these rely on breaking down data of a sequential nature into small groups within other groups etc. so that binary division can be used to quickly access the data on the leaf nodes.  It doesn't matter that the data is a DATE or a NUMBER or even VARCHAR2 etc., the index optimizes the access to data by using the sequential nature of the byte storage used for it.

  • Martin Preiss
    Martin Preiss Member Posts: 2,381 Gold Trophy

    I also think that the decision for a data type should be based on the data: if you have dates to store then you should use a date type. The main reason for this is that the CBO knows that 01.01.2014 follows on 31.12.2013 and not 32.12.2013 or something else.

    Regarding your question of the use of an FTS instead of an index scan it is just a question of costing: the CBO calculates the cost for the FTS by dividing the number of segment blocks by a value for multi block reads (the details depend on the flavor of system statictics you use). For the index scan the CBO takes into account the part of the index that has to be accessed and the part of the table to be read. The formula is:

    basic index range scan cost = index blevel + ceil(index selectivity x leaf blocks) + ceil(table selectivity x clustering factor)
    

    Using your example a CBO trace contains the details on this part of the calculation:

    ***************************************
    BASE STATISTICAL INFORMATION
    ***********************
    Table Stats::
      Table: TABLE_X2  Alias: TABLE_X2
        #Rows: 1000000  #Blks:  4087  AvgRowLen:  24.00
    Index Stats::
      Index: TABLE_X2_PK  Col#: 1 2
        LVLS: 2  #LB: 3204  #DK: 1000000  LB/K: 1.00  DB/K: 1.00  CLUF: 4015.00
    Access path analysis for TABLE_X2
    ***************************************
    SINGLE TABLE ACCESS PATH 
      Single Table Cardinality Estimation for TABLE_X2[TABLE_X2] 
    
    
      Table: TABLE_X2  Alias: TABLE_X2
        Card: Original: 1000000.000000  Rounded: 2496  Computed: 2495.53  Non Adjusted: 2495.53
      Access Path: TableScan
        Cost:  2045.05  Resp: 2045.05  Degree: 0
          Cost_io: 2045.00  Cost_cpu: 249280084
          Resp_io: 2045.00  Resp_cpu: 249280084
    kkofmx: index filter:"TABLE_X2"."DT"<=TO_DATE(' 2020-08-04 05:21:00', 'syyyy-mm-dd hh24:mi:ss')
    
    
      Access Path: index (skip-scan)
        SS sel: 0.002496  ANDV (#skips): 999502.000000
        SS io: 999502.000000 vs. index scan io: 2045.000000
        Skip Scan rejected
      Access Path: index (RangeScan)
        Index: TABLE_X2_PK
        resc_io: 3216.00  resc_cpu: 223277191
        ix_sel: 0.999502  ix_sel_with_filters: 0.002496 
     ***** Logdef predicate Adjustment ****** 
     Final IO cst 0.00 , CPU cst 50.00
     ***** End Logdef Adjustment ****** 
        Cost: 3216.05  Resp: 3216.05  Degree: 1
      Best:: AccessPath: TableScan
             Cost: 2045.05  Degree: 1  Resp: 2045.05  Card: 2495.53  Bytes: 0
    

    So the TableScan is considered to be cheaper than the index scan.

  • Hoek
    Hoek Member Posts: 16,076 Gold Crown

    However, I got the index range scans in my testrun, And afaik this has all to do with the way statistics were gathered (I set parameter cascade explicitly to true, whereas Ranit took the default).

  • Martin Preiss
    Martin Preiss Member Posts: 2,381 Gold Trophy

    as far as I know the cascade option should make no difference in this case since the index creation already includes a statistics gathering:

    drop TABLE table_x2;
    
    CREATE TABLE table_x2
    AS
           SELECT LEVEL rid, SYSDATE + LEVEL dt, 'aaaaa-bbbb' name
           FROM dual
           CONNECT BY LEVEL<=1000000;
    
    ALTER TABLE table_x2
    ADD CONSTRAINT table_x2_pk PRIMARY KEY(rid, dt);
    
    select index_name, blevel, leaf_blocks, last_analyzed from user_indexes where index_name = upper('table_x2_pk');
    
    INDEX_NAME                         BLEVEL LEAF_BLOCKS LAST_ANALYZED ------------------------------ ---------- ----------- ------------------- TABLE_X2_PK                             2        3204 03.10.2013 11:49:28 exec dbms_stats.gather_table_stats(user,'table_x2', cascade => true ); select index_name, blevel, leaf_blocks, last_analyzed from user_indexes where index_name = upper('table_x2_pk');
    INDEX_NAME                         BLEVEL LEAF_BLOCKS LAST_ANALYZED ------------------------------ ---------- ----------- ------------------- TABLE_X2_PK                             2        3204 03.10.2013 11:49:30

    So I would expect that you would also get the index scan without the cascade option.

  • ranit B
    ranit B Member Posts: 3,268 Silver Trophy

    Thanks a lot for all your efforts guys. Much appreciated!

    Just skimmed through all the replies (but have to re-read the workouts thouroughly once more)... and got the point:

    1. Performance of fetching data based on NUMBER index field is same as fetching it from DATE index field.

    2. DATE datatype is the best option while storing dates.

    Please rectify me, if wrong.

    Thanks

    -- Ranit

  • BluShadow
    BluShadow Member, Moderator Posts: 40,989 Red Diamond

    1. Yes, pretty much.  Obviously the amount of bytes used can vary, so the number of data blocks read from indexes etc. can vary, but this is generally not noticable and certainly not something you should worry about.

    2. Yes, always store dates as DATE or TIMESTAMP as appropriate.

  • ranit B
    ranit B Member Posts: 3,268 Silver Trophy

    Thanks again, Blu.

    BluShadow wrote:
    
    1. Yes, pretty much.  Obviously the amount of bytes used can vary, so the number of data blocks read from indexes etc. can vary, but this is generally not noticable and certainly not something you should worry about.
    2. Yes, always store dates as DATE or TIMESTAMP as appropriate.
    

    This reply was like an icing on the cake.

    You are always so helpful... an asset to the Oracle community. It's really appreciated.


This discussion has been closed.