This discussion is archived
8 Replies Latest reply: Jan 5, 2010 3:35 AM by 744334 RSS

Datatype best practice and plan cardinality

user503699 Expert
Currently Being Moderated
Hi,

I have a scenario where I need to store the data in the format YYYYMM (e.g. 201001 which means January, 2010).
I am trying to evaluate what is the most appropriate datatype to store this kind of data. I am comparing 2 options, NUMBER and DATE.
As the data is essentially a component of oracle date datatype and experts like Tom Kyte have proved (with examples) that using right
datatype is better for optimizer. So I was expecting that using DATE datatype will yield (at least) similar (if not better) cardinality estimates
than using NUMBER datatype. However, my tests show that when using DATE the cardinality estimates are way off from actuals whereas
using NUMBER the cardinality estimates are much closer to actuals.
My questions are:
1) What should be the most appropriate datatype used to store YYYYMM data?
2) Why does using DATE datatype yield estimates that are way off from actuals than using NUMBER datatype?
SQL> select * from V$VERSION ;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod
PL/SQL Release 10.2.0.1.0 - Production
CORE     10.2.0.1.0     Production
TNS for Linux: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production

SQL>  create table a nologging as select to_number(to_char(add_months(to_date('200101','YYYYMM'),level - 1), 'YYYYMM')) id from dual connect by level <= 289 ;

Table created.

SQL> create table b (id number) ;

Table created.

SQL> begin
  2  for i in 1..8192
  3  loop
  4     insert into b select * from a ;
  5  end loop;
  6  commit;
  7  end;
  8  /

PL/SQL procedure successfully completed.

SQL> alter table a add dt date ;

Table altered.

SQL> alter table b add dt date ;

Table altered.

SQL> select to_date(200101, 'YYYYMM') from dual ;

TO_DATE(2
---------
01-JAN-01

SQL> update a set dt = to_date(id, 'YYYYMM') ;

289 rows updated.

SQL> update b set dt = to_date(id, 'YYYYMM') ;

2367488 rows updated.

SQL> commit ;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(user, 'A', estimate_percent=>NULL) ;

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(user, 'B', estimate_percent=>NULL) ;     

SQL> explain plan for select count(*) from b where id between 200810 and 200903 ;

Explained.

SQL> select * from table(dbms_xplan.display) ;

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 749587668

---------------------------------------------------------------------------
| Id  | Operation        | Name | Rows  | Bytes | Cost (%CPU)| Time       |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |     1 |     5 |   824   (4)| 00:00:10 |
|   1 |  SORT AGGREGATE    |       |     1 |     5 |            |       |
|*  2 |   TABLE ACCESS FULL| B       | 46604 |   227K|   824   (4)| 00:00:10 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
---------------------------------------------------

   2 - filter("ID"<=200903 AND "ID">=200810)

14 rows selected.

SQL> explain plan for select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM') ;

Explained.

SQL> select * from table(dbms_xplan.display) ;

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
Plan hash value: 749587668

---------------------------------------------------------------------------
| Id  | Operation        | Name | Rows  | Bytes | Cost (%CPU)| Time       |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |     1 |     5 |   825   (4)| 00:00:10 |
|   1 |  SORT AGGREGATE    |       |     1 |     5 |            |       |
|*  2 |   TABLE ACCESS FULL| B       |  5919 | 29595 |   825   (4)| 00:00:10 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
---------------------------------------------------

   2 - filter("DT">=TO_DATE('2008-10-01 00:00:00', 'yyyy-mm-dd
           hh24:mi:ss') AND "DT"<=TO_DATE('2009-03-01 00:00:00', 'yyyy-mm-dd
           hh24:mi:ss'))

16 rows selected.
  • 1. Re: Datatype best practice and plan cardinality
    CharlesHooper Expert
    Currently Being Moderated
    You might be experiencing a bug in the unpatched 10.2.0.1, or there might be a histogram automatically generated on the table's columns.

    One of the problems with putting date values in number columns is this - if you select the range from 200810 to 200903, the optimizer will likely make the assumption that 200810 is just as likely of a number as 200808, 200812, 200814, 200816, 200820, 200890, 200900, etc. Some of those year/month combinations are simply not possible. In such a case, the optimizer should over-estimate the number of rows returned from that range when the column data type is NUMBER, and should be reasonably close when the column data type is DATE, since the optimizer knows that 200814 (14/1/2008), 200816 (16/1/2008), 200820 (20/1/2008), 200890 (90/1/2008), 200900 (0/1/2009), etc. could never be dates (and would be completely out of the serial sequence of dates).

    These are my results from Oracle 11.1.0.7:
    SQL> set autotrace traceonly explain
    SQL> select count(*) from b where id between 200810 and 200903 ;
     
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 749587668
     
    ---------------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |     1 |     5 |  4715   (1)| 00:00:57 |
    |   1 |  SORT AGGREGATE    |      |     1 |     5 |            |          |
    |*  2 |   TABLE ACCESS FULL| B    |   108K|   527K|  4715   (1)| 00:00:57 |
    ---------------------------------------------------------------------------
     
    Predicate Information (identified by operation id):
    ---------------------------------------------------
       2 - filter("ID"<=200903 AND "ID">=200810)
     
    SQL> select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM') ;
     
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 749587668
     
    ---------------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |     1 |     8 |  4718   (2)| 00:00:57 |
    |   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
    |*  2 |   TABLE ACCESS FULL| B    | 57166 |   446K|  4718   (2)| 00:00:57 |
    ---------------------------------------------------------------------------
     
    Predicate Information (identified by operation id):
    ---------------------------------------------------
       2 - filter("DT"<=TO_DATE(' 2009-03-01 00:00:00', 'syyyy-mm-dd
                  hh24:mi:ss') AND "DT">=TO_DATE(' 2008-10-01 00:00:00', 'syyyy-mm-dd
                  hh24:mi:ss'))
     
    SQL> set autotrace off
    SQL> select count(*) from b where id between 200810 and 200903 ;
     
      COUNT(*)
    ----------
         49152
     
    SQL> select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM') ;
     
      COUNT(*)
    ----------
         49152
    From a 10053 trace capture during the above test:
    ******************************************
    ----- Current SQL Statement for this session (sql_id=7uk18xj0z9uxf) -----
    select count(*) from b where id between 200810 and 200903 
    *******************************************
    ...
    ***************************************
    SINGLE TABLE ACCESS PATH 
      Single Table Cardinality Estimation for B[B] 
    
      Table: B  Alias: B
        Card: Original: 2367488.000000  Rounded: 108124  Computed: 108124.16  Non Adjusted: 108124.16
      Access Path: TableScan
        Cost:  4714.53  Resp: 4714.53  Degree: 0
          Cost_io: 4670.00  Cost_cpu: 636216240
          Resp_io: 4670.00  Resp_cpu: 636216240
      Best:: AccessPath: TableScan
             Cost: 4714.53  Degree: 1  Resp: 4714.53  Card: 108124.16  Bytes: 0
    
    ***************************************
    ...
    ******************************************
    ----- Current SQL Statement for this session (sql_id=2ac0k15zjdg5x) -----
    select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM') 
    *******************************************
    ...
    
    ***************************************
    SINGLE TABLE ACCESS PATH 
      Single Table Cardinality Estimation for B[B] 
    
      Table: B  Alias: B
        Card: Original: 2367488.000000  Rounded: 57166  Computed: 57165.51  Non Adjusted: 57165.51
      Access Path: TableScan
        Cost:  4717.89  Resp: 4717.89  Degree: 0
          Cost_io: 4670.00  Cost_cpu: 684264079
          Resp_io: 4670.00  Resp_cpu: 684264079
      Best:: AccessPath: TableScan
             Cost: 4717.89  Degree: 1  Resp: 4717.89  Card: 57165.51  Bytes: 0
    
    ***************************************
    As you can see from the above, the cardinality estimates are closer when the date data type is used.

    Charles Hooper
    Co-author of "Expert Oracle Practices: Oracle Database Administration from the Oak Table"
    http://hoopercharles.wordpress.com/
    IT Manager/Oracle DBA
    K&M Machine-Fabricating, Inc.
  • 2. Re: Datatype best practice and plan cardinality
    sb92075 Guru
    Currently Being Moderated
    1) What should be the most appropriate datatype used to store YYYYMM data?
    DATE datatype
    In the future you will be glad you did so.
  • 3. Re: Datatype best practice and plan cardinality
    user503699 Expert
    Currently Being Moderated
    Charles,

    Thanks for your response.
    I did not think of the possibilitty of histograms. When I ran the tests on 10.2.0.4, I could get the results as you have shown.
    So I thought it might be due to some bug in 10.2.0.1. But interestingly, when I ran the test after collecting statistics using 'FOR ALL COLUMNS SIZE 1'
    option, I got the cardinalities that match my 10.2.0.1 results (where METHOD_OPT was default i.e. 'FOR ALL COLUMNS SIZE AUTO').
    So I carried out the tests again on 10.2.0.1 but the results did not look consistent to me. When there were no histograms on DATE column, the cardinality
    was quite close to actuals but when I collected stats using 'FOR ALL COLUMNS SIZE SKEWONLY', it generated histograms on DATE column but
    the cardinality was not quite close to actuals.
    So I am bit confused about whether this is due to a bug or due to combined effect of optimizer's "intelligence" while collecting statistics using default option
    values and the way table is queried (COL_USAGE$ data).
    Here is my test:
    SQL> select * from v$version ;
    
    BANNER
    ----------------------------------------------------------------
    Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod
    PL/SQL Release 10.2.0.1.0 - Production
    CORE     10.2.0.1.0     Production
    TNS for Linux: Version 10.2.0.1.0 - Production
    NLSRTL Version 10.2.0.1.0 - Production
    
    SQL> exec dbms_stats.delete_table_stats(user, 'B') ;
    
    PL/SQL procedure successfully completed.
    
    SQL> select column_name, num_distinct, num_buckets, histogram from user_tab_col_statistics where table_name = 'B' ;
    
    no rows selected
    
    SQL> exec dbms_stats.gather_table_stats(user, 'B') ;
    
    PL/SQL procedure successfully completed.
    
    SQL> select column_name, num_distinct, num_buckets, histogram from user_tab_col_statistics where table_name = 'B' ;
    
    COLUMN_NAME                    NUM_DISTINCT NUM_BUCKETS HISTOGRAM
    ------------------------------ ------------ ----------- ---------------
    ID                                      289         254 HEIGHT BALANCED
    DT                                      289         254 HEIGHT BALANCED
    
    SQL> explain plan for select count(*) from b where b.id between 200810 and 200903 ;
    
    Explained.
    
    SQL> select * from table(dbms_xplan.display) ;
    
    PLAN_TABLE_OUTPUT
    -----------------------------------------------------------------------------------------------------------------------------------
    Plan hash value: 749587668
    
    ---------------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |     1 |     5 |  3691   (1)| 00:00:45 |
    |   1 |  SORT AGGREGATE    |      |     1 |     5 |            |          |
    |*  2 |   TABLE ACCESS FULL| B    | 38218 |   186K|  3691   (1)| 00:00:45 |
    ---------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - filter("B"."ID"<=200903 AND "B"."ID">=200810)
    
    14 rows selected.
    
    SQL> explain plan for select count(*) from b where b.dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM') ;
    
    Explained.
    
    SQL> select * from table(dbms_xplan.display) ;
    
    PLAN_TABLE_OUTPUT
    -----------------------------------------------------------------------------------------------------------------------------------
    Plan hash value: 749587668
    
    ---------------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |     1 |     8 |  3693   (1)| 00:00:45 |
    |   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
    |*  2 |   TABLE ACCESS FULL| B    | 38218 |   298K|  3693   (1)| 00:00:45 |
    ---------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - filter("B"."DT"<=TO_DATE('2009-03-01 00:00:00', 'yyyy-mm-dd
                  hh24:mi:ss') AND "B"."DT">=TO_DATE('2008-10-01 00:00:00', 'yyyy-mm-dd
                  hh24:mi:ss'))
    
    16 rows selected.
    
    SQL> connect sys as sysdba ;
    Connected.
    SQL> delete from sys.col_usage$ where obj# in (select object_id from all_objects where owner = 'HR' and object_name in ('A','B')) ;
    
    4 rows deleted.
    
    SQL> commit ;
    
    Commit complete.
    
    SQL> connect hr/hr ;
    Connected.
    SQL> set serveroutput on size 10000
    SQL> exec dbms_stats.delete_table_stats(user, 'B') ;
    
    PL/SQL procedure successfully completed.
    
    SQL> exec dbms_stats.gather_table_stats(user, 'B') ;
    
    PL/SQL procedure successfully completed.
    
    SQL> select column_name, num_distinct, num_buckets, histogram from user_tab_col_statistics where table_name = 'B' ;
    
    COLUMN_NAME                    NUM_DISTINCT NUM_BUCKETS HISTOGRAM
    ------------------------------ ------------ ----------- ---------------
    ID                                      289           1 NONE
    DT                                      289           1 NONE
    
    SQL> explain plan for select count(*) from b where b.id between 200810 and 200903 ;
    
    Explained.
    
    SQL> select * from table(dbms_xplan.display) ;
    
    PLAN_TABLE_OUTPUT
    -----------------------------------------------------------------------------------------------------------------------------------
    Plan hash value: 749587668
    
    ---------------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |     1 |     5 |  3691   (1)| 00:00:45 |
    |   1 |  SORT AGGREGATE    |      |     1 |     5 |            |          |
    |*  2 |   TABLE ACCESS FULL| B    |   110K|   541K|  3691   (1)| 00:00:45 |
    ---------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - filter("B"."ID"<=200903 AND "B"."ID">=200810)
    
    14 rows selected.
    
    SQL> explain plan for select count(*) from b where b.dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM') ;
    
    Explained.
    
    SQL> select * from table(dbms_xplan.display) ;
    
    PLAN_TABLE_OUTPUT
    -----------------------------------------------------------------------------------------------------------------------------------
    Plan hash value: 749587668
    
    ---------------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |     1 |     8 |  3693   (1)| 00:00:45 |
    |   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
    |*  2 |   TABLE ACCESS FULL| B    | 58680 |   458K|  3693   (1)| 00:00:45 |
    ---------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - filter("B"."DT"<=TO_DATE('2009-03-01 00:00:00', 'yyyy-mm-dd
                  hh24:mi:ss') AND "B"."DT">=TO_DATE('2008-10-01 00:00:00', 'yyyy-mm-dd
                  hh24:mi:ss'))
    
    16 rows selected.
    
    SQL> exec dbms_stats.gather_table_stats(user, 'B') ;
    
    PL/SQL procedure successfully completed.
    
    SQL> select column_name, num_distinct, num_buckets, histogram from user_tab_col_statistics where table_name = 'B' ;
    
    COLUMN_NAME                    NUM_DISTINCT NUM_BUCKETS HISTOGRAM
    ------------------------------ ------------ ----------- ---------------
    ID                                      289         254 HEIGHT BALANCED
    DT                                      289           1 NONE
    
    SQL> explain plan for select count(*) from b where b.id between 200810 and 200903 ;
    
    Explained.
    
    SQL> select * from table(dbms_xplan.display) ;
    
    PLAN_TABLE_OUTPUT
    -----------------------------------------------------------------------------------------------------------------------------------
    Plan hash value: 749587668
    
    ---------------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |     1 |     5 |  3690   (1)| 00:00:45 |
    |   1 |  SORT AGGREGATE    |      |     1 |     5 |            |          |
    |*  2 |   TABLE ACCESS FULL| B    | 46303 |   226K|  3690   (1)| 00:00:45 |
    ---------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - filter("B"."ID"<=200903 AND "B"."ID">=200810)
    
    14 rows selected.
    
    SQL> explain plan for select count(*) from b where b.dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM') ;
    
    Explained.
    
    SQL> select * from table(dbms_xplan.display) ;
    
    PLAN_TABLE_OUTPUT
    -----------------------------------------------------------------------------------------------------------------------------------
    Plan hash value: 749587668
    
    ---------------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |     1 |     8 |  3692   (1)| 00:00:45 |
    |   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
    |*  2 |   TABLE ACCESS FULL| B    | 56797 |   443K|  3692   (1)| 00:00:45 |
    ---------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - filter("B"."DT"<=TO_DATE('2009-03-01 00:00:00', 'yyyy-mm-dd
                  hh24:mi:ss') AND "B"."DT">=TO_DATE('2008-10-01 00:00:00', 'yyyy-mm-dd
                  hh24:mi:ss'))
    
    16 rows selected.
    
    SQL> exec dbms_stats.gather_table_stats(user, 'B') ;
    
    PL/SQL procedure successfully completed.
    
    SQL> select column_name, num_distinct, num_buckets, histogram from user_tab_col_statistics where table_name = 'B' ;
    
    COLUMN_NAME                    NUM_DISTINCT NUM_BUCKETS HISTOGRAM
    ------------------------------ ------------ ----------- ---------------
    ID                                      289         254 HEIGHT BALANCED
    DT                                      289           1 NONE
    
    SQL> exec dbms_stats.gather_table_stats(user, 'B', method_opt=>'FOR ALL COLUMNS SIZE SKEWONLY') ;
    
    PL/SQL procedure successfully completed.
    
    SQL> select column_name, num_distinct, num_buckets, histogram from user_tab_col_statistics where table_name = 'B' ;
    
    COLUMN_NAME                 NUM_DISTINCT NUM_BUCKETS HISTOGRAM
    ------------------------------ ------------ ----------- ---------------
    ID                         289         254 HEIGHT BALANCED
    DT                         289         254 HEIGHT BALANCED
    
    SQL> explain plan for select count(*) from b where b.dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM') ;
    
    Explained.
    
    SQL> select * from table(dbms_xplan.display) ;
    
    PLAN_TABLE_OUTPUT
    -----------------------------------------------------------------------------------------------------------------------------------
    Plan hash value: 749587668
    
    ---------------------------------------------------------------------------
    | Id  | Operation        | Name | Rows  | Bytes | Cost (%CPU)| Time       |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |       |     1 |     8 |  3692   (1)| 00:00:45 |
    |   1 |  SORT AGGREGATE    |       |     1 |     8 |            |       |
    |*  2 |   TABLE ACCESS FULL| B       | 27862 |   217K|  3692   (1)| 00:00:45 |
    ---------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - filter("B"."DT"<=TO_DATE('2009-03-01 00:00:00', 'yyyy-mm-dd
               hh24:mi:ss') AND "B"."DT">=TO_DATE('2008-10-01 00:00:00', 'yyyy-mm-dd
               hh24:mi:ss'))
    
    16 rows selected.
    
    SQL> explain plan for select count(*) from b where id between 200810 and 200903 ;
    
    Explained.
    
    SQL> select * from table(dbms_xplan.display) ;
    
    PLAN_TABLE_OUTPUT
    -----------------------------------------------------------------------------------------------------------------------------------
    Plan hash value: 749587668
    
    ---------------------------------------------------------------------------
    | Id  | Operation        | Name | Rows  | Bytes | Cost (%CPU)| Time       |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |       |     1 |     5 |  3690   (1)| 00:00:45 |
    |   1 |  SORT AGGREGATE    |       |     1 |     5 |            |       |
    |*  2 |   TABLE ACCESS FULL| B       | 32505 |   158K|  3690   (1)| 00:00:45 |
    ---------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - filter("ID"<=200903 AND "ID">=200810)
    
    14 rows selected.
  • 4. Re: Datatype best practice and plan cardinality
    user503699 Expert
    Currently Being Moderated
    sb92075 wrote:
    1) What should be the most appropriate datatype used to store YYYYMM data?
    DATE datatype
    In the future you will be glad you did so.
    I believe that can not be my ONLY argument. :)
    I am of the opinion that any theory should be proved, even "best practices", when it comes to Oracle.
    :)
  • 5. Re: Datatype best practice and plan cardinality
    sb92075 Guru
    Currently Being Moderated
    using DATE datatype 100% guarentees only valid date exist & precludes having data similar to 201013.
    using DATE supports native arithmetic operations
  • 6. Re: Datatype best practice and plan cardinality
    CharlesHooper Expert
    Currently Being Moderated
    I agree with sb92075's most recent comments - by putting the date type data into a DATE column, you have essentially added a constraint to the database to prevent invalid dates from being added. Additionally, date math, such as finding the number of days between 200802 and 200803 (compared to 200702 and 200703) is very simple - the answer is not 1 in both cases, but rather 29 and 28, respectively.

    Keep in mind that EXPLAIN PLAN can, and will, show an incorrect plan from time to time.

    Here is a test, using your test tables, from Oracle 10.2.0.2 with optimizer_features_enable set to 10.2.0.1:
    ALTER SESSION SET TRACEFILE_IDENTIFIER = 'DateTest';
    ALTER SESSION SET EVENTS '10053 trace name context forever, level 1';
     
    select count(*) from b where id between 200810 and 200903 ;
     
    select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM') ;
     
    ALTER SESSION SET EVENTS '10053 trace name context off';
     
    set autotrace traceonly explain
     
    select count(*) from b where id between 200810 and 200903 ;
     
    select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM') ;
     
    set autotrace off
     
    select count(*) from b where id between 200810 and 200903 ;
    The output follows:
    SQL> select count(*) from b where id between 200810 and 200903 ;
     
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 749587668
     
    ---------------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |     1 |     5 |   919  (11)| 00:00:05 |
    |   1 |  SORT AGGREGATE    |      |     1 |     5 |            |          |
    |*  2 |   TABLE ACCESS FULL| B    |   108K|   527K|   919  (11)| 00:00:05 |
    ---------------------------------------------------------------------------
     
    Predicate Information (identified by operation id):
    ---------------------------------------------------
       2 - filter("ID"<=200903 AND "ID">=200810)
     
    SQL> select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM') ;
     
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 749587668
     
    ---------------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |     1 |     8 |   926  (12)| 00:00:05 |
    |   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
    |*  2 |   TABLE ACCESS FULL| B    | 57166 |   446K|   926  (12)| 00:00:05 |
    ---------------------------------------------------------------------------
     
    Predicate Information (identified by operation id):
    ---------------------------------------------------
       2 - filter("DT"<=TO_DATE('2009-03-01 00:00:00', 'yyyy-mm-dd
                  hh24:mi:ss') AND "DT">=TO_DATE('2008-10-01 00:00:00', 'yyyy-mm-dd
                  hh24:mi:ss'))
     
    SQL> set autotrace off
     
    SQL> select count(*) from b where id between 200810 and 200903 ;
     
      COUNT(*)
    ----------
         49152
    From the 10053 trace file:
    ******************************************
    Current SQL statement for this session:
    select count(*) from b where id between 200810 and 200903 
    *******************************************
    ...
      PARAMETERS WITH ALTERED VALUES
      ******************************
      optimizer_features_enable           = 10.2.0.1
      *********************************
    ...
    BASE STATISTICAL INFORMATION
    ***********************
    Table Stats::
      Table:  B  Alias:  B
        #Rows: 2367488  #Blks:  16725  AvgRowLen:  13.00
    ***************************************
    SINGLE TABLE ACCESS PATH
      Column (#1): ID(NUMBER)
        AvgLen: 5.00 NDV: 289 Nulls: 0 Density: 0.0034602 Min: 200101 Max: 202501
      Table:  B  Alias: B     
        Card: Original: 2367488  Rounded: 108124  Computed: 108124.16  Non Adjusted: 108124.16
      Access Path: TableScan
        Cost:  918.67  Resp: 918.67  Degree: 0
          Cost_io: 819.00  Cost_cpu: 632570063
          Resp_io: 819.00  Resp_cpu: 632570063
      Best:: AccessPath: TableScan
             Cost: 918.67  Degree: 1  Resp: 918.67  Card: 108124.16  Bytes: 0
    ***************************************
    ...
    ...
    ...
    ******************************************
    Current SQL statement for this session:
    select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM') 
    *******************************************
    ...
      *************************************
      PARAMETERS WITH ALTERED VALUES
      ******************************
      optimizer_features_enable           = 10.2.0.1
      *********************************
    ...
    BASE STATISTICAL INFORMATION
    ***********************
    Table Stats::
      Table:  B  Alias:  B
        #Rows: 2367488  #Blks:  16725  AvgRowLen:  13.00
    ***************************************
    SINGLE TABLE ACCESS PATH
      Column (#2): DT(DATE)
        AvgLen: 8.00 NDV: 289 Nulls: 0 Density: 0.0034602 Min: 2451911 Max: 2460677
      Table:  B  Alias: B     
        Card: Original: 2367488  Rounded: 57166  Computed: 57165.51  Non Adjusted: 57165.51
      Access Path: TableScan
        Cost:  926.24  Resp: 926.24  Degree: 0
          Cost_io: 819.00  Cost_cpu: 680617902
          Resp_io: 819.00  Resp_cpu: 680617902
      Best:: AccessPath: TableScan
             Cost: 926.24  Degree: 1  Resp: 926.24  Card: 57165.51  Bytes: 0
    From the above, it seems that Oracle 10.2.0.2 produces roughly the same results as 11.1.0.7. Note also that there are no histograms created, based on the 10053 trace file.

    Now, a second test:
    exec dbms_stats.gather_table_stats(user, 'B', estimate_percent=>NULL,method_opt=>'FOR ALL COLUMNS SIZE 254',no_invalidate=>false) ;
     
    ALTER SESSION SET TRACEFILE_IDENTIFIER = 'DateTest2';
    ALTER SESSION SET EVENTS '10053 trace name context forever, level 1';
     
    select count(*) from b where id between 200810 and 200903 ;
     
    select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM') ;
     
    ALTER SESSION SET EVENTS '10053 trace name context off';
     
    set autotrace traceonly explain
     
    select count(*) from b where id between 200810 and 200903 ;
     
    select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM') ;
     
    set autotrace off
    The output of the above follows:
    SQL> select count(*) from b where id between 200810 and 200903 ;
    
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 749587668
    
    ---------------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |     1 |     5 |   919  (11)| 00:00:05 |
    |   1 |  SORT AGGREGATE    |      |     1 |     5 |            |          |
    |*  2 |   TABLE ACCESS FULL| B    | 46604 |   227K|   919  (11)| 00:00:05 |
    ---------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
       2 - filter("ID"<=200903 AND "ID">=200810)
    
    SQL>
    SQL> select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_da
    te(200903, 'YYYYMM') ;
    
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 749587668
    
    ---------------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |     1 |     8 |   926  (12)| 00:00:05 |
    |   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
    |*  2 |   TABLE ACCESS FULL| B    | 46604 |   364K|   926  (12)| 00:00:05 |
    ---------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
       2 - filter("DT"<=TO_DATE('2009-03-01 00:00:00', 'yyyy-mm-dd
                  hh24:mi:ss') AND "DT">=TO_DATE('2008-10-01 00:00:00', 'yyyy-mm-dd
                  hh24:mi:ss'))
    Note that in both cases the optimizer estimates that 46,604 rows will be returned. From the 10053 trace file:
    ******************************************
    Current SQL statement for this session:
    select count(*) from b where id between 200810 and 200903 
    *******************************************
    ...
    BASE STATISTICAL INFORMATION
    ***********************
    Table Stats::
      Table:  B  Alias:  B
        #Rows: 2367488  #Blks:  16725  AvgRowLen:  13.00
    ***************************************
    SINGLE TABLE ACCESS PATH
      Column (#1): ID(NUMBER)
        AvgLen: 5.00 NDV: 289 Nulls: 0 Density: 0.0034602 Min: 200101 Max: 202501
        Histogram: HtBal  #Bkts: 254  UncompBkts: 254  EndPtVals: 255
      Table:  B  Alias: B     
        Card: Original: 2367488  Rounded: 46604  Computed: 46604.09  Non Adjusted: 46604.09
      Access Path: TableScan
        Cost:  918.76  Resp: 918.76  Degree: 0
          Cost_io: 819.00  Cost_cpu: 633149246
          Resp_io: 819.00  Resp_cpu: 633149246
      Best:: AccessPath: TableScan
             Cost: 918.76  Degree: 1  Resp: 918.76  Card: 46604.09  Bytes: 0
    ...
    ...
    ...
    ******************************************
    Current SQL statement for this session:
    select count(*) from b where dt between to_date(200810, 'YYYYMM') and to_date(200903, 'YYYYMM') 
    *******************************************
    ...
    BASE STATISTICAL INFORMATION
    ***********************
    Table Stats::
      Table:  B  Alias:  B
        #Rows: 2367488  #Blks:  16725  AvgRowLen:  13.00
    ***************************************
    SINGLE TABLE ACCESS PATH
      Column (#2): DT(DATE)
        AvgLen: 8.00 NDV: 289 Nulls: 0 Density: 0.0034602 Min: 2451911 Max: 2460677
        Histogram: HtBal  #Bkts: 254  UncompBkts: 254  EndPtVals: 255
      Table:  B  Alias: B     
        Card: Original: 2367488  Rounded: 46604  Computed: 46604.09  Non Adjusted: 46604.09
      Access Path: TableScan
        Cost:  926.22  Resp: 926.22  Degree: 0
          Cost_io: 819.00  Cost_cpu: 680499006
          Resp_io: 819.00  Resp_cpu: 680499006
      Best:: AccessPath: TableScan
             Cost: 926.22  Degree: 1  Resp: 926.22  Card: 46604.09  Bytes: 0
    Both columns now have a height balanced histogram with 254 buckets. The accuracy of the estimates may be very different as the number of distinct values increases above 289.

    Charles Hooper
    Co-author of "Expert Oracle Practices: Oracle Database Administration from the Oak Table"
    http://hoopercharles.wordpress.com/
    IT Manager/Oracle DBA
    K&M Machine-Fabricating, Inc.
  • 7. Re: Datatype best practice and plan cardinality
    user503699 Expert
    Currently Being Moderated
    Charles / sb92075,

    I have no doubts in my mind about the benifits of using the "most appropriate" data types but...
    There is always a BUT. :)
    My experience is....when the debate includes "performance" (or other similar "magical" terms like Scalability, response time, storage efficiency etc.), I wanted to be sure that what I propose is not only logically correct but also does not compromise on "performance".
    The (sad) fact is I would have been able to taken the same stand as you guys (and I still do personally) if only I had been an "IT Manager" or a "DBA" or a "Solution Architect" or a "Data Architect". However, it may take a lot more than just stating "best practices" to convince your audience if your designation is relatively insignificant like "Analyst programmer". :)
    In this particular example, the cardinality variance is not preventing optimizer to choose "right" plan to execute the query. Hence, in theory, (possibly) no performance argument comes into picture. But then, this was just a test case. All I want to do is when I put forward my argument (of using DATE datatype instead of NUMBER), I do not want to end up in a situation where I focus on something like the constraint advantage (suggested by sb92075) and somebody (with above mentioned higher designations) countering it with "performance" card (How often do people say "We can always manage constraining the data going into database from outside the database" ? :) )

    I want to understand if this behaviour is due to some valid reasons (like presense or absense of histograms, usage of table data etc.) or due to "optimizer limitations" (a glorified term for bug :) ) in the particular version. The issue does not appear to be deterministic in 10.2.0.1 (but fairly reproducible and deterministic on 10.2.0.4 and 11.1.0.7 as Charles showed).

    Thanks again for your responses.
  • 8. Re: Datatype best practice and plan cardinality
    744334 Newbie
    Currently Being Moderated
    I think you can use INTERVAL YEAR TO MONTH data type and function based index can be used to convert those types to numeric during the SQL query. Improving performance with NUMERIC is true in most cases but does not accountable for date processing

    Edited by: user8939723 on Jan 5, 2010 3:35 AM

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points