This site is currently read-only as we are migrating to Oracle Forums for an improved community experience. You will not be able to initiate activity until January 30th, when you will be able to use this site as normal.

    Forum Stats

  • 3,889,831 Users
  • 2,269,775 Discussions
  • 7,916,823 Comments

Discussions

Partition by combination of several (correlated) date columns.

user3897193
user3897193 Member Posts: 16 Blue Ribbon
edited Dec 10, 2017 6:50AM in Database Ideas - Ideas

Many systems have several date columns which could be used as partitioning base.
E.g. in ordering system there is ORDER_DATE, LOADING_DATE, DESIRED_DELIVERY_DATE, DELIVERY_DATE
or in banking system GENERAL_LEDGER_DATE, TRANSACTION_DATE, INTEREST_DATE.
Typically these dates  are correlated and they do not differ a lot. However there are exceptionally cases and
one cannot make assumptions that they do not differ more than a week etc.

Also there are quite a lot applications where rows have BEGIN_DATE, END_DATE combination.
These also reguire some feasible partitioning method.

Typically queries are referencing to recent data. E.g.
where ORDER_DATE >= trunc(sysdate, 'MM').


Current (tested on Oracle 12.1.0.2) Oracle version does support some interesting partitioning methods.
E.g.
DROP TABLE LH_D;
CREATE TABLE LH_D
(
  KEY_COLUMN             NUMBER(9)              NOT NULL,
  BEGIN_DATE             DATE                   NOT NULL,
  END_DATE               DATE,
  INSERTED               date           NOT NULL,
  UPDATED                date,
  VALUE_COLUMN           CHAR(100 CHAR),
  MAX_PARTITIONING_DATE  date INVISIBLE GENERATED ALWAYS AS (GREATEST("INSERTED",CAST("BEGIN_DATE" AS TIMESTAMP),CAST(COALESCE("END_DATE","BEGIN_DATE") AS TIMESTAMP),COALESCE("UPDATED","INSERTED"))) NOT NULL
)
NOCOMPRESS
TABLESPACE USERS
PCTUSED    0
PCTFREE    10
INITRANS   1
MAXTRANS   255
STORAGE    (
            BUFFER_POOL      DEFAULT
           )
PARTITION BY RANGE (MAX_PARTITIONING_DATE)
INTERVAL( NUMTOYMINTERVAL(1,'MONTH'))

  PARTITION DEFAULT_PARTITION VALUES LESS THAN (TIMESTAMP' 2010-01-01 00:00:00')
);

declare
TYPE ARRAY IS TABLE OF lh_d%ROWTYPE;
l_data ARRAY;
w_array_size pls_integer := 20000;
cursor c is
select 
     level,
     sysdate + level,
     sysdate + level + 50,
     sysdate,
     null,
     'something'
from
                             dual connect by level < 100;
begin
    open c;
    loop
        fetch c bulk collect into l_data limit w_array_size;
        forall i in 1..l_data.count
            insert into lh_d values l_data(i);
        exit when c%notfound;
    end loop;
    close c;
end;
/


                            
ALTER TABLE LH_D ADD (
  CONSTRAINT BEGIN_MAX_D
  CHECK (begin_date <= max_partitioning_date)
  ENABLE VALIDATE,
  CONSTRAINT END_MAX_D
  CHECK (end_date <= max_partitioning_date)
  ENABLE VALIDATE,
-- CONSTRAINT inserted_updated_MAX_D
-- CHECK (inserted <= updated) -- beware of summertime
-- ENABLE VALIDATE,
  CONSTRAINT INSERTED_MAX_D
  CHECK (inserted <= max_partitioning_date)
  ENABLE VALIDATE,
  CONSTRAINT UPDATED_MAX_D
  CHECK (updated <= max_partitioning_date)
  ENABLE VALIDATE);

Select * from LH_D where inserted >= trunc(sysdate) + 50;
select * from table(dbms_xplan.display_cursor);
SQL_ID  0jdnuvdtwuyr7, child number 0
-------------------------------------
Select * from LH_D where inserted >= trunc(sysdate) + 50

Plan hash value: 1161958654

-------------------------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |       |       |   211 (100)|          |       |       |
|   1 |  PARTITION RANGE ITERATOR|      |     1 |   460 |   211   (1)| 00:00:01 |   KEY |1048575|
|*  2 |   TABLE ACCESS FULL      | LH_D |     1 |   460 |   211   (1)| 00:00:01 |   KEY |1048575|
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("INSERTED">=TRUNC([email protected]!)+50 AND
              GREATEST("INSERTED",CAST(INTERNAL_FUNCTION("BEGIN_DATE") AS
              TIMESTAMP),CAST(COALESCE("END_DATE","BEGIN_DATE") AS
              TIMESTAMP),COALESCE("UPDATED","INSERTED"))>=TRUNC([email protected]!)+50))

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold

select * from LH_D where end_date >= trunc(sysdate) + 50;
select * from table(dbms_xplan.display_cursor);
SQL_ID  bh60cr1tdm2yp, child number 0
-------------------------------------
select * from LH_D where end_date >= trunc(sysdate) + 50

Plan hash value: 1161958654

-------------------------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |       |       |   211 (100)|          |       |       |
|   1 |  PARTITION RANGE ITERATOR|      |     1 |   460 |   211   (1)| 00:00:01 |   KEY |1048575|
|*  2 |   TABLE ACCESS FULL      | LH_D |     1 |   460 |   211   (1)| 00:00:01 |   KEY |1048575|
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("END_DATE">=TRUNC([email protected]!)+50 AND
              GREATEST("INSERTED",CAST(INTERNAL_FUNCTION("BEGIN_DATE") AS
              TIMESTAMP),CAST(COALESCE("END_DATE","BEGIN_DATE") AS
              TIMESTAMP),COALESCE("UPDATED","INSERTED"))>=TRUNC([email protected]!)+50))

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold


So Oracle does partition pruning with two separate columns without any changes to sql-statements.



However there is bug and  missing functionalities which do prevent practical utilization of this technique.

Enhancements request is about correcting these bugs and making enhancements to allow partitioning by several (correlated) date and timestamp columns.

Here are listed some currently known problems.

'array insert does not work with constraints and interval partitioning'

drop table lh_d;
CREATE TABLE LH_D
(
  KEY_COLUMN             NUMBER(9)              NOT NULL,
  BEGIN_DATE             DATE                   NOT NULL,
  END_DATE               DATE,
  INSERTED               date           NOT NULL,
  UPDATED                date,
  VALUE_COLUMN           CHAR(100 CHAR),
  MAX_PARTITIONING_DATE  date INVISIBLE GENERATED ALWAYS AS (GREATEST("INSERTED",CAST("BEGIN_DATE" AS TIMESTAMP),CAST(COALESCE("END_DATE","BEGIN_DATE") AS TIMESTAMP),COALESCE("UPDATED","INSERTED"))) NOT NULL
)
NOCOMPRESS
TABLESPACE USERS
PCTUSED    0
PCTFREE    10
INITRANS   1
MAXTRANS   255
STORAGE    (
            BUFFER_POOL      DEFAULT
           )
PARTITION BY RANGE (MAX_PARTITIONING_DATE)
INTERVAL( NUMTOYMINTERVAL(1,'MONTH'))

  PARTITION DEFAULT_PARTITION VALUES LESS THAN (TIMESTAMP' 2010-01-01 00:00:00')
);

                            
ALTER TABLE LH_D ADD (
  CONSTRAINT BEGIN_MAX_D
  CHECK (begin_date <= max_partitioning_date)
  ENABLE VALIDATE,
  CONSTRAINT END_MAX_D
  CHECK (end_date <= max_partitioning_date)
  ENABLE VALIDATE,
-- CONSTRAINT inserted_updated_MAX_D
-- CHECK (inserted <= updated) -- beware of summertime...
-- ENABLE VALIDATE,
  CONSTRAINT INSERTED_MAX_D
  CHECK (inserted <= max_partitioning_date)
  ENABLE VALIDATE,
  CONSTRAINT UPDATED_MAX_D
  CHECK (updated <= max_partitioning_date)
  ENABLE VALIDATE);


declare
TYPE ARRAY IS TABLE OF lh_d%ROWTYPE;
l_data ARRAY;
w_array_size pls_integer := 20000;
cursor c is
select 
     level,
     sysdate + level,
     sysdate + level + 50,
     sysdate,
     null,
     'something'
from
                             dual connect by level < 100;
begin
    open c;
    loop
        fetch c bulk collect into l_data limit w_array_size;
        forall i in 1..l_data.count
            insert into lh_d values l_data(i);
        exit when c%notfound;
    end loop;
    close c;
end;
/

ORA-02290: check constraint (END_MAX_D) violated
ORA-06512: at line 19

This partitioning technique would be usefull in datawarehouse systems where interval partitioning and array inserts would be used.

One could however choose not to use constraints and add criterias from constraints into where clauses but then
changing existing systems whould require sql-statement changes.


'Partition pruning is not occuring where parameters are used.'
declare
cursor c1(p_date date) is
select /*+ as_parameter*/ * from LH_D where inserted >= p_date;

w_date date := trunc(sysdate) + 50;
w_row c1%rowtype;
begin
     open c1(w_date);
     loop
         fetch c1 into w_row;
         exit when c1%notfound;
     end loop;
end;
/

select sql_id, sql_text from v$sql where sql_text like '%/*+ as_parameter*/%';
select * from table(dbms_xplan.display_cursor('152vpkutv2fss'));
SQL_ID  152vpkutv2fss, child number 0
-------------------------------------
SELECT /*+ as_parameter*/ * FROM LH_D WHERE INSERTED >= :B1

Plan hash value: 2756421308

--------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       |    43 (100)|          |       |       |
|   1 |  PARTITION RANGE ALL|      |     1 |   451 |    43   (0)| 00:00:01 |     1 |1048575|
|*  2 |   TABLE ACCESS FULL | LH_D |     1 |   451 |    43   (0)| 00:00:01 |     1 |1048575|
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("INSERTED">=:B1)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold

'partition pruning is not working when mixed date and timestamp columns are involved'
drop table lh_ts;
CREATE TABLE LH_TS
(
  KEY_COLUMN             NUMBER(9)              NOT NULL,
  BEGIN_DATE             DATE                   NOT NULL,
  END_DATE               DATE,
  INSERTED               TIMESTAMP(6)           NOT NULL,
  UPDATED                TIMESTAMP(6),
  VALUE_COLUMN           CHAR(100 CHAR),
  MAX_PARTITIONING_DATE  TIMESTAMP(6) INVISIBLE GENERATED ALWAYS AS (GREATEST("INSERTED",CAST("BEGIN_DATE" AS TIMESTAMP),CAST(COALESCE("END_DATE","BEGIN_DATE") AS TIMESTAMP),COALESCE("UPDATED","INSERTED"))) NOT NULL
)
NOCOMPRESS
TABLESPACE USERS
PCTUSED    0
PCTFREE    10
INITRANS   1
MAXTRANS   255
STORAGE    (
            BUFFER_POOL      DEFAULT
           )
PARTITION BY RANGE (MAX_PARTITIONING_DATE)
INTERVAL( NUMTOYMINTERVAL(1,'MONTH'))

  PARTITION DEFAULT_PARTITION VALUES LESS THAN (TIMESTAMP' 2010-01-01 00:00:00')
)
enable row movement;


declare
TYPE ARRAY IS TABLE OF lh_ts%ROWTYPE;
l_data ARRAY;
w_array_size pls_integer := 20000;
cursor c is
select 
     level,
     sysdate + level,
     sysdate + level + 50,
     sysdate,
     null,
     'something'
from
                             dual connect by level < 100;
begin
    open c;
    loop
        fetch c bulk collect into l_data limit w_array_size;
        forall i in 1..l_data.count
            insert into lh_ts values l_data(i);
        exit when c%notfound;
    end loop;
    close c;
end;
/


                            
ALTER TABLE LH_ts ADD (
  CONSTRAINT BEGIN_MAX_ts
  CHECK (begin_date <= max_partitioning_date)
  ENABLE VALIDATE,
  CONSTRAINT END_MAX_ts
  CHECK (end_date <= max_partitioning_date)
  ENABLE VALIDATE,
-- CONSTRAINT inserted_updated_MAX_ts
-- CHECK (inserted <= updated) -- beware of summertime...
-- ENABLE VALIDATE,
  CONSTRAINT INSERTED_MAX_ts
  CHECK (inserted <= max_partitioning_date)
  ENABLE VALIDATE,
  CONSTRAINT UPDATED_MAX_ts
  CHECK (updated <= max_partitioning_date)
  ENABLE VALIDATE);
 
select * from LH_ts where end_date >= trunc(sysdate) + 50;
select * from table(dbms_xplan.display_cursor);
-------------------------------------
SQL_ID  gccbddhubqapn, child number 0
-------------------------------------
select * from LH_ts where end_date >= trunc(sysdate) + 50

Plan hash value: 4152623163

---------------------------------------------------------------------------------------------
| Id  | Operation           | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |       |       |       |   211 (100)|          |       |       |
|   1 |  PARTITION RANGE ALL|       |     1 |   459 |   211   (1)| 00:00:01 |     1 |1048575|
|*  2 |   TABLE ACCESS FULL | LH_TS |     1 |   459 |   211   (1)| 00:00:01 |     1 |1048575|
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("END_DATE">=TRUNC([email protected]!)+50)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold

One could arque that use of this method is preventing effective way to fetch data from some specific period from history.
This is so, if period can be defined with partitioning column. However
- vast majority of queries do concentrate on recent data
- this enables use of several different data columns
- if oracle in sometimes would enable subpartitioning by interval ranges, then creating subpartitions by MIN_PARTITIONING_DATE, would effectively help to this.

user3897193
4 votes

Active · Last Updated

Comments

  • Mike Kutz
    Mike Kutz Member Posts: 6,317 Gold Crown

    I believe Oracle is [not] doing something that you assume it is[n't]

    Query 1 vs Query 2

    Because you are using a static value, Oracle can assume:

    greatest( "INSERTED",.....,... ) >= "INSERTED" >= (static value)

    Which becomes

    greatest( "INSERTED", ...,... ) >= (static value) AND "INSERTED" >= (static value)

    When you use a BIND variable, ORACLE can not make that assumption because the plan has to handle cases where the BIND variable IS NULL.

    Your assumption that "Query 1 and Query 2 should produce identical PLANS" is, therefore, incorrect.

    Query 1 vs Query 3

    From the "5 steps of the scientific process", you have adjusted too many variables and you have ran too few tests to make the conclusion you have stated.

    You need to

    • Rerun Query 1 with the same WHERE clause as Query 3
    • Rerun Query 3 with the same WHERE clause as Query 1

    Additionally, you should rerun Query 1 and Query 3 with all of the other DATE/TIMESTAMP columns.

    Other

    Your GREATEST() contains CAST(COALESCE("END_DATE","BEGIN_DATE") as timestamp)

    I suspect that the ability to have rows where END_DATE IS NULL is preventing the usage of partition pruning.

    This could be true for all columns that are NULLABLE.

    I suggest you post this in the   section so that you can get help finding out why Oracle is doing what it is doing.

  • William Robertson
    William Robertson London, UKMember Posts: 9,579 Bronze Crown

    I don't see any suggestion for new functionality here. We can already partition by multiple columns, correlated or otherwise.

  • user3897193
    user3897193 Member Posts: 16 Blue Ribbon

    I don't see any suggestion for new functionality here. We can already partition by multiple columns, correlated or otherwise.

    Hi

    Oracle is currently implementing this functionality in some limited cases. However Oracle is so complicated system, that all features do not work well with each other.

    Effectively using this idea is almost impossible because these limitations. These limitations have been noticed also in other contexts

    (e.g.

    Bug 12646509 : VIRTUAL COLUMN DEFINITION DOES NOT GENERATE TRANSITIVE PREDICATE

    ENH 3761303 - PARTITION PRUNING SHOULD OCCUR WHEN TIMESTAMP IS USED FOR A PARTITION KEY

    ENH 22487986 - SUPPORT PARTITION PRUNING CAPABILITIES ON PREDICATES ON F(PARTITION_KEY_COL)

    ).

    Currently one cannot use interval partitioning by several date columns.

    This technique could enable it.