Forum Stats

  • 3,733,808 Users
  • 2,246,823 Discussions
  • 7,856,882 Comments

Discussions

Partition Exchange strange behaviour - including indexes vs excluding indexes

jhall_uk_redux
jhall_uk_redux Member Posts: 133 Blue Ribbon
edited February 2016 in General Database Discussions

Good morning,

We are using Oracle Database 11.2.0.3 running on Oracle Linux in production and 11.2.0.4 in development

We have a data warehouse that has been running in production for approximately a year. We are using partition exchange extensively and it has been on the whole very reliable.

Then a few weeks ago we have had an ETL failure which stopped the ETL logic part way through the preparation activities for partition exchange, i.e., the data had been prepared and staged into our exchange table and the primary key recreated in 'disable validate' mode and all the bitmap join indexes rebuilt after previously being marked unusable prior to data staging. But the outage occurred just before the next step to do partition exchange.

The team restarted the job as this would normally just reload the data into to the exchange table but it failed with:

SQL Error: ORA-14098: index mismatch for tables in ALTER TABLE EXCHANGE PARTITION

14098. 00000 -  "index mismatch for tables in ALTER TABLE EXCHANGE PARTITION"

*Cause:    The two tables specified in the EXCHANGE have indexes which are

           not equivalent

*Action:   Ensure that the indexes for the two tables have indexes which

           follow this rule

           For every non partitioned index for the non partitioned table,

           there has to be an identical LOCAL index on the partitioned

           table and vice versa. By identical, the column position, type

           and size have to be the same.

Our partition exchange syntax is as follows:

alter table [schema].[exchange table]

exchange partition [partition] with table [schema].[partitioned table]

including indexes

without validation

update global indexes;

A pretty standard error when indexes are not in alignment between the two tables and usually a 2 min fix. However, our indexes are valid and the constraints are as they should be - everything looks ok.

Manually, we can get an alternative version of the exchange partition syntax to work fine - this very strange - either the structure of the exchange and partitioned table are in alignment or they are not!

alter table [schema].[exchange table]

exchange partition [partition] with table [schema].[partitioned table]

excluding indexes

without validation

update global indexes;

which we can then finish off the equivalent of the ETL job by rebuilding the index partitions and the global primary key index in the target table.

Why would the SQL using "including indexes" fail, but the SQL using "excluding indexes" succeed?

Following on from using "excluding indexes" we thought something may have not been synchronised properly in the dictionary and then repeated the partition exchange process again from the beginning - stage, validate, rebuild indexes/PK constraint etc and the original syntax consistently fails to work. No matter how many times I repeat the process "include indexes" fails and "exclude indexes" succeeds. No other table in our prod environment is affected - they all use the same approach to partition exchange.

We even have a two schemas with identical copies of the table structures - as we always load into one whilst reporting against another, using synonyms to redirect the report queries. So, identical code runs against both schemas. The code still works using "including indexes" against the other schema copy.

I have not tried dropping all the local indexes and rebuilding them completely as I wanted to understand what has gone wrong first in case it is a known issue that could affect any of our partitioned tables at any time we have another ETL failure or outage.

I have not considered modifying the code to use "excluding indexes" followed by a index rebuild as I want some evidence first about the cause. if I change the syntax for one table, then logic means I should modify the ETL for every partitioned table to follow the same approach. Anyway, the syntax we have been using has worked for 14 months without issue and now only affects one table following an failure.

I have reproduced the issue by failing the ETL at the same point in our DEV environment (which has been patched to 11.2.0.4), but if I create a standalone test case schema using the DDL from the dictionary to recreate the affected tables then I cannot reproduce the issues - "including indexes" works fine!

This is very strange - does anyone have any ideas?

cheers,

John

«1

Answers

  • Martin Preiss
    Martin Preiss Member Posts: 2,381 Gold Trophy
    edited January 2016

    though the error message is not exactly the same I could imagine that there is a problem with different hakan factors: https://oracleoddsandends.wordpress.com/2012/08/26/bitmap-indexes-and-the-hakan-factor/.

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,607 Gold Crown
    edited January 2016

    I think I've seen this once before when the build table was created (for some ad hoc accidental reason) with a column of type INTEGER when the main table had type NUMBER; using describe these showed up as NUMBER and NUMBER(38) respectively but it took some time for anyone to notice that these were actually different.

    Regards

    Jonathan Lewis

    UPDATE:  I've just run up a little test - the error was 14097:

    SQL> desc t1

    Name                                                                     Null?    Type

    ------------------------------------------------------------------------ -------- -------------------------------------------------

    ID                                                                       NOT NULL NUMBER(38)

    GRP                                                                               NUMBER

    SMALL_VC                                                                          VARCHAR2(40)

    PADDING                                                                           VARCHAR2(100)

    SQL> desc pt_range

    Name                                                                     Null?    Type

    ------------------------------------------------------------------------ -------- -------------------------------------------------

    ID                                                                       NOT NULL NUMBER(38)

    GRP                                                                               NUMBER

    SMALL_VC                                                                          VARCHAR2(40)

    PADDING                                                                           VARCHAR2(100)

    SQL> alter table pt_range exchange partition p200 with table t1

      2  including indexes with validation

      3  ;

    alter table pt_range exchange partition p200 with table t1

    *

    ERROR at line 1:

    ORA-14097: column type or size mismatch in ALTER TABLE EXCHANGE PARTITION

  • jhall_uk_redux
    jhall_uk_redux Member Posts: 133 Blue Ribbon
    edited January 2016

    Hi Martin,

    Thanks for the reply - yes, I thought of the Hakan factors as I've come across this before with CTAS operations being used as a shortcut to create the exchange table, thus making the Hakan values different - in this case they have identical values.  A very useful suggestion though as many people are not aware of it and it will cause the same error as described above.

    cheers,

    John

  • jhall_uk_redux
    jhall_uk_redux Member Posts: 133 Blue Ribbon
    edited January 2016

    Thanks Jonathan too to take the time to reply. In this case, the tables seem to be identical. I'm loathe to do it, but I may drop and recreate the indexes and constraints to see if that resolve the problem. Something corrupted somewhere perhaps - just weird that it has happened in both DEV (11.2.0.4) and PROD (11.2.0.3) and all was ok for 14 months with no structural changes to the table taking place.

    cheers,

    John

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,607 Gold Crown
    edited January 2016

    Check the constraint and index definitions - are you using deferrable constraints with non-unique indexes; I may have emulated your problem:

    Regards

    Jonathan Lewis

    Update: No, my error, I got an ORA-14098 but in the course of my tests I had lost a "local" so I was trying to exchange a base-table index with a global index

  • jhall_uk_redux
    jhall_uk_redux Member Posts: 133 Blue Ribbon
    edited January 2016

    Thanks again, Jonathan.

    My primary key constraints are as follows:

    [schema] PK_EXCH_[TABLE]     P EXCH_[TABLE]          DISABLED NOT DEFERRABLE IMMEDIATE VALIDATED USER NAME

    [schema] PK_[TABLE]                 P FCT_[TABLE]             ENABLED  NOT DEFERRABLE IMMEDIATE VALIDATED USER NAME

    No unique index on exchange table and global index on partitioned table as partition key different to PK column.

    No deferred indexes.

    All indexes are VALID

    PK index on partitioned table is unique

    All foreign  key columns in both exchange and partitioned table have the same number of referential integrity constraints - they are all have the same properties: ENABLED NOT DEFERRABLE IMMEDIATE NOT VALIDATED

    Both exchange table and partitioned table contain 33 bitmap join indexes to parent dimension tables - 9 dimension tables in total

    - on partitioned table they are obviously LOCAL, and ALL_IND_PARTITIONS show all status USABLE.

    - on exchange table ALL_INDEXES show the status as VALID

    cheers,

    John

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,607 Gold Crown
    edited January 2016

    When you create a bitmap join indexes it is supported by an invisible global temporary table owned by SYS.

    I don't know the internal details of how the tables, indexes, and these GTTs are connected, but possibly some data dictionary information about WHICH gtt is supposed to go with which index has become corrupted. The only thing I can think of checking that you might have missed so far is user_join_ind_columns.

    If you do end up dropping and recreating all the bitmap join indexes, you could try doing the exchange after each matching pair of indexes to see if the problem is related to just one of them.  You could also start by attemptint the exchange with tracing enabled at level 4 so that you can see where in the process the error occurs as this might give you a better clue of why the process is failing.

    One last thought - a few years ago I wrecked one of my test databases because I was cleaning out garbage from the SYS schema. At the time, these GTTs - which are named L${number} - didn't get dropped when the bitmap join index was dropped, so I had about 30 of them in the SYS schema which I dropped, without realising I hadn't dropped one of the bitmap join indexes. After that I couldn't do  anything with the containing schema, and ended up creating a new database. So if you get the faintest hint that the problem is with Oracle's attempt to exchange the bitmap join indexes you might want to raise an SR before you try any drops.

    Regards

    Jonathan Lewis

  • Unknown
    edited January 2016
    When you create a bitmap join indexes it is supported by an invisible global temporary table owned by SYS.
    I don't know the internal details of how the tables, indexes, and these GTTs are connected, but possibly some data dictionary information about WHICH gtt is supposed to go with which index has become corrupted.
    

    Would that GTT be partitioned if the bitmap join indexes were local on a partitioned table? Or if the 'joined' table was also partitioned?

    So if you get the faintest hint that the problem is with Oracle's attempt to exchange the bitmap join indexes you might want to raise an SR before you try any drops.
    

    My 'guess' is OPs issue IS that 'hint' - maybe there was some overlap between the rebuilding of the indexes on the main table and the exchange process causing different data_object_ids to be used for some of the partitions.

  • Unknown
    edited January 2016
    Then a few weeks ago we have had an ETL failure which stopped the ETL logic part way through the preparation activities for partition exchange, i.e., the data had been prepared and staged into our exchange table and the primary key recreated in 'disable validate' mode and all the bitmap join indexes rebuilt after previously being marked unusable prior to data staging.
    

    Ok - can you be more precise about the EXACT order each of those steps were done - I'm not clear on just which tables you are referring to above:

    1. What table's primary key was 'recreated' and why? The 'main' table? Or the work table used for the exchange?

    2. What table's bitmap join indexes were rebuilt? The 'main' table? Or the work table? Why were they marked 'unusable'? What 'data staging' are you talking about?

    In particular were ANY of the 'dimension' tables (i.e. the tables used in the bitmap joins) also involved in partition exchange?

    If so is it possible that any of those dimension table exchanges overlapped with exchanges of any of the fact tables that depended on those bitmap join indexes?

    Both exchange table and partitioned table contain 33 bitmap join indexes to parent dimension tables - 9 dimension tables in total
    

    That statement is what makes me think there could be some overlap between the processing of dimension tables and the work tables.

    1. a dimension table/partition was referenced by one work table and an exchange done

    2. the dimension table itself then underwent an operation - perhaps it was itself involved in an exchange operation

    3. that same dimension table/partition was then referenced (via a join index) by another work table being exchanged.

    The above may not describe the concern sufficiently but the premise is this:

    Each segment has a unique DATA_OBJECT_ID. That value changed for one of those dimension tables sometime during the entire process. That caused a later exchange (the one that failed) to reference the WRONG value for the dimension table segment - either because it didn't exist anymore at all or, more likely because there is more than one value for that id now for the dimension table.

    So in step #1 above one DATA_OBJECT_ID value was used but in step #3 what should have been a reference to that same value was really different causing Oracle to detect a mismatch.

    The short version? I think the problem may be due to an overlap due to the relative position/timing of the processes you are using.

    It is commonly known that PARENT/CHILD info needs to be processed in the correct order. The use of bitmap join indexes has some possible dependencies also. Especially if the target table of the join (dimension table) is partitioned and is itself being exchanged as part of updating.

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,607 Gold Crown
    edited January 2016

    Just checked, there are two GTTs created with the bitmap join index - but they're used during data maintenance to hold intermediate results; as such they don't need to be partitioned in anyway.  My thought was simply that somewhere in the infrastructure there has to be something that associates L$ tables with real tables - but I really don't have any idea about how.

    Just one extra detail, though - I've just created a local bitmap join index on a partitioned table with dimension and got GTTs L$108 and L$109; then I create the corresponding bitmap join index on a simple heap table with the same dimension (i.e. getting ready for the exchange) - and L$109 disappeared, while L$110 and L$111 appeared.

    L$108 and L$110 have the same structure; L$111 has a different structure that matches the structure that L$109 had - so I'm guessing that there's a moment when L$111 has to take on the role of L$109, and maybe something broke at that critical point.

    Regards

    Jonathan Lewis

  • Unknown
    edited February 2016
    Just checked, there are two GTTs created with the bitmap join index - but they're used during data maintenance to hold intermediate results; as such they don't need to be partitioned in anyway.  My thought was simply that somewhere in the infrastructure there has to be something that associates L$ tables with real tables - but I really don't have any idea about how.
    
    

    Interesting - for years I was never able to figure out what these two tables were for and now this one 'oddball' thread may have given us the answer. Might make for an interesting blog.

    Check out the SYS.JIJOIN$ table

    select obj#,
    (select o.object_name from dba_objects o
         where o.object_id = j.obj#) obj_name,
      tab1obj#,
    (select o.object_name from dba_objects o
         where o.object_id = j.tab1obj#) tab1obj_name,
      tab2obj#,
    (select o.object_name from dba_objects o
         where o.object_id = j.tab2obj#) tab2obj_name
    from sys.jijoin$ j
    
    OBJ#,OBJ_NAME,TAB1OBJ#,TAB1OBJ_NAME,TAB2OBJ#,TAB2OBJ_NAME
    155099,MGR_NAME_BIT_JOIN_NDX,87814,EMP_TEMP,73194,DEPT
    155103,MGR_NAME_BIT_JOIN_NDX1,151402,EMP_SAVE,73194,DEPT
    
    

    That table has rows that reference the tables I used to replicate what you did. The EMP_TEMP table is partitioned, he EMP_SAVE is the non-partitioned work table and DEPT is the standard DEPT.

    There is a row for each of the bitmap join indexes.

    Then check the JIREFRESHSQL$ table - it has the code being used for the maintenance/query

    select iobj#, tobj#, sqltext
    from sys.jirefreshsql$
    
    IOBJ#,TOBJ#,SQLTEXT
    155099,87814,UPD_JOININDEX "SCOTT"."MGR_NAME_BIT_JOIN_NDX" AS SELECT T73194."DNAME", T87814.L$ROWID FROM "SCOTT"."DEPT" T73194, SYS.L$46 T87814 WHERE T87814."DEPTNO" = T73194."DEPTNO"
    155103,73194,UPD_JOININDEX "SCOTT"."MGR_NAME_BIT_JOIN_NDX1" AS SELECT T73194."DNAME", T151402.ROWID FROM SYS.L$47 T73194, "SCOTT"."EMP_SAVE" T151402 WHERE T151402."DEPTNO" = T73194."DEPTNO"
    155103,151402,UPD_JOININDEX "SCOTT"."MGR_NAME_BIT_JOIN_NDX1" AS SELECT T73194."DNAME", T151402.L$ROWID FROM "SCOTT"."DEPT" T73194, SYS.L$48 T151402 WHERE T151402."DEPTNO" = T73194."DEPTNO"
    155099,73194,UPD_JOININDEX "SCOTT"."MGR_NAME_BIT_JOIN_NDX" AS SELECT T73194."DNAME", T87814.ROWID FROM SYS.L$47 T73194, "SCOTT"."EMP_TEMP" T87814 WHERE T87814."DEPTNO" = T73194."DEPTNO"
    
    

    The 'T' references appear to be like bind variables for the object with the same number: T73194 => object_id = 73194

    And there are the L$ references (L$46, L$47, L$48) for the maintenance.

    Just one extra detail, though - I've just created a local bitmap join index on a partitioned table with dimension and got GTTs L$108 and L$109; then I create the corresponding bitmap join index on a simple heap table with the same dimension (i.e. getting ready for the exchange) - and L$109 disappeared, while L$110 and L$111 appeared.
    
    L$108 and L$110 have the same structure; L$111 has a different structure that matches the structure that L$109 had - so I'm guessing that there's a moment when L$111 has to take on the role of L$109, and maybe something broke at that critical point.
    
    

    The object that 'disappears' is the one with the referenced column name in it. My hypotheses is still that the OBJECT_ID associated with that 'disappeared' object is the one being used and Oracle can't find it anymore.

    That is if one of those queries above from JIREFRESHSQL$ queries SYS.L$48 but L48 'disappeared' that would cause a problem.

    Since those objects disappear when the needed/matching index is created it suggests to me that the EXACT order, or overlapping, of events might be causing the problem.

    Maybe OP's process is using parallel executions (as opposed to Oracle's PARALLEL) to rebuild two more tables/indexes at the same time.

  • jhall_uk_redux
    jhall_uk_redux Member Posts: 133 Blue Ribbon
    edited February 2016

    Interesting...

    The first query that looks at DBA_OBJECTS and JIJOIN$ shows the following characteristics:

    In both my PROD and DEV environment, on the problematic schema (SCHEMA A) there is a difference in the ordering of the bitmap join indexes of the exchange table versus the target partitioned table - this is noticed when sorted by OBJ_ID.

    Row NumberObj#Obj_NameObj_OwnerTab1_Obj#Tab1_OwnerTab1_Object_NameTab2_Obj#Tab2_Object_NameTab2_Owner
    7454507956FCT_TABLE_NAME_BJ1DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A2539238DIM_TABLE1DW_SCHEMA_A
    7464508043FCT_TABLE_NAME_BJ2DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A2539225DIM_TABLE2DW_SCHEMA_A
    7474508130FCT_TABLE_NAME_BJ3DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A2539229DIM_TABLE3DW_SCHEMA_A
    7484508217FCT_TABLE_NAME_BJ4DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A2539229DIM_TABLE3DW_SCHEMA_A
    7494508304FCT_TABLE_NAME_BJ5DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A2539229DIM_TABLE3DW_SCHEMA_A
    7504508391FCT_TABLE_NAME_BJ6DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A2539229DIM_TABLE3DW_SCHEMA_A
    7514508478FCT_TABLE_NAME_BJ7DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A2539229DIM_TABLE3DW_SCHEMA_A
    7524508565FCT_TABLE_NAME_BJ8DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A2539229DIM_TABLE3DW_SCHEMA_A
    7534508652FCT_TABLE_NAME_BJ9DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A2539232DIM_TABLE4DW_SCHEMA_A
    7544508739FCT_TABLE_NAME_BJ10DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A2539232DIM_TABLE4DW_SCHEMA_A
    7554508826FCT_TABLE_NAME_BJ11DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A2539232DIM_TABLE4DW_SCHEMA_A
    7564508913FCT_TABLE_NAME_BJ12DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A2539234DIM_TABLE5DW_SCHEMA_A
    7574509000FCT_TABLE_NAME_BJ13DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A2539234DIM_TABLE5DW_SCHEMA_A
    7584509087FCT_TABLE_NAME_BJ14DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A2539234DIM_TABLE5DW_SCHEMA_A
    7594509174FCT_TABLE_NAME_BJ15DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A2539234DIM_TABLE5DW_SCHEMA_A
    7604509261FCT_TABLE_NAME_BJ16DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A2539234DIM_TABLE5DW_SCHEMA_A
    7614509348FCT_TABLE_NAME_BJ17DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A2539234DIM_TABLE5DW_SCHEMA_A
    7624509435FCT_TABLE_NAME_BJ18DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A2539234DIM_TABLE5DW_SCHEMA_A
    7634509522FCT_TABLE_NAME_BJ19DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A2539234DIM_TABLE5DW_SCHEMA_A
    7644509609FCT_TABLE_NAME_BJ20DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A2539234DIM_TABLE5DW_SCHEMA_A
    7654509696FCT_TABLE_NAME_BJ21DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A2539234DIM_TABLE5DW_SCHEMA_A
    7664509783FCT_TABLE_NAME_BJ22DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A2539234DIM_TABLE5DW_SCHEMA_A
    7674509870FCT_TABLE_NAME_BJ23DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A4471301DIM_TABLE6DW_SCHEMA_A
    7684509957FCT_TABLE_NAME_BJ24DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A4471301DIM_TABLE6DW_SCHEMA_A
    7694510044FCT_TABLE_NAME_BJ25DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A4471301DIM_TABLE6DW_SCHEMA_A
    7704510131FCT_TABLE_NAME_BJ26DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A2539230DIM_TABLE7DW_SCHEMA_A
    7714510218FCT_TABLE_NAME_BJ27DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A2539230DIM_TABLE7DW_SCHEMA_A
    7724510305FCT_TABLE_NAME_BJ28DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A2539230DIM_TABLE7DW_SCHEMA_A
    7734510392FCT_TABLE_NAME_BJ29DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A4471715DIM_TABLE8DW_SCHEMA_A
    7744510479FCT_TABLE_NAME_BJ30DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A4471716DIM_TABLE9DW_SCHEMA_A
    7754510566FCT_TABLE_NAME_BJ31DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A4471716DIM_TABLE9DW_SCHEMA_A
    7764510653FCT_TABLE_NAME_BJ32DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A4471716DIM_TABLE9DW_SCHEMA_A
    7774510740FCT_TABLE_NAME_BJ33DW_SCHEMA_A4507869FCT_TABLE_NAMEDW_SCHEMA_A4471716DIM_TABLE9DW_SCHEMA_A
    9067657636EXCH_FCT_TABLE_NAME_BJ1DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A2539238DIM_TABLE1DW_SCHEMA_A
    9077657639EXCH_FCT_TABLE_NAME_BJ10DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A2539232DIM_TABLE4DW_SCHEMA_A
    9087657642EXCH_FCT_TABLE_NAME_BJ11DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A2539232DIM_TABLE4DW_SCHEMA_A
    9097657645EXCH_FCT_TABLE_NAME_BJ12DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A2539234DIM_TABLE5DW_SCHEMA_A
    9107657648EXCH_FCT_TABLE_NAME_BJ13DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A2539234DIM_TABLE5DW_SCHEMA_A
    9117657651EXCH_FCT_TABLE_NAME_BJ14DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A2539234DIM_TABLE5DW_SCHEMA_A
    9127657654EXCH_FCT_TABLE_NAME_BJ15DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A2539234DIM_TABLE5DW_SCHEMA_A
    9137657657EXCH_FCT_TABLE_NAME_BJ16DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A2539234DIM_TABLE5DW_SCHEMA_A
    9147657660EXCH_FCT_TABLE_NAME_BJ17DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A2539234DIM_TABLE5DW_SCHEMA_A
    9157657663EXCH_FCT_TABLE_NAME_BJ18DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A2539234DIM_TABLE5DW_SCHEMA_A
    9167657666EXCH_FCT_TABLE_NAME_BJ19DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A2539234DIM_TABLE5DW_SCHEMA_A
    9177657669EXCH_FCT_TABLE_NAME_BJ2DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A2539225DIM_TABLE2DW_SCHEMA_A
    9187657672EXCH_FCT_TABLE_NAME_BJ20DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A2539234DIM_TABLE5DW_SCHEMA_A
    9197657675EXCH_FCT_TABLE_NAME_BJ21DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A2539234DIM_TABLE5DW_SCHEMA_A
    9207657678EXCH_FCT_TABLE_NAME_BJ22DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A2539234DIM_TABLE5DW_SCHEMA_A
    9217657681EXCH_FCT_TABLE_NAME_BJ23DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A4471301DIM_TABLE6DW_SCHEMA_A
    9227657684EXCH_FCT_TABLE_NAME_BJ24DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A4471301DIM_TABLE6DW_SCHEMA_A
    9237657687EXCH_FCT_TABLE_NAME_BJ25DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A4471301DIM_TABLE6DW_SCHEMA_A
    9247657690EXCH_FCT_TABLE_NAME_BJ26DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A2539230DIM_TABLE7DW_SCHEMA_A
    9257657693EXCH_FCT_TABLE_NAME_BJ27DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A2539230DIM_TABLE7DW_SCHEMA_A
    9267657696EXCH_FCT_TABLE_NAME_BJ28DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A2539230DIM_TABLE7DW_SCHEMA_A
    9277657699EXCH_FCT_TABLE_NAME_BJ29DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A4471715DIM_TABLE8DW_SCHEMA_A
    9287657702EXCH_FCT_TABLE_NAME_BJ3DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A2539229DIM_TABLE3DW_SCHEMA_A
    9297657705EXCH_FCT_TABLE_NAME_BJ30DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A4471716DIM_TABLE9DW_SCHEMA_A
    9307657708EXCH_FCT_TABLE_NAME_BJ31DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A4471716DIM_TABLE9DW_SCHEMA_A
    9317657711EXCH_FCT_TABLE_NAME_BJ32DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A4471716DIM_TABLE9DW_SCHEMA_A
    9327657714EXCH_FCT_TABLE_NAME_BJ33DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A4471716DIM_TABLE9DW_SCHEMA_A
    9337657717EXCH_FCT_TABLE_NAME_BJ4DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A2539229DIM_TABLE3DW_SCHEMA_A
    9347657720EXCH_FCT_TABLE_NAME_BJ5DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A2539229DIM_TABLE3DW_SCHEMA_A
    9357657723EXCH_FCT_TABLE_NAME_BJ6DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A2539229DIM_TABLE3DW_SCHEMA_A
    9367657726EXCH_FCT_TABLE_NAME_BJ7DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A2539229DIM_TABLE3DW_SCHEMA_A
    9377657729EXCH_FCT_TABLE_NAME_BJ8DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A2539229DIM_TABLE3DW_SCHEMA_A
    9387657732EXCH_FCT_TABLE_NAME_BJ9DW_SCHEMA_A7657534EXCH_FCT_TABLE_NAMEDW_SCHEMA_A2539232DIM_TABLE4DW_SCHEMA_A

    Notice the difference in ordering (object names renamed to protect the innocent)

    When looking at the other working schema (SCHEMA B) which contains a copy of the same tables in terms of structure, and where data gets loaded into on alternative loads I do not see such characteristics - the index names are ordered the same way for both the exchange table and the partitioned table.

    On my "test case" schema (XX_TEST_CASE) where I created the DDL for the affected tables in isolation to try and reproduce the problem, the ordered of the objects by OBJ_ID shows the same ordering of bitmap join indexes for both the exchange table and the partitioned table.

    I see similar characteristics when I query JIREFRESHSQL$. When I filter by table object id (tobj#) i see for the working schema a similar ordering of SQL by bitmap join index, but the ordering differs for the non-working schema.

    This is a strange coincidence that the only schema we have partition exchange failures is the schema that seems to have the internal ordering of bitmap join indexes different between the exchange table and the partitioned table.

    cheers,

    John

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,607 Gold Crown
    edited February 2016

    I don't think the ordering of the rows should make a difference - but you could modify the query to check that the definitions were consistent.

    I'm assuming that BJ1 on the fact table should have the same definition as BJ1 on the exchange table, and given your naming convention you might compare the definitions fairly easily with something like the following:

    select

            obj#,

            (select name from obj$ o where o.obj# = j.obj#) obj_name,

            tab1obj#, tab1col#,

            (select name from obj$ o where o.obj# = j.tab1obj#) tab1obj_name,

            tab2obj#, tab2col#,

            (select name from obj$ o where o.obj# = j.tab2obj#) tab2obj_name

    from

            sys.jijoin$ j

    where

            (select name from obj$ o where o.obj# = j.obj#) like '%FCT_TABLE_NAME_BJ%'

    order by

            substr(obj_name,-4),

            obj_name desc,

            obj#

    /

    I haven't tried to make it efficient, and you seem to have a lot of bitmap join indexes, so don't be surprised if it takes some time to run.

    I've tried to pick out the bitmap indexes on your partitioned and exchange table, ordering them by index suffix so that you can check off matching pairs fairly easily.  Index names are not significant, of course, but I'm assuming that most of the _nn should be matchable in pairs so if there are any inconsistencies they will be easy to spot after you've eliminated in pairs.

    Regards

    Jonathan Lewis

  • Unknown
    edited February 2016

    Ok - but is there some reason you didn't answer ANY of the questions I ask you in reply #9?

  • jhall_uk_redux
    jhall_uk_redux Member Posts: 133 Blue Ribbon
    edited February 2016

    On our DEV environment I have recreated the bitmap join indexes and re-executed the partiion exchange syntax using the "including indexes" clause. This time it worked.

    Looking at my query of DBA_OBJECTS and JIJOIN$ the only difference I can see is the ordering of the index name by OBJ#. Before I recreated the indexes the ordering of the indexes by obj# was different between the exchange table and partition table.

    In the working schema you can see they are ordered like:

    OBJ#  Object name

    1         INDEX_BI1

    2         INDEX_BI2

    3         INDEX_BI3

    4         INDEX_BI4

    but in the non-working schema they are ordered like:

    OBJ#  Object name

    1         INDEX_BI1

    2         INDEX_BI10

    3         INDEX_BI11

    4         INDEX_B2

    etc.

    Once I recreated the bitmap join indexes for the exchange table in the same order used in the partition table suddenly the partition exchange "including indexes" syntax started working properly again.

    This seems very strange.ora-14098 doesn't indicate the ordering of the indexes matter - only number, size, type, and order of index columns (for a multi-column index)

    I'm going to try and reproduce this in my test case schema by creating the indexes in a random order  to see if that will allow the issue to be reproduced.

    Incidentally, I also noticed the created date of the exchange table on the two environments that are having problems with partition exchange is very recent. It looks like the exchange table was recreated at some point after the initial partition exchange process failed. This may be coincidental, but all the bitmap join indexes were created at the same time of this recent recreate - and recreated in a different order from how they were previously - which I have confirmed, as I'm lucky enough to have access to a clone of PROD from a few weeks before the issue occurred and I can see the OBJ# and created dates differ for this one exchange table and all the indexes if ordered by OBJ# are the same between the exchange table and partition.  So, that explains why something that was working stopped working - but the cause being the index order feels not right.

    cheers,

    John


  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,607 Gold Crown
    edited February 2016

    I considered that and did a very small test (only two indexes) even though I thought it unlikely to be the problem. I didn't see any problem, but maybe with a couple of dozen indexes you might see a different result.

    Regards

    Jonathan Lewis

  • Unknown
    edited February 2016
     Once I recreated the bitmap join indexes for the exchange table in the same order used in the partition table suddenly the partition exchange "including indexes" syntax started working properly again.
    
    This seems very strange.ora-14098 doesn't indicate the ordering of the indexes matter - only number, size, type, and order of index columns (for a multi-column index)
    . . .
    So, that explains why something that was working stopped working - but the cause being the index order feels not right.
    

    It seems right to me. As Jonathan first showed, and I reproduced, even with just ONE bitmap join index the creation of the second one (e.g. on the work table) causes one of the hidden GTTs to disappear.

    This is what I said in my reply to what Jonathan posted - perhaps you missed it

    The object that 'disappears' is the one with the referenced column name in it. My hypotheses is still that the OBJECT_ID associated with that 'disappeared' object is the one being used and Oracle can't find it anymore.
    
    That is if one of those queries above from JIREFRESHSQL$ queries SYS.L$48 but L48 'disappeared' that would cause a problem.
    
    Since those objects disappear when the needed/matching index is created it suggests to me that the EXACT order, or overlapping, of events might be causing the problem.
    

    See that clause in the last sentence? Where I said 'the EXACT order, or overlapping, of events might be causing the problem'?

    And you still haven't provided the DETAILS of what steps your process takes, what order it takes them in and whether any of it is done in parallel. If multiple processes are working in parallel to do bits and pieces of the work (e.g. some indexes being rebuilt by one process while others are rebuilt by other processes) then the actual execution order will be indeterminate.

    If the build order is Indeterminate it means you can NOT control, or predict, the order things will be started or completed. Which means you may NEVER be able to reproduce any particular execution.

  • jhall_uk_redux
    jhall_uk_redux Member Posts: 133 Blue Ribbon
    edited February 2016

    Rp0428,

    Ok - can you be more precise about the EXACT order each of those steps were done - I'm not clear on just which tables you are referring to above:

    1. What table's primary key was 'recreated' and why? The 'main' table? Or the work table used for the exchange?

    [REPLY] The primary key that was recreate was the PK constraint against the exchange table. It is dropped prior to truncating the exchange table (which if there was a successful partition exchange taking place previously would be empty anyway, but the truncate adds some robustness to the ETL logic). It gets recreated afterwards.

    Order of steps: 1. drop exchange table PK; 2. mark all exchange table bitmap join indexes unusable; 3. stage data into exchange table; 4. rebuild unusable bitmap join indexes in exchange table; 5. recreate exchange table PK in disable validate mode; 6. truncate partition in target table relating to the data being loaded (as we can replace the data); 7. perform partition exchange (using "including indexes without validation update global indexes" clauses).

    Why [was the PK recreated]? you cannot truncate a table containing a constraint disabled and validated - ORA-25128

    2. What table's bitmap join indexes were rebuilt? The 'main' table? Or the work table? Why were they marked 'unusable'? What 'data staging' are you talking about?

    [REPLY]

    "What table's bitmap join indexes were rebuilt?" - in normal execution, just the exchange table (the indexes are never dropped and recreated - we mark unusable, stage the data into the exchange table and do an alter index [index] rebuild;

    "Why were they marked 'unusable'?" - Only because partition exchange using "including indexes" clause has started failing did we revert to using "excluding indexes" - which means the indexes become unusable in the target partition after the partition exchange command is executed, so we have to rebuild the indexes in the affected partition

    "What 'data staging' are you talking about?" - in our data warehouse, we hold full transactional data including history in a 3NF foundation layer. as part of loading the data into our "Kimball" dimensional model we stage the new and updated records into a staging table - let's call it "STG_[target fact table]". here we do all the dimension lookups with various bits of set based SQL so that we reach a state we are ready for partition exchange. The STG table is structurally different to the target fact as it contains both source and target ID columns and is aggregated differently, to the target fact. So we have to load the aggregated data complete with all dimension lookup values into the exchange table - let's call it "EXCH_[target fact table]". That's what I mean by data staging.

    "were any of the 'dimension tables' ... also involved in the partition exchange" - confused... The fact table is exchanged. the dimensions are not and they are not partitioned either. The indexes would get exchanged and these indexes are bitmap joins - so there may be some internal stuff going on in the database that I'm not aware that interacts with the parent dimension table. [actually that would not surprise me - I've seen internal serialization take place with bitmap join indexes in the past that means I could not trust loading two target fact tables that shared the same dimensions at the same time without risking an ETL timeout failure]. Therefore, I don't think your follow up point is relevant. Also, to be clear - we load one fact table at a time - see my bracketed comments earlier in this paragraph to set some context for why we do it that way.

    "That statement is what makes me think there could be some overlap between the processing of the dimension table and the work tables"

    [REPLY] see above and especially my comment about potential serialization of bitmap join indexes. There could be some "overlap" - but if you read the follow up comments from the other contributors you may see that I have spotted a strange difference in the ordering of object ids of the bitmap join indexes from the dictionary, which may be a cause. I need to prove it in an isolated test case.

    I hope this set's the context a little more. Like I said previously, the partition exchange syntax was running fine for 14 months and it now seems that apart from the order that the exchange table indexes were created, everything seems ok. The comments re looking at the internal tables sounded very relevant in terms of driving my investigation and what with the weekend and all, I have limited time to investigate and reply to every comment as quickly as I would like to. Hopefully, this answers your questions and thanks for your input.

    cheers,

    John

  • jhall_uk_redux
    jhall_uk_redux Member Posts: 133 Blue Ribbon
    edited February 2016

    Jonathan,

    I've reproduced the problem in an isolated schema. Yes, the order of bitmap index creation does matter.

    I have confirmed this by running partition exchange once with both tables having the 33 bitmap join indexes created in the same order (partition exchange "including indexes" works)

    ..and running it once again after I dropped a few random indexes from the EXCH_[fact table] table and recreated them (in my test case I dropped indexes BJ10 through to BJ15 and added them again). This time it failed.

    Running the query against DBA_OBJECTS and JIJOIN$ shows the new OBJ# assigned to these indexes and the overall set of exchange table indexes are not in the same order as the partition table. So, whilst the OBJ# doesn't have to match - the net order of the indexes have to be the same regardless.

    Thinking about how partition exchange works, I know this is just a dictionary update to effectively swap the segments around - there must be some limitation in how it determines if the indexes are the same. Perhaps it is specific to bitmap joins - a quick test case will soon show that. Really, the database ought to be able to deal with that as without resorting to these $ tables which I presume a unsupported it would be impossible to know if the indexes are the same order.

    So, to avoid an ORA-14098 you must for bitmap join indexes (at least) ensure the indexes are of the same size, type, column order and are created in the same CREATE INDEX DDL execution order in both your exchange (work) table and the target table partition.

    I will update my SR I have open and suggest this is a new bug in 11.2.0.3 and 11.2.0.4.

    Anyway, this is something to bear in mind for anyone else heavily using partition exchange.

    cheers,

    John

  • jhall_uk_redux
    jhall_uk_redux Member Posts: 133 Blue Ribbon
    edited February 2016

    RP0428,

    Not sure about the GTTs you suggested (out of ignorance of which GTTs and which table specifically you are referring to - I'm not disputing their actual relevance to the issue).

    Frankly, I don't understand how the internal tables work (I'll plead ignorance on that front). However, the manifestation in terms of index ordering when querying JIJOIN$ and DBA_OBJECTS does illustrate the issue clearly. Your example SQL on this has clearly allowed this issue to be progressed also will plead ignorance on whether the JIJOIN$ table is actually a global temporary table or not. When I extended that SQL to include details on object creation dates it became clear some objects had been recreated recently and that the order of index creation had been different second time around. This has guided me to the believe that exact order as you mentioned does matter, but more importantly it matters for the order in which the bitmap join indexes are created. I'm not sure if had the order in which you create each index in mind when you mentioned it - or if you were thinking of some other element of the process (Jonathan suggested he was doubtful that index create ordered mattered in one of his comments above).

    cheers,

    John

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,607 Gold Crown
    edited February 2016

    John,.

    Interesting - I may kick this around a bit; two things stand out that may have made my very simple test irrelevant:

    a) you have 33 bitmap join indexes on the one table

    b) you have far fewer dimension tables than bitmap indexes

    It may be that (b) is more significant than (a) - but if I had free access to your spare system I'd test to see what happened if I got rid of a couple of indexes (i.e. dropped below 32 because powers of 2 often make funny things happen) before doing the "rebuild in different order" test, and I'd also pick the "out of order" indexes so that I had no more than one from any one dimension (I think you had 2 based on one dimension and 4 based on another).

    If I have a go I'll create at least 33 indexes against a small number of dimensions and let you know what happens.

    Regards

    Jonathan Lewis

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,607 Gold Crown
    edited February 2016

    John,

    Couldn't resist temptation.

    Created a scripts that generated 36 bitmap indexes based on 6 dimension tables

    Builds in about 20 seconds.

    Dropping and rebuilding a couple of sets of 6 indexes results in

    Index created.

            exchange partition p8001 with table t1

                                                *

    ERROR at line 2:

    ORA-14098: index mismatch for tables in ALTER TABLE EXCHANGE PARTITION

    Dropping and rebuilding just those in "the right order" allows the exchange to run.

    I haven't done any further testing to find a minimal example yet.

    If you're working with your real data rather than a small scripted variant I can publish or email my script to reference in your SR.

    Regards

    Jonathan Lewis

    P.S.  The problem reproduces on 12.1.0.2

  • Unknown
    edited February 2016
    If you're working with your real data rather than a small scripted variant I can publish or email my script to reference in your SR.
    

    Can you publish it?

    My hypothesis would be that it doesn't matter how many indexes there are (so even just a couple should suffice) and that is shouldn't really matter if there is any data. It might matter that the segments are actually created so if segment creation is deferred that could mask the issue.

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,607 Gold Crown
    edited February 2016

    rp0428,

    I don't know what I did wrong on my earlier quick test when I said the problem didn't reproduce with just 2 indexes because I just stripped by 6 dimensions and 36 indexes back to two dimensions with one index each and got the problem by switch the order of index creation.

    I now suspect that it's something to do with the virtual / hidden columns that get created to support the bitmap indexes - perhaps they're flagged incorrectly in some way; however the same problem doesn't appear with function-based indexes (which also require supporting virtual columns) or declared virtual columns; in fact as far as "real" virtual columns are concerned the simple table and the partitioned table are allowed to have different virtual column definitions and still exchange properly.

    I may try tracing the exchange tomorrow to see if the problem reveals itself in some recursive query.

    I'll also do a brief blog note.

    Regards

    Jonathan Lewis

  • jhall_uk_redux
    jhall_uk_redux Member Posts: 133 Blue Ribbon
    edited February 2016

    Thanks both. I've added this additional information to the SR I've had open with Oracle Support for the last month. I'm glad that you've managed to reproduce the issue (yes, generating bitmap join indexes is definitely the way to go to reduce typing pain - that's how I defined mine across my entire data model).

    For reference, I have 9 dimension tables referenced by my fact table - but due to differences in the number of hierarchy levels there are significant difference in the number of bitmap join indexes per dimension table.

    cheers,

    John

This discussion has been closed.