Good morning,
We are using Oracle Database 11.2.0.3 running on Oracle Linux in production and 11.2.0.4 in development
We have a data warehouse that has been running in production for approximately a year. We are using partition exchange extensively and it has been on the whole very reliable.
Then a few weeks ago we have had an ETL failure which stopped the ETL logic part way through the preparation activities for partition exchange, i.e., the data had been prepared and staged into our exchange table and the primary key recreated in 'disable validate' mode and all the bitmap join indexes rebuilt after previously being marked unusable prior to data staging. But the outage occurred just before the next step to do partition exchange.
The team restarted the job as this would normally just reload the data into to the exchange table but it failed with:
SQL Error: ORA-14098: index mismatch for tables in ALTER TABLE EXCHANGE PARTITION
14098. 00000 - "index mismatch for tables in ALTER TABLE EXCHANGE PARTITION"
*Cause: The two tables specified in the EXCHANGE have indexes which are
not equivalent
*Action: Ensure that the indexes for the two tables have indexes which
follow this rule
For every non partitioned index for the non partitioned table,
there has to be an identical LOCAL index on the partitioned
table and vice versa. By identical, the column position, type
and size have to be the same.
Our partition exchange syntax is as follows:
alter table [schema].[exchange table]
exchange partition [partition] with table [schema].[partitioned table]
including indexes
without validation
update global indexes;
A pretty standard error when indexes are not in alignment between the two tables and usually a 2 min fix. However, our indexes are valid and the constraints are as they should be - everything looks ok.
Manually, we can get an alternative version of the exchange partition syntax to work fine - this very strange - either the structure of the exchange and partitioned table are in alignment or they are not!
alter table [schema].[exchange table]
exchange partition [partition] with table [schema].[partitioned table]
excluding indexes
without validation
update global indexes;
which we can then finish off the equivalent of the ETL job by rebuilding the index partitions and the global primary key index in the target table.
Why would the SQL using "including indexes" fail, but the SQL using "excluding indexes" succeed?
Following on from using "excluding indexes" we thought something may have not been synchronised properly in the dictionary and then repeated the partition exchange process again from the beginning - stage, validate, rebuild indexes/PK constraint etc and the original syntax consistently fails to work. No matter how many times I repeat the process "include indexes" fails and "exclude indexes" succeeds. No other table in our prod environment is affected - they all use the same approach to partition exchange.
We even have a two schemas with identical copies of the table structures - as we always load into one whilst reporting against another, using synonyms to redirect the report queries. So, identical code runs against both schemas. The code still works using "including indexes" against the other schema copy.
I have not tried dropping all the local indexes and rebuilding them completely as I wanted to understand what has gone wrong first in case it is a known issue that could affect any of our partitioned tables at any time we have another ETL failure or outage.
I have not considered modifying the code to use "excluding indexes" followed by a index rebuild as I want some evidence first about the cause. if I change the syntax for one table, then logic means I should modify the ETL for every partitioned table to follow the same approach. Anyway, the syntax we have been using has worked for 14 months without issue and now only affects one table following an failure.
I have reproduced the issue by failing the ETL at the same point in our DEV environment (which has been patched to 11.2.0.4), but if I create a standalone test case schema using the DDL from the dictionary to recreate the affected tables then I cannot reproduce the issues - "including indexes" works fine!
This is very strange - does anyone have any ideas?
cheers,
John