- 3,722,118 Users
- 2,244,225 Discussions
- 7,849,643 Comments
Forum Stats
Discussions
Categories
- 16 Data
- 362.2K Big Data Appliance
- 7 Data Science
- 2K Databases
- 585 General Database Discussions
- 3.7K Java and JavaScript in the Database
- 32 Multilingual Engine
- 494 MySQL Community Space
- 7 NoSQL Database
- 7.7K Oracle Database Express Edition (XE)
- 2.8K ORDS, SODA & JSON in the Database
- 418 SQLcl
- 54 SQL Developer Data Modeler
- 185K SQL & PL/SQL
- 21.1K SQL Developer
- 2.3K Development
- 3 Developer Projects
- 32 Programming Languages
- 135.5K Development Tools
- 12 DevOps
- 3K QA/Testing
- 321 Java
- 10 Java Learning Subscription
- 12 Database Connectivity
- 70 Java Community Process
- 2 Java 25
- 11 Java APIs
- 141.2K Java Development Tools
- 7 Java EE (Java Enterprise Edition)
- 153K Java Essentials
- 135 Java 8 Questions
- 86.2K Java Programming
- 270 Java Lambda MOOC
- 65.1K New To Java
- 1.7K Training / Learning / Certification
- 13.8K Java HotSpot Virtual Machine
- 14 Java SE
- 13.8K Java Security
- 3 Java User Groups
- 22 JavaScript - Nashorn
- 18 Programs
- 144 LiveLabs
- 34 Workshops
- 9 Software
- 3 Berkeley DB Family
- 3.5K JHeadstart
- 5.7K Other Languages
- 2.3K Chinese
- 4 Deutsche Oracle Community
- 15 Español
- 1.9K Japanese
- 2 Portuguese
Choosing right Index

Hi We are using version 11.2.0.4.0 version of oracle. We have two similar kind of index in a big transaction table -T1 in which ~200million records inserted daily in a conventional method from Java in multiple threads. We want to remove one of the duplicate local index to make it better along with get back ~2TB of storage space. Here table T1 is range partitioned(daily partitioned) on column "c3_dt" which is a truncated date column with no time component in it. so it means in as single partition there would be only one distinct value for column c3_dt and a single partition holds ~200million records.
As index idx1 is already having first two columns in it so its look safe and obvious to drop index idx2 in first place. Btw i see , In current queries ~90% of the time index idx1 is getting used. so i have few questions here , as we have third column in the composite index IDX1 is a truncated date column which is partition key, so i was thinking it doesn't make sense to have that column included in the local index, as for that specific partition it only have one distinct value in it. So we should drop index IDX1 keeping IDX2 intact, so we would get rid of a much bulkier index with no performance hit. Need expert suggestions regarding same if my understanding is correct or i am missing anything?
I do see that it has index IDX1 as unique which is a unique constraint maintaining data quality/integrity intact, so we may have to justify the need to drop unique index and find tradeoff between these two option.
also we are planning to first make the index invisible and then drop after sometime of observation, as in existing queries where we have some profiles in place(something as below)and idx1 is getting used, does this change will make or invalidate the profiled path or it will silently pick the index idx2 in place of idx1? Need expert suggestion about this.
q'[INDEX(@SEL$1 "T1"@SEL$1 ("T1"."C1" "T1"."C2" "T1"."C3_DT"))]',
Index idx1 is unique index on- (c1,c2,c3_dt) and idx2 is on (c1,c2).
INDEX Idx2 is non unique index on (c1,c2)
Index Idx1 is having total size for all partitions ~2.5 TB
Index IXD2 is having total size for all partitions - 2 TB
Here column c1 is of data type VARCHAR2 and c2 is of data type Number and c3_dt is date datatype. Below are the statistics for the columns
Global statistics:-
column_name, data_type,num_distinct,density,num_nulls
C3_dt DATE 211 0.004739336492891 0
c1 VARCHAR2 1597649 6.2591971077502E-7 0
c2 NUMBER 1996800 5.00801282051282E-7 0
Partition level stats:-
column_name,num_distinct,density,num_nulls
C1 8754 0.00127388535031847 0
c2 1398016 7.1529939571507E-7 0
C3_dt 1 2.43469886548297E-9 0
Best Answer
-
The index (c1, c2, c3_date) makes the index (c1, c2) redundant from a mechanical perspective - every query that might use (c1, c2) could just as well use (c1, c2, c3_date), so that's the argument for dropping the index on (c1, c2).
You can't drop the index (c1, c2, c3_date) (probably) because the uniqueness is probably a deliberate choice made to enforce uniqueness of the combination of columns and Oracle won't be able to enforce uniqueness as the data changes unless you have a valid index that starts with the correct columns.
If you make the index invisible that means (most precisely) invisible to the optimizer. In fact it would still be visible for purposes of the parent/child locking problem if this index were on the child end of a foreign key constraint (but that's irrelevant in your case as both indexes start with the same columns in the same order). This may have some side effects on the choice of execution plan because (a) currently Oracle has some information about the number of combinations of (c1, c2) which would disappear if you dropped the index - you could bypass this issue by creating a column group (extended stats) on the combination - and (b) when two indexes start with the same columns the longer index is larger in leaf block count and has a larger clustering_factor and the clustering_factor is a very significant contributor to the optimizer's estimate of the cost of using the index - in your case, since the c3_date column has the same value for all rows in the index the presence or absence of the column shouldn't affect the clustering_factor, at least at the partition level.
It has to be said, though, that there are a couple of anomalies visible in the index stats you've reported, which make it harder to predict with any confidence exactly what might happen when you drop the index - so marking it invisible and then creating a column group on the (c1, c2) combination is a good starting point.
Regards
Jonathan Lewis
Answers
-
You need to look at your queries. For example, if a query is selecting and projecting only the columns in the index key, then it will not need to look at the table at all. That won't work any more if you remove columns from the index.
-
I would just drop it on the testing system and see what happens under normal load. It means you need to have a testing environment and can replicate load, but that sort Of testing is a normal requirement anyway.
Dont forget that even if an index isn’t being directly used, it’s statistics could be influencing your execution plans.
-
The right index is idx1 is unique index on- (c1,c2,c3_dt)
-
Thank you so much. Agree to all your point. Actually in a default scenario i should go for dropping index IDX2 as because IDX1 is covering all the column of IDX2 or its kind of super-set. But something actually making little more inclined towards removing index IDX1, as its ~500GB more bulkier and having additional column C3_DT as the last column which is a truncated date column(partition key) with 1 distinct value for each partition, so it seem to me IDX1 is really not helping anyway better than IDX2 also its consuming much more space than IDX2. Additional overhead for DML must be more in case IDX1 as compared to IDX2. Can you please let me know if its really foolish to think of choosing index idx1 over idx2? I do agree that the benefit of idx1 is that it ensure data integrity as because of its uniqueness in nature, but i was trying to reach to a better decision, considering all pros and cons. Appreciate your response.
Is it correct to say that "stats or influence for optimizer" associated with IDX1 and IDX2 would be closely similar as they just differ by one column which is a partition key(truncate date) and both are local index in nature.
Below is the global stats for the two indexes.
index_name,uniqueness,leaf_blocks,distinct_keys,num_rows
IDX1 UNIQUE 183573250 34784425080 34784425080
IDX2 NONUNIQUE 148872779 226946615 34944244080
I understand that if only index scan is being used and table is not being accessed , removing column from that may impact bit negative way, but as this system mostly involves batch queries running >5minutes so was thinking considering the last column being a very less distinct one the table access will not affect the query much. Please suggest.
Edited-
Additionally as you said, "Dont forget that even if an index isn’t being directly used, it’s statistics could be influencing your execution plans." if we make the index invisible will optimizer still use its stats during execution plan even it will not use the index directly for fetching data? or i can say if by making invisible my application will work fine, then it means its safe to drop? as the index are very big in nature so i was kind of preferring invisible option before dropping permanently as that will take long time and resource if at all we need to create it back.
-
The index (c1, c2, c3_date) makes the index (c1, c2) redundant from a mechanical perspective - every query that might use (c1, c2) could just as well use (c1, c2, c3_date), so that's the argument for dropping the index on (c1, c2).
You can't drop the index (c1, c2, c3_date) (probably) because the uniqueness is probably a deliberate choice made to enforce uniqueness of the combination of columns and Oracle won't be able to enforce uniqueness as the data changes unless you have a valid index that starts with the correct columns.
If you make the index invisible that means (most precisely) invisible to the optimizer. In fact it would still be visible for purposes of the parent/child locking problem if this index were on the child end of a foreign key constraint (but that's irrelevant in your case as both indexes start with the same columns in the same order). This may have some side effects on the choice of execution plan because (a) currently Oracle has some information about the number of combinations of (c1, c2) which would disappear if you dropped the index - you could bypass this issue by creating a column group (extended stats) on the combination - and (b) when two indexes start with the same columns the longer index is larger in leaf block count and has a larger clustering_factor and the clustering_factor is a very significant contributor to the optimizer's estimate of the cost of using the index - in your case, since the c3_date column has the same value for all rows in the index the presence or absence of the column shouldn't affect the clustering_factor, at least at the partition level.
It has to be said, though, that there are a couple of anomalies visible in the index stats you've reported, which make it harder to predict with any confidence exactly what might happen when you drop the index - so marking it invisible and then creating a column group on the (c1, c2) combination is a good starting point.
Regards
Jonathan Lewis
-
Thank you Jonathan.
fyi, The index stats which i had published was at global level, and some of the historical partition are compressed so not sure if that has impacted the net global stats anyway. Below is the stats for one of the local partition which is not compressed.
index_name,leaf_blocks,distinct_keys,clustering_factor,num_rows
IDX2 1007062 195246354 133507958 195246354
IDX1 1284091 206224738 141103155 206224738
I was almost thinking to just drop the invisible index without column group being created on (c1,c2) as the usage of index IDX2 was on few(2 to 3) occasion and tested those queries working fine with IDX1 too. My understanding was if the index is invisible then the optimizer will also not use the stats of that index to decide/reach out to any execution path. but as per your comment it seems even after making index invisible , optimizer will still use the stats of this index under the scene, though it may not use the index as a choice in the final execution path.
Again as this table is big one so creating extended stats would not require any space like index as it has no physical structures in it, but will it increase the stats gathering time significantly? Actually we are currently using incremental stats option for gathering stats and i hope in first attempt oracle will try to create the column group/extended stats in all the individual partitions and that may take some time and we may need some leisure time to have this implemented on production. Please correct if my understanding is wrong. And is there any other thing associated with column group/extended stats which we need to take care.Please suggest.
-
but as per your comment it seems even after making index invisible , optimizer will still use the stats of this index under the scene
No, that is exactly 100% the opposite of what I said
1 "If you make the index invisible that means (most precisely) invisible to the optimizer",
2 "currently Oracle has some information about the number of combinations of (c1, c2) which would disappear if you dropped the index",
3 "so marking it invisible and then creating a column group on the (c1, c2) combination is a good starting point"
1 How could the optimizer use the stats of the index if it's invisible to the optimizer ?
2 If the stats disappear when you drop the index how could the optimizer use them ?
3 Why would I suggest that you recreate the index's statistics through a column group if the optimizer could use them anyway ?
Regarding your other questions:
Collecting one set of column group stats may result in virtually no change to the workload of gathering stats - but the relative impact does depend on the number of columns and number of partitions in the table, and whether or not you are simply using Oracle's default strategy. I have to say that I don't know exactly what Oracle will do and how long it will take if you simply add the column group and then leave Oracle to do whatever it fancies in the automatic stats collection job; it may be best to add the column group and then do an immediate gather for just that column group and I think your thought about Oracle having to gather (at least) that column across all partitions on the first run is sound - so it would be a good idea to test this on a smaller data set first tracing the calls to dbms_stats to see what SQL Oracle runs in response to various different strategies. (The worst possible case would be Oracle gathering everything for every column on every partition because you've added one column group - and I have to say I haven't ever tested that option.)
I've written a few notes about undesirable behaviour and strange overheads relating to extended stats - the best starting point is the following note: https://jonathanlewis.wordpress.com/2012/04/11/extended-stats/ If you read that and follow the links you may spot something that's relevant to your system.
Regards
Jonathan Lewis
-
My Apology. I interpreted wrongly.
So I hope our testing strategy would be safe as below
1) Just mark the index invisible, if all application worked as expected for certain monitoring period with normal BAU workload without any issue. so its safe to simply drop the index.
2)If any issue found with variation in plan after making index invisible then we will create the column group(c1,c2) , so that will fix the plan and should make the plan back on track. and will monitor it for certain time with normal BAU workload without any issue. And then its safe to drop the index.
Edited- In above i am opting for column group as a second attempt rather immediate one.Just in case its very much required then only i am planning to have the column group. I hope that would be fare enough.
-
Edited- In above i am opting for column group as a second attempt rather immediate one.Just in case its very much required then only i am planning to have the column group. I hope that would be fare enough.
That's a perfectly reasonably strategy - particularly in your case where the c3_date column holds exactly one value per partition.
I'm always in favour not making the database do any work it doesn't need to do.
Regards
Jonathan Lewis