Forum Stats

  • 3,722,118 Users
  • 2,244,225 Discussions
  • 7,849,643 Comments

Discussions

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Choosing right Index

User_OCZ1T
User_OCZ1T Member Posts: 1,921 Gold Badge

Hi We are using version 11.2.0.4.0 version of oracle. We have two similar kind of index in a big transaction table -T1 in which ~200million records inserted daily in a conventional method from Java in multiple threads. We want to remove one of the duplicate local index to make it better along with get back ~2TB of storage space. Here table T1 is range partitioned(daily partitioned) on column "c3_dt" which is a truncated date column with no time component in it. so it means in as single partition there would be only one distinct value for column c3_dt and a single partition holds ~200million records.

As index idx1 is already having first two columns in it so its look safe and obvious to drop index idx2 in first place. Btw i see , In current queries ~90% of the time index idx1 is getting used. so i have few  questions here , as we have third column in the composite index IDX1 is a truncated date column which is partition key, so i was thinking it doesn't make sense to have that column included in the local index, as for that specific partition it only have one distinct value in it. So we should drop index IDX1 keeping IDX2 intact, so we would get rid of a much bulkier index with no performance hit. Need expert suggestions regarding same if my understanding is correct or i am missing anything?

I do see that it has index IDX1 as unique which is a unique constraint maintaining data quality/integrity intact, so we may have to justify the need to drop unique index and find tradeoff between these two option.

also we are  planning to first make the index invisible and then drop after sometime of observation, as in existing queries where we have some profiles in place(something as below)and idx1 is getting used, does this change will make or invalidate the profiled path or it will silently pick the index idx2 in place of idx1? Need expert suggestion about this.

q'[INDEX(@SEL$1 "T1"@SEL$1 ("T1"."C1" "T1"."C2" "T1"."C3_DT"))]',

Index idx1 is unique index on- (c1,c2,c3_dt) and idx2 is on (c1,c2).

INDEX Idx2 is non unique index on (c1,c2)

Index Idx1 is having total size for all partitions ~2.5 TB

Index IXD2 is having total size for all partitions - 2 TB

Here column c1 is of data type VARCHAR2 and c2 is of data type Number and c3_dt is date datatype. Below are the statistics for the columns

Global statistics:-

column_name, data_type,num_distinct,density,num_nulls

C3_dt    DATE    211    0.004739336492891    0

c1    VARCHAR2    1597649    6.2591971077502E-7    0

c2    NUMBER    1996800    5.00801282051282E-7    0

Partition level stats:-

column_name,num_distinct,density,num_nulls

C1    8754    0.00127388535031847    0

c2    1398016    7.1529939571507E-7    0

C3_dt    1    2.43469886548297E-9    0

Best Answer

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,553 Gold Crown
    edited February 2018 Accepted Answer

    The index (c1, c2, c3_date) makes the index (c1, c2) redundant from a mechanical perspective - every query that might use (c1, c2) could just as well use (c1, c2, c3_date), so that's the argument for dropping the index on (c1, c2).

    You can't drop the index (c1, c2, c3_date) (probably) because the uniqueness is probably a deliberate choice made to enforce uniqueness of the combination of columns and Oracle won't be able to enforce uniqueness as the data changes unless you have a valid index that starts with the correct columns.

    If you make the index invisible that means (most precisely) invisible to the optimizer. In fact it would still be visible for purposes of the parent/child locking problem if this index were on the child end of a foreign key constraint (but that's irrelevant in your case as both indexes start with the same columns in the same order). This may have some side effects on the choice of execution plan because (a) currently Oracle has some information about the number of combinations of (c1, c2) which would disappear if you dropped the index - you could bypass this issue by creating a column group (extended stats) on the combination - and (b) when two indexes start with the same columns the longer index is larger in leaf block count and has a larger clustering_factor and the clustering_factor is a very significant contributor to the optimizer's estimate of the cost of using the index - in your case, since the c3_date column has the same value for all rows in the index the presence or absence of the column shouldn't affect the clustering_factor, at least at the partition level.

    It has to be said, though, that there are a couple of anomalies visible in the index stats you've reported, which make it harder to predict with any confidence exactly what might happen when you drop the index - so marking it invisible and then creating a column group on the (c1, c2) combination is a good starting point.

    Regards

    Jonathan Lewis

    User_OCZ1T

Answers

  • JohnWatson2
    JohnWatson2 Member Posts: 4,209 Bronze Crown
    edited February 2018

    You need to look at your queries. For example, if a query is selecting and projecting only the columns in the index key, then it will not need to look at the table at all. That won't work any more if you remove columns from the index.

    User_OCZ1T
  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited February 2018

    I would just drop it on the testing system and see what happens under normal load. It means you need to have a testing environment and can replicate load, but that sort Of testing is a normal requirement anyway.

    Dont forget that even if an index isn’t being directly used, it’s statistics could be influencing your execution plans.

    User_OCZ1T
  • John Thorton
    John Thorton Member Posts: 14,493 Silver Crown
    edited February 2018

    The right index is idx1 is unique index on- (c1,c2,c3_dt)

  • User_OCZ1T
    User_OCZ1T Member Posts: 1,921 Gold Badge
    edited February 2018

    Thank you so much. Agree to all your point. Actually in a default scenario i should go for dropping index IDX2 as because IDX1 is covering all the column of IDX2 or its kind of super-set. But something actually making little more inclined towards removing index IDX1, as its ~500GB more bulkier and having additional column C3_DT as the last column which is  a truncated date column(partition key) with 1 distinct value for each partition, so it seem to me IDX1 is really not helping anyway better than IDX2 also its consuming much more space than IDX2. Additional  overhead for DML must be more in case IDX1 as compared to IDX2. Can you please let me know if its really foolish to think of choosing index idx1 over idx2? I do agree that the benefit of idx1 is that it ensure data integrity as because of its uniqueness in nature, but i was trying to reach to a better decision, considering all pros and cons. Appreciate your response.

    Is it correct to say that "stats or influence for optimizer" associated with IDX1 and IDX2 would be closely similar as they just differ by one column which is a partition key(truncate date) and both are local index in nature.

    Below is the global stats for the two indexes.

    index_name,uniqueness,leaf_blocks,distinct_keys,num_rows

    IDX1    UNIQUE    183573250    34784425080    34784425080

    IDX2    NONUNIQUE    148872779    226946615    34944244080

    I understand that if only index scan is being used and table is not being accessed , removing column from that may impact bit negative way, but as this system mostly involves batch queries running >5minutes so was thinking considering the last column being a very less distinct one the table access will not affect the query much. Please suggest.

    Edited-

    Additionally as you said, "Dont forget that even if an index isn’t being directly used, it’s statistics could be influencing your execution plans." if we make the index invisible will optimizer still use its stats during execution plan even it will not use the index directly for fetching data? or i can say if by making invisible my application will work fine, then it means its safe to drop? as the index are very big in nature so i was kind of preferring invisible option before dropping permanently as that will take long time and resource if at all we need to create it back.

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,553 Gold Crown
    edited February 2018 Accepted Answer

    The index (c1, c2, c3_date) makes the index (c1, c2) redundant from a mechanical perspective - every query that might use (c1, c2) could just as well use (c1, c2, c3_date), so that's the argument for dropping the index on (c1, c2).

    You can't drop the index (c1, c2, c3_date) (probably) because the uniqueness is probably a deliberate choice made to enforce uniqueness of the combination of columns and Oracle won't be able to enforce uniqueness as the data changes unless you have a valid index that starts with the correct columns.

    If you make the index invisible that means (most precisely) invisible to the optimizer. In fact it would still be visible for purposes of the parent/child locking problem if this index were on the child end of a foreign key constraint (but that's irrelevant in your case as both indexes start with the same columns in the same order). This may have some side effects on the choice of execution plan because (a) currently Oracle has some information about the number of combinations of (c1, c2) which would disappear if you dropped the index - you could bypass this issue by creating a column group (extended stats) on the combination - and (b) when two indexes start with the same columns the longer index is larger in leaf block count and has a larger clustering_factor and the clustering_factor is a very significant contributor to the optimizer's estimate of the cost of using the index - in your case, since the c3_date column has the same value for all rows in the index the presence or absence of the column shouldn't affect the clustering_factor, at least at the partition level.

    It has to be said, though, that there are a couple of anomalies visible in the index stats you've reported, which make it harder to predict with any confidence exactly what might happen when you drop the index - so marking it invisible and then creating a column group on the (c1, c2) combination is a good starting point.

    Regards

    Jonathan Lewis

    User_OCZ1T
  • User_OCZ1T
    User_OCZ1T Member Posts: 1,921 Gold Badge
    edited February 2018

    Thank you Jonathan.

    fyi, The index stats which i had published was at global level, and some of the historical partition are compressed so not sure if that has impacted the net global stats anyway. Below is the stats for one of the local  partition which is not compressed.

    index_name,leaf_blocks,distinct_keys,clustering_factor,num_rows

    IDX2    1007062    195246354    133507958    195246354

    IDX1    1284091    206224738    141103155    206224738

    I was almost thinking to just drop the invisible index without column group being created on (c1,c2) as the usage of index IDX2 was on few(2 to 3) occasion and tested those queries working fine with IDX1 too. My understanding was if the index is invisible then the optimizer will also not use the stats of that index to decide/reach out to any execution path. but as per your comment it seems even after making index invisible , optimizer will still use the stats of this index under the scene, though it may not use the index as a choice in the final execution path.

    Again as this table is big one so creating extended stats would not require any space like index as it has no physical structures in it, but will it increase the stats gathering time significantly? Actually we are currently using incremental stats option for gathering stats and i hope in first attempt oracle will try to create the column group/extended stats in all the individual partitions and that may take some time and we may need some leisure time to have this implemented on production. Please correct if my understanding is wrong. And is there any other thing associated with column group/extended stats which we need to take care.Please suggest.

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,553 Gold Crown
    edited February 2018
    but as per your comment it seems even after making index invisible , optimizer will still use the stats of this index under the scene

    No, that is exactly 100% the opposite of what I said

    1 "If you make the index invisible that means (most precisely) invisible to the optimizer",

    2 "currently Oracle has some information about the number of combinations of (c1, c2) which would disappear if you dropped the index",

    3 "so marking it invisible and then creating a column group on the (c1, c2) combination is a good starting point"

    1 How could the optimizer use the stats of the index if it's invisible to the optimizer ?

    2 If the stats disappear when you drop the index how could the optimizer use them ?

    3 Why would I suggest that you recreate the index's statistics through a column group if the optimizer could use them anyway ?

    Regarding your other questions:

    Collecting one set of column group stats may result in virtually no change to the workload of gathering stats - but the relative impact does depend on the number of columns and number of partitions in the table, and whether or not you are simply using Oracle's default strategy. I have to say that I don't know exactly what Oracle will do and how long it will take if you simply add the column group and then leave Oracle to do whatever it fancies in the automatic stats collection job; it may be best to add the column group and then do an immediate gather for just that column group and I think your thought about Oracle having to gather (at least) that column across all partitions on the first run is sound - so it would be a good idea to test this on a smaller data set first tracing the calls to dbms_stats to see what SQL Oracle runs in response to various different strategies.  (The worst possible case would be Oracle gathering everything for every column on every partition because you've added one column group - and I have to say I haven't ever tested that option.)

    I've written a few notes about undesirable behaviour and strange overheads relating to extended stats - the best starting point is the following note: https://jonathanlewis.wordpress.com/2012/04/11/extended-stats/  If you read that and follow the links you may spot something that's relevant to your system.

    Regards

    Jonathan Lewis

    User_OCZ1T
  • User_OCZ1T
    User_OCZ1T Member Posts: 1,921 Gold Badge
    edited February 2018

    My Apology. I interpreted wrongly.

    So I hope our testing strategy would be safe as below

    1) Just mark the  index  invisible, if all application worked as expected for certain monitoring period with normal BAU workload without any issue. so its safe to simply drop the index.

    2)If any issue found with variation in plan after making index invisible then we will create the column group(c1,c2) , so that will fix the  plan and should make  the plan back on track. and will monitor it for certain time with normal BAU workload without any issue. And then its safe to drop the index.

    Edited- In above i am opting for column group as a second attempt rather immediate one.Just in case  its  very much required then only i am planning to have  the  column group. I hope that would be fare enough.

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,553 Gold Crown
    edited February 2018
    Edited- In above i am opting for column group as a second attempt rather immediate one.Just in case  its  very much required then only i am planning to have  the  column group. I hope that would be fare enough.

    That's a perfectly reasonably strategy - particularly in your case where the c3_date column holds exactly one value per partition.

    I'm always in favour not making the database do any work it doesn't need to do.

    Regards

    Jonathan Lewis

This discussion has been closed.