Forum Stats

  • 3,734,269 Users
  • 2,246,935 Discussions
  • 7,857,216 Comments

Discussions

Creating index on table

User910243567
User910243567 Member Posts: 620 Silver Badge
edited Mar 8, 2017 11:25PM in SQL & PL/SQL

We have below table with columns,

Table T1

Columns:

-----------

Col_1, Col_2, Col_3, Col_4, Col_5, Col_6, Col_7, Col_8, Col_9, Col_10, Col_11, Col_12, Col_13, Col_14, Col_15

on which below indexes are created.

XXTEST_Col_1    Col_1

XXTEST_Col_2    Col_2

XXTEST_Col_3    Col_3

XXTEST_Col_5    Col_5

XXTEST_Col_6    Col_6

XXTEST_Col_7    Col_7

XXTEST_Col_8    Col_8

XXTEST_Col_8    (Col_4, Col_10, Col_11)

I have requirement to update table T1 as below and its taking really long.

UPDATE T1

SET Col_3= x_value,

    Col_6 = y_value

where Col_4='N'

and Col_3 IS NULL;

To improve performance, created below index and it improved the performance of the update.

New Index:  XXTEST_Col_4    (Col_4, Col_3)

My question is it good idea to create the above index with already existing indexes. Is there any other better way to handle this.

Thanks for your time.

Tagged:

Best Answer

  • Nimish Garg
    Nimish Garg Member Posts: 3,185 Gold Trophy
    edited Jan 23, 2017 5:43AM Accepted Answer

    Depends on your data.

    Check count of both queries

    select count(*) from T1 where Col_4='N' and Col_3 IS NULL;

    select count(*) from T1 where Col_4='N';

    If there is a significant difference in count, and delete is executed frequently, it might be good to have.

    User910243567

Answers

  • Billy Verreynne
    Billy Verreynne Member Posts: 28,280 Red Diamond
    edited Jan 23, 2017 12:40AM

    Indexes provides a faster I/O path at the expense of space, and a performance overhead on row/table maintenance.

    You need to weigh these factors and then decide whether the cost of an index is more or less than the benefit of that index.

  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited Jan 23, 2017 2:14AM

    Il assume the duplicate index name is an error from obfuscation and your column names are really sensible.

    Indexes are useful when they limit the amount of blocks you'll be visiting in the table enough to make it cheaper to use the index than a full table scan (or alternative index).

    We have no idea how selective each of your columns woupld be as a filter. You haven't shared any execution plans so we can't have any idea of what Oracle did do to achieve your filter. There are some possibilities:

    The only Index that could be first used was the one on 4,8,10 (As null index keys dont exist in singular column indexes). Oracle thought that as you are filtering on an exact value of col 4 it would read 1/n of the index to determine which rowids it needs to visit  (where n is the number of different values of col 4), it can then visit those rows via the index. Maybe this takes a long time because that  col_4 ='N' is highly selective due to skew. Or maybe Oracle believed that the filter would be highly selective and so opted for a full table scan, reading the entire table segment.

    When you added that new index you gave Oracle a way of applying both filters to it, now Oracle can identify only the rows that fit both filters, hopefully visiting less blocks in the process (What if all rows with col_4 = 'N' had a null value for col_3?) This may have tipped the decision over to using an index, which may be faster than the full table scan if many table blocks were cached or if it meant reading much less of the table.

    The execution plans would tell you exactly what Oracle attempted to do and the costs it associated with them.

    But of course, adding this index is adding to the list of things that Oracle needs to maintain when it updates rows. The impact is often hard to predict but you can run your own benchmarks to make sure your critical statements complete in the  required times.

    User910243567
  • Nimish Garg
    Nimish Garg Member Posts: 3,185 Gold Trophy
    edited Jan 23, 2017 5:43AM Accepted Answer

    Depends on your data.

    Check count of both queries

    select count(*) from T1 where Col_4='N' and Col_3 IS NULL;

    select count(*) from T1 where Col_4='N';

    If there is a significant difference in count, and delete is executed frequently, it might be good to have.

    User910243567
  • Martin Preiss
    Martin Preiss Member Posts: 2,381 Gold Trophy
    edited Jan 23, 2017 6:16AM

    in addition to the explanations already given: it is alsolutely possible that the existing indexes are indeed well chosen - but when I see lots of single column indexes I begin to wonder if they are really useful. Oracle rarely combines multiple (B*Tree) indexes in its data access (though that's a usual approach with bitmap indexes). So maybe some of the existing indexes are not valuable (since they are not very selective) and could be removed. But I have to add that removing existing indexes is not a simple task - since it is hard to determine if they have a vaild reason to be.

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,607 Gold Crown
    edited Jan 23, 2017 6:17AM

    As the other posters have pointed out, there's a cost/benefit calculation: "how much do you benefit how often vs what do you pay" and the basic question of the underlying statistics of the data.

    IF you have only a small number of rows at any time which match the condition

    where Col_4='N'

    and Col_3 IS NULL;

    AND you control the SQL used

    THEN it's a good idea to create the smallest cheapest possible index that matches the requirement. In your case:

    create index xxtext_fbi1 on t1(case when col_4='N' and col_3 is null then 0 end);

    and code:

    update t1 set

            col_3 = 123,

            col_6 = 456

    where

            case when col_4='N' and col_3 is null then 0 end = 0

    ;

    Regards

    Jonathan Lewis

    P.S.  For a slightly more advanced analysis of risk, here's a reference to a note I wrote a few years ago about this type of strategy - generally it's good, occasionally there's a side effect to worry about:  https://jonathanlewis.wordpress.com/2010/07/30/scalability-conflict/

    User910243567
  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited Jan 23, 2017 7:55AM
    Jonathan Lewis wrote:As the other posters have pointed out, there's a cost/benefit calculation: "how much do you benefit how often vs what do you pay" and the basic question of the underlying statistics of the data.IF you have only a small number of rows at any time which match the conditionwhere Col_4='N' and Col_3 IS NULL;AND you control the SQL usedTHEN it's a good idea to create the smallest cheapest possible index that matches the requirement. In your case:create index xxtext_fbi1 on t1(case when col_4='N' and col_3 is null then 0 end);and code:update t1 set col_3 = 123, col_6 = 456where case when col_4='N' and col_3 is null then 0 end = 0;RegardsJonathan LewisP.S. For a slightly more advanced analysis of risk, here's a reference to a note I wrote a few years ago about this type of strategy - generally it's good, occasionally there's a side effect to worry about: https://jonathanlewis.wordpress.com/2010/07/30/scalability-conflict/ 

    Jonathan, that's an interesting strategy, I've always tended towards the opposite - if I am to make a new index I want it to be useful for more than just one query.

    I'd have gone for something like

    virtual column on nvl2(col_3,null,col_4) and name it something explanatory like ready_status (let's say that null col_3 means that the row is ready for something and col_4 is some other status).

    Index the virtual column (and gather stats)

    A review of the performance of queries that touch the table would show if the index has caused anything unwanted. I'd trust that as the statistics are there (and if they needed a histogram you would gather one), it's unlikely it could go too pear shaped.

    It'll be useful for this query as we can do where ready_status= 'N'. If we also wanted to know the rows that have col_4='Y' and col_3 is null, we could just use ready_status ='Y'.

    Obviously, the meaning behind the columns will give suggestions as to whether that would be useful to use.

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,607 Gold Crown
    edited Jan 23, 2017 8:57AM
    Andrew Sayer wrote:Jonathan, that's an interesting strategy, I've always tended towards the opposite - if I am to make a new index I want it to be useful for more than just one query. I'd have gone for something like virtual column on nvl2(col_3,null,col_4) and name it something explanatory like ready_status (let's say that null col_3 means that the row is ready for something and col_4 is some other status).Index the virtual column (and gather stats)A review of the performance of queries that touch the table would show if the index has caused anything unwanted. I'd trust that as the statistics are there (and if they needed a histogram you would gather one), it's unlikely it could go too pear shaped.

    Andrew,

    There really isn't a significant difference between your suggestion and mine. The virtual column is a nicety, of course, especially since 12c would allow you to declare it as invisible, thus defending against code that failed to list columns explicitly, but the principle is pretty much the same.The key difference is in the potential for wasted resources.

    Like mine, your index will only be used for code that mimics the virtual column definition - so nothing other than the code the OP explicitly engineers will use the index anyway: but your index will be maintained for all rows where col_3 is (or changes through) null which (based on current requirements) is a waste of space and a waste of resources. It may also mean that the OP has to create a histogram on the virtual column and (until 12c) that could lead to instability in plan generation unless the histogram is coded rather than gathered.

    Of course the OP may say that actually they've also got 'A', 'X', and 'Z' for the col_4 values which they're interested in for the special case of col_3 is null - but 99.99% of the data has some other value(s). In that case I'd look at either 4 indexes (or virtual columns) one for each value or one index (or virtual column) defined as:

    create index t2_f2 on t2(

            case when col4 in ('A','N','X','Z') and col3 is null then 0 end

    )

    ;

    Regards

    Jonathan Lewis

    User910243567
This discussion has been closed.