Forum Stats

  • 3,824,977 Users
  • 2,260,447 Discussions
  • 7,896,368 Comments

Discussions

Reg: Clustering Factor and index usage.

ranit B
ranit B Member Posts: 3,268 Silver Trophy
edited Jun 6, 2016 2:35PM in SQL & PL/SQL

Hi Experts,

I have a table "X" and below are its index details. Just want to know why is the clustering factor so high for indexes - IDX1 and IDX2 ?

Is there any way CF can be brought down by say ordering data in the indexes ?

Table has 188K records.

INDEX

NAME

DISTINCT

KEYS

LEAF

BLOCKS

CLUSTERING

FACTOR

NUM

ROWS

INDEX COLUMNS
T_PK1881113573887188111ORD_ID
T_IDX2188341699187887188341EXT_ID, ORD_ID
T_IDX1187504762187661188111EXT_ID

Possibly, because of this one of my query is not opting for a index scan, rather FTS.

MERGE INTO gtt_ord t1

    USING X t2 ON (t1.global_ext_id = t2.ext_id)

    WHEN MATCHED THEN

    UPDATE SET t1.ord_id = t2.ord_id;

-------------------------------------------------------------------------------------------

| Id  | Operation            | Name               | Rows  | Bytes | Cost (%CPU)| Time     |

-------------------------------------------------------------------------------------------

|   0 | MERGE STATEMENT      |                    |       |       |   832 (100)|          |

|   1 |  MERGE               | GTT_ORD            |       |       |            |          |

|   2 |   VIEW               |                    |       |       |            |          |

|*  3 |    HASH JOIN         |                    |  1156 |   706K|   832   (2)| 00:00:01 |

|   4 |     TABLE ACCESS FULL| GTT_ORD            |  1152 |   589K|    36   (0)| 00:00:01 |

|   5 |     TABLE ACCESS FULL| X                  |   188K|    18M|   794   (2)| 00:00:01 |

-------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):

---------------------------------------------------

   3 - access("t1"."GLOBAL_EXT_ID"="t2"."EXT_ID")

Thanks and Regards,

-Ranit

( on Oracle 11.2.0.4.0 )

gdanbyKarthick2003
«1

Answers

  • gdanby
    gdanby Member Posts: 221 Bronze Badge
    edited Jun 2, 2016 6:15AM

    Your clustering factor is low for T_PK probably because the data was inserted into the table in ORD_ID order. Most of the index lookups from each leaf block on T_PK will go to the same table data block.

    The opposite is probably true for the other indexes, EXT_ID is likely distributed across the table blocks, so all the index lookups from a single leaf block will go to many table data blocks.

    The data in your indexes will always be "ordered" as you've said. You can rebuild the table using CTAS , inserting in EXT_ID order to improve the clustering factor of T_IDX1/2.

    I don't think you can have it both ways, your clustering factor can be always be low for either T_PK, or for T_IDX1/2, but not both.

  • ranit B
    ranit B Member Posts: 3,268 Silver Trophy
    edited Jun 2, 2016 6:14AM

    I just created another index -

    create index T_IDX3 on X (ORD_ID, EXT_ID);

    Column order just the reverse of T_IDX2, and below are the index details -

    PATIENT_IDX3  188341  699  3924  188341
  • gdanby
    gdanby Member Posts: 221 Bronze Badge
    edited Jun 2, 2016 6:19AM

    That's how I would expect it, the index is still primarily ordered by ORD_ID, so the CF is low.

  • Tom321
    Tom321 Member Posts: 342
    edited Jun 2, 2016 6:28AM

    Hi Ranit,

    if you want to improve the clustering factor of your 2 indexes,  you must reorder the data in the table. Data in the indexes can't be reordered!

    They are already ordered and the CF depends on the data distribution in the table.

    You can do this by creating a new table with:

    create table x2 as select * from X order by ext_id,ord_id;

    + creation of your indexes, etc

    + rename of the tables to make it "visible" to your application

    You can do this either in a maintenance window or during workhours using online redefinition.

    Doing this will however ruin the CF on your PK Index.

    Which one is more important depends on your access pattern to that table. Which index is frequently used in range scans? If your PK is only used for unique scans and not for range scans, then it's CF won't matter.

    But keep in mind, that since your data seems to be inserted in PK order, you will see the CF getting worse over time.

    You will therefore have to repeat this process from time to time.

    Regards

    Thomas

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,974 Blue Diamond
    edited Jun 2, 2016 7:52AM

    The ordering of the ext_id as you walk through the table is at least a little random, and by default the clustering_factor is likely to misinterpret a small amount of randomness as a very large degree of randomness.  If your ext_id values are actually fairly well clustered then you may be able to improve the situation by setting the "table_cached_blocks" preference for the table table and re-gathering stats on the indexes.

    See Richard Foote's blog on the topic - it's a new feature in 12c, backported to 11.2.0.4: https://richardfoote.wordpress.com/category/table_cached_blocks/

    Regards

    Jonathan Lewis

    gdanbyKarthick2003
  • Unknown
    edited Jun 2, 2016 5:02PM
     I have a table "X" and below are its index details. Just want to know why is the clustering factor so high for indexes - IDX1 and IDX2 ?
    

    Well you've posted enough to know that you need to SHOW US all of the info needed to help you.

    I don't see ANY table or index DDL yet the answers to your questions are entirely dependent on just how the tables and indexes were created.

    So post the DDL.

    Also I'm skeptical that the index info you posted is really accurate and even for the same table.

    You show an index name of T_PK which suggests it the 'PK' means primary key. Is that index a primary key?

    If so then since primary keys don't allow null values why do the other two indexes show MORE KEYS than the primary key shows?

    If a table has 100 rows the primary key will show 100 keys. Since Oracle only creates ONE index entry per row other indexes (e.g. non-unique indexes) won't show more keys than the primary key - they will show the same number of fewer..

    Please explain the discrepancy.

    And for other responders - please don't provide an explanation but let the OP provide it.

  • John Thorton
    John Thorton Member Posts: 14,493 Silver Crown
    edited Jun 2, 2016 5:13PM
    ranit B wrote:
    
    Hi Experts,
    
    I have a table "X" and below are its index details. Just want to know why is the clustering factor so high for indexes - IDX1 and IDX2 ?
    Is there any way CF can be brought down by say ordering data in the indexes ?
    
    
    

    why does CF value matter?

    What problem are you trying to solve?

    At what value does CF go from being OK to being high?

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,974 Blue Diamond
    edited Jun 3, 2016 2:08AM
    John Thorton wrote:
    
    why does CF value matter?
    What problem are you trying to solve?
    At what value does CF go from being OK to being high?
    

    Why did you bother with creating a new account with a pseudonym if all you're going to do is behave in the same churlish way?

    No Regards

    Jonathan Lewis

    gdanby
  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,974 Blue Diamond
    edited Jun 3, 2016 2:14AM

    I know that Ranit B has a history of asking irritating questions and not supplying relevant information, but this question is far better directed with better background information than many we see on OTN, so I find your response as unnecessarily harsh.

    It highly likely that the names of the table and its indexes have been camouflaged for reasons of security, but you have no good basis for thinking that the indexes don't belong to the table - there's an obvious answer to your question about the num_rows, and it's even indicated by the names of the indexes (making a fairly modest assumption that it's only a table prefex that has changed in the name).

    If you want to complain about some surprising behaviour you might make some comment about why the plan is a tablescan and not an index fast full scan - with the implication that the SET command has probably been simplified from the original.

    Regards

    Jonathan Lewis

  • Unknown
    edited Jun 3, 2016 4:13PM
    I know that Ranit B has a history of asking irritating questions and not supplying relevant information
    

    I've never considered ANY of the questions posted in the forums 'irritating'. People are free to ask whatever questions, and in whatever manner, they choose.

    But if they really want help they need to be 'supplying relevant information' and this OP has been reminded of that repeatedly.

    While I applaud the attempt to provide that info the info he posted is at best misleading and at worst just plain wrong.

     It highly likely that the names of the table and its indexes have been camouflaged for reasons of security, 
    

    The actual names are irrelevant as long as the names used are consistent with the rest of the info posted.

    but you have no good basis for thinking that the indexes don't belong to the table
    

    I respectfully disagree. Consider both T_PK and T_IDX2. They show a different value for NUM_ROWS. But each of them shows their same respective value for DISTINCT_KEYS.

    Per the Oracle doc

    https://docs.oracle.com/cd/B28359_01/server.111/b28320/statviews_1096.htm

    DISTINCT_KEYS*NUMBER Number of distinct indexed values. For indexes that enforce UNIQUE and PRIMARY KEY constraints, this value is the same as the number of rows in the table (*_TABLES.NUM_ROWS)

    So the dictionary info posted suggests that each set of values is 'For indexes that enforce UNIQUE and PRIMARY KEY constraints' since the value for distinct_keys is the same as num_rows.

    But since num_rows is different for each index is suggests that the indexes are for different tables since a table can NOT have but ONE number of rows at any given moment.

    I consider the above to be a 'good basis for thinking that the indexes don't belong to the table' and that is why I said this:

    Also I'm skeptical that the index info you posted is really accurate and even for the same table.
    

    The info posted, IF ACCURATE, isn't consistent with only one table.

    there's an obvious answer to your question about the num_rows, and it's even indicated by the names of the indexes (making a fairly modest assumption that it's only a table prefex that has changed in the name). 
    

    Yes - there is (although I don't see any connection to the index names). Which is why I ask OP to explain the discrepancy and ask others NOT to until OP has had a chance to do so. I appreciate you respecting that.

    But since OP hasn't responded you and others should feel free to comment.

    So yes, OP made an effort this time to provide info. Unfortunately the info provided is NOT consistent.

    The issue is that OPs question is specifically about the values (clustering factor) in that inconsistent info they posted.

    It seems rather pointless to me to try to conjecture about why two values are so different when it appears they don't relate to the same object at all.

    In my experience it is a waste of time to try to work with missing or poor data especially when obtaining the correct data is so easy. 

This discussion has been closed.