This discussion is archived
7 Replies Latest reply: Dec 2, 2013 4:42 AM by NightWing RSS

Does clustering factor vary for each index?

NightWing Journeyer
Currently Being Moderated

Dear experts,

 

Today when I was reading an article about bitmap index and b-tree index I have learned a new thing called clustering factor. I read more about it and I am just wondering that does it change for each index?

 

For example lets say that I got table like following

 

create table emp(

emp_no number primary key,

name varchar2(50),

surname varchar2(50)

);

 

create index emp_name_indx on emp(name);

 

We got two indexed columns one is emp_no and the other one is name. My question is, suppose that we inserted 100 million rows with sequential order by emp_no. However, names can be different. for example;

 

1 michael jackson

2 toni blair

3 zack son

......

1000000 mike tyson

 

I mean, names are inserted scattered. And I think this is very normal because you can only insert one order. Any way so, the clustering factor for emp_no might be low. However, for name it might be so high.

 

If this is true, so this doesn't depend on us? we can't do anything for this factor, right? Because we can only insert rows according to one column's order, isn't it?

 

 

My last question is also very important, lets say that we got 20 tables and these tables are populated with sequential order with hundred millions of rows. The clustering factor for each indexes in these tables are also very very high. Because the tables are populated with sequential order and if the rows are added sequential in the same data blocks, that means tables become very scattered in datafiles. It is very bad for clustering factor?

 

Thanks in advance.

  • 1. Re: Does clustering factor vary for each index?
    Nikolay Savvinov Guru
    Currently Being Moderated

    Hi,

     

    yes your understanding is correct. You cannot have perfect clustering by all columns. in fact achieving high clustering by even one column is difficult - you can rebuild the table using CTAS ordered by, but after that clustering will start to degrade as data is modified in the table, so you'll have to do it periodically. If you really need clustering, consider making the table into an IOT.

     

    Best regards,

    Nikolay

  • 2. Re: Does clustering factor vary for each index?
    NightWing Journeyer
    Currently Being Moderated

    Thanks a lot NikolaySavvinov, Could you please mention about my last question below?

     

    My last question is also very important, lets say that we got 20 tables and these tables are populated with sequential order with hundred millions of rows. The clustering factor for each indexes in these tables are also very very high. Because the tables are populated with sequential order and if the rows are added sequential in the same data blocks, that means tables become very scattered in datafiles. It is very bad for clustering factor?


    Thanks in advance.

  • 3. Re: Does clustering factor vary for each index?
    Hemant K Chitale Oracle ACE
    Currently Being Moderated

    A bad clustering factor is an issue when you have an execution plan that attempts to use an index to retrieve (what it thinks is) very many rows.  The optimizer will estimate that the number of blocks to be read would be high because of the clustering factorr indicating that the rows are scattered across the table blocks.

     

    If you are using an index to retrieve (what the optimizer thinks) is very few rows, the clustering factor does not play a significant role.

     

    Hemant K Chitale

  • 4. Re: Does clustering factor vary for each index?
    Aman.... Oracle ACE
    Currently Being Moderated

    NightWing wrote:

     

    Thanks a lot NikolaySavvinov, Could you please mention about my last question below?

     

    My last question is also very important, lets say that we got 20 tables and these tables are populated with sequential order with hundred millions of rows. The clustering factor for each indexes in these tables are also very very high. Because the tables are populated with sequential order and if the rows are added sequential in the same data blocks, that means tables become very scattered in datafiles. It is very bad for clustering factor?


    Thanks in advance.

    May be it's early morning and I am not really able to see something obvious but how does the sequential inserts makes the data very scattered? If the data is truly scattered among the blocks, the CF would be nearly equal to the rows of the table and that would be an indication to the optimizer that using the index would be costly . But the thing is, this is going to be a recurring issue as the data in the table would keep on changing and so would be the distribution of it in the data blocks.

     

    Aman....

  • 5. Re: Does clustering factor vary for each index?
    Girish Sharma Guru
    Currently Being Moderated

    NightWing wrote:

    My last question is also very important, lets say that we got 20 tables and these tables are populated with sequential order with hundred millions of rows. The clustering factor for each indexes in these tables are also very very high. Because the tables are populated with sequential order and if the rows are added sequential in the same data blocks, that means tables become very scattered in datafiles. It is very bad for clustering factor?

    See below link :

    Ask Tom "Dba_indexes"

     

    i.e. if you wish to have CF of indexes near to number of blocks, you have to insert the data order by indexed column; i.e. before bulk insert you should tell order by that column which is most important to you/indexed column.

     

    In the above link Tom has given two examples:

    1.insert into .... order by id column, created indexes on id and name column, so CF of the index for id column is near to the number of blocks. Because order by column is the main factor for CF.

    2.insert into ... order by name column,  created indexes on id and name column, so CF of the index for name column is near to the number of blocks.  Because order by column is the main factor for CF.

     

    Regards

    Girish Sharma

  • 6. Re: Does clustering factor vary for each index?
    Nikolay Savvinov Guru
    Currently Being Moderated

    Hi,

     

    I'm not sure what you meant by "sequential inserts" but the bottom line is simple: regular tables in Oracle are also known as heap tables. As the name suggests, data is not expected to be ordered by any key, so it's normal for a clustering factor to be "bad" (high). If you want to store data ordered by some key, then heap tables is not the right solution for you: use IOTs or clusters.

     

    In a way, clustering is like caching in sense that it can have a very big effect on performance, but the developer has virtually no control over that effect. If you get lucky, your data will come from cache and this would make your query very fast. But if not, there's not much you can do. If your data happens to be clustered around a key you often use in queries, great. But if not, there's not much you can do about it (with heap tables anyway).

     

    So I would say that clustering factors are for understanding performance and not for improving it.

     

    Best regards,

    Nikolay

  • 7. Re: Does clustering factor vary for each index?
    NightWing Journeyer
    Currently Being Moderated

    Thank you everybody.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points