What about your index, John Brady, that led you to 0.5% as a rule of thumb?
Using this test case, I get a switch from index range scan to full table scan at 14% for a perfectly clustered index and at 0.06% for an unclustered index:
I am not suggesting that this test is in any way scientific. The fact that it is on my laptop with SSD is going to make it irrelevant to the real world, never mind system stats and serial or parallel direct or indirect read. But it does show that making assumptions about a %age crossover point is problematic.
My tests got a cutover point as low as 0.2% of the data in the table. All of the SQL is in the blog post I linked to in a previous post, including the execution plans that were used, so anyone can replicate the results or try variations. The index is a normal b-tree index on a numeric column with values evenly distributed across a limited range of possible values. I used 0.5% in my reply because I know that 0.2% is specific to my test case, which had relatively small rows, so in reality the actual cutover could be higher.
One of the big assumptions I made was that Clustering Factor would be close in value to the number of rows in the table. When this is not true, when you have highly clustered data and the Clustering Factor is much lower, then of course the "cutover point" will be much higher. You have shown how wide the range can be - 14% for a perfectly clustered index and 0.06% for an unclustered index. Jonathan achieved 24.5% for an index range scan, which is higher still.
As a result of this feedback I'm not going to use the "0.5% cutover point" phrase any more, as it is not always true, and has too many assumptions built into it. It might be true in many cases, but it is not true in all cases.
HI Subhajit ,
May you have generated plans from Prod ( the 1st ) and Dev( 2nd one) , , have you verified the tables and index have updated stats ? Please check first .
Shivendra Narain Nirala
I completely agree with Jonathan Lewis about these percentages. It is far more complex than just a percentage of rows. It is more complex than including the pre 12c clustering factor. It is more complex than including the 12c clustering factor. It is more complex than including optimizer parameters .
Yes we could probably use all the statistics and parameters to figure out what this cut off would be for a certain index and column on a table for the optimizer .
But does that make it the actual cut off point where it would be faster? Are there things that not even the optimizer can figure out (yet)?
The general sentiment of full scans aren't bad is good. I still would not expect to see them in a normal oltp process, it would suggest I'm probably missing filters or an actually useful index.
Looking at the original post then I'd say this argument is almost irrelevant. The op merely stated that there were different numbers of rows in the two versions of the table, I do not see it mentioned that the filters being applied reduce the result set more or less (as a percentage of the table) on the differing servers.
As mentioned by others, the predicates probably don't help the optimizer and its likely that it is not getting correct cardinalities from these. I suspect that the column level stats on the tables in either version are different so that the optimizer is guessing at different selectivities. Its also possible that the available indexes have different clustering factors between the two servers due to the way the data was populated. Also the skip scan in the second plan is usually an indicator that there is very few distinct values for the leading columns of that index, something highly dependent on statistics.
Yes I have performed update stats on both the tables and ran the explain plan.
I have checked the indexes all are same.