This content has been marked as final. Show 3 replies
gogol wrote:Congratulations for reading something and taking the time to come up with a thought experiment to question what you've read. In fact your scenario is quite common and not a silly question at all.
Now a situation came in my mind (may be its a very silly question but I am not getting it, please help me)...
Suppose in a table of eighty thousand rows there are several columns and one column has total five distinct values (which is very less and we should not use index). But on the other hand, among those five distinct values, three of them appeared for very less number of times (in 80000 rows, they appear only for 10 to 20 times) and rest two distinct values appeared a lot of times.80,000 isn't a particularly large table - the last time I saw this the table held several million rows with 2 common values and 4 or 5 values which tended to hold 10 to 100 rows each, and only temporarily. (See http://jonathanlewis.wordpress.com/2009/04/23/histogram-change/ )
Now in this case, an index on that column will be beneficial if we access those three values which appeared less number of times and the same index will be disastrous for the other two values.
The histogram with literals in the SQL is one (perfectly valid) approach - but if you do that you have to be very careful with the histogram in case Oracle manages to sample a histogram that only sees the two popular values. The solution to that would be to ensure that you could create a representative histogram using some pl/sql ( See http://jonathanlewis.wordpress.com/2009/05/28/frequency-histograms/ )
(Updated - I hit "post" when I meant to hit "Preview")
An alternative that is much nicer if you can modify the application code is to create a function-based index (or a virtual column with an index in 11g) to hide the popular values, e.g.:
(I may have got the syntax a bit wrong - I'm not near a database at present.)
create index t1_f1 on t1( (case flag_col when 'X' then null when 'Y' then null else flag_col end ) )
This index will be tiny as it holds only the rare values that you'd typically be interested in - which also means it reduces the index maintenance costs, of course.
The side effect is that your client code has to know that queries for X or Y have to use predicates of the form flag_col = 'Y', while queries for all other values should use "case ... end = value".