Skip to Main Content

Oracle Database Discussions

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

Extended column statistics for Join order and Join type

Sekar_BLUE4EVERMar 6 2018 — edited Mar 8 2018

I was reading up on extended statistics(although I have never actually used them) and I see that they should be helpful in estimating the cardinality when the query contains multiple columns in WHERE caluse. Is anyone using extended column statistics in their database? Now my main doubt is will extended column statistics help in deciding the join order and join type when there are multiple tables joined together and join type ?

  select a.*,b.*,c.* from a,b,c where a.id=b.id and a.id1=b.id1 and a.id=c.id and b.id2=c.id2

  and a.id4=66 and b.id7=44 and c.id88=88

Would it make sense to gather extended statistics on columns

  table A - id,id1

  table B - id,id1

  table C - id,id2

Will this help in deciding the join order? Is there a way to gather statistics specifically for particular joins if we know a particular set columns will be joined frequently or should we go for a cluster in that case?

Comments

AndrewSayer

Sekar_BLUE4EVER wrote:

I was reading up on extended statistics(although I have never actually used them) and I see that they should be helpful in estimating the cardinality when the query contains multiple columns in WHERE caluse. Is anyone using extended column statistics in their database? Now my main doubt is will extended column statistics help in deciding the join order and join type when there are multiple tables joined together and join type ?

  1. selecta.*,b.*,c.*froma,b,cwherea.id=b.idanda.id1=b.id1anda.id=c.idandb.id2=c.id2
  2. anda.id4=66andb.id7=44andc.id88=88

Would it make sense to gather extended statistics on columns

table A - id,id1

table B - id,id1

table C - id,id2

Will this help in deciding the join order? Is there a way to gather statistics specifically for particular joins if we know a particular set columns will be joined frequently or should we go for a cluster in that case?

Maybe.

Extended statistics help you when columns are related to eachother, e.g. if you had 10 values of col_1 and 10 values of col_2, filtering on col_1 =:y and col_2 = :z could be a selectivity of 1/100 if they're unrelated or 1/10 if they're completely related.

You get the same sort of statistic when you index two columns together, as you now have distinct keys.

Try it and see?

Do you actually have a problem with what the optimizer has come up with?

SeánMacGC

Yes, I have used them, and they can be beneficial: they'll help the optimizer to estimate the correct cardinality where multiple related columns are involved as WHERE clause predicates (as Andrew has explained) -- check out these URLs for an overview:

https://blogs.oracle.com/optimizer/extended-statistics

https://blogs.oracle.com/optimizer/how-do-i-know-what-extended-statistics-are-needed-for-a-given-workload

JohnWatson2

Extended statistics are often of critical importance to get correct join order. So much so that Uncle Oracle has provided a way to detect the need for them automatically, see the dbms_stats.seed_col usage / dbms_stats.report_col_usage /  dbms_stats.create_extended_stats routine.

Anywhere you see filter or join predicates across correlated columns, you need them. Predicates on columns to do with dates or addesses are common culprits: this type of information is often denormalized. Such as

...where cust_city='Passau' and cust_state_province='Bayern'

in the sh.customers demo table.

An alternative that can get you out of trouble quickly is to set optimizer_dynamic_sampling=4

Jonathan Lewis

Others have already commented on the benefits of column groups for getting better cardinality estimates and plans (including join order) when multiple columns from the same table are used in the where clause  - there are a few articles on limitations, surprises and defects (and benefits) on my blog.

For the example you've given you may find some benefit if you create a column group on both a (id, id1) and b (id, id1) ... though if one table has an index on just those two columns you need only create a column group on the other table to get the benefit. I am fairly sure that you won't see any change for that query if you create a column group on c(id, id2) because you're not comparing them with two columns from the SAME table, you're comparing with one column from a and one from b and the "product of predicates" will probably mean that the a and b single selectivities will override the c column group selectivity.

On the other hand - if you change the predicate "a.id = c.id" to "b.id = c.id", or change the predicate "b.id1 = c.id1" to "a.id1 = c.id1" [Updated to remove error: the b/c predicate is on id2, not id1] then (before you've created any column groups) you may find that the cardinality figures change and the join order changes.  (Transitive closure that the human eye can see but the optimizer isn't coded to handle makes either modification legal.)  If you do make the code change then a column group on c (id, id2) plus a column group on b(id, id2) [Updated to get correct colukmn groups relating to id2] could then have an effect.

Regards

Jonathan Lewis

Sekar_BLUE4EVER

Hi Jonathan,

     Thanks for the reply. I went through the post in your blog .Posting it here for future readers

https://jonathanlewis.wordpress.com/2018/03/08/column-groups-4/

I have another question on how the optimizer would decide the join type based on extended stats.

Let us Assume that

join order is A ->B->C

a.id4=66 returns 9999 rows

b,id7=44 returns 9999 rows

Total rows in A and B = 10000

After further filtering with join

a.id=b.id and a.id1=b.id1 satisfies  3800 rows

We have extended statistics on

(a.id,a.id1)

(b.id,b.id1)

Considering that  there is a histogram on these extended stats like

How do these stats help in deciding the join type? We do not have the filter condition included in the extended stats i.e a.id4 and b.id7

The data distribution is in such a way that a.id,a.id1 has 9999 distinct values with 1000  row for each  combination and similary for b.id,b.id1 .With this statistics optimizer may think that i have might have to return a large number of records but in fact if we include the condition a.id4=66 returns only 1 row from each distinct combination of id,id1 i.e in total  9999 rows and lets say the same case for table B. Wouldn't adding the condition in where clause to extended stats help ?

1 - 5
Locked Post
New comments cannot be posted to this locked post.

Post Details

Locked on Apr 5 2018
Added on Mar 6 2018
5 comments
1,550 views