5 Replies Latest reply on Mar 8, 2018 9:35 AM by Sekar_BLUE4EVER

    Extended column statistics for Join order and Join type

    Sekar_BLUE4EVER

      I was reading up on extended statistics(although I have never actually used them) and I see that they should be helpful in estimating the cardinality when the query contains multiple columns in WHERE caluse. Is anyone using extended column statistics in their database? Now my main doubt is will extended column statistics help in deciding the join order and join type when there are multiple tables joined together and join type ?

       

        select a.*,b.*,c.* from a,b,c where a.id=b.id and a.id1=b.id1 and a.id=c.id and b.id2=c.id2 
        and a.id4=66 and b.id7=44 and c.id88=88
      

       

       

      Would it make sense to gather extended statistics on columns

       

        table A - id,id1

        table B - id,id1

        table C - id,id2

      Will this help in deciding the join order? Is there a way to gather statistics specifically for particular joins if we know a particular set columns will be joined frequently or should we go for a cluster in that case?

        • 1. Re: Extended column statistics for Join order and Join type
          AndrewSayer

          Sekar_BLUE4EVER wrote:

           

          I was reading up on extended statistics(although I have never actually used them) and I see that they should be helpful in estimating the cardinality when the query contains multiple columns in WHERE caluse. Is anyone using extended column statistics in their database? Now my main doubt is will extended column statistics help in deciding the join order and join type when there are multiple tables joined together and join type ?

           

          1. selecta.*,b.*,c.*froma,b,cwherea.id=b.idanda.id1=b.id1anda.id=c.idandb.id2=c.id2
          2. anda.id4=66andb.id7=44andc.id88=88

           

           

          Would it make sense to gather extended statistics on columns

           

          table A - id,id1

          table B - id,id1

          table C - id,id2

          Will this help in deciding the join order? Is there a way to gather statistics specifically for particular joins if we know a particular set columns will be joined frequently or should we go for a cluster in that case?

          Maybe.

           

          Extended statistics help you when columns are related to eachother, e.g. if you had 10 values of col_1 and 10 values of col_2, filtering on col_1 =:y and col_2 = :z could be a selectivity of 1/100 if they're unrelated or 1/10 if they're completely related.

          You get the same sort of statistic when you index two columns together, as you now have distinct keys.

           

          Try it and see?

          Do you actually have a problem with what the optimizer has come up with?

          • 2. Re: Extended column statistics for Join order and Join type
            SeánMacGC

            Yes, I have used them, and they can be beneficial: they'll help the optimizer to estimate the correct cardinality where multiple related columns are involved as WHERE clause predicates (as Andrew has explained) -- check out these URLs for an overview:

             

             

            https://blogs.oracle.com/optimizer/extended-statistics

             

            https://blogs.oracle.com/optimizer/how-do-i-know-what-extended-statistics-are-needed-for-a-given-workload

            • 3. Re: Extended column statistics for Join order and Join type
              JohnWatson2

              Extended statistics are often of critical importance to get correct join order. So much so that Uncle Oracle has provided a way to detect the need for them automatically, see the dbms_stats.seed_col usage / dbms_stats.report_col_usage /  dbms_stats.create_extended_stats routine.

               

              Anywhere you see filter or join predicates across correlated columns, you need them. Predicates on columns to do with dates or addesses are common culprits: this type of information is often denormalized. Such as

              ...where cust_city='Passau' and cust_state_province='Bayern'

              in the sh.customers demo table.

               

              An alternative that can get you out of trouble quickly is to set optimizer_dynamic_sampling=4

              • 4. Re: Extended column statistics for Join order and Join type
                Jonathan Lewis

                Others have already commented on the benefits of column groups for getting better cardinality estimates and plans (including join order) when multiple columns from the same table are used in the where clause  - there are a few articles on limitations, surprises and defects (and benefits) on my blog.

                 

                For the example you've given you may find some benefit if you create a column group on both a (id, id1) and b (id, id1) ... though if one table has an index on just those two columns you need only create a column group on the other table to get the benefit. I am fairly sure that you won't see any change for that query if you create a column group on c(id, id2) because you're not comparing them with two columns from the SAME table, you're comparing with one column from a and one from b and the "product of predicates" will probably mean that the a and b single selectivities will override the c column group selectivity.

                 

                On the other hand - if you change the predicate "a.id = c.id" to "b.id = c.id", or change the predicate "b.id1 = c.id1" to "a.id1 = c.id1" [Updated to remove error: the b/c predicate is on id2, not id1] then (before you've created any column groups) you may find that the cardinality figures change and the join order changes.  (Transitive closure that the human eye can see but the optimizer isn't coded to handle makes either modification legal.)  If you do make the code change then a column group on c (id, id2) plus a column group on b(id, id2) [Updated to get correct colukmn groups relating to id2] could then have an effect.

                 

                Regards

                Jonathan Lewis

                1 person found this helpful
                • 5. Re: Extended column statistics for Join order and Join type
                  Sekar_BLUE4EVER

                  Hi Jonathan,

                       Thanks for the reply. I went through the post in your blog .Posting it here for future readers

                  https://jonathanlewis.wordpress.com/2018/03/08/column-groups-4/

                   

                  I have another question on how the optimizer would decide the join type based on extended stats.

                   

                  Let us Assume that

                  join order is A ->B->C

                  a.id4=66 returns 9999 rows
                  b,id7=44 returns 9999 rows 
                  

                   

                  Total rows in A and B = 10000

                  After further filtering with join

                  a.id=b.id and a.id1=b.id1 satisfies  3800 rows 
                  

                   

                  We have extended statistics on

                  (a.id,a.id1) 
                  (b.id,b.id1)
                  

                   

                  Considering that  there is a histogram on these extended stats like

                   

                   

                  How do these stats help in deciding the join type? We do not have the filter condition included in the extended stats i.e a.id4 and b.id7

                  The data distribution is in such a way that a.id,a.id1 has 9999 distinct values with 1000  row for each  combination and similary for b.id,b.id1 .With this statistics optimizer may think that i have might have to return a large number of records but in fact if we include the condition a.id4=66 returns only 1 row from each distinct combination of id,id1 i.e in total  9999 rows and lets say the same case for table B. Wouldn't adding the condition in where clause to extended stats help ?