1 Reply Latest reply: Jan 17, 2013 1:51 PM by Nikolay Savvinov RSS

    Cardinality calculation

    985446
      Hello everybody

      I'm new on this forum

      I have a test case to show you, I try to resolve it since a long time unsucessfully

      The query is very simple

      Here is the test case

      create table seba (
      id1 number,
      id2 number
      );

      begin
      for i in 1..300 loop
      insert into seba values ( i, i );
      end loop;
      end;
      /

      begin
      for i in 301..310 loop
      for a in 1..50 loop
      insert into seba values ( i, i );
      end loop;
      end loop;
      end;
      /

      create table sebb as select * from seba;

      execute dbms_stats.gather_table_stats('SYSTEM','SEBA', method_opt=>'for all columns size 254');
      execute dbms_stats.gather_table_stats('SYSTEM','SEBB', method_opt=>'for all columns size 254');


      At this step, we have 2 tables with histogramme on column id1 and id2

      This is the query:

      select *
      from seba a, sebb b
      where a.id1=b.id1;

      ---------------------------------------------------------------------------
      | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
      ---------------------------------------------------------------------------
      | 0 | SELECT STATEMENT | | 27773 | 433K| 5 (20)| 00:00:01 |
      |* 1 | HASH JOIN | | 27773 | 433K| 5 (20)| 00:00:01 |
      | 2 | TABLE ACCESS FULL| SEBA | 800 | 6400 | 2 (0)| 00:00:01 |
      | 3 | TABLE ACCESS FULL| SEBB | 800 | 6400 | 2 (0)| 00:00:01 |
      ---------------------------------------------------------------------------

      What I don't know to do is how to calculate the part of cardinality of unpopular values in the selectivity of the join

      I begin my calculation:

      Statistics are identical for SEBA and SEBB

      You can see:

      Cardinality of SEBA = 800
      ndv id1 310
      Number of popular value from id1 = 10
      Sum of selectivity of popular value from id1 167/254

      This query give us selectivity of the join for popular value:

      with ssq_a as (
      select endpoint_number-nvl(lag(endpoint_number,1) over (partition by owner, table_name, column_name order by endpoint_number ),0) bucket_count,
      endpoint_value
      from dba_tab_histograms where table_name='SEBA' and column_name='ID1' order by endpoint_number asc ),
      ssq_b as (
      select endpoint_number-nvl(lag(endpoint_number,1) over (partition by owner, table_name, column_name order by endpoint_number ),0) bucket_count,
      endpoint_value
      from dba_tab_histograms where table_name='SEBB' and column_name='ID1' order by endpoint_number asc )
      select endpoint_value, a.bucket_count bucket_count_a, b.bucket_count bucket_count_b, a.bucket_count/254*b.bucket_count/254 sel,
      sum(a.bucket_count/254*b.bucket_count/254) over ( order by endpoint_value ) cumulative_sel
      from ssq_a a inner join ssq_b b using(endpoint_value)
      where a.bucket_count>1 and
      b.bucket_count>1;

      NB: There is no null value

      On the last line: 0.043260586521173

      At the end:

      card = 800 x 800 x 0.043260586521173 = 27686

      Optimier show: 27773


      I Know, join selectivity of unpopular value is missing, this is the number I try to find !!!

      By reverse calculation:

      1/(27773/800/800-0.043260586521173) = 7422

      It missing 1/7422 to add to 0.043260586521173, I'm going to spell it DELTA



      Without histogramme:

      card = card(a)xcard(b)xsel where sel=1/(max( ndv(seba,id1), ndv(sebb,id1) )

      I I try to do the same thing:
      Number of unpopular value 300
      Sum of selectivity of unpopular value 87/254
      Selectuvity of one unpopular value = 87/254/300 = 1/876

      We can imagine to reduce the volume card(a)xcard(b) with only unpopular value:

      card = 800 x 800 x ( 1 - 0.043260586521173 ) = 612313

      With formula witout histogram:

      card = 612313 x 1/300 = 2041

      And sel(val impop) = 2041 / ( 800 x 800 ) = 0,0031890625 = 1/313

      1/313 is not 1/7422

      I have tried different scenarii unsucessfully

      Is there somebody which can help me ?

      Sory for my english, is there something not clear I can retry to explain it :)

      Thanks
        • 1. Re: Cardinality calculation
          Nikolay Savvinov
          Hi,

          Welcome to he forum!

          The formula for selectivity becomes quite complicated in the presence of a histogram, and it's version-dependent. You can find some information about that in Jonathan Lewis' blog: http://jonathanlewis.wordpress.com/category/oracle/statistics/histograms/

          Best regards,
          Nikolay