# Cardinality calculation

**985446**Jan 17, 2013 4:39 AM

Hello everybody

I'm new on this forum

I have a test case to show you, I try to resolve it since a long time unsucessfully

The query is very simple

Here is the test case

create table seba (

id1 number,

id2 number

);

begin

for i in 1..300 loop

insert into seba values ( i, i );

end loop;

end;

/

begin

for i in 301..310 loop

for a in 1..50 loop

insert into seba values ( i, i );

end loop;

end loop;

end;

/

create table sebb as select * from seba;

execute dbms_stats.gather_table_stats('SYSTEM','SEBA', method_opt=>'for all columns size 254');

execute dbms_stats.gather_table_stats('SYSTEM','SEBB', method_opt=>'for all columns size 254');

At this step, we have 2 tables with histogramme on column id1 and id2

This is the query:

select *

from seba a, sebb b

where a.id1=b.id1;

---------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

---------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 27773 | 433K| 5 (20)| 00:00:01 |

|* 1 | HASH JOIN | | 27773 | 433K| 5 (20)| 00:00:01 |

| 2 | TABLE ACCESS FULL| SEBA | 800 | 6400 | 2 (0)| 00:00:01 |

| 3 | TABLE ACCESS FULL| SEBB | 800 | 6400 | 2 (0)| 00:00:01 |

---------------------------------------------------------------------------

What I don't know to do is how to calculate the part of cardinality of unpopular values in the selectivity of the join

I begin my calculation:

Statistics are identical for SEBA and SEBB

You can see:

Cardinality of SEBA = 800

ndv id1 310

Number of popular value from id1 = 10

Sum of selectivity of popular value from id1 167/254

This query give us selectivity of the join for popular value:

with ssq_a as (

select endpoint_number-nvl(lag(endpoint_number,1) over (partition by owner, table_name, column_name order by endpoint_number ),0) bucket_count,

endpoint_value

from dba_tab_histograms where table_name='SEBA' and column_name='ID1' order by endpoint_number asc ),

ssq_b as (

select endpoint_number-nvl(lag(endpoint_number,1) over (partition by owner, table_name, column_name order by endpoint_number ),0) bucket_count,

endpoint_value

from dba_tab_histograms where table_name='SEBB' and column_name='ID1' order by endpoint_number asc )

select endpoint_value, a.bucket_count bucket_count_a, b.bucket_count bucket_count_b, a.bucket_count/254*b.bucket_count/254 sel,

sum(a.bucket_count/254*b.bucket_count/254) over ( order by endpoint_value ) cumulative_sel

from ssq_a a inner join ssq_b b using(endpoint_value)

where a.bucket_count>1 and

b.bucket_count>1;

NB: There is no null value

On the last line: 0.043260586521173

At the end:

card = 800 x 800 x 0.043260586521173 = 27686

Optimier show: 27773

I Know, join selectivity of unpopular value is missing, this is the number I try to find !!!

By reverse calculation:

1/(27773/800/800-0.043260586521173) = 7422

It missing 1/7422 to add to 0.043260586521173, I'm going to spell it DELTA

Without histogramme:

card = card(a)xcard(b)xsel where sel=1/(max( ndv(seba,id1), ndv(sebb,id1) )

I I try to do the same thing:

Number of unpopular value 300

Sum of selectivity of unpopular value 87/254

Selectuvity of one unpopular value = 87/254/300 = 1/876

We can imagine to reduce the volume card(a)xcard(b) with only unpopular value:

card = 800 x 800 x ( 1 - 0.043260586521173 ) = 612313

With formula witout histogram:

card = 612313 x 1/300 = 2041

And sel(val impop) = 2041 / ( 800 x 800 ) = 0,0031890625 = 1/313

1/313 is not 1/7422

I have tried different scenarii unsucessfully

Is there somebody which can help me ?

Sory for my english, is there something not clear I can retry to explain it :)

Thanks

I'm new on this forum

I have a test case to show you, I try to resolve it since a long time unsucessfully

The query is very simple

Here is the test case

create table seba (

id1 number,

id2 number

);

begin

for i in 1..300 loop

insert into seba values ( i, i );

end loop;

end;

/

begin

for i in 301..310 loop

for a in 1..50 loop

insert into seba values ( i, i );

end loop;

end loop;

end;

/

create table sebb as select * from seba;

execute dbms_stats.gather_table_stats('SYSTEM','SEBA', method_opt=>'for all columns size 254');

execute dbms_stats.gather_table_stats('SYSTEM','SEBB', method_opt=>'for all columns size 254');

At this step, we have 2 tables with histogramme on column id1 and id2

This is the query:

select *

from seba a, sebb b

where a.id1=b.id1;

---------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

---------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 27773 | 433K| 5 (20)| 00:00:01 |

|* 1 | HASH JOIN | | 27773 | 433K| 5 (20)| 00:00:01 |

| 2 | TABLE ACCESS FULL| SEBA | 800 | 6400 | 2 (0)| 00:00:01 |

| 3 | TABLE ACCESS FULL| SEBB | 800 | 6400 | 2 (0)| 00:00:01 |

---------------------------------------------------------------------------

What I don't know to do is how to calculate the part of cardinality of unpopular values in the selectivity of the join

I begin my calculation:

Statistics are identical for SEBA and SEBB

You can see:

Cardinality of SEBA = 800

ndv id1 310

Number of popular value from id1 = 10

Sum of selectivity of popular value from id1 167/254

This query give us selectivity of the join for popular value:

with ssq_a as (

select endpoint_number-nvl(lag(endpoint_number,1) over (partition by owner, table_name, column_name order by endpoint_number ),0) bucket_count,

endpoint_value

from dba_tab_histograms where table_name='SEBA' and column_name='ID1' order by endpoint_number asc ),

ssq_b as (

select endpoint_number-nvl(lag(endpoint_number,1) over (partition by owner, table_name, column_name order by endpoint_number ),0) bucket_count,

endpoint_value

from dba_tab_histograms where table_name='SEBB' and column_name='ID1' order by endpoint_number asc )

select endpoint_value, a.bucket_count bucket_count_a, b.bucket_count bucket_count_b, a.bucket_count/254*b.bucket_count/254 sel,

sum(a.bucket_count/254*b.bucket_count/254) over ( order by endpoint_value ) cumulative_sel

from ssq_a a inner join ssq_b b using(endpoint_value)

where a.bucket_count>1 and

b.bucket_count>1;

NB: There is no null value

On the last line: 0.043260586521173

At the end:

card = 800 x 800 x 0.043260586521173 = 27686

Optimier show: 27773

I Know, join selectivity of unpopular value is missing, this is the number I try to find !!!

By reverse calculation:

1/(27773/800/800-0.043260586521173) = 7422

It missing 1/7422 to add to 0.043260586521173, I'm going to spell it DELTA

Without histogramme:

card = card(a)xcard(b)xsel where sel=1/(max( ndv(seba,id1), ndv(sebb,id1) )

I I try to do the same thing:

Number of unpopular value 300

Sum of selectivity of unpopular value 87/254

Selectuvity of one unpopular value = 87/254/300 = 1/876

We can imagine to reduce the volume card(a)xcard(b) with only unpopular value:

card = 800 x 800 x ( 1 - 0.043260586521173 ) = 612313

With formula witout histogram:

card = 612313 x 1/300 = 2041

And sel(val impop) = 2041 / ( 800 x 800 ) = 0,0031890625 = 1/313

1/313 is not 1/7422

I have tried different scenarii unsucessfully

Is there somebody which can help me ?

Sory for my english, is there something not clear I can retry to explain it :)

Thanks

- 30 Views
- Tags: none (add)