1 2 Previous Next 19 Replies Latest reply on Dec 16, 2013 2:54 PM by NightWing

# Join cardinality without statistics interesting result

Hello Experts,

I have just read an article about JOIN CARDINALITY Oracle related stuff: Table Functions And Join Cardinality Estimates , without gathering statistics (with using dynamic sampling) the optimizer calculates wrong JOIN CARDINALITY why? I mean, I am trying to understand JOIN CARDINALITY, does it depend on statistics? Although I did not gether statistics, as you can see from the below, optimizer estimate the table cardinalities correct but JOIN CARDINALITY completely wrong. What do you think about this behaviour? Does the dynamic sampling mislead the optimizer?

drop table t1 purge;

drop table t2 purge;

create table t1

as

select rownum id, mod(rownum, 10)+1 as fk, rpad('X',10) filter from dual connect by level <= 1000;

create table t2 as

select rownum + 20 id, rpad('X', 10) filter from dual connect by level <= 10;

explain plan for

select * from t1 join t2 on t1.fk = t2.id;

select * from table (dbms_xplan.display);

Plan hash value: 2959412835

---------------------------------------------------------------------------

| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------

|   0 | SELECT STATEMENT   |      |  1000 | 53000 |     8  (13)| 00:00:01 |

|*  1 |  HASH JOIN         |      |  1000 | 53000 |     8  (13)| 00:00:01 |

|   2 |   TABLE ACCESS FULL| T2   |    10 |   200 |     3   (0)| 00:00:01 |

|   3 |   TABLE ACCESS FULL| T1   |  1000 | 33000 |     4   (0)| 00:00:01 |

---------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - access("T1"."FK"="T2"."ID")

Note

-----

- dynamic sampling used for this statement (level=2)

exec dbms_stats.gather_table_stats(user, 'T1');

exec dbms_stats.gather_table_stats(user, 'T2');

explain plan for

select * from t1 join t2 on t1.fk = t2.id;

select * from table (dbms_xplan.display);

Plan hash value: 2959412835

---------------------------------------------------------------------------

| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------

|   0 | SELECT STATEMENT   |      |     1 |    32 |     8  (13)| 00:00:01 |

|*  1 |  HASH JOIN         |      |     1 |    32 |     8  (13)| 00:00:01 |

|   2 |   TABLE ACCESS FULL| T2   |    10 |   140 |     3   (0)| 00:00:01 |

|   3 |   TABLE ACCESS FULL| T1   |  1000 | 18000 |     4   (0)| 00:00:01 |

---------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - access("T1"."FK"="T2"."ID")

• ###### 1. Re: Join cardinality without statistics interesting result

taking a look at a CBO trace (event 10053) for your example I see the different calculations for the join cardinality:

-- without statistics

Join Card:  1000.000000 = = outer (10.000000) * inner (1000.000000) * sel (0.100000)

-- with statistics

Join Card:  0.000000 = = outer (10.000000) * inner (1000.000000) * sel (0.000000)

The sel (0.100000) for the execution with dynamic sampling corresponds to the standard formula: Join Selectivity = 1 / greater(num_distinct(t1.fk), num_distinct(t2.id)). In the CBO trace for this execution I also see some information concerning the sampling:

SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),0), NVL(SUM(C2),0), COUNT(DISTINCT C3), NVL(SUM(CASE WHEN C3 IS NULL THEN 1 ELSE 0 END),0) FROM (SELECT /*+ NO_PARALLEL("T1") FULL("T1") NO_PARALLEL_INDEX("T1") */ 1 AS C1, 1 AS C2, "T1"."FK" AS C3 FROM "T1" "T1") SAMPLESUB

*** 2013-12-13 14:14:19.584

** Executed dynamic sampling query:

level : 2

sample pct. : 100.000000

actual sample size : 1000

filtered sample card. : 1000

orig. card. : 572

block cnt. table stat. : 7

block cnt. for sampling: 7

max. sample block cnt. : 64

sample block cnt. : 7

ndv C3 : 10

scaled : 10.00

nulls C4 : 0

scaled : 0.00

min. sel. est. : -1.00000000

** Dynamic sampling col. stats.:

Column (#2): FK(  Part#: 0

AvgLen: 22 NDV: 10 Nulls: 0 Density: 0.100000

** Using dynamic sampling NULLs estimates.

** Using dynamic sampling NDV estimates.

Scaled NDVs using cardinality = 1000.

** Using dynamic sampling card. : 1000

** Dynamic sampling updated table card.

Table: T1  Alias: T1

Card: Original: 1000.000000  Rounded: 1000  Computed: 1000.00  Non Adjusted: 1000.00

Access Path: TableScan

Cost:  2.00  Resp: 2.00  Degree: 0

Cost_io: 2.00  Cost_cpu: 239850

Resp_io: 2.00  Resp_cpu: 239850

Best:: AccessPath: TableScan

Cost: 2.00  Degree: 1  Resp: 2.00  Card: 1000.00  Bytes: 0

The sampling query does not contain an information on the value ranges for the join columns - so the optimizer has no other option than to use the standard formula for the join selectivity.

After the statistics gathering the CBO knows the HIGH_VALUE and the LOW_VALUE for the join columns and can guess that there are no intersections - and so the selectivity is set to 0 (resulting in a cardinality of 1).

• ###### 2. Re: Join cardinality without statistics interesting result

Nice explanation

Regards

Jonathan Lewis

@jloracle

• ###### 3. Re: Join cardinality without statistics interesting result

@Jonathan Lewis: thank you.

For the sake of completeness: I created the trace files on 11.1.0.7 but 12.1.0.1 shows the same behaviour - though in this case the test has to be rearranged because the CTAS operation creates statistics automatically, but with dynamic sampling the value ranges are still unknown.

• ###### 4. Re: Join cardinality without statistics interesting result

Martin,

Interesting difference appears if you set optimizer_dynamic_sampling to 11 on 12c.

Also when you then vary the offset introduced by the +20 in t2 to give a very small overlap instead of a complete break

Regards

Jonathan Lewis

• ###### 5. Re: Join cardinality without statistics interesting result

Jonathan,

strange indeed:

-- using the initial setup in 12.1.0.1

exec dbms_stats.delete_table_stats(user, 'T1');

exec dbms_stats.delete_table_stats(user, 'T2');

alter session set optimizer_dynamic_sampling = 11;

explain plan for

select * from t1 join t2 on t1.fk = t2.id;

---------------------------------------------------------------------------

| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |

---------------------------------------------------------------------------

|   0 | SELECT STATEMENT   |      |     1 |    53 |     7   (0)| 00:00:01 |

|*  1 |  HASH JOIN         |      |     1 |    53 |     7   (0)| 00:00:01 |

|   2 |   TABLE ACCESS FULL| T2   |   327 |  6540 |     3   (0)| 00:00:01 |

|   3 |   TABLE ACCESS FULL| T1   |   572 | 18876 |     4   (0)| 00:00:01 |

---------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - access("T1"."FK"="T2"."ID")

Note

-----

- dynamic statistics used: dynamic sampling (level=AUTO)

And the CBO trace tells me:

***************************************

BASE STATISTICAL INFORMATION

***********************

Table Stats::

Table: T2  Alias: T2  (NOT ANALYZED)

#Rows: 327  #Blks:  4  AvgRowLen:  100.00  ChainCnt:  0.00

Column (#1): ID(NUMBER)  NO STATISTICS (using defaults)

AvgLen: 13 NDV: 10 Nulls: 0 Density: 0.000000

***********************

Table Stats::

Table: T1  Alias: T1  (NOT ANALYZED)

#Rows: 572  #Blks:  7  AvgRowLen:  100.00  ChainCnt:  0.00

Column (#2): FK(NUMBER)  NO STATISTICS (using defaults)

AvgLen: 13 NDV: 18 Nulls: 0 Density: 0.000000

Maria Colgan wrote in her whitepaper http://www.oracle.com/technetwork/database/bi-datawarehousing/twp-statistics-concepts-12c-1963871.pdf :

When set to 11 the Optimizer will automatically decide if dynamic statistics will be useful, and what dynamic sampling level will be used for SQL statements The optimizer bases its decision, to use dynamic statistics, on the complexity of the predicates used, the existing base statistics, and the total execution time expected for the SQL statement.

So I guess the optimizer decides it would be a waste of time to gather statistics for these tiny tables and the simple join. If my memory serves me well the sizes are determined by taking a look at the segment header (where I find  Highwater:: ... blk#: 7 for T1). And obviously the calculation says that there are ~ 327/4 = 81,75 rows per block. I have no good explanation for the AvgLen: 13 while the NDV could be something like round(Blks * 2.5), but that's just guessing.

On the positive side: the 1 for the hash join is spot on:

Join Card:  572.000000 = outer (327.000000) * inner (572.000000) * sel (0.003058)

>> Join Card adjusted from 572.000000  to: 1.000000, prelen=2

But I don't know where this fitting adjustion comes from...

Regards

Martin Preiss

• ###### 6. Re: Join cardinality without statistics interesting result

Great stuff (as always), Martin and Jonathan!

• ###### 7. Re: Join cardinality without statistics interesting result

MartinPreiss wrote:

The sampling query does not contain an information on the value ranges for the join columns - so the optimizer has no other option than to use the standard formula for the join selectivity.

After the statistics gathering the CBO knows the HIGH_VALUE and the LOW_VALUE for the join columns and can guess that there are no intersections - and so the selectivity is set to 0 (resulting in a cardinality of 1).

alter session set optimizer_dynamic_sampling = 11;

So, when you set the optimizer_dynamic_sampling parameter to 11 then does the sampling contain an information on the value ranges for join columns? Because, your second example get the correct cardinality.

My second question is, do you recommend us to use dynamic sampling?

Note that by the way I have just read join cardinality chapter in JonathanLewi's book (Cost Based Oracle Fundamentals), I am little bit confused therefore, I asked questions.

• ###### 8. Re: Join cardinality without statistics interesting result

You need to read the Errata and Addenda page on my old website for better information about join cardinality: http://www.jlcomp.demon.co.uk/cbo_book/ch_10.html

Regards

Jonathan Lewis

• ###### 9. Re: Join cardinality without statistics interesting result

Martin,

There used to be some documentaton about defaults for missing stats - somewhere in the 9i manuals, I think.  These have changed over time, though. It strikes me that num_rows/32 looks much more likely to be the default for NDV.

The treatment of the join cardinality in 12c is very odd. The basic treatment behaves as if the number of distinct values = num_rows rather than the reported defaults for NDV; then it's not immediately obvious why that changes especially for the slightly overlapping ranges of values (which you don't show).

Regards

Jonathan Lewis

• ###### 10. Re: Join cardinality without statistics interesting result

Would you like to say something about my questions?

Regards

• ###### 11. Re: Join cardinality without statistics interesting result

NightWing wrote:

Would you like to say something about my questions?

Regards

So, when you set the optimizer_dynamic_sampling parameter to 11 then does the sampling contain an information on the value ranges for join columns? Because, your second example get the correct cardinality.

Martin has already said that he doesn't know where the adjustment came from - if he had seen the range values in the trace I think he would (a) have some idea about where the adjustement came from  and (b) said something about it.  I didn't see anything in the trace file that suggested any attempt to capture the ranges - in fact there were indications that Oracle hadn't tried to sample ANY statistics.

My second question is, do you recommend us to use dynamic sampling?

That's not a reasonable question. All you've done is point out one anomaly (of many) that occurs when you don't gather any statistics.  If you're planning to run an entire production system without statistics then it's probably a good idea to use dynamic sampling; but it's probably not sensible to run an entire production system without statistics (although if you have a data warehouse and all queries are time-consuming then you may have an argument for using dynamic sampling at a level of 5 or higher).

• ###### 13. Re: Join cardinality without statistics interesting result

Jonathan,

yes, indeed, the defaults are in the Oracle 9 documentation: http://docs.oracle.com/cd/B10501_01/server.920/a96533/stats.htm#25056. And I also think that num_rows/32 is much more likely.

The results with overlapping ranges are quite surprising:

-- created with:

create table t2

as

select rownum + 10 /*9,8,7*/ id, rpad('X', 10) filter from dual connect by level <= 10;

Min(Id) T2Join-CardinalityRows
1110
10100100
9200200
8572300
15721000

For the execution with Min(Id) = 9 I get the following information in the CBO trace:

-- Min(Id) = 10

SPD: Return code in qosdDSDirSetup: NOCTX, estType = JOIN

Join Card:  572.000000 = outer (327.000000) * inner (572.000000) * sel (0.003058)

>> Join Card adjusted from 572.000000  to: 100.000000, prelen=2

Join Card - Rounded: 100 Computed: 100.00

-- Min(Id) = 9

SPD: Return code in qosdDSDirSetup: NOCTX, estType = JOIN

Join Card:  572.000000 = outer (327.000000) * inner (572.000000) * sel (0.003058)

>> Join Card adjusted from 572.000000  to: 200.000000, prelen=2

Join Card - Rounded: 200 Computed: 200.00

-- Min(Id) = 8

SPD: Return code in qosdDSDirSetup: NOCTX, estType = JOIN

Join Card:  572.000000 = outer (327.000000) * inner (572.000000) * sel (0.003058)

Join Card - Rounded: 572 Computed: 572.00

So the CBO can adjust the join cardinality to the correct values without taking a look at the data? At this point I started to get suspicious - and made a simple 10046 trace. And there I found:

SELECT /* DS_SVC */ /*+ cursor_sharing_exact dynamic_sampling(0) no_sql_tune

no_monitoring optimizer_features_enable(default) result_cache */ SUM(C1)

FROM

(SELECT /*+ qb_name("innerQuery") NO_INDEX_FFS( "T1#1")  */ 1 AS C1 FROM

"T1" "T1#1", "T2" "T2#0" WHERE ("T1#1"."FK"="T2#0"."ID")) innerQuery

-- and the result is

SUM(C1)

----------

100

That's a cheap trick - but of course results in a correct adjustion...

I still don't know why the optimizer stops to use its marked cards for min(Id) <= 8 (and why the total number of result rows is not adjusted over 572). But it's always amazing how many strange details appear when you start to look a little bit closer at basic operations.

Regards

Martin Preiss

• ###### 14. Re: Join cardinality without statistics interesting result

JonathanLewis wrote:

but it's probably not sensible to run an entire production system with statistics (although if you have a data warehouse and all queries are time-consuming then you may have an argument for using dynamic sampling at a level of 5 or higher).

Did you mean "without statistics"?

1 2 Previous Next