Skip navigation

Dear DBA

June 2015 Previous month Next month

Here are 3 occurrences of the same SELECT DISTINCT, with durations that range from 15 seconds to 966 seconds!

 

SQL_ID

Cost

Duration (s)

End Time

SQL TEXT

1hdzqy0f9ukcc

2046761

21

24-06-15 02H

SELECT DISTINCT IRI, CT FROM BIGTABLETHATALWAYSGROWS WHERE INI IS NULL AND CT = 012

35vbdxu0xucjh

2046761

15

24-06-15 02H

SELECT DISTINCT IRI, CT FROM BIGTABLETHATALWAYSGROWS WHERE INI IS NULL AND CT = 034

aa316uybjk84s

2046761

966,14

24-06-15 03H

SELECT DISTINCT IRI, CT FROM BIGTABLETHATALWAYSGROWS WHERE INI IS NULL AND CT = 027

 

This SELECT DISTINCT selects the records to be modified by a nightly batch job, and runs for too long.  This partitioned table has 357M rows and weighs over 50Go.  I am tasked to optimize this SELECT DISTINCT.

 

First and foremost, a detail in the SQL code screams for attention – did you spot it?  The Explain Plan mentions it:

SQL> explain plan for SELECT DISTINCT IRI, CT FROM BIGTABLETHATALWAYSGROWS WHERE INI IS NULL AND CT = 040;

Explained.

SQL> SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());

PLAN_TABLE_OUTPUT

Plan hash value: 2894093667

--------------------------------------------------------------------------------| Id  | Operation           | Name           | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |

------------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT | |     1 |    20 |  2036K  (1)| 06:47:17 |       | |

|   1 | PARTITION LIST ALL | |     1 |    20 |  2036K  (1)| 06:47:17 |     1 |    35 |

|   2 | HASH UNIQUE | |     1 |    20 |  2036K  (1)| 06:47:17 | |       |

|*  3 |    TABLE ACCESS FULL| BIGTABLETHATALWAYSGROWS |     1 |    20 |  2036K  (1)| 06:47:17 |     1 |    35 |

--------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

3 - filter("INI" IS NULL AND TO_NUMBER("CT")=040)

 

Yes, an implicit conversion (040 being converted to a number, because of the lack of single quotes around it) will prevent the use of any index (except for the appropriate function-based index, which, of course, has not been created) because the CT column is VARCHAR.

 

So an extremely simple, yet highly effective, tweak is to add single quotes:

EXPLAIN PLAN FOR

SELECT DISTINCT IRI, CT FROM BIGTABLETHATALWAYSGROWS WHERE INI IS NULL AND CT = '040' ;

---------------------------------------------------------------------------------------------------------

| Id  | Operation              | Name           | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |                                                                                                                                                                                                  

---------------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT       |                |   711 | 14220 | 42066   (1)| 00:08:25 |       | |

|   1 | HASH UNIQUE           |                |   711 | 14220 | 42066   (1)| 00:08:25 |       | |

|   2 |   PARTITION LIST SINGLE|                |  8496 | 165K| 42065   (1)| 00:08:25 |   KEY | KEY |

|*  3 |    TABLE ACCESS FULL   | BIGTABLETHATALWAYSGROWS |  8496 | 165K| 42065   (1)| 00:08:25 |    16 | 16 |

---------------------------------------------------------------------------------------------------------                                                                                                                                                                                                  

Predicate Information (identified by operation id):

---------------------------------------------------

   3 - filter("INI" IS NULL)

 

From a cost of over 2M, we’re down to 42K.  Good, but not good enough for that SELECT that will return only a fraction of that table’s records.  A Full Table Scan is probably overkill.  Let’s see which index will give us the best performance.

 

There are a few indexes on that big table.  Let’s tell the optimizer to use them with a hint:

EXPLAIN PLAN FOR

SELECT  /*+ INDEX */  DISTINCT…

No can do:

0 | SELECT STATEMENT       |                |     1 | 20 | 41718   (1)| 00:08:21 |       | |

|   1 |  HASH UNIQUE           |                |     1 | 20 | 41718   (1)| 00:08:21 |       | |

|   2 |   PARTITION LIST SINGLE|                |     5 | 100 | 41717   (1)| 00:08:21 |   KEY | KEY |                                                                                                                                                                                                  

|*  3 |    TABLE ACCESS FULL   | BIGTABLETHATALWAYSGROWS |     5 |   100 | 41717 (1)| 00:08:21 |    16 | 16

 

The CBO considers the full table scan more effective than using any of the existing indexes. I tested each of them and each time EXPLAIN PLAN gave me a cost higher than that of the full table scan.

 

So let’s try of few indexes of our own.

 

CREATE INDEX fd4_INI_IRI_CT ON BIGTABLETHATALWAYSGROWS (INI,IRI,CT) LOCAL;

CREATE INDEX fd3_IRI_INI_CT ON BIGTABLETHATALWAYSGROWS (IRI,INI,CT) LOCAL;

CREATE INDEX fd2_INI_CT ON BIGTABLETHATALWAYSGROWS (INI,CT) LOCAL;

CREATE INDEX fd1_CT_INI ON BIGTABLETHATALWAYSGROWS (CT,INI) LOCAL;

 

With index #1 (index on all the columns mentioned in the WHERE clause):

----------------------------------------------------------------------------------------------------------------------

| Id  | Operation                           | Name           | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |

----------------------------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT                    |                |     1 | 20 |     3  (34)| 00:00:01 |       | |

|   1 |  HASH UNIQUE                        |                |     1 | 20 |     3  (34)| 00:00:01 |       | |

|   2 |   PARTITION LIST SINGLE             |                |     5 | 100 |     2   (0)| 00:00:01 |   KEY | KEY |                                                                                                                                                                                     

|   3 |    TABLE ACCESS BY LOCAL INDEX ROWID| BIGTABLETHATALWAYSGROWS |     5 |   100 | 2   (0)| 00:00:01 |    16 | 16 |

|*  4 | INDEX RANGE SCAN                | FD1_CT_INI     | 1 |       |     1 (0)| 00:00:01 |    16 |    16 |

Index Range Scan, good, but the table still has to be accessed because a column of the SELECT is not in that index.  Wow! Cost of 3.  Can’t get it any better, right?  Let’s find out. 

 

With index #2 (index on all the columns mentioned in the WHERE clause, but the order of columns is reversed):

----------------------------------------------------------------------------------------------------------------------

| Id  | Operation                           | Name           | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |                                                                                                                                                                                     

----------------------------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT                    |                |     1 | 20 |     3 (34)| 00:00:01 |       |       |

|   1 |  HASH UNIQUE                        |                |     1 | 20 |     3  (34)| 00:00:01 |       | |

|   2 |   PARTITION LIST SINGLE             |                |     5 | 100 |     2   (0)| 00:00:01 |   KEY | KEY |

|   3 |    TABLE ACCESS BY LOCAL INDEX ROWID| BIGTABLETHATALWAYSGROWS |     5 |   100 | 2   (0)| 00:00:01 |    16 | 16 |

|*  4 |     INDEX RANGE SCAN                | FD2_INI_CT     | 1 |       |     1 (0)| 00:00:01 |    16 |    16 |

 

No difference, so no better.

 

With index # 3 (index on all the columns mentioned in that SELECT, the least selective last):

---------------------------------------------------------------------------------------------------------                                                                                                                                                                                                  

| Id  | Operation              | Name           | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |                                                                                                                                                                                                  

---------------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT       |                |     1 | 20 |    27   (4)| 00:00:01 |       | |

|   1 |  SORT UNIQUE NOSORT    |                |     1 | 20 |    27   (4)| 00:00:01 |       | |

|   2 |   PARTITION LIST SINGLE|                |     5 | 100 |    26   (0)| 00:00:01 |   KEY | KEY |

|*  3 |    INDEX SKIP SCAN     | FD3_IRI_INI_CT |     5 | 100 |    26   (0)| 00:00:01 |    16 | 16 |

Interesting, because since all the columns mentioned in our SELECT are in that index, no need to access the table.  Still the cost is higher than our previous attempts, which did include a table access by rowid.

So since that INDEX SKIP SCAN seems too expensive, let’s shuffle the order of columns in our index …

 

With index # 4 (index on all the columns mentioned in that SELECT, the least selective last, and the other 2 columns swapped):

---------------------------------------------------------------------------------------------------------

| Id  | Operation              | Name           | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |

---------------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT       |                |     1 | 20 |     3  (34)| 00:00:01 |       | |

|   1 |  HASH UNIQUE           |                |     1 | 20 |     3  (34)| 00:00:01 |       | |

|   2 |   PARTITION LIST SINGLE|                |     5 | 100 |     2   (0)| 00:00:01 |   KEY | KEY |                                                                                                                                                                                                  

|*  3 |    INDEX RANGE SCAN    | FD4_INI_IRI_CT |     5 | 100 |     2   (0)| 00:00:01 |    16 | 16 |

---------------------------------------------------------------------------------------------------------  

Hmmh, cost of 3, again.  And with no table access.

 

So, we have 3 possibilities with a CBO cost of 3 (indexes fd1, fd2, and fd4).  Which one really is the best?  I’d rather go for the one with the least steps (there are 5 steps in plans 1 and 2, 4 steps in plan 4).  If we ask the CBO to decide, which will it be?

 

EXPLAIN PLAN FOR

SELECT  DISTINCT IRI…

 

----------------------------------------------------------------------------------------------------------------------

| Id  | Operation                           | Name           | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |

----------------------------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT                    |                |     1 | 20 |     3  (34)| 00:00:01 |       | |

|   1 |  HASH UNIQUE                        |                |     1 | 20 |     3  (34)| 00:00:01 |       | |

|   2 |   PARTITION LIST SINGLE             |                |     5 | 100 |     2   (0)| 00:00:01 |   KEY | KEY |                                                                                                                                                                                     

|   3 |    TABLE ACCESS BY LOCAL INDEX ROWID| BIGTABLETHATALWAYSGROWS |     5 |   100 | 2   (0)| 00:00:01 |    16 | 16 |

|*  4 | INDEX RANGE SCAN                | FD1_CT_INI     | 1 |       |     1 (0)| 00:00:01 |    16 |    16 |

 

Index #1 seems to be the CBO’s favorite.   Let’s double-check with different values for our bind variable (big partitions, small partitions), and yes, the CBO constantly chooses index #1.


To keep in mind:

  • let's be wary of literals and implicit conversions
  • partitioned (LOCAL) indexes on partitioned tables are a great way to access data when a sufficiently small subset of data is requested.

I am listing the differences as they appear from top to bottom of an AWR report.  The differences are of 3 kinds:

  1. difference in location in the report
  2. new information
  3. information gone


In 11.2.0.3, the "Cache Sizes" lines appear first:

Cache Sizes

 

Begin

End

 

 

Buffer Cache:

304M

304M

Std Block Size:

8K

Shared Pool Size:

608M

608M

Log Buffer:

24,588K

While in 11.2.0.3, you find them further down (after "Memory Statistics).

 

"Load Profile" has 15 lines in 11.2.0.3 and 19 lines in 11.2.0.4, because there are 5 new lines in 11.2.0.4:

SQL Work Area (MB):

Read IO requests

Read IO (MB)

Write IO requests

Write IO (MB)

And W/A MB processed  disappeared in 11.2.0.4.

 

"Shared Pool Statistics" changed location: it is between "Instance Efficiency Percentages" and "Top 5 Timed Foreground Events" in 11.2.0.3, while in 11.2.0.4, it has moved to the bottom of the "Report Summary", right before the "Main Report" section.


The "Top 5 Timed Foreground Events" table of 11.2.0.3 has been replaced with 2 tables:

 

« Top 10 Foreground Events by Total Wait Time”:

Event

Waits

Total Wait Time (sec)

Wait Avg(ms)

% DB time

Wait Class

db file sequential read

178,619

281,8

2

41.8

User I/O

db file scattered read

68,622

168,4

2

25.0

User I/O

DB CPU

 

109,4

 

16.2

 

direct path read

3,885

16,1

4

2.4

User I/O

log file switch (checkpoint incomplete)

33

14,2

430

2.1

Configuration

log file switch completion

147

4,2

29

.6

Configuration

read by other session

89

1,5

17

.2

User I/O

cursor: pin S wait on X

37

1,3

35

.2

Concurrency

db file parallel read

26

,8

31

.1

User I/O

log file sync

187

,7

4

.1

Commit

 

And “Wait Classes by Total Wait Time” :

Wait Class

Waits

Total Wait Time (sec)

Avg Wait (ms)

% DB time

Avg Active Sessions

User I/O

254,484

488

2

72.4

0.1

DB CPU

 

109

 

16.2

0.0

System I/O

43,396

49

1

7.2

0.0

Configuration

184

19

102

2.8

0.0

Concurrency

361

2

6

.3

0.0

Commit

189

1

4

.1

0.0

Application

25

0

14

.1

0.0

Network

27,064

0

0

.1

0.0

Other

344

0

1

.0

0.0

Scheduler

1

0

3

.0

0.0

 

After the “Host CPU” and “Instance CPU” sections, which are identical in both versions, we have in 11.2.0.4  a new section, called "IO Profile":

IO Profile

 

Read+Write Per Second

Read per Second

Write Per Second

Total Requests:

156.3

86.2

70.1

Database Requests:

142.7

76.3

66.4

Optimized Requests:

0.0

0.0

0.0

Redo Requests:

1.5

0.0

1.5

Total (MB):

3.3

2.11.2

Database (MB):

2.6

1.90.7

Optimized Total (MB):

0.0

0.00.0

Redo (MB):

0.5

0.00.5

Database (blocks):

334.2

245.2

89.0

Via Buffer Cache (blocks):

290.1

201.5

88.6

Direct (blocks):

44.1

43.7

0.4

 

For the "Time Model Statistics" table, its number of lines varies, but I cannot determine if it varies because of the change from 11.2.0.3 to 11.2.0.4 or because of the activity recorded by each AWR report.

 

The whole SQL section seems to me identical from one version to the other.

 

Then the " Instance Activity Statistics" has been revamped.  In 11.2.0.3, we have:

While in 11.2.0.4 we have:

This new "Key Instance Activity Stats" section of 11.2.0.4 is a selection of statistics that are key according to Oracle:

Statistic

Total

per Second

per Trans

db block changes

11,620,235

3,225.92

12,882.74

execute count

42,189

11.71

46.77

logons cumulative

100

0.03

0.11

opened cursors cumulative

28,169

7.82

31.23

parse count (total)

14,413

4.00

15.98

parse time elapsed

2,316

0.64

2.57

physical reads

883,276

245.21

979.24

physical writes

320,511

88.98

355.33

redo size

1,847,700,304

512,943.60

2,048,448.23

session cursor cache hits

23,893

6.63

26.49

session logical reads

24,834,290

6,894.29

27,532.47

user calls

28,797

7.99

31.93

user commits

902

0.25

1.00

user rollbacks

0

0.00

0.00

Workarea executions - onepass

0

0.00

0.00

Workarea executions - optimal

8,125

2.26

9.01

 

 

The "Tablespace IO Stats"  table in 11.2.0.4 has 2 extra columns:

 

Tablespace

Reads

Av Rds/s

Av Rd(ms)

Av Blks/Rd

1-bk Rds/s

Av 1-bk Rd(ms)

Writes

Writes avg/s

Buffer Waits

Av Buf Wt(ms)

IxxxxT

137,197

38

1.91

4.43

54,507

19.31

1

15

25

27.20

DxxxxT

89,046

25

1.25

1

62,224

24.72

1

17

40

6.25

 

The "FileIO Stats"  table in 11.2.0.4 has 2 extra columns: "1-bk Rds/s" and "Av 1-bk Rd(ms)":

Tablespace

Filename

Reads

Av Rds/s

Av Rd(ms)

Av Blks/Rd

1-bk Rds/s

Av 1-bk Rd(ms)

Writes

Writes avg/s

Buffer Waits

Av Buf Wt(ms)

DxxxxR

/dxxxx1.dbf

181

0

2.65

1.34

0

2.47

14

0

0

0.00

DxxxxX

/dxxxx1.dbf

29

0

0.34

1.00

0

0.34

14

0

0

0.00

 

And that is all I could spot.  Do not hesitate to point out if something is missing or wrong.