1 2 Previous Next 18 Replies Latest reply: Jan 30, 2012 6:56 PM by jgarry RSS

    serial table scan with direct path read compared to db file scattered read

    Mich Talebzadeh
      Hi,

      The environment
      Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit
      
      8K block size 
      
      db_file_multiblock_read_count is 128
      
      show sga
      
      Total System Global Area 1.6702E+10 bytes
      Fixed Size                  2219952 bytes
      Variable Size            7918846032 bytes
      Database Buffers         8724152320 bytes
      Redo Buffers               57090048 bytes
      
      16GB of SGA with 8GB of db buffer cache.
      
      -- database is built on Solid State Disks
      
      -- SQL trace and wait events
      
      DBMS_MONITOR.SESSION_TRACE_ENABLE ( waits=>true )
      
      -- The underlying table is called tdash. It has 1.7 Million rows based on data in all_objects. NO index 
      
      TABLE_NAME                             Rows Table Size/MB      Used/MB    Free/MB
      ------------------------------ ------------ ------------- ------------ ----------
      TDASH                             1,729,204        15,242       15,186         56
      
      TABLE_NAME                     Allocated blocks Empty blocks Average space/KB Free list blocks
      ------------------------------ ---------------- ------------ ---------------- ----------------
      TDASH                                 1,943,823        7,153              805                0
      Objectives

      To show that when serial scans are performed on database built on Solid State Disks (SSD) compared to Magnetic disks (HDD), the performance gain is far less compared to random reads with index scans on SSD compared to HDD

      Approach

      We want to read the first 100 rows of tdash table randomly into buffer, taking account of wait events and wait times generated. The idea is that on SSD the wait times will be better compared to HDD but not that much given the serial nature of table scans.

      The code used
      ALTER SESSION SET TRACEFILE_IDENTIFIER = 'test_with_tdash_ssdtester_noindex';
      DECLARE
              type array is table of tdash%ROWTYPE index by binary_integer;
              l_data array;
              l_rec tdash%rowtype;
      BEGIN
              SELECT
                      a.*
                      ,RPAD('*',4000,'*') AS PADDING1
                      ,RPAD('*',4000,'*') AS PADDING2
              BULK COLLECT INTO
              l_data
              FROM ALL_OBJECTS a;
      
              DBMS_MONITOR.SESSION_TRACE_ENABLE ( waits=>true );
              FOR rs IN 1 .. 100
              LOOP
                      BEGIN
                              SELECT * INTO l_rec FROM tdash WHERE object_id = l_data(rs).object_id;
                      EXCEPTION
                        WHEN NO_DATA_FOUND THEN NULL;
                      END;
              END LOOP;
      END;
      /
      Server is rebooted prior to any tests

      Whern run as default, the optimizer (although some attribute this to the execution engine) chooses direct path read into PGA in preference to db file scattered read.
      With this choice it takes 6,520 seconds to complete the query. The results are shown below
      SQL ID: 78kxqdhk1ubvq
      Plan Hash: 1148949653
      SELECT *
      FROM
       TDASH WHERE OBJECT_ID = :B1
      
      
      call     count       cpu    elapsed       disk      query    current        rows
      ------- ------  -------- ---------- ---------- ---------- ----------  ----------
      Parse        1      0.01       0.00          2         47          0           0
      Execute    100      0.00       0.00          1         51          0           0
      Fetch      100     10.88    6519.89  194142802  194831012          0         100
      ------- ------  -------- ---------- ---------- ---------- ----------  ----------
      total      201     10.90    6519.90  194142805  194831110          0         100
      
      Misses in library cache during parse: 1
      Optimizer mode: ALL_ROWS
      Parsing user id: 96  (SSDTESTER)   (recursive depth: 1)
      
      Rows     Row Source Operation
      -------  ---------------------------------------------------
            1  TABLE ACCESS FULL TDASH (cr=1948310 pr=1941430 pw=0 time=0 us cost=526908 size=8091 card=1)
      
      
      Rows     Execution Plan
      -------  ---------------------------------------------------
            0  SELECT STATEMENT   MODE: ALL_ROWS
            1   TABLE ACCESS   MODE: ANALYZED (FULL) OF 'TDASH' (TABLE)
      
      
      Elapsed times include waiting on following events:
        Event waited on                             Times   Max. Wait  Total Waited
        ----------------------------------------   Waited  ----------  ------------
        Disk file operations I/O                        3        0.00          0.00
        db file sequential read                         2        0.00          0.00
        direct path read                          1517504        0.05       6199.93
        asynch descriptor resize                      196        0.00          0.00
      ********************************************************************************
      
      DECLARE
              type array is table of tdash%ROWTYPE index by binary_integer;
              l_data array;
              l_rec tdash%rowtype;
      BEGIN
              SELECT
                      a.*
                      ,RPAD('*',4000,'*') AS PADDING1
                      ,RPAD('*',4000,'*') AS PADDING2
              BULK COLLECT INTO
              l_data
              FROM ALL_OBJECTS a;
              DBMS_MONITOR.SESSION_TRACE_ENABLE ( waits=>true );
              FOR rs IN 1 .. 100
              LOOP
                      BEGIN
                              SELECT * INTO l_rec FROM tdash WHERE object_id = l_data(rs).object_id;
                      EXCEPTION
                        WHEN NO_DATA_FOUND THEN NULL;
                      END;
              END LOOP;
      END;
      
      call     count       cpu    elapsed       disk      query    current        rows
      ------- ------  -------- ---------- ---------- ---------- ----------  ----------
      Parse        0      0.00       0.00          0          0          0           0
      Execute      1      3.84       4.03        320      48666          0           1
      Fetch        0      0.00       0.00          0          0          0           0
      ------- ------  -------- ---------- ---------- ---------- ----------  ----------
      total        1      3.84       4.03        320      48666          0           1
      
      Misses in library cache during parse: 0
      Misses in library cache during execute: 1
      Optimizer mode: ALL_ROWS
      Parsing user id: 96  (SSDTESTER)
      
      Elapsed times include waiting on following events:
        Event waited on                             Times   Max. Wait  Total Waited
        ----------------------------------------   Waited  ----------  ------------
        SQL*Net message to client                       1        0.00          0.00
        SQL*Net message from client                     1        0.00          0.00
      ********************************************************************************
      
      SQL ID: 9babjv8yq8ru3
      Plan Hash: 0
      BEGIN DBMS_OUTPUT.GET_LINES(:LINES, :NUMLINES); END;
      
      
      call     count       cpu    elapsed       disk      query    current        rows
      ------- ------  -------- ---------- ---------- ---------- ----------  ----------
      Parse        1      0.00       0.00          0          0          0           0
      Execute      1      0.00       0.00          0          0          0           1
      Fetch        0      0.00       0.00          0          0          0           0
      ------- ------  -------- ---------- ---------- ---------- ----------  ----------
      total        2      0.00       0.00          0          0          0           1
      
      Misses in library cache during parse: 0
      Optimizer mode: ALL_ROWS
      Parsing user id: 96  (SSDTESTER)
      
      Elapsed times include waiting on following events:
        Event waited on                             Times   Max. Wait  Total Waited
        ----------------------------------------   Waited  ----------  ------------
        SQL*Net message to client                       1        0.00          0.00
        SQL*Net message from client                     1        0.00          0.00
      
      
      
      ********************************************************************************
      
      OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS
      
      call     count       cpu    elapsed       disk      query    current        rows
      ------- ------  -------- ---------- ---------- ---------- ----------  ----------
      Parse        1      0.00       0.00          0          0          0           0
      Execute      2      3.84       4.03        320      48666          0           2
      Fetch        0      0.00       0.00          0          0          0           0
      ------- ------  -------- ---------- ---------- ---------- ----------  ----------
      total        3      3.84       4.03        320      48666          0           2
      
      Misses in library cache during parse: 0
      Misses in library cache during execute: 1
      
      Elapsed times include waiting on following events:
        Event waited on                             Times   Max. Wait  Total Waited
        ----------------------------------------   Waited  ----------  ------------
        SQL*Net message to client                       2        0.00          0.00
        SQL*Net message from client                     2        0.00          0.00
        log file sync                                   1        0.00          0.00
      
      
      OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS
      
      call     count       cpu    elapsed       disk      query    current        rows
      ------- ------  -------- ---------- ---------- ---------- ----------  ----------
      Parse        9      0.01       0.00          2         47          0           0
      Execute    129      0.01       0.00          1         52          2           1
      Fetch      140     10.88    6519.89  194142805  194831110          0         130
      ------- ------  -------- ---------- ---------- ---------- ----------  ----------
      total      278     10.91    6519.91  194142808  194831209          2         131
      
      Misses in library cache during parse: 9
      Misses in library cache during execute: 8
      
      Elapsed times include waiting on following events:
        Event waited on                             Times   Max. Wait  Total Waited
        ----------------------------------------   Waited  ----------  ------------
        db file sequential read                         5        0.00          0.00
        Disk file operations I/O                        3        0.00          0.00
        direct path read                          1517504        0.05       6199.93
        asynch descriptor resize                      196        0.00          0.00
      
        102  user  SQL statements in session.
         29  internal SQL statements in session.
        131  SQL statements in session.
          1  statement EXPLAINed in this session.
      ********************************************************************************
      Trace file: mydb_ora_16394_test_with_tdash_ssdtester_noindex.trc
      Trace file compatibility: 11.1.0.7
      Sort options: default
      
             1  session in tracefile.
           102  user  SQL statements in trace file.
            29  internal SQL statements in trace file.
           131  SQL statements in trace file.
            11  unique SQL statements in trace file.
             1  SQL statements EXPLAINed using schema:
                 ssdtester.plan_table
                   Schema was specified.
                   Table was created.
                   Table was dropped.
       1531657  lines in trace file.
          6520  elapsed seconds in trace file.
      I then force the query not to use direct path read by invoking
      ALTER SESSION SET EVENTS '10949 trace name context forever, level 1'  -- No Direct path read  ;
      In this case the optimizer uses db file scattered read predominantly and the query takes 4,299 seconds to finish which is around 34% faster than using direct path read (default).

      The report is shown below
      SQL ID: 78kxqdhk1ubvq
      Plan Hash: 1148949653
      SELECT *
      FROM
       TDASH WHERE OBJECT_ID = :B1
      
      
      call     count       cpu    elapsed       disk      query    current        rows
      ------- ------  -------- ---------- ---------- ---------- ----------  ----------
      Parse        1      0.00       0.00          2         47          0           0
      Execute    100      0.00       0.00          2         51          0           0
      Fetch      100    143.44    4298.87  110348670  194490912          0         100
      ------- ------  -------- ---------- ---------- ---------- ----------  ----------
      total      201    143.45    4298.88  110348674  194491010          0         100
      
      Misses in library cache during parse: 1
      Optimizer mode: ALL_ROWS
      Parsing user id: 96  (SSDTESTER)   (recursive depth: 1)
      
      Rows     Row Source Operation
      -------  ---------------------------------------------------
            1  TABLE ACCESS FULL TDASH (cr=1944909 pr=1941430 pw=0 time=0 us cost=526908 size=8091 card=1)
      
      
      Rows     Execution Plan
      -------  ---------------------------------------------------
            0  SELECT STATEMENT   MODE: ALL_ROWS
            1   TABLE ACCESS   MODE: ANALYZED (FULL) OF 'TDASH' (TABLE)
      
      
      Elapsed times include waiting on following events:
        Event waited on                             Times   Max. Wait  Total Waited
        ----------------------------------------   Waited  ----------  ------------
        Disk file operations I/O                        3        0.00          0.00
        db file sequential read                    129759        0.01         17.50
        db file scattered read                    1218651        0.05       3770.02
        latch: object queue header operation            2        0.00          0.00
      ********************************************************************************
      
      DECLARE
              type array is table of tdash%ROWTYPE index by binary_integer;
              l_data array;
              l_rec tdash%rowtype;
      BEGIN
              SELECT
                      a.*
                      ,RPAD('*',4000,'*') AS PADDING1
                      ,RPAD('*',4000,'*') AS PADDING2
              BULK COLLECT INTO
              l_data
              FROM ALL_OBJECTS a;
              DBMS_MONITOR.SESSION_TRACE_ENABLE ( waits=>true );
              FOR rs IN 1 .. 100
              LOOP
                      BEGIN
                              SELECT * INTO l_rec FROM tdash WHERE object_id = l_data(rs).object_id;
                      EXCEPTION
                        WHEN NO_DATA_FOUND THEN NULL;
                      END;
              END LOOP;
      END;
      
      call     count       cpu    elapsed       disk      query    current        rows
      ------- ------  -------- ---------- ---------- ---------- ----------  ----------
      Parse        0      0.00       0.00          0          0          0           0
      Execute      1      3.92       4.07        319      48625          0           1
      Fetch        0      0.00       0.00          0          0          0           0
      ------- ------  -------- ---------- ---------- ---------- ----------  ----------
      total        1      3.92       4.07        319      48625          0           1
      
      Misses in library cache during parse: 0
      Misses in library cache during execute: 1
      Optimizer mode: ALL_ROWS
      Parsing user id: 96  (SSDTESTER)
      
      Elapsed times include waiting on following events:
        Event waited on                             Times   Max. Wait  Total Waited
        ----------------------------------------   Waited  ----------  ------------
        SQL*Net message to client                       1        0.00          0.00
        SQL*Net message from client                     1        0.00          0.00
      ********************************************************************************
      
      SQL ID: 9babjv8yq8ru3
      Plan Hash: 0
      BEGIN DBMS_OUTPUT.GET_LINES(:LINES, :NUMLINES); END;
      
      
      call     count       cpu    elapsed       disk      query    current        rows
      ------- ------  -------- ---------- ---------- ---------- ----------  ----------
      Parse        1      0.00       0.00          0          0          0           0
      Execute      1      0.00       0.00          0          0          0           1
      Fetch        0      0.00       0.00          0          0          0           0
      ------- ------  -------- ---------- ---------- ---------- ----------  ----------
      total        2      0.00       0.00          0          0          0           1
      
      Misses in library cache during parse: 0
      Optimizer mode: ALL_ROWS
      Parsing user id: 96  (SSDTESTER)
      
      Elapsed times include waiting on following events:
        Event waited on                             Times   Max. Wait  Total Waited
        ----------------------------------------   Waited  ----------  ------------
        SQL*Net message to client                       1        0.00          0.00
        SQL*Net message from client                     1        0.00          0.00
      
      
      
      ********************************************************************************
      
      OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS
      
      call     count       cpu    elapsed       disk      query    current        rows
      ------- ------  -------- ---------- ---------- ---------- ----------  ----------
      Parse        1      0.00       0.00          0          0          0           0
      Execute      2      3.92       4.07        319      48625          0           2
      Fetch        0      0.00       0.00          0          0          0           0
      ------- ------  -------- ---------- ---------- ---------- ----------  ----------
      total        3      3.92       4.07        319      48625          0           2
      
      Misses in library cache during parse: 0
      Misses in library cache during execute: 1
      
      Elapsed times include waiting on following events:
        Event waited on                             Times   Max. Wait  Total Waited
        ----------------------------------------   Waited  ----------  ------------
        SQL*Net message to client                       2        0.00          0.00
        SQL*Net message from client                     2        0.00          0.00
        log file sync                                   1        0.00          0.00
      
      
      OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS
      
      call     count       cpu    elapsed       disk      query    current        rows
      ------- ------  -------- ---------- ---------- ---------- ----------  ----------
      Parse        9      0.01       0.00          2         47          0           0
      Execute    129      0.00       0.00          2         52          2           1
      Fetch      140    143.44    4298.87  110348674  194491010          0         130
      ------- ------  -------- ---------- ---------- ---------- ----------  ----------
      total      278    143.46    4298.88  110348678  194491109          2         131
      
      Misses in library cache during parse: 9
      Misses in library cache during execute: 8
      
      Elapsed times include waiting on following events:
        Event waited on                             Times   Max. Wait  Total Waited
        ----------------------------------------   Waited  ----------  ------------
        db file sequential read                    129763        0.01         17.50
        Disk file operations I/O                        3        0.00          0.00
        db file scattered read                    1218651        0.05       3770.02
        latch: object queue header operation            2        0.00          0.00
      
        102  user  SQL statements in session.
         29  internal SQL statements in session.
        131  SQL statements in session.
          1  statement EXPLAINed in this session.
      ********************************************************************************
      Trace file: mydb_ora_26796_test_with_tdash_ssdtester_noindex_NDPR.trc
      Trace file compatibility: 11.1.0.7
      Sort options: default
      
             1  session in tracefile.
           102  user  SQL statements in trace file.
            29  internal SQL statements in trace file.
           131  SQL statements in trace file.
            11  unique SQL statements in trace file.
             1  SQL statements EXPLAINed using schema:
                 ssdtester.plan_table
                   Schema was specified.
                   Table was created.
                   Table was dropped.
       1357958  lines in trace file.
          4299  elapsed seconds in trace file.
      I note that there are 1,517,504 waits with direct path read with total time of nearly 6,200 seconds. In comparison with no direct path read, there are 1,218,651 db file scattered read waits with total wait time of 3,770 seconds. My understanding is that direct path read can use single or multi-block read into the PGA. However db file scattered reads do multi-block read into multiple discontinuous SGA buffers. So it is possible given the higher number of direct path waits that the optimizer cannot do multi-block reads (contigious buffers within PGA) and hence has to revert to single blocks reads which results in more calls and more waits?.

      Appreciate any advise and apologies for being long winded.

      Thanks,

      Mich
        • 1. Re: serial table scan with direct path read compared to db file scattered read
          Charles Hooper
          I will attempt to answer your question, but you should expect additional input from other members of the forum. Keep in mind that the numbers that you are seeing are slightly distorted due to the fact that the buffer cache was empty when the test started. To understand how this may affect the test results, see the first comment on this page: http://dioncho.wordpress.com/2009/04/22/strong-doubt-on-small-table_small_table_threshold/

          Let's take a look at the TKPROF summary for the SQL statement.
          DB FILE SCATTERED READ:
          call     count       cpu    elapsed       disk      query    current        rows
          ------- ------  -------- ---------- ---------- ---------- ----------  ----------
          Parse        1      0.00       0.00          2         47          0           0
          Execute    100      0.00       0.00          2         51          0           0
          Fetch      100    143.44    4298.87  110348670  194490912          0         100
          ------- ------  -------- ---------- ---------- ---------- ----------  ----------
          total      201    143.45    4298.88  110348674  194491010          0         100
          In the above, notice the 143.44 seconds of CPU utilization reported in the trace file - this is significant when compared to the same statistic found in the direct path read summary. Also, note that there is a difference between the QUERY and DISK columns of 84,142,242 blocks - you can consider that a large portion of that difference is due to the effects of the empty buffer cache, which helped eliminate the reading of roughly 689,293,246,464 bytes from disk (SSD). Each of the blocks required an average of 0.000038957 seconds per block (up to 128 of these could be read in a single read request depending on the current buffer cache contents, extent sizes, and OS limitations). Now, consider for a moment how the results might have changed if there were block belonging to other objects in the buffer cache - it is likely that in your test case many of those 84,142,242 blocks would have been read from disk, which means that the total time might have been closer to 7576 seconds.

          DIRECT PATH READ:
          call     count       cpu    elapsed       disk      query    current        rows
          ------- ------  -------- ---------- ---------- ---------- ----------  ----------
          Parse        1      0.01       0.00          2         47          0           0
          Execute    100      0.00       0.00          1         51          0           0
          Fetch      100     10.88    6519.89  194142802  194831012          0         100  
          ------- ------  -------- ---------- ---------- ---------- ----------  ----------
          total      201     10.90    6519.90  194142805  194831110          0         100
          An object level checkpoint is issued before the start of a direct path read, and that can add to the execution time when a direct path read is performed. In the above, note that there is a difference between the QUERY and DISK columns of just 688,210 blocks - some of that difference is due to the effects of the buffer cache. Each of the blocks read from disk required an average of 0.000033582 per block - note the decreased in the average block read time compared to the db file scattered read summary.

          Regarding the number of blocks read from disk in one read request, there is a bit of a difference. There are an average of about 90.44337 blocks read from disk during a single db file scattered read request, and 127.93561 blocks read from disk during a single direct path read.

          Try the test again with blocks belonging to other objects already in the buffer cache - something approaching a real world situation for this particular SQL statement when it executes in a production database.

          Charles Hooper
          http://hoopercharles.wordpress.com/
          IT Manager/Oracle DBA
          K&M Machine-Fabricating, Inc.
          • 2. Re: serial table scan with direct path read compared to db file scattered read
            Mich Talebzadeh
            Thanks Charles.

            I have read a number of online experiements including yours, Dion Cho, Tom Kyte, Craig Shallahamer and other comments. I think many focus on when direct path read (DPR) kicks in taking into account the threshold parameter smalltable_threshold* and other factors. In my case the table has 1,943,823 blocks so I guess it easily satisfies the condition for DPR. Also question I would like to know is whether the optimizer or the execution engine decides to use DPR?

            With regard to your points, by default (tests are always done after reboot of server), Oracle decides to use DPR so it assumes that it will be quicker to deploy this approach. If that is the case, then there is implicit assumption that there is not going to be enough free buffers to allow a conventional serial scan and read into buffer cache for this size of blocks (without ageing good data). It begs the question whether Oracle makes that decision based on a pre build factor. Otherwise in this case the cache is empty. However, Oracle does not check that.

            You pointed out that DPR is an order of magnitude less CPU intensive compared to DB file scattered read (143 sec compared to 10 sec). So I am just postulating that Oracle decides to reduce CPU cycle in expense of elapsed time (6,519 sec vs compared to 4,298 sec).

            In short is that correct that Oracle will decide in scenarios like this to use sessions's PGA and have the ability to use temp disk space if needed as opposed to flooding the buffer cache with once used data?

            Table scans should always go to the LRU end and Oracle should use a small number of blocks (i.e. 2 to 32) which would be overwritten leaving the remaining cached table blocks from other segments still in cache. So why bother with direct path reads?

            Regards,

            Mich
            • 3. Re: serial table scan with direct path read compared to db file scattered read
              Charles Hooper
              Mitch,

              There were potentially interesting comments in Craig Shallahamer's "Oracle Performance Firefighting" book related to the SMALLTABLE_THRESHOLD hidden parameter. I selected a couple of minor quotes from that book, provided alternate definitions of that parameter from other sources, and roughly 1.5 years ago asked other people to comment on those sections:
              http://hoopercharles.wordpress.com/2010/06/17/_small_table_threshold-parameter-and-buffer-cache-what-is-wrong-with-this-quote/
              Needless to say, the comments attached to the article are much more interesting than the article itself, for instance this comment by Jonathan Lewis: "Even a “long” tablescan will be read in to the mid-point in lots of newer versions of Oracle...".

              I could be wrong, but I do not think that a direct path read, even of a very large table, will necessarily lead to the use of the temp tablespace, where the temp tablespace would not have been used just as much in the case of a db file scattered read.

              I think that one of the implicit assumptions is that you will not execute exactly the same SQL statement, touching the same table blocks, 100 times in rapid succession. Let's try a simple test using 11.2.0.2 accessing an SSD drive:
              First, create the table, gather statistics, and report the number of table blocks:
              CREATE TABLE T1 AS
              SELECT
                *
              FROM
                (SELECT
                   *
                 FROM
                   ALL_OBJECTS
                 WHERE
                   ROWNUM<=60000) V1,
                (SELECT
                   ROWNUM RN
                 FROM
                   DUAL
                 CONNECT BY
                   LEVEL<=200);
               
              EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',ESTIMATE_PERCENT=>100)
              
              SELECT
                NUM_ROWS,
                BLOCKS
              FROM
                USER_TABLES
              WHERE
                TABLE_NAME='T1';
               
              NUM_ROWS     BLOCKS
              -------- ----------
              12000000     179155
              So, the table has 1.2 million rows, and just under 180,000 8KB blocks (1,467,637,760 bytes)

              The test script for the serial direct path read, executed after bouncing the database:
              SET TIMING ON
               
              ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';
               
              SELECT
                COUNT(*)
              FROM
                T1
              WHERE
                RN=1;
               
              ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';
              The output that I received from the serial direct path read test:
              SQL> SELECT
                2    COUNT(*)
                3  FROM
                4    T1
                5  WHERE
                6    RN=1;
               
                COUNT(*)
              ----------
                   60000
               
              Elapsed: 00:00:04.13
              The test script for the db file scattered read, executed after bouncing the database:
              SET TIMING ON
              ALTER SESSION SET EVENTS '10949 TRACE NAME CONTEXT FOREVER, LEVEL 1';
               
              ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';
               
              SELECT
                COUNT(*)
              FROM
                T1
              WHERE
                RN=1;
               
              ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT OFF';
              The output that I received from the db file scattered read test:
              SQL> SELECT
                2    COUNT(*)
                3  FROM
                4    T1
                5  WHERE
                6    RN=1;
              
                COUNT(*)
              ----------
                   60000
              
              Elapsed: 00:00:05.58
              Just looking at the elapsed time output, the direct path read is roughly 35% faster than the db file scattered read approach, and is less likely to suffer latch contention and excessive CPU consumption. It is the runtime engine, not the optimizer, that makes the decision of serial direct path read or db file scattered read, although it might be interesting to test whether or not the optimizer might select to use an existing index in one case if serial direct path reads are disabled, and a full table scan if serial direct path reads are enabled (maybe someone could put together a test case?).

              Let's take a look at the TKPROF summaries:
              The test script for the serial direct path read:
              SELECT
                COUNT(*)
              FROM
                T1
              WHERE
                RN=1
               
              call     count       cpu    elapsed       disk      query    current        rows
              ------- ------  -------- ---------- ---------- ---------- ----------  ----------
              Parse        1      0.00       0.00          0          0          0           0
              Execute      1      0.00       0.00          0          0          0           0
              Fetch        2      0.60       4.10     178493     178498          0           1
              ------- ------  -------- ---------- ---------- ---------- ----------  ----------
              total        4      0.60       4.10     178493     178498          0           1
               
              Misses in library cache during parse: 1
              Optimizer mode: ALL_ROWS
              Parsing user id: 62  
              Number of plan statistics captured: 1
               
              Rows (1st) Rows (avg) Rows (max)  Row Source Operation
              ---------- ---------- ----------  ---------------------------------------------------
                       1          1          1  SORT AGGREGATE (cr=178498 pr=178493 pw=0 time=4105260 us)
                   60000      60000      60000   TABLE ACCESS FULL T1 (cr=178498 pr=178493 pw=0 time=6543 us cost=14339 size=240000 card=60000)
               
              Elapsed times include waiting on following events:
                Event waited on                             Times   Max. Wait  Total Waited
                ----------------------------------------   Waited  ----------  ------------
                SQL*Net message to client                       3        0.00          0.00
                SQL*Net message from client                     3        0.00          0.00
                Disk file operations I/O                        1        0.00          0.00
                db file sequential read                         1        0.00          0.00
                direct path read                             1404        0.01          3.40
              The test script for the db file scattered read:
              SELECT
                COUNT(*)
              FROM
                T1
              WHERE
                RN=1
               
              call     count       cpu    elapsed       disk      query    current        rows
              ------- ------  -------- ---------- ---------- ---------- ----------  ----------
              Parse        1      0.00       0.00          0          0          0           0
              Execute      1      0.00       0.00          0          0          0           0
              Fetch        2      1.40       5.54     178493     178514          0           1
              ------- ------  -------- ---------- ---------- ---------- ----------  ----------
              total        4      1.40       5.55     178493     178514          0           1
               
              Misses in library cache during parse: 1
              Optimizer mode: ALL_ROWS
              Parsing user id: 62  
              Number of plan statistics captured: 1
               
              Rows (1st) Rows (avg) Rows (max)  Row Source Operation
              ---------- ---------- ----------  ---------------------------------------------------
                       1          1          1  SORT AGGREGATE (cr=178514 pr=178493 pw=0 time=5547810 us)
                   60000      60000      60000   TABLE ACCESS FULL T1 (cr=178514 pr=178493 pw=0 time=7386 us cost=14339 size=240000 card=60000)
               
              Elapsed times include waiting on following events:
                Event waited on                             Times   Max. Wait  Total Waited
                ----------------------------------------   Waited  ----------  ------------
                SQL*Net message to client                       3        0.00          0.00
                SQL*Net message from client                     3        0.01          0.02
                Disk file operations I/O                        1        0.00          0.00
                db file sequential read                         1        0.00          0.00
                db file scattered read                       1415        0.00          4.16
              The TKPROF summaries show that direct path read, in this case, is 35.4% faster. Of course, if the SELECT COUNT(*) FROM T1 where performed twice in each test script, then the results might be expected to be different due to caching.

              Charles Hooper
              http://hoopercharles.wordpress.com/
              IT Manager/Oracle DBA
              K&M Machine-Fabricating, Inc.
              • 4. Re: serial table scan with direct path read compared to db file scattered read
                jgarry
                >
                In short is that correct that Oracle will decide in scenarios like this to use sessions's PGA and have the ability to use temp disk space if needed as opposed to flooding the buffer cache with once used data?

                Table scans should always go to the LRU end and Oracle should use a small number of blocks (i.e. 2 to 32) which would be overwritten leaving the remaining cached table blocks from other segments still in cache. So why bother with direct path reads?
                Think about 1000 users doing direct path reads while 100 others update. Would you rather they all flood the LRU, or some use larger amounts of PGA and temp? Is it reasonable to throw hardware at SGA or PGA?

                It's very illuminating to watch exactly what happens on single tests, but you have to think in terms of aggregates and concurrency to make global decisions.
                • 5. Re: serial table scan with direct path read compared to db file scattered read
                  Mich Talebzadeh
                  Your point Joel
                  Think about 1000 users doing direct path reads while 100 others update. Would you rather they all flood the LRU, or some use larger amounts of PGA and temp? Is it reasonable to throw hardware at SGA or PGA?
                  Sure we can use PGA for this purpose. However, PGA is not scalable (unlike db buffer cache) and in so far as I can see unlike db buffer cache there is no true ageing mechanism/ LRU-MRU chain. One can easily run out of PGA memory with mulitple users doing the same thing +(-4030): ORA-04030: out of process memory when trying to allocate 16328 bytes (koh-kghu call ,pmuccst: adt/record)).+ With buffer cache the likelihood is that with many users the process will be slower but will carry on regardless.

                  Regards,

                  Mich
                  • 6. Re: serial table scan with direct path read compared to db file scattered read
                    Jonathan Lewis
                    Mich Talebzadeh wrote:

                    Table scans should always go to the LRU end and Oracle should use a small number of blocks (i.e. 2 to 32) which would be overwritten leaving the remaining cached table blocks from other segments still in cache. So why bother with direct path reads?
                    Here's a link to some follow-up work I did after Dion Cho published his note: http://jonathanlewis.wordpress.com/2011/03/24/small-tables/

                    Among other things, it's worth noting that a "large" tablescan can rotate through 25% of the buffer cache - the "small window at the end of the LRU" ceased to apply after 9.2

                    Regards
                    Jonathan Lewis
                    http://jonathanlewis.wordpress.com
                    Author: <b><em>Oracle Core</em></b>
                    • 7. Re: serial table scan with direct path read compared to db file scattered read
                      Mich Talebzadeh
                      Hi Charles,

                      I am doing your tests for t1 table using my server.
                      Just to clarify my environment is:
                      I did the whole of this test on my server. My server has I7-980 HEX core processor with 24GB of RAM and 1 TB of HDD SATA II for test/scratch backup and archive. The operating system is RHES 5.2 64-bit installed on a 120GB OCZ Vertex 3 Series SATA III 2.5-inch Solid State Drive. 
                      
                      Oracle version installed was 11g Enterprise Edition Release 11.2.0.1.0 -64bit. The binaries were created on HDD. Oracle itself was configured with 16GB of SGA, of which 7.5GB was allocated to Variable Size and 8GB to Database Buffers.
                      
                      For Oracle tablespaces including SYS, SYSTEM, SYSAUX, TEMPORARY, UNDO and redo logs, I used file systems on 240GB OCZ Vertex 3 Series SATA III 2.5-inch Solid State Drive. With 4K Random Read at 53,500 IOPS and 4K Random Write at 56,000 IOPS (manufacturer’s figures), this drive is probably one of the fastest commodity SSDs using NAND flash memory with Multi-Level Cell (MLC). 
                      Now my T1 table created as per your script and has the following rows and blocks (8k block size)
                       SELECT
                        NUM_ROWS,
                        BLOCKS
                      FROM
                        USER_TABLES
                      WHERE
                        TABLE_NAME='T1';
                        NUM_ROWS     BLOCKS
                      ---------- ----------
                        12000000     178952
                      which is pretty identical to yours.
                      Then I run the query as brelow
                      set timing on
                      ALTER SESSION SET TRACEFILE_IDENTIFIER = 'test_bed_T1';
                      ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 8';
                      SELECT
                              COUNT(*)
                      FROM
                              T1
                      WHERE
                              RN=1;
                      
                      which gives
                      
                        COUNT(*)
                      ----------
                           60000
                      
                      Elapsed: 00:00:05.29
                      tkprof output shows
                      
                      call     count       cpu    elapsed       disk      query    current        rows
                      ------- ------  -------- ---------- ---------- ---------- ----------  ----------
                      Parse        1      0.00       0.00          0          0          0           0
                      Execute      1      0.00       0.00          0          0          0           0
                      Fetch        2      0.02       5.28     178292     178299          0           1
                      ------- ------  -------- ---------- ---------- ---------- ----------  ----------
                      total        4      0.02       5.28     178292     178299          0           1
                      
                      Compared to yours:
                      
                      Fetch        2      0.60       4.10     178493     178498          0           1
                      
                      It appears to me that my CPU utilisation is by order of magnitude better but my elapsed time is worse!
                      
                      Now the way I see it elapsed time = CPU time + wait time. Further down I have
                      
                      Rows     Row Source Operation
                      -------  ---------------------------------------------------
                            1  SORT AGGREGATE (cr=178299 pr=178292 pw=0 time=0 us)
                        60000   TABLE ACCESS FULL T1 (cr=178299 pr=178292 pw=0 time=42216 us cost=48697 size=240000 card=60000)
                      
                      
                      Rows     Execution Plan
                      -------  ---------------------------------------------------
                            0  SELECT STATEMENT   MODE: ALL_ROWS
                            1   SORT (AGGREGATE)
                        60000    TABLE ACCESS   MODE: ANALYZED (FULL) OF 'T1' (TABLE)
                      
                      
                      Elapsed times include waiting on following events:
                        Event waited on                             Times   Max. Wait  Total Waited
                        ----------------------------------------   Waited  ----------  ------------
                        SQL*Net message to client                       3        0.00          0.00
                        SQL*Net message from client                     3        0.00          0.00
                        Disk file operations I/O                        3        0.00          0.00
                        direct path read                             1405        0.00          4.68
                      
                      Your direct path reads are
                      
                        direct path read                             1404        0.01          3.40
                      Which indicates to me you have faster disks compared to mine, whereas it sounds like my CPU is faster than yours.

                      With db file scattered read I get
                      Elapsed: 00:00:06.95
                      
                      call     count       cpu    elapsed       disk      query    current        rows
                      ------- ------  -------- ---------- ---------- ---------- ----------  ----------
                      Parse        1      0.00       0.00          0          0          0           0
                      Execute      1      0.00       0.00          0          0          0           0
                      Fetch        2      1.22       6.93     178293     178315          0           1
                      ------- ------  -------- ---------- ---------- ---------- ----------  ----------
                      total        4      1.22       6.94     178293     178315          0           1
                      
                      Rows     Row Source Operation
                      -------  ---------------------------------------------------
                            1  SORT AGGREGATE (cr=178315 pr=178293 pw=0 time=0 us)
                        60000   TABLE ACCESS FULL T1 (cr=178315 pr=178293 pw=0 time=41832 us cost=48697 size=240000 card=60000)
                      
                      
                      Rows     Execution Plan
                      -------  ---------------------------------------------------
                            0  SELECT STATEMENT   MODE: ALL_ROWS
                            1   SORT (AGGREGATE)
                        60000    TABLE ACCESS   MODE: ANALYZED (FULL) OF 'T1' (TABLE)
                      
                      
                      Elapsed times include waiting on following events:
                        Event waited on                             Times   Max. Wait  Total Waited
                        ----------------------------------------   Waited  ----------  ------------
                        SQL*Net message to client                       2        0.00          0.00
                        Disk file operations I/O                        3        0.00          0.00
                        db file sequential read                         1        0.00          0.00
                        db file scattered read                       1414        0.00          5.36
                        SQL*Net message from client                     2        0.00          0.00
                      
                      compared to your
                      
                        db file scattered read                       1415        0.00          4.16
                      On the face of it with this test mine shows 21% improvement with direct path read compared to db scattered file read. So now I can go back to re-visit my original test results:

                      First default with direct path read
                      call     count       cpu    elapsed       disk      query    current        rows
                      ------- ------  -------- ---------- ---------- ---------- ----------  ----------
                      Parse        1      0.01       0.00          2         47          0           0
                      Execute    100      0.00       0.00          1         51          0           0
                      Fetch      100     10.88    6519.89  194142802  194831012          0         100
                      ------- ------  -------- ---------- ---------- ---------- ----------  ----------
                      total      201     10.90    6519.90  194142805  194831110          0         100
                      
                      
                      CPU ~ 11 sec, elapsed ~ 6520 sec
                      
                      wait stats
                      
                        Event waited on                             Times   Max. Wait  Total Waited
                        ----------------------------------------   Waited  ----------  ------------
                        direct path read                          1517504        0.05       6199.93
                      
                      roughly 0.004 sec for each I/O
                      Now with db scattered file read I get
                      call     count       cpu    elapsed       disk      query    current        rows
                      ------- ------  -------- ---------- ---------- ---------- ----------  ----------
                      Parse        1      0.00       0.00          2         47          0           0
                      Execute    100      0.00       0.00          2         51          0           0
                      Fetch      100    143.44    4298.87  110348670  194490912          0         100
                      ------- ------  -------- ---------- ---------- ---------- ----------  ----------
                      total      201    143.45    4298.88  110348674  194491010          0         100
                      
                      CPU ~ 143 sec, elapsed ~ 4299 sec
                      
                      and waits:
                      
                        Event waited on                             Times   Max. Wait  Total Waited
                        ----------------------------------------   Waited  ----------  ------------
                        db file sequential read                    129759        0.01         17.50
                        db file scattered read                    1218651        0.05       3770.02
                      
                      roughly 17.5/129759 = .00013 sec for single block I/O and  3770.02/1218651 = .0030 for multi-block I/O
                      Now my theory is that the improvements comes from the large buffer cache (8320MB) inducing it to do some read aheads (async pre-fetch). Read aheads are like quasi logical I/Os and they will be cheaper compared to physical I/O. When there is large buffer cache and read aheads can be done then using buffer cache is a better choice than PGA?

                      Regards,

                      Mich
                      • 8. Re: serial table scan with direct path read compared to db file scattered read
                        Mich Talebzadeh
                        Thanks Jonathan.

                        My understanding from your notes
                        a) “Large” tablescans don’t increment the touch count – so avoiding promotion to the hot portion of the buffer
                        b) There is a 25% boundary (ca. 3750 in this case) above which a tablescan will start to recycle the buffer it has used
                        c) There is a 10% boundary (ca. 1540 in this case) below which repeating a scan WILL increment the touch count
                        d) There is a 2% boundary (ca. 300 in this case) below which tablescans will always increment the touch count. 
                        So I gather the main change has been (from 9.2 onwards) that large table scans can take around 25% of cache size but they will be trapped where they are without being promoted to hot area. So the recycling has been shifted from 2-32 blocks at the end of LRU chain to a more dynamic figure for big tables <= ~25%

                        Has this schange come about because there is an implicit assummption that SGAs have grown in size in 10, 11 compared to previous releases, because of availability of more memory and the fact that nowadays there is hardly anyone who uses a 32-bit server (4GB SGA limit)?

                        So in my test case I have a table of 1,943,824 blocks with DEFAULT buffer cache of 8320MB.

                        That is is 8320*1024*1024/8192 = 1,064,960 8K blocks. At 25% max this table can have up to 1,064,960 * 0.25 = 266,240 blocks in memory or roughly 14% of the table. So roughly in 7 cycles the whole table can be scanned in. Does that include read ahead (async pre-fetch) blocks as well?

                        Cheers,

                        Mich
                        • 9. Re: serial table scan with direct path read compared to db file scattered read
                          Charles Hooper
                          Mich Talebzadeh wrote:
                          Hi Charles,

                          I am doing your tests for t1 table using my server.
                          Just to clarify my environment is:
                          I did the whole of this test on my server. My server has I7-980 HEX core processor with 24GB of RAM and 1 TB of HDD SATA II for test/scratch backup and archive. The operating system is RHES 5.2 64-bit installed on a 120GB OCZ Vertex 3 Series SATA III 2.5-inch Solid State Drive. 
                          
                          Oracle version installed was 11g Enterprise Edition Release 11.2.0.1.0 -64bit. The binaries were created on HDD. Oracle itself was configured with 16GB of SGA, of which 7.5GB was allocated to Variable Size and 8GB to Database Buffers.
                          
                          For Oracle tablespaces including SYS, SYSTEM, SYSAUX, TEMPORARY, UNDO and redo logs, I used file systems on 240GB OCZ Vertex 3 Series SATA III 2.5-inch Solid State Drive. With 4K Random Read at 53,500 IOPS and 4K Random Write at 56,000 IOPS (manufacturer’s figures), this drive is probably one of the fastest commodity SSDs using NAND flash memory with Multi-Level Cell (MLC). 
                          (snip)
                          >
                          roughly 17.5/129759 = .00013 sec for single block I/O and 3770.02/1218651 = .0030 for multi-block I/O
                          Now my theory is that the improvements comes from the large buffer cache (8320MB) inducing it to do some read aheads (async pre-fetch). Read aheads are like quasi logical I/Os and they will be cheaper compared to physical I/O. When there is large buffer cache and read aheads can be done then using buffer cache is a better choice than PGA?
                          
                          Regards,
                          
                          Mich
                          Mich,

                          I was curious, so I tried my test script on my laptop. The laptop is running Windows 7 with a quad core Core i7 X940 at 2.13 GHz, with 16GB of memory, and two Crucial C300 256GB SSD drives in a RAID 0 (the laptop only supports SATA 2, not SATA 3). It is important to recognize, at least under the Windows operating system, that the Core i7 processors can and will automatically overclock (and underclock) themselves depending on CPU demand, which effectively means that simple measurement of CPU consumption measured in seconds over a short period of time may be slighly misleading. For example, the CPU in my laptop went from a 14% underclock to a 7% overclock in CPU speed when I performed the db file scattered read test.

                          The starting point:
                          SQL> SHOW SGA
                           
                          Total System Global Area 1.0456E+10 bytes
                          Fixed Size                  2262312 bytes
                          Variable Size            1577061080 bytes
                          Database Buffers         8858370048 bytes
                          Redo Buffers               17948672 bytes
                           
                          SQL> SELECT
                            2    NUM_ROWS,
                            3    BLOCKS
                            4  FROM
                            5    USER_TABLES
                            6  WHERE
                            7    TABLE_NAME='T1';
                           
                            NUM_ROWS     BLOCKS
                          ---------- ----------
                            12000000     179151
                          Test script for the serial direct path read:
                          SQL> SELECT
                            2    COUNT(*)
                            3  FROM
                            4    T1
                            5  WHERE
                            6    RN=1;
                           
                            COUNT(*)
                          ----------
                               60000
                           
                          Elapsed: 00:00:02.66
                          So, the direct path read test completed in 2.66 seconds ~ 526.2 MB/s if all of the time was spent reading from disk.

                          TKPROF for the serial direct path read:
                          SELECT
                            COUNT(*)
                          FROM
                            T1
                          WHERE
                            RN=1
                           
                          call     count       cpu    elapsed       disk      query    current        rows
                          ------- ------  -------- ---------- ---------- ---------- ----------  ----------
                          Parse        1      0.03       0.01          4         63          0           0
                          Execute      1      0.00       0.00          0          0          0           0
                          Fetch        2      1.07       2.62     178492     178497          0           1
                          ------- ------  -------- ---------- ---------- ---------- ----------  ----------
                          total        4      1.10       2.64     178496     178560          0           1
                           
                          Misses in library cache during parse: 1
                          Optimizer mode: ALL_ROWS
                          Parsing user id: 64  
                          Number of plan statistics captured: 1
                           
                          Rows (1st) Rows (avg) Rows (max)  Row Source Operation
                          ---------- ---------- ----------  ---------------------------------------------------
                                   1          1          1  SORT AGGREGATE (cr=178497 pr=178492 pw=0 time=2629080 us)
                               60000      60000      60000   TABLE ACCESS FULL T1 (cr=178497 pr=178492 pw=0 time=16012 us cost=69183 size=240000 card=60000)
                           
                          Elapsed times include waiting on following events:
                            Event waited on                             Times   Max. Wait  Total Waited
                            ----------------------------------------   Waited  ----------  ------------
                            SQL*Net message to client                       3        0.00          0.00
                            SQL*Net message from client                     3        0.04          0.06
                            Disk file operations I/O                        2        0.00          0.00
                            db file sequential read                         1        0.00          0.00
                            direct path read                             1279        0.00          1.57
                          If we just look at the direct path read time, that suggests that the laptop was reading at a rate of approximately 891.5 MB/s
                          179151 * 8192 = 1467604992
                          1467604992 / 1024 / 1024 / 1.57 = 891.48 MB/s
                          --

                          Test script for the db file scattered read:
                          SQL> SELECT
                            2    COUNT(*)
                            3  FROM
                            4    T1
                            5  WHERE
                            6    RN=1;
                           
                            COUNT(*)
                          ----------
                               60000
                           
                          Elapsed: 00:00:02.91
                          So, the db file scattered read test completed in 2.91 seconds ~ 480.97 MB/s if all of the time was spent reading from disk.

                          TKPROF for the db file scattered read:
                          SELECT
                            COUNT(*)
                          FROM
                            T1
                          WHERE
                            RN=1
                          
                          call     count       cpu    elapsed       disk      query    current        rows
                          ------- ------  -------- ---------- ---------- ---------- ----------  ----------
                          Parse        1      0.01       0.01          4         63          0           0
                          Execute      1      0.00       0.00          0          0          0           0
                          Fetch        2      1.87       2.88     178492     178513          0           1
                          ------- ------  -------- ---------- ---------- ---------- ----------  ----------
                          total        4      1.88       2.90     178496     178576          0           1
                          
                          Misses in library cache during parse: 1
                          Optimizer mode: ALL_ROWS
                          Parsing user id: 64  
                          Number of plan statistics captured: 1
                          
                          Rows (1st) Rows (avg) Rows (max)  Row Source Operation
                          ---------- ---------- ----------  ---------------------------------------------------
                                   1          1          1  SORT AGGREGATE (cr=178513 pr=178492 pw=0 time=2886779 us)
                               60000      60000      60000   TABLE ACCESS FULL T1 (cr=178513 pr=178492 pw=0 time=14767 us cost=69183 size=240000 card=60000)
                          
                          
                          Elapsed times include waiting on following events:
                            Event waited on                             Times   Max. Wait  Total Waited
                            ----------------------------------------   Waited  ----------  ------------
                            SQL*Net message to client                       3        0.00          0.00
                            SQL*Net message from client                     3        0.06          0.08
                            Disk file operations I/O                        2        0.00          0.00
                            db file sequential read                         1        0.00          0.00
                            db file scattered read                       1415        0.00          0.79
                          And thiis is where the head scratching begins - only 0.79 seconds spent in the db file scattered read wait event (I repeated the test twice and receive similar results both times), which implies a read rate of 1,771.67 MB/s (this is impossibly fast, just in case you are wondering, so it suggests that there is some strange problem in the time accounting because Oracle on Windows only supports direct asynchronous I/O).

                          If nothing else, the above results show that testing results can be occasionally interesting.

                          --

                          Back to your question. "... inducing it to do some read ahead ..." - I do not think that is the case, rather that some of the blocks previously read from disk remain in memory when the next read request for the same block arrives.

                          Charles Hooper
                          http://hoopercharles.wordpress.com/
                          IT Manager/Oracle DBA
                          K&M Machine-Fabricating, Inc.
                          • 10. Re: serial table scan with direct path read compared to db file scattered read
                            Mich Talebzadeh
                            Charles,

                            I believe Oracle wait events timings are accurate to only 1/100 second (I am trying to find the link somewhere) or 10 ms. The issue is that if a wait is <10 ms, it reports a 0 WaitTime effectively - so the WaitTime can be skewed a bit low. How about CPUTime which is always computed as ElapsedTime - WaitTime.


                            To measure device hit ratios I decided to run the UNIX utility IOSTAT on tests on T1 table, because I felt that it is more accurate for short sessions?

                            iostat -x -d <DEVICE_NAME> 10

                            This will provide extended statistics for a given device for 10 seconds. For sake of brevity I only describe the wait time, the service time and % utilization

                            •     await, average response time (ms) of IO requests to a device. The name is a bit confusing as this is the total response time including wait time in the requests
                            queue (let us call it qutim), and service time that device was working servicing the requests (see next column svctm). So the formula is await = qutim + svctm
                            •     svctm, average time (ms) a device was servicing requests. This is a component of total response time of IO requests.
                            •     qutim = await - svctm
                            iostat -x -d sdc 10
                            
                            Direct path
                            
                            Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
                            sdc             258.94     0.20 346.95  0.60 284983.82     5.19   819.99     2.37    6.81   1.52  52.95
                            
                            qutim (queue time/wait time in request queue)  = 6.81 – 1.52 = 5.29 ms
                            
                            db file scattered read
                            
                            Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
                            sdc            1993.90     0.20 348.70  0.60 285268.80     5.20   816.70     0.93    2.66   1.28  44.68
                            
                            qutim (queue time/wait time in request queue)  = 2.66 – 1.28 = 1.38 ms
                            Somehow it appears that db file scattered read produces better stats from OS level. So I wonder why we are getting this discrepancy?

                            Cheers,

                            Mich
                            • 11. Re: serial table scan with direct path read compared to db file scattered read
                              Charles Hooper
                              Mich,

                              Your comment about "Oracle wait events timings are accurate to only 1/100 second" is correct prior to the release of Oracle Database 9.0.1. Examining the raw 10046 trace from an Oracle Database 8.1.0.x database instance might show something that that looks like this:
                              WAIT #10: nam='db file scattered read' ela= 3 p1=3 p2=13011 p3=3
                              WAIT #10: nam='db file sequential read' ela= 0 p1=3 p2=13018 p3=1
                              WAIT #10: nam='db file sequential read' ela= 0 p1=3 p2=13021 p3=1
                              WAIT #10: nam='file open' ela= 0 p1=0 p2=0 p3=0
                              WAIT #10: nam='db file sequential read' ela= 2 p1=4 p2=151050 p3=1
                              WAIT #10: nam='db file sequential read' ela= 0 p1=4 p2=151055 p3=1
                              WAIT #10: nam='db file sequential read' ela= 1 p1=3 p2=165047 p3=1
                              WAIT #10: nam='db file scattered read' ela= 2 p1=4 p2=209661 p3=4
                              Those ela times are in 1/100 of a second as you correctly described.

                              Examining the raw 10046 trace from an Oracle Database 11.2.0.2 database instance might show something that looks like this (this is from my initial test run on SSD):
                              WAIT #379820632: nam='db file scattered read' ela= 3163 file#=4 block#=4142212 blocks=128 obj#=72736 tim=207192430656
                              WAIT #379820632: nam='db file scattered read' ela= 2941 file#=4 block#=4142340 blocks=124 obj#=72736 tim=207192434476
                              WAIT #379820632: nam='db file scattered read' ela= 2955 file#=4 block#=4142468 blocks=128 obj#=72736 tim=207192438323
                              WAIT #379820632: nam='db file scattered read' ela= 2934 file#=4 block#=4142596 blocks=128 obj#=72736 tim=207192442160
                              WAIT #379820632: nam='db file scattered read' ela= 2868 file#=4 block#=4142724 blocks=128 obj#=72736 tim=207192445977
                              WAIT #379820632: nam='db file scattered read' ela= 3025 file#=4 block#=4142852 blocks=128 obj#=72736 tim=207192450214
                              WAIT #379820632: nam='db file scattered read' ela= 2899 file#=4 block#=4142980 blocks=128 obj#=72736 tim=207192454042
                              WAIT #379820632: nam='db file scattered read' ela= 2859 file#=4 block#=4143108 blocks=128 obj#=72736 tim=207192457807
                              Those ela times are in 1/1000000 of a second (although it might not actually be to that level of precision depending on operating system and Oracle Database version). If we look at the second ela time reported above, 2941, that is 0.002941 seconds, but the time delta between the first two lines (207192434476 - 207192430656) is 0.003820 seconds, so there is some unaccounted for time, as shown above, in each db file scattered read request. I do not have the trace file with me from my laptop's execution of the tests, but I seem to recall that the ela times for each db file scattered read were near 200, which implies 0.000200 seconds (I will have to double check later, but I believe that all of the reported ela times were less than 999).

                              --

                              The AIX man page for iostat (http://publib.boulder.ibm.com/infocenter/aix/v6r1/index.jsp?topic=%2Fcom.ibm.aix.cmds%2Fdoc%2Faixcmds3%2Fiostat.htm ) indicates that "The iostat command is used to monitor system input/output (I/O) devices (physical and logical) that are loaded, by observing the time for which these devices are active." So, on AIX it appears that the read requests that are satisfied by the operating system's file cache (which I assume would be considered a logical IO) might lead to overstating the raw performance of a device (there is a reasonable chance that I am wrong here).

                              I am not sure if the Linux command behaves the same as the AIX version of the command. The Linux man page for iostat (http://linux.die.net/man/1/iostat ) describes the various columns that are shown on screen, but some of the details are omitted. For example, "rrqm/s: The number of read requests merged per second that were queued to the device." What does merged mean in this context - why is the statistic 7.7 times larger for the db file scattered read test? The r/s column is defined in the manual as "The number of read requests that were issued to the device per second." If I am understanding those two definitions correctly, that means that there were 5.7 times as many read requests queued for service by the SSD drive per second as there were read requests serviced in the case of the db file scattered read test. The svctm column is supposed to indicate the average service time in 1/1000 of a second - based on the output that you provided, the 6.81 ms result for the direct path read test is roughly the average service time I would expect from an otherwise unused single 10,000 RPM hard drive (it would seem that 0.5 ms would be close to the expected result for an SSD drive).

                              In one of my posts in this thread I stated "Oracle on Windows only supports direct asynchronous I/O". I am curious to know if your operating system's file cache might be influencing your test results. In your initialization parameters, what is the value of the FILESYSTEMIO_OPTIONS parameter? See: http://docs.oracle.com/cd/E14072_01/server.112/e10821/os.htm#i9267

                              There is no doubt that Jonathan (or someone else) can shed some more light on the subject. Until then, keep in mind that there are potentially many factors that can influence performance results (CPU speed throttling up and down, Oracle buffer cache, operating system file cache, demand for the memory or drives or CPU by other processes, activity of other sessions, etc.)

                              Charles Hooper
                              http://hoopercharles.wordpress.com/
                              IT Manager/Oracle DBA
                              K&M Machine-Fabricating, Inc.
                              • 12. Re: serial table scan with direct path read compared to db file scattered read
                                Mich Talebzadeh
                                Hi Charles,

                                Just a quick note for now as per your point in quotes:

                                "In one of my posts in this thread I stated "Oracle on Windows only supports direct asynchronous I/O". I am curious to know if your operating system's file cache might be influencing your test results. In your initialization parameters, what is the value of the FILESYSTEMIO_OPTIONS parameter? ......"
                                 show parameter FILESYSTEMIO_OPTIONS
                                
                                NAME                                 TYPE        VALUE
                                ------------------------------------ ----------- ------------------------------
                                filesystemio_options                 string      SETALL
                                
                                
                                SETALL: enable both asynchronous and direct I/O on file system files.
                                cheers,

                                Mich
                                • 13. Re: serial table scan with direct path read compared to db file scattered read
                                  Jonathan Lewis
                                  Mich,

                                  I've just picked this one up by way of the comment you posted on my blog.
                                  I don't have time to look closely at it, but the thing that crossed my mind was that the number of blocks read by scattered reads was much smaller than the direct path - and you started with an empty cache.

                                  It's possible that the first part of the table was loaded into the whole free cache to get started, and then the rest of the table was cycled through the last 25%. So every tablescan started with a scan through the starting part of the table in memory. I'd have to do the arithmetic to check, and you could check the details of where the reads were.

                                  I'll try to find time to follow up tomorrow.


                                  Regards
                                  Jonathan Lewis
                                  http://jonathanlewis.wordpress.com
                                  Author: <b><em>Oracle Core</em></b>
                                  • 14. Re: serial table scan with direct path read compared to db file scattered read
                                    Jonathan Lewis
                                    I've just done a little bit of arithmetic.

                                    Your cache is about: 1,065,000 blocks, so 75% of your cache is about 80,000 buffers
                                    Your table (below HWM) is about 1,941,000 blocks (assuming direct path read blocks /100)

                                    Assume that your first read loads the cache with the first 1,065,000 blocks of the table and then starts to recycle the last 25% of buffers on the replacement list, then the total number of blocks read would be
                                       1 * 1941000  + 99 * (1941000 - 80000)
                                    which is about 115,000,000 block reads.

                                    This is 5% out compared to the figures you supplied - but it suggests that the idea is in the right ballpark.

                                    One correction to consider -
                                    The direct path read may simply read all the blocks in the table in 1MB chunks - and include the space management blocks in those reads. It's possible that the scattered reads into cache don't read any of the space management blocks that are below the "low high water mark (LHWM)". The tablespace definition could affect the actual number of space management block involved, but it could be in the order of 1% or 2% of the total.

                                    Update: another thought, if examination of the trace file shows that this is the pattern of activity: extract the block id from the row id and count the number of different blocks that hold rows: perhaps the method you used to load the table has resulted in batches of unused blocks being scattered through the table (there have been some very odd bugs with space usage in ASSM).

                                    Update 2: You could create a copy of the table, and do a test where you flush the cache, then read the copy to fill the cache once, and then do your 100 queries. This might make the direct and cached reads match more closely.

                                    Regards
                                    Jonathan Lewis
                                    http://jonathanlewis.wordpress.com
                                    Author: <b><em>Oracle Core</em></b>

                                    Edited by: Jonathan Lewis on Jan 14, 2012 10:39 AM

                                    Edited by: Jonathan Lewis on Jan 14, 2012 10:48 AM
                                    1 2 Previous Next