1 2 Previous Next 15 Replies Latest reply: Jun 19, 2014 9:59 PM by Hemant K Chitale RSS

    Continuous I/O on original table segment, despite ORDER BY

    mtefft

      We have a query that retrieves all 80 million rows from a table, with ORDER BY. It is issued by a JDBC client to populate a database on another platform.

       

      Here is the query and its plan (from dbms_xplan.display_cursor)::

      PLAN_TABLE_OUTPUT

      -----------------------------------------------------------------------------------------------------

      SQL_ID  6ymtaj6kjd9ga, child number 0

      -------------------------------------

      SELECT * FROM EIN ORDER BY EIN_ID

       

      Plan hash value: 3320763040

       

      -----------------------------------------------------------------------------------------------------

      | Id  | Operation            | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |

      -----------------------------------------------------------------------------------------------------

      |   0 | SELECT STATEMENT     |      |       |       |       |    20M(100)|          |       |       |

      |   1 |  SORT ORDER BY       |      |    80M|    78G|    87G|    20M  (1)| 69:50:10 |       |       |

      |   2 |   PARTITION RANGE ALL|      |    80M|    78G|       |  3238K  (1)| 10:47:38 |     1 |1048575|

      |   3 |    TABLE ACCESS FULL | EIN  |    80M|    78G|       |  3238K  (1)| 10:47:38 |     1 |1048575|

      -----------------------------------------------------------------------------------------------------

       

      15 rows selected.

       

      My interpretation of this plan is that it will retrieve all rows into memory and TEMP, perform the sort, and then fetches will begin. So, my expectation is: if the client is actually fetching rows, all of the I/O against the original table/partition segment should already be complete; the retrieval should just be from TEMP.

       

      But that is not what we are seeing via ASH or SQL trace. We see 'db file sequential read' against the table partition, above all else. Why would this be?

       

      This is from a tkprof of a 2-minute trace of the session, about 8 hours after the session started and after millions of rows had been fetched:

       

      SQL ID: 6ymtaj6kjd9ga Plan Hash: 3320763040

       

      SELECT *

      FROM

      EIN ORDER BY EIN_ID

       

      call     count       cpu    elapsed       disk      query    current        rows

      ------- ------  -------- ---------- ---------- ---------- ----------  ----------

      Parse        0      0.00       0.00          0          0          0           0

      Execute      0      0.00       0.00          0          0          0           0

      Fetch     1430      4.82     119.81      27309     143000          0       71500

      ------- ------  -------- ---------- ---------- ---------- ----------  ----------

      total     1430      4.82     119.81      27309     143000          0       71500

      . . .

      Elapsed times include waiting on following events:

        Event waited on                             Times   Max. Wait  Total Waited

        ----------------------------------------   Waited  ----------  ------------

        db file sequential read                     14961        0.12        115.68

        SQL*Net more data to client                 10353        0.00          0.30

        SQL*Net message from client                  1431        0.80          4.10

        SQL*Net message to client                    1430        0.00          0.00

        direct path read temp                          90        0.00          0.02

       

      Examination of the trace file itself shows that the File#'s referenced by the 'db file sequential read' correspond to the tablespace of the table partition segment. The Object ID referenced by these operations correspond to the table partition segment.

       

      Both the cached plan and SQL Monitor showed that 87 GB of TEMP was used. The average row length is 1055 so 87 GB is enough to hold the entirety of every row. (i.e. there is no indication that the sort held just the sort key + rowid, for example)..

       

      We have local SSD drives for TEMP but SAN for the tablespace where EIN is. located. So this is not just idle curiosity...

       

      Other items about this table:

      - The ORDER BY column (EIN_ID) is the primary key of this table, supported by a global unique index. No other indexes (except a LOB index).

      - There is a LOB column, which is defined with ENABLE STORAGE IN ROW. There are no rows with length > 4000 so there should be nothing that is actually stored in the LOB segment. The size of the LOB segment is the default (8MB) so that also corresponds to nothing significant out there.

      - The table is range (interval) partitioned on a different column. There are two partitions, but one of them is empty (it was the 'starting partition').

       

      Version information:

      BANNER

      --------------------------------------------------------------------------------

      Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production

      PL/SQL Release 11.2.0.3.0 - Production

      CORE    11.2.0.3.0      Production

      TNS for Linux: Version 11.2.0.3.0 - Production

      NLSRTL Version 11.2.0.3.0 - Production

       

      My question: What is Oracle really doing, that is driving all this I/O to the original table segment, after fetches have already started?

      Any insight would be appreciated.

        1 2 Previous Next