3 Replies Latest reply: Feb 23, 2011 10:05 AM by 687216 RSS

    issue with "read by other session" and a parallel MERGE query

    687216
      Hi everyone,

      we have run into an issue with a batch process updating a large table (12 million rows / a few GB, so it's not that large). The process is quite simple - load the 'increment' from a file into a working table (INCREMENT_TABLE) and apply it to the main table using a MERGE. The increment is rather small (usually less than 10k rows), but the MERGE runs for hours (literally) although the execution plan seems quite reasonable (can post it tomorrow, if needed).

      The first thing we've checked is AWR report, and we've noticed this:

      Top 5 Timed Foreground Events
      Event     Waits     Time(s)     Avg wait (ms)     % DB time     Wait Class
      DB CPU           10,086           43.82      
      read by other session     3,968,673     9,179     2     39.88     User I/O
      db file scattered read     1,058,889     2,307     2     10.02     User I/O
      db file sequential read     408,499     600     1     2.61     User I/O
      direct path read     132,430     459     3     1.99     User I/O
      So obviously most of the time was consumed by "read by other session" wait event. There were no other queries running at the server, so in this case "other session" actually means "parallel processes" used to execute the same query. The main table (the one that's updated by the batch process) has "PARALLEL DEGREE 4" so Oracle spawns 4 processes.

      I'm not sure how to fix this. I've read a lot of details about "read by other session" but I'm not sure it's the root cause - in the end, when two processes read the same block, it's quite natural that only one does the physical I/O while the other waits. What really seems suspicious is the number of waits - 4 million waits means 4 million blocks, 8kB each. That's about 32GB - the table has about 4GB, and there are less than 10k rows updated. So 32 GB is a bit overkill (OK, there are indexes etc. but still, that's 8x the size of the table).

      So I'm thinking that the buffer cache is too small - one process reads the data into cache, then it's removed and read again. And again ...

      One of the recommendations I've read was to increase the PCTFREE, to eliminate 'hot blocks' - but wouldn't that make the problem even worse (more blocks to read and keep in the cache)? Or am I completely wrong?

      The database is 11gR2, the buffer cache is about 4GB. The storage is a SAN (but I don't think this is the bottleneck - according to the iostat results it performs much better in case of other batch jobs).
        • 1. Re: issue with "read by other session" and a parallel MERGE query
          Ganesh Srivatsav
          When you are simply merging two tables, There isnt much that you can achieve by using parellel because different sets of data needs to merged again inorder to achive the final result.

          I would prefer a hash join in this case as you have a fairly small table against a large table.

          Please post your quey and explain plan so that we can help more accurately.

          G.
          • 2. Re: issue with "read by other session" and a parallel MERGE query
            687216
            OK, so the query is something like this (I've removed pieces that are not important - column lists etc.)
              MERGE /*+ parallel(D DEFAULT)*/ INTO T_NOTUNIFIED_CLIENT D /*+ append */
              USING (SELECT
                  DIFF_FLAG,
                  NCLIENT_KEY,
                  CLIENT_KEY,
                  CLIENT_SOURCE_ID,
                  SOURCE_SYSTEM_KEY,
                  SYSTEM_SOURCE_ID
                  FROM TMP_SODW_BB) S
              ON (D.NCLIENT_KEY = S.NCLIENT_KEY AND D.CURRENT_RECORD = 'Y' AND S.DIFF_FLAG IN (FLAG_UPDATED, FLAG_DELETED))
                 WHEN MATCHED
                 THEN
                       UPDATE
                       SET D.VALIDITY_TO_DATETIME = SYSDATE,
                           D.UPDATE_PROCESS_RUN_KEY = p_process_run_key
                 WHEN NOT MATCHED
                 THEN
                       INSERT (
                           ... column list of T_NOTUNIFIED CLIENT ...
                       )
                       VALUES (
                           ... values from S ...
                       );
            What the query does - it takes rows from the increment that were 'updated' or 'deleted' mark the 'close' the matching records in the large table. If there's no matching row, insert a new one.

            This is the execution plan
            OPERATION                       OBJECT_NAME             OPTIONS         COST
            MERGE STATEMENT                                                         936825 
              MERGE                         T_NOTUNIFIED_CLIENT
                PX COORDINATOR
                  PX SEND                   :TQ10000                QC (RANDOM)     936825
                    VIEW 
                      NESTED LOOPS                                  OUTER           936825
                        PX BLOCK                                    ITERATOR
                          TABLE ACCESS      TMP_SODW_BB             FULL            2 
                        VIEW
                          FILTER 
                            Filter Predicates 
                              OR 
                                DIFF_FLAG='D' 
                                DIFF_FLAG='U' 
                            TABLE ACCESS    T_NOTUNIFIED_CLIENT     FULL            3925 
                              Filter Predicates 
                                AND 
                                  D.NCLIENT_KEY=NCLIENT_KEY 
                                  D.CURRENT_RECORD='Y' 
            The "T_NOTUNIFIED_CLIENT" table is the large one (the one that is updated in the MERGE), TMP_SODW_BB is the small one (contains the increment).

            There is an index on T_NOTUNIFIED_CLIENT.NCLIENT_KEY (a composed one), so I'm a bit surprised by the FULL scan of the T_NOTUNIFIED_CLIENT. Because this seems like the real root cause, as the FULL scan is repeated for each row in the increment so the cost skyrockets.

            The index seem to be OK - it's marked as VALID, so I'm not sure why it's not used (no, I don't think index scans are always better than full scans, but in this case I believe it is).
            • 3. Re: issue with "read by other session" and a parallel MERGE query
              687216
              OK, so a bit more details - we've managed to significantly decrease the estimated cost and runtime. All we had to do was a small change in the SQL - instead of
              MERGE /*+ parallel(D DEFAULT)*/ INTO T_NOTUNIFIED_CLIENT D /*+ append */
                USING (SELECT
                      ...
                    FROM TMP_SODW_BB) S
                ON (D.NCLIENT_KEY = S.NCLIENT_KEY AND D.CURRENT_RECORD = 'Y' AND S.DIFF_FLAG IN ('U', 'D'))
                ...
              (which is the query listed above) we have done this
              MERGE /*+ parallel(D DEFAULT)*/ INTO T_NOTUNIFIED_CLIENT D /*+ append */
                USING (SELECT
                      ...
                    FROM TMP_SODW_BB AND DIFF_FLAG IN ('U', 'D')) S
                ON (D.NCLIENT_KEY = S.NCLIENT_KEY AND D.CURRENT_RECORD = 'Y')
                ...
              i.e. we have moved the condition from the MERGE ON clause to the SELECT. And suddenly, the execution plan is this
              OPERATION                           OBJECT_NAME             OPTIONS             COST 
              MERGE STATEMENT                                                                 239 
                MERGE                             T_NOTUNIFIED_CLIENT
                  PX COORDINATOR
                    PX SEND                       :TQ10000                QC (RANDOM)         239 
                      VIEW 
                        NESTED LOOPS                                      OUTER               239
                          PX BLOCK                                        ITERATOR
                            TABLE ACCESS          TMP_SODW_BB             FULL                2
                              Filter Predicates 
                                OR 
                                  DIFF_FLAG='D'
                                  DIFF_FLAG='U' 
                            TABLE ACCESS          T_NOTUNIFIED_CLIENT       BY INDEX ROWID    3 
                              INDEX               AK_UQ_NOTUNIF_T_NOTUNI    RANGE SCAN        2 
                                Access Predicates 
                                  AND 
                                    D.NCLIENT_KEY(+)=NCLIENT_KEY
                                    D.CURRENT_RECORD(+)='Y' 
                                Filter Predicates 
                                  D.CURRENT_RECORD(+)='Y' 
              Yes, I know the queries are not exactly the same - but we can fix that. The point is that the TMP_SODW_BB table contains 1639 rows in total, and 284 of them match the moved 'IN' condition. Even if we remove the condition altogether (i.e. 1639 rows have to be merged), the execution plan does not change (the cost increases to about 1300, which is proportional to the number of rows).

              But with the original IN condition (that turns into an OR combination of predicates) in the MERGE ON clausule, the cost suddenly skyrockets to 990.000 and it's damn slow. It seems like a problem with cost estimation, because once we remove one of the values (so there's only one value in the IN clausule), it works fine again. So I guess it's a planner/estimator issue ...