11 Replies Latest reply: May 26, 2012 2:56 AM by Jonathan Lewis RSS

    Re-org performance issue

    user12083970
      We performed a re-org of a table over the weekend and since then we are having performance issues. This table is a child table on which the following happens at a high frequency:
      - SQL*Loader loads data in batches (1 thread only) - Writes only
      - Appln program reads through this table (5 threads) - Reads only

      Row Lock Waits & ITL Waits have been introduced on the table index this week after the re-org. Not sure why these were introduced after the re-org. Have increased the pct-free from 10 to 40. This resulted in small marginal performance gain. Not sure what's going on.
      1) Why were the waits not there before the re-org?
      2) What should be the next steps?

      Re-org was done via the move command and subsequent rebuild of indexes.

      Edited by: user12083970 on Apr 26, 2012 11:43 PM
        • 1. Re: Re-org performance issue
          amardeep.sidhu
          Any specific reasons that you went for a reorg ? I mean any major differences in the size before and after the reorg ?
          • 2. Re: Re-org performance issue
            Jonathan Lewis
            user12083970 wrote:
            We performed a re-org of a table over the weekend and since then we are having performance issues. This table is a child table on which the following happens at a high frequency:
            - SQL*Loader loads data in batches (1 thread only) - Writes only
            - Appln program reads through this table (5 threads) - Reads only

            Row Lock Waits & ITL Waits have been introduced on the table index this week after the re-org. Not sure why these were introduced after the re-org. Have increased the pct-free from 10 to 40. This resulted in small marginal performance gain. Not sure what's going on.
            1) Why were the waits not there before the re-org?
            2) What should be the next steps?

            Re-org was done via the move command and subsequent rebuild of indexes.
            Please describe the evidence that you have, preferably with a sample, to justify your statement about the wait events you've reported.
            Please explain what you mean by "the table index" - bearing in mind that you also suggest that there are several indexes on the table.
            Please let us know which version of Oracle.

            Regards
            Jonathan Lewis
            • 3. Re: Re-org performance issue
              user12083970
              Thanks for looking at my post.
              Oracle Version is 11.2.0.1
              The appln was affected by performance issues since before the re-org, but was tolerable. It went worse after the re-org.
              Non-prod tests showed the re-org would reduce the size by around 40%. Thought this would give boost in production.
              There are two child tables with one index each. The parent was not touched since the re-org did nothing to the size of the parent.

              Here are sizes before and after re-org in PROD.
              --First Child Table

              SQL> SELECT SEGMENT_NAME, segment_type, BYTES, BLOCKS, EXTENTS
              2 FROM USER_SEGMENTS
              3 WHERE segment_NAME IN ('ODSDEFAULT_ODS_DETAIL', 'ODSDEFAULT_ODS_DETAILI1');

              SEGMENT_NAME SEGMENT_TYPE BYTES BLOCKS EXTENTS
              ---------------- ------------------ ---------- ---------- ----------
              ODSDEFAULT_ODS_DETAIL TABLE 3.7434E+10 4569600 357

              ODSDEFAULT_ODS_DETAILI1 INDEX 6291456000 768000 60


              SQL> ALTER TABLE ODS_DETAIL MOVE;

              Table altered.

              SQL> ALTER INDEX ODS_DETAILI1 REBUILD;

              Index altered.

              SQL> SELECT SEGMENT_NAME, segment_type, BYTES, BLOCKS, EXTENTS
              2 FROM USER_SEGMENTS
              3 WHERE segment_NAME IN ('ODSDEFAULT_ODS_DETAIL', 'ODSDEFAULT_ODS_DETAILI1');

              SEGMENT_NAME SEGMENT_TYPE BYTES BLOCKS EXTENTS
              ---------------- ------------------ ---------- ---------- ----------

              ODSDEFAULT_ODS_DETAIL TABLE 2.7787E+10 3392000 265

              ODSDEFAULT_ODS_DETAILI1 INDEX 2831155200 345600 27



              --Second Child Table
              SQL> SELECT SEGMENT_NAME, segment_type, BYTES, BLOCKS, EXTENTS
              2 FROM USER_SEGMENTS
              3 WHERE segment_NAME IN ('ODSDEFAULT_ODS_DEPENDENCY', 'ODSDEFAULT_ODS_DEPENDENCYI1');

              SEGMENT_NAME SEGMENT_TYPE BYTES BLOCKS EXTENTS
              ---------------- ------------------ ---------- ---------- ----------
              ODSDEFAULT_ODS_DEPENDENCY TABLE 5242880000 640000 50

              ODSDEFAULT_ODS_DEPENDENCYI1 INDEX 6710886400 819200 64

              SQL> ALTER TABLE ODSDEFAULT_ODS_DEPENDENCY MOVE;

              Table altered.

              SQL> ALTER INDEX ODSDEFAULT_ODS_DEPENDENCYI1 REBUILD;

              Index altered.

              SEGMENT_NAME SEGMENT_TYPE BYTES BLOCKS EXTENTS
              ---------------- ------------------ ---------- ---------- ----------
              ODSDEFAULT_ODS_DEPENDENCY TABLE 4508876800 550400 43

              ODSDEFAULT_ODS_DEPENDENCYI1 INDEX 3774873600 460800 36


              Row Lock and ITL Waits are found from the hourly AWR. Below is a sample from AWR (after some editing).

              Segments by Row Lock Waits
              Subobject Name     Obj.           Type     Row Lock Waits     % of Capture
              ODSDEFAULT_ODS_DEPENDENCYI1     INDEX     2,875          38.94
              ODSDEFAULT_ODS_DETAILI1          INDEX 2,718          36.81


              Segments by ITL Waits
              Subobject Name     Obj.           Type     ITL Waits     % of Capture
              ODSDEFAULT_ODS_DEPENDENCYI1     INDEX     53          75.71

              RowLock waits appear on both the indexes after the re-org.
              The SQL*Loader isjust one thread inserting into these tables at regualar intervals of 1-2 mins depending on incoming load, and the appln reads these tables via a max of 5 threads. Not that all 5 will be working at all times.
              One observation is that now the appln performance depends directly on the size of these two tables. We are regularly removing processed rows and performance is up for the fist few hours and then starts to degrade. We do a CTAS on both the child tables to keep only the working set at that time. But again, why all this drama after the re-org?
              Any insight would be useful at this time. Do you think a reverse key index on those indexes would help. They are composite keys with leading column being the PK of the parent + one additional key having values of either 0,1,2 or 3.
              • 4. Re: Re-org performance issue
                user12083970
                We moved the two tables from ASSM to Manual mode. This has removed the Row Lock and ITL waits, but introduced Buffer Busy waits(BBW). Have increased freelists to 15, but BBW still appear. We now have a scheduler job to remove the completed txns records to archive tables. The performance has improved only marginally. Need some suggestions from experts.
                • 5. Re: Re-org performance issue
                  Jonathan Lewis
                  user12083970 wrote:
                  We moved the two tables from ASSM to Manual mode. This has removed the Row Lock and ITL waits, but introduced Buffer Busy waits(BBW). Have increased freelists to 15, but BBW still appear. We now have a scheduler job to remove the completed txns records to archive tables. The performance has improved only marginally. Need some suggestions from experts.
                  I'm sorry, I missed your reply the first time around.
                  Ufortunately the information you suppliied doesn't tell us very much - the NUMBER of waits it fairly irrelevant as an indicator of the problem - especially when the number if pretty low - it's the time that is more significant.

                  Can you show us the Top 5 Timed event for a bad hour, along with the the section (I forget the exact title) which is the breakdown of buffer busy waits.
                  At the same time it would be useful to see the Load Profile and the summary information from the top page of the AWR report.

                  You say that all you've got is one writer and 5 readers - but ITL waits, row lock waits and buffer busy waits are usually indicators of multiple concurrent processes modifying data.

                  Regards
                  Jonathan Lewis
                  • 6. Re: Re-org performance issue
                    jgarry
                    user12083970 wrote:
                    We performed a re-org of a table over the weekend and since then we are having performance issues. This table is a child table on which the following happens at a high frequency:
                    - SQL*Loader loads data in batches (1 thread only) - Writes only
                    - Appln program reads through this table (5 threads) - Reads only
                    I've seen this sort of scenario where the waits make the batch take longer, so they overlap, hence write contention. I've also seen it made worse as the optimizer insists on index fast full scan, pounding cpu (small index relative to pool size). Follow Jonathan's instructions to hone in on the real problem. If you have access to the EM top activity screen, you might see what is being victimized, and work backwards to what is doing the victimization. Often there are multiple problems masked by aggregation of wait statistics.
                    • 7. Re: Re-org performance issue
                      user12083970
                      Exactly, the appearance of Row lock and ITL waits baffled me since I cannot explain why it happens in our system with one SQL*Loader (conventional path) process and 5 readers(or workers as we call them). The workers only read from the detail & dependency tables. They do not modify them.
                      Currently, with the introduction of the archival process and move to Manual mode, the system looks like this:
                      - 1 SQL*Loader (conventional path) process loading into Detail & Dependency tables regularly
                      - 5 workers processing the txns (we increased them to 7 with not much benefit)
                      - 1 Archival process which archives the Processed Detail/Dependency rows and deletes them from Detail/Dependency tables. Frequency of run is every 5 mins.
                      (Tables and indexes are setup with pctfree 40%, freelists 15, freelist groups 2)

                      I understand the buffer busy waits are a result of some contribution from the introduction of archival process and the switch to manual mode. Maybe, freelist settings are not the optimum.
                      However, due to this new aggressive archival process, the sizes of the Detail & Dependency tables have significantly reduced.
                      Detail table after the re-org occupied around 27GB, now it occupies 629MB(76800 blocks). Index is another 100MB.
                      Dependency table after the re-org occupied around 4.5GB, now it occupies 104MB(12800 blocks). Index is another 100MB.

                      This has given rise to new proposal:
                           - Increase frequency of the archival even more (every 3 mins) to reduce the sizes further
                           - Drop the index on dependency table due to the small size of the table
                           - Cache both the tables in memory (keep pool). Not sure if that would be beneficial to address buffer busy waits.

                      Below are the sections from AWR.

                      Top 5 Timed Foreground Events
                      Event          Waits          Time(s)          Avg wait (ms)     % DB time     Wait Class
                      db file sequential read     811,909          10,193          13          37.44          User I/O
                      DB CPU               -          4,473          -          16.43          -
                      log file sync          68,391          3,492          51          12.83          Commit
                      library cache pin     555,281          2,118          4          7.78          Concurrency
                      DFS lock handle     16,231          1,427          88          5.24          Other



                      Load Profile

                                Per Second     Per Transaction     Per Exec     Per Call
                      DB Time(s):     7.6          0.2     0.00     0.01
                      DB CPU(s):     1.3          0.0     0.00     0.00
                      Redo size:     2,054,161.5     41,907.0          
                      Logical reads:     29,979.2     611.6          
                      Block changes:     10,346.8     211.1          
                      Physical reads:     307.5          6.3          
                      Physical writes: 393.6          8.0          
                      User calls:     722.8          14.8          
                      Parses:          666.5          13.6          
                      Hard parses:     8.7          0.2          
                      W/A MB processed: 3.5          0.1          
                      Logons:          0.8          0.0          
                      Executes:     2,308.9     47.1          
                      Rollbacks:     0.0          0.0          
                      Transactions:     49.0          -          


                      Segments by Buffer Busy Waits
                      Object Name          Obj. Type     Buffer Busy Waits     % of Capture
                      SHMT_RTEI11          INDEX          248     30.62
                      ODSDEFAULT_ODS_DETAIL     TABLE          148     18.27
                      ...


                      Buffer Wait Statistics
                      Class          Waits     Total Wait Time (s)     Avg Time (ms)
                      data block          90,159          776          9
                      undo header     881          7          8
                      file header block     16          2          109
                      undo block          444          1          2
                      1st level bmb     14          1          41
                      free list          176          0          2
                      2nd level bmb     10          0          1
                      bitmap index block     3          0          0



                      Thanks in advance.
                      • 8. Re: Re-org performance issue
                        Jonathan Lewis
                        user12083970 wrote:
                        Exactly, the appearance of Row lock and ITL waits baffled me since I cannot explain why it happens in our system with one SQL*Loader (conventional path) process and 5 readers(or workers as we call them). The workers only read from the detail & dependency tables. They do not modify them.
                        Currently, with the introduction of the archival process and move to Manual mode, the system looks like this:
                        - 1 SQL*Loader (conventional path) process loading into Detail & Dependency tables regularly
                        - 5 workers processing the txns (we increased them to 7 with not much benefit)
                        - 1 Archival process which archives the Processed Detail/Dependency rows and deletes them from Detail/Dependency tables. Frequency of run is every 5 mins.
                        So do we think that maybe a process that is busy deleting data from the tables at the same time as another process is loading the data might be the cause of the buffer busy waits and ITL waits - this description is very different from your original description.
                        Below are the sections from AWR.
                        Top 5 Timed Foreground Events
                        Event                      Waits          Time(s)          Avg wait (ms)     % DB time                   Wait Class
                        db file sequential read                 811,909          10,193          13          37.44          User I/O
                        DB CPU                 -          4,473           -          16.43           -
                        log file sync                      68,391          3,492          51          12.83          Commit
                        library cache pin                 555,281          2,118          4          7.78          Concurrency
                        DFS lock handle                 16,231          1,427          88          5.24          Other
                        a) DFS lock handle usually indicates RAC - but you haven't mentioned it up to this point.
                        b) BBW and ITL don't appear in the top 5, and you have an average single block read time of 13 ms - which is reallly bad and could easily be contributing to BBW time, but you think the BBW and ITL times are the thing to chase ?
                        c) log file sync waits are running at an average of 0.05 seconds - which is reallly bad. Perhaps it's a side effect of very large writes associated with SQL*Load, but you need to check that before you worry about BBW.
                        Load Profile
                        
                                  Per Second     Per Transaction     Per Exec     Per Call
                        Redo size:                      2,054,161.5     41,907.0            
                        Physical reads:      307.5           6.3            
                        Physical writes:            393.6           8.0            
                        User calls:                      722.8           14.8            
                        Parses:            666.5           13.6            
                        Hard parses:      8.7           0.2            
                        Executes:                      2,308.9                     47.1            
                        Transactions:      49.0           -            
                        Only one processing deleting - but 49 transactions per second and 2,308 executions per second. I'd guess that the SQL*Load isn't responsible for those figures - unless there's a use of sequences with the ORDER and NOCACHE clauses in place - so you probably need to look at how you're processing your delete.
                        Segments by Buffer Busy Waits
                        Object Name          Obj. Type     Buffer Busy Waits      % of Capture
                        SHMT_RTEI11           INDEX          248     30.62
                        ODSDEFAULT_ODS_DETAIL     TABLE          148     18.27
                        
                        Buffer Wait Statistics
                        Class          Waits     Total Wait Time (s)     Avg Time (ms)
                        data block          90,159          776          9
                        undo header     881          7          8
                        file header block     16          2          109
                        undo block          444          1          2
                        1st level bmb     14          1          41
                        free list          176          0          2
                        2nd level bmb     10          0          1
                        bitmap index block     3          0          0
                        Apparently only about 800 buffer busy waits captured - out of 91,600: which suggests that most of them were "read by other session", which means you need to go and look at all those single block reads and figure out why they take such a long time. How many indexes are you updating as you load data, how randomly are you hitting indexes as you delete data.

                        And you're asking us about 780 seconds of buffer busy waits when you have 10,000 seconds of random I/O to worry about, and 2,000 seconds of library cache pins (which probably relate to the 2,308 executions per second - and are you increasing that problem by doing some stats collection at the end of each data load)

                        Regards
                        Jonathan Lewis
                        • 9. Re: Re-org performance issue
                          user12083970
                          Thanks for all the suggestions, and sorry for replying after a long time...
                          The workers were increased first and then the Oracle SGA was increased. The SGA increase wiped out all issues.
                          The system now looks like this:
                          - 1 SQL*Loader (conventional path) process loading into Detail & Dependency tables regularly
                          - 8 workers processing the txns
                          - 1 Archival process which archives the Processed Detail/Dependency rows and deletes them from Detail/Dependency tables. Frequency of run is every hour.

                          My original query on why a re-org caused an immediate persistent performance issue (only to be resolved by increase in memory) remains unresolved.
                          T1 - Performance is acceptable
                          T2 - Reorg is done causing perfomance issues
                          T3 - SGA is increased which resolves all issues and performance is better than T1

                          If the solution was as simple as increasing the SGA, how could this be identified earlier? What indicators need to be looked at?

                          Thanks
                          • 10. Re: Re-org performance issue
                            rp0428
                            >
                            My original query on why a re-org caused an immediate persistent performance issue (only to be resolved by increase in memory) remains unresolved.
                            T1 - Performance is acceptable
                            T2 - Reorg is done causing perfomance issues
                            T3 - SGA is increased which resolves all issues and performance is better than T1

                            If the solution was as simple as increasing the SGA, how could this be identified earlier? What indicators need to be looked at?
                            >
                            Your case should show you how important it is to backup important data before making changes to production systems.

                            By 'backup important data' I am talking about metadata for the objects involved in your 'reorg' whatever that means; you never did explain exactly what you did that you are calling a 'reorg'.

                            An important tool for troubleshooting problems, especially problems that manifest themselves AFTER some sort of change, is comparing information from before the change to information from after the change. It is simply best if ALL the changes that occured, intentional or not, can be identified.

                            Without knowing what you did to 'reorg' your table it's difficult to be more specific but here are some things that could have been done, before and after the change, to make it easier to determine exactly what changed. Not all of these are always necessary.

                            1. Document the reasons why you doing a 're-org' on the table. Do you realize you didn't even mention the reason in your thread? Perhaps the re-org wasn't even needed and whatever problem you thought you had could have been addressed in some other way.

                            2. Extract and save the DDL for the entire table (table, indexes, triggers, et al). This would include that storage parameters being used (PCTFREE, PCTUSED, et al). You said you changed PCTFREE from 10 to 40 but are you certain it was 10 before the 'reorg'? Or did you just notice that it was 10 after the reorg and so changed it to 40. Sounds minor but that can be important.
                            What were INITRANS and MAXTRANS before the reorg? I'd wager that you don't know. What are they now? Did they change? If they changed could that have an impact?

                            3. Extract and save information about the number of records, number of blocks and number of extents being used by the table and each index. Good - looks like you did this.

                            4. Extract and save execution plan information for the key queries that use the table and indexes. You should always have this information for all of the key production queries whether you plan any reorg or not. The information is valuable if a performance problem arises regardless of the cause.

                            Now that you presumably have the problem solved how about sharing the information you had about the 'problem' indications, what troubleshooting steps you took, what options/solutions you considered and why you chose a table reorg?
                            • 11. Re: Re-org performance issue
                              Jonathan Lewis
                              user12083970 wrote:

                              My original query on why a re-org caused an immediate persistent performance issue (only to be resolved by increase in memory) remains unresolved.
                              T1 - Performance is acceptable
                              T2 - Reorg is done causing perfomance issues
                              T3 - SGA is increased which resolves all issues and performance is better than T1
                              T1 - if performance was acceptable, why did you do a re-org ?
                              We can't tell you why a re-org actually made things worse - beyond observing that the people who understand these things have often pointed out that re-orgs can cause problems if you don't understand your data and don't know what real problem you're trying to fix.
                              If the solution was as simple as increasing the SGA, how could this be identified earlier? What indicators need to be looked at?
                              As I pointed out when you posted the Top 5 - the things you were complaining about weren't visible as major resource consumers (that doesn't mean they weren't significant to your specific process of course), but the things that were in the Top 5 showed that your system as a whole was suffering from an I/O overload (slow times related to I/O). I think I also pointed out that the things you were complaining about were potential side effects of an I/O overload. Such I/O overload is typically indicative of (a) inefficient SQL accessing and discarding excess data or (b) limited memory resources.

                              If I were you I would go back to your earlier (pre increase of SGA) and check the execution paths of the most expensive "SQL ordered by reads" - which might now be in the top "SQL ordered by CPU".

                              Regards
                              Jonathan Lewis