9 Replies Latest reply: Mar 25, 2014 5:25 AM by user5817878 RSS

    11gR2 vs 12c ON HP-UX 11iv3!!

    user5817878

      Hi folks,

       

      I'm neither a database expert nor am a professional dba, but been using Oracle databases in my test environments for quite some time.

      I have been working on comparing single instance 11.2.0.3 database instance vs. 12.1 database instance for similar workloads. Don't ask me why.

       

      During a 4 hour load test, I see that 12c fares pathetically when compared to 11gR2. Top wait events list "enq: US - Contention" and "latch: enqueue chains".

      On an average my TPS would be around 2500 and suddenly it drops to less than 500. When I check the v$session_wat view, I see that 80% of the sessions are in the above events when the TPS numbers fall. Following is a snippet:

       

            3683 enq: US - contention                                             Other                                                                          0                   -1

            3684 enq: US - contention                                             Other                                                                     733594                   -1

            3685 enq: US - contention                                             Other                                                                     409950                   -1

            3686 buffer busy waits                                                Concurrency                                                               188612                   -1

            3687 enq: US - contention                                             Other                                                                     997039                   -1

            3688 enq: US - contention                                             Other                                                                    3736050                   -1

            3689 enq: US - contention                                             Other                                                                          0                   -1

            3690 latch: enqueue hash chains                                       Other                                                                          0                   -1

            3691 enq: US - contention                                             Other                                                                          0                   -1

            3692 enq: US - contention                                             Other                                                                     952676                   -1

            3693 enq: US - contention                                             Other                                                                    1513398                   -1

            3694 enq: US - contention                                             Other                                                                     109772                   -1

       

      I have moved my redo log files and undo datafile onto SSD based storage, increased redo log groups, but I still see the same wait events.

      11gR2 does not seem to have this kind of suffering. I installed 11.2.0.3 database instance on a different boot disk and run the same workload. I do not see these wait events and find no sudden drops in TPS numbers.

      Undo retention values and undo tablespace sizes are same across both instances.

       

      My configuration is as follows:

      64 Gb RAM, 16 cores, general purpose database on VxFS file systems mounted with CIO and delaylog options.

      DB block size 8KB and FS block size 8KB. DB running in dedicated server mode. No archive logging. No flashback recovery. 6 Redo log groups with 2 (1 GB) members each.

      Values for sessions, open_cursors and processes parameters were bumped-up sufficiently.

       

      Tuning:

      Bumped-up shmmax (accommodate memeory_target of 48Gb), semmnu, nproc, nkthread, maxuprc, and a few other over the recommendation made by OUI.

       

      Workload:

      Swingbench 2.5, OE schema, 10GB seed data, 600 users load,

       

      Currently running the same load on 11.2.0.3 after moving redo log files onto SSD based storage to check throughput numbers. I have the awr and addm reports collected during the stress runs.

      Need some support to identify and resolve the bottleneck. This is a test environment, so feel free to suggest config changes as well.

       

      Thanks in advance,

      Rajesh

        • 1. Re: 11gR2 vs 12c ON HP-UX 11iv3!!
          user5817878

          Anyone got any clue? Here is some more data.

           

          12c Top Wait Events:

           

          EventWaitsTotal Wait Time (sec)Wait Avg(ms)% DB timeWait Class
          enq: US - contention1,819,3141.6M87544.1Other
          log file sync7,730,550645.5K8317.9Commit
          latch: enqueue hash chains4,704,762586.3K12516.3Other
          latch free5,679,146363.8K6410.1Other
          enq: HW - contention91,227157.3K17244.4Configuration
          buffer busy waits300,900116.6K3873.2Concurrency
          DB CPU 57.8K 1.6
          library cache: mutex X2,602,24935K131.0Concurrency
          latch: cache buffers chains47,59434.5K7261.0Concurrency
          enq: SQ - contention56410.1K17943.3Configuration

           

          11g Top Wait Events:

          Wait ClassWaits%Time -outsTotal Wait Time (s)Avg wait (ms)%DB time
          Concurrency12,466,11801,573,44912647.86
          Configuration697,7700812,616116524.72
          Commit21,281,6220788,1693723.98
          DB CPU 82,113 2.50
          Other347,4654823,278670.71
          User I/O1,726,633011,75070.36
          Application3,18303601130.01
          Network42,834,628016900.01
          System I/O0 0 0.00

           

          I have captured how undo retention was being tuned during peak load and they stand as follows. Got them from v$UNDOSTAT view:

           

          Time Period11g  Tuned UNDO Retention12c  Tuned UNDO Retention
          0:1011381831
          0:2017451761
          0:3011441162
          0:4017511768
          0:5011541787
          1:0017602392
          1:1011601179
          1:2017671782
          1:3011691805
          1:4017141790
          1:5011182412
          2:0017252416
          • 2. Re: 11gR2 vs 12c ON HP-UX 11iv3!!
            Richard Harrison .

            Hi,

            Some algorithmn must have changed i guess between 11.2 and 12.1 - have you taken a look at metalink note 1332738.1?

            This has some general workarounds for all versions - you could try some of these in 12 to see if you can get heaviour back to how it was in 11.2?

            Are you on the latest psu?

             

            Cheers,

            Rich




            • 3. Re: 11gR2 vs 12c ON HP-UX 11iv3!!
              jgarry

              One thing that strikes me:  The total wait time of the top wait of each is about the same, even though the wait is different.  If your tps is wildly different, that seems to say the wait interface has some failing that hides the real problem.  Beyond Oracle Wait Interface - Part 2 | Tanel Poders blog: Responsible data management

               

              Also, I must add sillly questions:  Did you load the data in exactly the same way?  Are the same plans being used to access the data?

              • 4. Re: 11gR2 vs 12c ON HP-UX 11iv3!!
                chris_c

                I would take a look at v$transaction and v$sesswait while the load is running, whats strikes me is that most wait events in 12c are undo and redo related so I would look for transactions that are generating more undo/redo than in 11.2. If you see a difference for specific transactions then you need to figure out why, possibilities include new features being enabled or bugs, if you get a clear difference in execution plan or undo/redo generation then raising an SR is the next logical step.

                • 5. Re: 11gR2 vs 12c ON HP-UX 11iv3!!
                  user5817878

                  I just copied wait class table from AWR for 12c. I see that "enq: US contention" & "latch: hash enqueue chains" wait events constitute 60% of the database time while the same wait events constitute less than 0.20% in 11gR2.

                  Created a general purpose database for both versions and seeded them independently. Haven't used execution plans from 11g in 12c or vice versa.

                  • 6. Re: 11gR2 vs 12c ON HP-UX 11iv3!!
                    user5817878

                    Chris,Will capture the stats from the said view in my next run. I have ran the same SwingBench workload against both keeping insert, update and select ratios(approx. 20%, 10% & 70%). Statistics from the runs indicate the same.

                    Not sure if we have a support agreement with Oracle, so raising an SR with Oracle is not an options for me. Not sure but it looked like Oracle 12c was GAed on HP-UX without any beta versions or any kind of early partner drops, so chances of hitting bugs is high.

                    • 7. Re: 11gR2 vs 12c ON HP-UX 11iv3!!
                      user5817878

                      Richard, Thanks for the metalink doc pointer!!  I'm a bit novice here, so pardon if my queries sound silly.

                      It suggests setting values for _rollback_segment_count & _highthreshold_undoretention manually. How do determine that what value is appropriate for my workload?

                      Also, patch for Bug:14226599 is not available for 12.1. Can I safely assume that if it is applicable to 12c, it would have already been part of 12c during its release?

                      • 8. Re: 11gR2 vs 12c ON HP-UX 11iv3!!
                        jgarry

                        Are you saying you haven't looked at execution plans?  That's always a potential problem when trying a benchmark like this.  An assumption must be made that the code is well-tuned, and that requires some kind of plan stability.  I have no idea how swingbench works so maybe someone could enlighten me on how or whether it handles this.  In general, you use AWR (if licensed) or statspack to point you in the direction of where performance problems lie, and then proceed to work on them.  Most problems are SQL related: the assumption that 11g and 12c are going to use the same plans with the same data and code is not a good one on its face (though again, I don't know what swingbench does about this) - and not only that, the optimizer has adaptation features and misfeatures, so might not be the same from run to run or might even depend on what was run before.  When the more broad tools point towards some particular SQL, we have a methodology for that: HOW TO: Post a SQL statement tuning request - template posting

                         

                        You might also look around for white papers like http://www.oracle.com/technetwork/database/bi-datawarehousing/twp-optimizer-with-oracledb-12c-1963236.pdf

                        • 9. Re: 11gR2 vs 12c ON HP-UX 11iv3!!
                          user5817878

                          Thanks for the pointer JGarry. Still figuring out how to migrate execution plans between 11g - 12c and will give it a run once comfortable with the process.

                          I tried the simplest solution that struck me after skimming the doc. I set "OPTIMIZER_ADAPTIVE_REPORTING_ONLY=true" and reran the same workload.

                          Found about 60% performance improvement over previous run. Significant, but its still much less than what I got from 11.2.0.3.

                           

                          Undo space contentions wait events dropped in number but "wait average" time for each wait has gone up. So is the case with "latch: enqueue hash chains" wait event.

                           

                          Below are the top 10 wait events from AWR. Second table is with "OPTIMIZER_ADAPTIVE_REPORTING_ONLY=true"

                           

                          EventWaitsTotal Wait Time (sec)Wait Avg(ms)% DB timeWait Class
                          enq: US - contention1,819,3141.6M87544.1Other
                          log file sync7,730,550645.5K8317.9Commit
                          latch: enqueue hash chains4,704,762586.3K12516.3Other
                          latch free5,679,146363.8K6410.1Other
                          enq: HW - contention91,227157.3K17244.4Configuration
                          buffer busy waits300,900116.6K3873.2Concurrency
                          DB CPU 57.8K 1.6
                          library cache: mutex X2,602,24935K131.0Concurrency
                          latch: cache buffers chains47,59434.5K7261.0Concurrency
                          enq: SQ - contention56410.1K17943.3Configuration

                           

                           

                          EventWaitsTotal Wait Time (sec)Wait Avg(ms)% DB timeWait Class
                          log file sync13,559,9151.1M7832.4Commit
                          enq: US - contention654,454824.7K126025.4Other
                          latch: enqueue hash chains2,201,844406.6K18512.5Other
                          latch free3,554,347271K768.3Other
                          enq: HW - contention86,552212.9K24606.6Configuration
                          log file switch (checkpoint incomplete)6,212134.6K216614.1Configuration
                          buffer busy waits366,184125.9K3443.9Concurrency
                          latch: cache buffers chains143,10878.5K5492.4Concurrency
                          DB CPU 72.3K 2.2
                          library cache: mutex X5,151,61651.3K101.6Concurrency