7 Replies Latest reply on Jan 26, 2018 9:50 AM by Jonathan Lewis

    OPTIMIZER_FEATURES_ENABLE change caused massive hard parse and slowness

    3341881

      DB version : 11.2.0.4

      Platform : Oracle Linux 6.8

       

       

      I have a very busy OLTP production database which is a 4-node RAC DB serving the Siebel application.

      This DB was upgraded 2 months back from 11.2.0.3 to 11.2.0.4. But, I kept the OPTIMIZER_FEATURES_ENABLE to 11.2.0.3 as a precautionary measure.

      So, 2 days back I modified OPTIMIZER_FEATURES_ENABLE parameter from 11.2.0.3 to 11.2.0.4 across all the instances. Its a dynamic parameter BTW.

       

      Suddenly the application became slow and impacted our business. So, around 50 minutes after i changed OPTIMIZER_FEATURES_ENABLE , I had to revert it back to 11.2.0.3. We were seeing lot of hard parses when OPTIMIZER_FEATURES_ENABLE was changed. I have few questions on this.

       

      Question1.

      When I change OPTIMIZER_FEATURES_ENABLE parameter, will oracle flush all cursors from Library Cache ? Is this why there were so many hard parses ?

      Oracle documentation page on  OPTIMIZER_FEATURES_ENABLE doesn't mention any such thing.

       

       

      Another weird thing I noticed is that when I reverted OPTIMIZER_FEATURES_ENABLE to 11.2.0.3 , there weren't many hard parses.

      ie. If changing this parameter to 11.2.0.4 caused lot of hard parses, why weren't there so much hard parses when I reverted the parameter to 11.2.0.3 ?

       

      Question2.

      Unfortunately, I modified OPTIMIZER_FEATURES_ENABLE parameter from 11.2.0.3 to 11.2.0.4 during business hours (10:35 AM) .

      Maybe if I did it during off-peak hours , hard parsing would have still happened for the most common SQLs but with less load Oracle will have parsed it without much hassle . Right ?

       

       

      Question3. Was Oracle generating multiple plans as part of Adaptive cursor sharing during this time of reqression?

       

      I am attaching AWR reports from Instance 1 and Instance 2 between 10AM to 11:30 AM . At 10:35AM , I changed the parameter to 11.2.0.4 and by 11:25 , I reverted it to 11.2.0.3 due to DB slowness.

       

       

      All suggestions/findings welcome.

        • 1. Re: OPTIMIZER_FEATURES_ENABLE change caused massive hard parse and slowness
          SeánMacGC

          Hello, version 11.2.0.4 was the first to introduce adaptive SQL Plan Management (SPM), which, of necessity, will involve more hard parses, initially at least, but should, in theory, result in overall better performance.

           

          v12.1 refined SPM, and v12.2 refines it further, specifically by allowing control of both plans and statistics separately (optimizer_adaptive_plans and optimizer_adaptive_statistics, respectively, with the former defaulting to TRUE and the latter to FALSE).

          1 person found this helpful
          • 2. Re: OPTIMIZER_FEATURES_ENABLE change caused massive hard parse and slowness
            AndrewSayer

            When you change your optimizer environment, you must parse a child cursor that belongs to that optimizer environment.

             

            When you changed your optimizer environment back, your sessions were able to just re-use the child cursors that were already parsed under that optimizer environment.


            If you want to know why Oracle is doing a hard parse for a query that already has a parsed cursor, check v$sql_shared_cursor for the reason.

             

            AWR reports that include both slow and fast periods are not that useful, you want to be able to compare the two different periods not average them together.

             

            -Edit

            I opened up the attachments. What makes you think parsing was a problem?

             

            Statistic NameTime (s)% of DB Time
            sql execute elapsed time1,383,205.8799.88
            DB CPU186,735.9813.48
            parse time elapsed1,296.800.09
            hard parse elapsed time1,107.260.08
            hard parse (sharing criteria) elapsed time899.240.06
            connection management call elapsed time85.810.01
            PL/SQL execution elapsed time20.010.00
            PL/SQL compilation elapsed time11.880.00
            failed parse elapsed time1.050.00
            repeated bind elapsed time0.610.00
            sequence load elapsed time0.540.00
            hard parse (bind mismatch) elapsed time0.010.00
            DB time1,384,917.25
            background cpu time12,709.95
            background elapsed time7,440.27
            1 person found this helpful
            • 3. Re: OPTIMIZER_FEATURES_ENABLE change caused massive hard parse and slowness
              3341881

              Thank You Sean, Andrew

               

              I am attaching another AWR report from instance1 for today Jan25 , 11:00AM to 12:00PM (1 hour period) . During this time, the DB worked just fine despite relatively high load.

               

              What makes you think parsing was a problem?

              Hi Andrew,

              When the DB was facing performance issues , the "sql execute elapsed time" was showing 99.88% in the "Time Model Statistics" section (AWR attached in the previous post).

              So, does this mean that the SQLs were running slow with the newly generated plans after the 'parse storm' ?

              • 4. Re: OPTIMIZER_FEATURES_ENABLE change caused massive hard parse and slowness
                Jonathan Lewis

                At first sight it looks as if you've simply overloaded the machines CPU (don't be misled by the apparent 50% utilisation - you're machine is behaving like 40 CPUs, not the 80 CPUs it's pretending to be).

                 

                If you check the total time spent, a large fraction of it is in latch sleeps on the cache buffers chains, and another large fraction in global cache consistency. You have a high rate of single block reads, and a huge rate of logical I/Os, so it's likely that the rate of work (particularly the buffer activity) is resulting in CPU loading in a few hot spots in the data sets which result in lots of latch spins (burning more CPU) and long sleeps, which exacerbate any problems of buffer busy waits and global cache coherence.

                 

                You'll note, also, in the SQL order by Cluster Wait Time that the total time of the top few statement is roughly: 25% cluster time, 8% CPU time, 10% I/O time - which means 57% LOST time - which is your sessions sitting in the run queue with Oracle thinking they're running (as opposed to waiting in an instrumented wait) when they're not.

                 

                There may be something more subtle going on - for example you have a very large number of "gc buffer busy acquire" but a very small number of "gc buffer busy release" that between them don't look possible; and there's also the oddity that the O/S interconnect traffic is zero - which suggests that either you're not using the private interconnect, or there's an instrumentation error in Oracle. However, before getting too far into rocket science I would at this point check a couple of older AWR reports for about the same time period to see if "workload per execution" for some of the expensive statement has changed - drill in to a couple of the big SQL_IDs to check whether they've had multiple execution plans that vary from the 11.2.0.3 interval to the 11.2.0.4 interval - the root problem may simply be that the change in version produced a couple of very expensive changes in plan.

                 

                Regards

                Jonathan Lewis

                 

                 

                P.S  I wouldn't have made the change while the system was live - I would have tried to work out a plan to stop and restart (perhaps one instance at a time), or flush shared_pool at a quiet moment for each instance in turn to make the change.  It's possible that the excessive workload was induced by a side effect of a dynamic change when you already had a relatively high version count on some  cursors in memory.

                 

                P.P.S You'll note that your connection count jumps significantly from the start to the end of the period. It's possible that a small stutter in the database response time around the time of the change was sufficient to panic the front end into starting a lot more sessions from its connection pool which, in turn, pushed the system into a catastrophe of continuous contention.  (In other words, you just got unlucky.)

                1 person found this helpful
                • 5. Re: OPTIMIZER_FEATURES_ENABLE change caused massive hard parse and slowness
                  Jonathan Lewis

                  I see you loaded an appropriate AWR report while I was typing my previous reply.

                  Take a look at the "SQL ordered by" sections (e.g. by Gets or CPU or Cluster Wait time) the top few statements in the bad period don't even appear in the good period.

                  This suggests either (a) catastrophic changes in plan or (b) some people who only work at certain times of day on certain operations.

                   

                  Pick a couple of the worst SQL IDs from the CPU/buffer gets sections and see if you can find their execution plans and operation costs in memory and compare with the plans from the bad period.

                   

                  Regards

                  Jonathan Lewis

                  1 person found this helpful
                  • 6. Re: OPTIMIZER_FEATURES_ENABLE change caused massive hard parse and slowness
                    Dom Brooks

                    Anytime you've got all code going to any instance for any workload/sql/object, you're asking for trouble if everything isn't already super efficient... RAC exaggerates any inefficiencies. And any time you do an upgrade, you're quite likely to have one or two statements turn nastily inefficient.

                     

                    Most RAC systems I've worked on, where the application code combined with data model has been inherently inefficient (so that'll be all of them), we've ended up going for node affinity even if it means that you end up with other instances doing very little ... which can be politically awkward.

                    1 person found this helpful
                    • 7. Re: OPTIMIZER_FEATURES_ENABLE change caused massive hard parse and slowness
                      Jonathan Lewis

                      Dom Brooks wrote:

                       

                       

                      Most RAC systems I've worked on, where the application code combined with data model has been inherently inefficient (so that'll be all of them),.

                       

                      And this system is running Siebel - which is engineered to be inefficient on Oracle.  The _optimizer_max_permutations = 100 is almost guaranteed to produce randomly bad join orders; and with the (for Siebel) usual optimizer_mode = first_rows_1 (which isn't in the parameter file in this case, but is probably being set by the application code) it's almost inevitable that everything will be nested loop joins with a huge tendency to pick the wrong indexes and do massive amounts of inappropriate buffer gets and disk reads - aided, of course, by Siebel's insistence that the hundreds of redundant indexes on the system should not be dropped.

                       

                      Siebel's habit of creating tables with more than 255 columns, using only 40% of them, then adding a dozen more on each upgrade - while allowing the business to add custom columns as well - invariably leads to lots of chained/migrated/continued rows that only add to the burden of random access.

                       

                      Regards

                      Jonathan Lewis

                      1 person found this helpful