8 Replies Latest reply: Jul 18, 2010 1:17 PM by jeev007 RSS

    CPU load goes very high(any node/any time) hanging all the apps (any node)!

    jeev007
      Hi,

      We have production servers running on

      RHEL 5.3(Red Hat Enterprise Linux Server release 5.3 (Tikanga)) on
      Oracle 11g Standard edition Release2 (11.2.0.1.0) RAC with ASM.
      It is a 2 node RAC setup which is connected by Java & Ruby on Rails applications through oracle client(TNSNAMES).

      In normal time the load is below 1 on both nodes. But in between some spikes happening in DB server (any node) and the load shoots upto 8-10 and hangs the entire applications. Then we are forced to restart the application servers and then restart the DB server and things get back to normal. This happens randomly on any one of the nodes. Same time the load does not shoot up on other node, i.e if the load on node 1 is 9-10, the load on other node is 1.2/1.4. The spikes happen daily 3-4 times minimum. .ie It happens either on Node 1 or Node 2 and it happens adhoc. We have enough memory on our servers (32GB RAM on each server).
      We have noticed that,

      From a session perspective sessions are distributed (say 87 sessions on node1, 85 on node2).
      In normal time, even though the SGA max size is 10G, oracle takes almost 32G memory and free memory is only in MBs say 112MB free.
      We have a scheduled rman full backup runs on everyday 3AM in cron, and noticed that once th RMAN is started, the memory is not getting released.
      We have collected the statspack,sessions details but could not help..
      We dont get any error as such in alert logs
      There is no background/scheduled jobs running on server.

      Has anyone faced this issue with 11g RAC? If so how it has been solved. It is very critical as it directly impacts our business. Any help or hints would be appreciated.

      Thanks
      Jaise.
        • 1. Re: CPU load goes very high(any node/any time) hanging all the apps (any node)!
          Billy~Verreynne
          user13346877 wrote:

          It is very critical as it directly impacts our business. Any help or hints would be appreciated.
          Jaise, that means not using this forum, but filing a SR (Service Request) asap with Oracle Support (https://support.oracle.com/).

          Advice here is unsolicited, can be WAGs, or wrong for the version/os/storage/whatever you use, or address a problem with similar symptoms that you are not experiencing.

          I would be very careful using a forum like this to fix a critical problem that impacts business. For that I want proper controls, escalation procedures and accountability. Not base the resolution on "free advice".

          Though I will bounce such a problem around a public forum to see if I can gain any better insights to the problem.

          The suggestion from a technical side is to determine what process is causing the high CPU utilisation. It is unusual, given the Linux kernel scheduling priority, for any ordinary (non-realtime) process to tie the server up to such an extent that everything else hangs. Worse case that everything will slow down.. but still respond.

          Unless it is a RT process - and then all bets are off as it will receive most of the CPU time from the kernel. Typical example is running out of memory and swap space being trashed. The swap daemon can easily consume 99% of all CPU resources (even on a multiple CPU server as there are multiple swap daemons).

          On the Oracle side, AWR reports are very useful to get a picture of what the database saw as the load, top wait events, heavy SQLs (ito CPU utilisation) and so on.

          Also, various logs (kernel log, alert log, src log, storage logs, etc) needs to be checked for potential anomalies.
          • 2. Re: CPU load goes very high(any node/any time) hanging all the apps (any node)!
            692600
            Above suggestion is right, you should get in touch with Oracle support for any business critical issues and use forums for additional insight.
            In normal time, even though the SGA max size is 10G, oracle takes almost 32G memory and free memory is only in MBs say 112MB free.
            Probably that's you starting point for investigation. Do you know what oracle process is consuming most of the memory apart from RMAN? JDBC?

            As you mentioned java based apps, I believe application would be using JDBC connections to access database. JDBC since 10g is notorious regarding memory consumption. With each release, oracle puts lots of effort in developing JDBC drivers for optimal access to database. with 10g, they comprimized memory usage for performance, with 11gR1, they tried to optimize performance as well as memory usage. Thus it's always a good practise to read the JDBC driver documentation before each upgrade if you are using it to access database to know the implications of upgrade. And don't forget to terminate jdbc connection during long idle times.

            The read-me of 11gR2 jdbc drive is given below. You can get the details for all previous releases too.
            http://download.oracle.com/otn/utilities_drivers/jdbc/112/Readme.txt (look for "Reduced Memory Footprint" )

            Cheers.
            • 3. Re: CPU load goes very high(any node/any time) hanging all the apps (any node)!
              user11214836
              Hi,

              As you mention, you scheduled RMAN fullbackup using crontab:
              Please execute
              exec dbms_stats.gather_fixed_objects_stats;
              Metalink Note ID :357765.1

              May this will improve your performance issue.

              Regards
              • 4. Re: CPU load goes very high(any node/any time) hanging all the apps (any node)!
                jeev007
                Thanks Billy for your input and I totally agree with you that since it is production, we need to contact oracle support.

                I have already raised a P1 issue with Oracle and hope it would be addressed soon. Same time just wanted to check with Forum members whether they faced similar situation and how they solved it.

                Jaise.
                • 5. CPU load goes very high(any node/any time) hanging all the apps (any node)
                  jeev007
                  Hi,

                  Thanks for your input.

                  We use Oracle 11 g Standard edition. Executed the Statistics collection for fixed objects.

                  But got the below error. Seems it will work only with Data guard.

                  SQL> exec dbms_stats.gather_fixed_objects_stats;
                  BEGIN dbms_stats.gather_fixed_objects_stats; END;

                  *
                  ERROR at line 1:
                  ORA-20011: Approximate NDV failed: ORA-00439: feature not enabled: Data Guard
                  Broker
                  ORA-06512: at "SYS.DBMS_STATS", line 20508
                  ORA-06512: at "SYS.DBMS_STATS", line 20951
                  ORA-06512: at "SYS.DBMS_STATS", line 21498
                  ORA-06512: at line 1


                  SQL>

                  Jaise
                  • 6. Re: CPU load goes very high(any node/any time) hanging all the apps (any node)!
                    736258
                    What is your parallel_max_servers setting?
                    • 7. Re: CPU load goes very high(any node/any time) hanging all the apps (any node)!
                      727876
                      Hi,

                      you say that: +"We have a scheduled rman full backup runs on everyday 3AM in cron, and noticed that once th RMAN is started, the memory is not getting released."+

                      Does the RMAN job finish successfully?
                      Is there any file system backups running on the server over the same time period as the RMAN backup?
                      • 8. Re: CPU load goes very high(any node/any time) hanging all the apps (any node)!
                        jeev007
                        Yes, the RMAN backup successfully completes. Same time no other background process /scheduled jobs are running.