This discussion is archived
10 Replies Latest reply: Mar 21, 2013 7:06 AM by marksmithusa RSS

ONS process and high paging on AIX

ebh-OC Newbie
Currently Being Moderated
Dear all,

I would like to share with you a problem that we've been facing in the past few days, and hope to get some suggestions or solutions from your side:

First of all we are running an Oracle RAC 10g (10.2.0.3) with two database nodes on two dedicated IBM Power servers with AIX 5.3 as OS.
We have executed a memory upgrade during last week by Replacing on both servers our old 8GB RAM with new 16GB RAM, and after that we increased SGA_target to 8GB and SGA_max_size to 9GB on both instances. Unfortunately the next day one of the instances crashed, after investigation we have noted that there was very high paging activity on the server, so we immediately increased the swap space from 16GB to 48GB and restarted the server.
Also the next day the swap space was full and the instance crashed again, so we decided to decrease the sga on that instance to 5 GB, but once again the paging space was filled so we flushed the swapping space into another target to avoid the crash.
We noted that a processes on the server was consuming most of the memory (and paging): and that is (ONS) /oracle/opmn/bin, and we found in ons.log a repeating message:
(Passive connection: 0,<IP of localhost>,6200 invalid connect server IP fromat)
and below that we have: (hostaname:<name of the second server>)
Noting that we have changed nothing in our cluster configuration.

I would appreciate any suggestions here, I need to know if is it normal that the ons process consumes high memory (The highest consuming process on the server), If not what could be the problem. Or could it be that we are facing defected hardware with the new installed RAMs noting that using topas on AIX, the memory capacity is exact and during the installation all went smoothly and the server started normally.

Thank you for your help
  • 1. Re: ONS process and high paging on AIX
    marksmithusa Journeyer
    Currently Being Moderated
    Well, you're running 10.2.0.3 (which is no longer supported by Oracle) with AIX 5.3 (which is no longer supported by IBM). You're running an uncertified, never mind unsupported configuration. From what I understand from our system admin team, IBM won't even take your call if you have an issue and say you're running 5.3.

    Obviously, I understand that you might be stuck on that version and there's nothing you can do about it. However, if you are able, I would recommend upgrading to 11.2 and then AIX 6 (at least).

    Why are you running the ONS: are you using Fast Application Notification? If not, and it's unlikely that you are, you don't need the ONS process to be running.

    Of course, this post comes with all the usual caveats: don't trust anyone on a forum, test it out yourself in a non-Production environment first, etc, etc.

    What's your PGA usage? That can sometimes be the cause of the memory getting swallowed up on a box (though it would be a co-incidence that it happened when you replaced the memory). It is something to check.
  • 2. Re: ONS process and high paging on AIX
    ebh-OC Newbie
    Currently Being Moderated
    Thank you for your response,

    Actually we have a maximum number of 300 users, and about 220 of them are connected at the same time, our PGA is set for 989 MB, this value was implemented after many DBA's suggestions saying that an average user consumption is about 4 to 5 MB. Actually crashes have taken place during low load times.
    We are conducting tests today, we are gonna install back the old RAMs, and conduct a firmware update for the server's mother board.
    I want to know if there's a way to know if Fast Application Notification is being used, I know that load balancing and instance fail-over is used, but unfortunately it is not us who installed the RAC system. We are meanwhile preparing a high availability new system with the latest Oracle releases.
  • 3. Re: ONS process and high paging on AIX
    ebh-OC Newbie
    Currently Being Moderated
    +13/03/18 09:34:33 [2] Passive connection 0,<IP of server 1>,6200 invalid connect server IP format+
    +3232237319,6200,6113,0+

    ONSinfo: !!3232237319!0!6200!0!6113

    hostName: <hostname of server2>
    clusterId: databaseClusterId
    clusterName: databaseClusterName
    instanceId: databaseInstanceId
    instanceName: databaseInstanceName




    This is the message occurring in a very frequent way in ons.log, and unfortunately I can't find any documentation for that.
    Regards.
  • 4. Re: ONS process and high paging on AIX
    marksmithusa Journeyer
    Currently Being Moderated
    http://docs.oracle.com/cd/E11882_01/java.112/e16548/apxracfan.htm#CDCBCDBD

    The above link might offer some insight. FAN is event-driven, but needs for someone to develop the actions that are driven by the events.

    The PGA metric is a target: that means Oracle will try and keep its PGA usage within those bounds, but there are quite a few ways to make the PGA balloon and eat up all of the memory (and swap) of the server. It might be worth checking into, just in case.

    But I suspect the issue is a) with your ONS configuration and b) something to do with your memory structures
  • 5. Re: ONS process and high paging on AIX
    ebh-OC Newbie
    Currently Being Moderated
    Further investigations on the issue have led us to notice that the ons daemon is stopped, starting it using onsctl would change nothing regarding the memory consumption,
    but on another side killing the process on the OS level dramatically solved the memory consumption issue without any negative effect on the cluster or the overall system,
    but the thing is that the process ons (oracle/ocr/opm/) restarts on its own and again begins consuming memory in an ascending manner. Any idea how to definitely stop that process ?



    Regards.
  • 6. Re: ONS process and high paging on AIX
    marksmithusa Journeyer
    Currently Being Moderated
    Check srvctl and see what that has for the entry for the ONS daemon. You might be automatically restarting it after a failure.
  • 7. Re: ONS process and high paging on AIX
    ebh-OC Newbie
    Currently Being Moderated
    Thank you for your help,

    Actually We've checked onsctl ping, and even after few seconds of stopping the daemon (onsctl stop) it restarts again and onsctl ping returns (ons is running ...).
    In srvctl we always get the following result:

    +$ srvctl status nodeapps -n p51+
    VIP is running on node: p51
    GSD is running on node: p51
    Listener is running on node: p51
    ONS daemon is running on node: p51


    Someone suggested that I should stop ons on both instances and perform a clean restart and this should solve the problem. What do you think about that ?


    You've said :
    +"You might be automatically restarting it after a failure."+

    If that's true, any Idea how to stop the automatic restart ?


    Regards
  • 8. Re: ONS process and high paging on AIX
    marksmithusa Journeyer
    Currently Being Moderated
    975727 wrote:
    Thank you for your help,
    You are very welcome :)

    >
    Someone suggested that I should stop ons on both instances and perform a clean restart and this should solve the problem. What do you think about that ?
    I don't think that will solve your problem - it sounds very much like you have auto_start enabled. My 10g RAC is a bit rusty, but you might try doing something like this:

    *> srvctl status nodeapps -n cheese*
    VIP is running on node: cheese
    GSD is running on node: cheese
    Listener is running on node: cheese
    ONS daemon is running on node: cheese

    Now check what the name of the ONS is on the server(s) you want to look at

    crs_stat -l | grep ons
    NAME=ora.cheese.ons
    NAME=ora.cracker.ons

    Use crs_stat -f to show detailed information about the ONS process:

    crs_stat -f ora.cheese.ons | grep AUTO_START
    AUTO_START=always

    Notice that the AUTO_START is set to ALWAYS (at least in this example) You can choose either 'always', 'restore', 'never'. You can also use numbers, I think.

    The process to actually change it is rather complicated on 10gR2 (I think) - you have to create a profile, edit that, shutdown the component, re-register it with the new profile and then restart.

    The best example of something similar that I can find is [this link|http://surachartopun.com/2009/07/change-oracle-asm-resource-autostart.html] (obviously, this is for the ASM, not the ONS)

    Barring that, you could ask Oracle for the correct syntax. Unfortunately, I don't have a 10gR2 RAC I can try this out on (we upgraded them all to 11gR2)

    Good luck!
  • 9. Re: ONS process and high paging on AIX
    ebh-OC Newbie
    Currently Being Moderated
    Thank you for that !

    It was very helpful, I will work on it :)
  • 10. Re: ONS process and high paging on AIX
    marksmithusa Journeyer
    Currently Being Moderated
    You're welcome :). Happy to help (assuming it does help and if it causes you Production problems, then it's not me!)

    Good luck and let me know how it goes. It is a pain and it is easier in 11gR2...

    (P.S. try and upgrade ASAP!)

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points