This discussion is archived
3 Replies Latest reply: Apr 20, 2013 10:20 PM by KR10822864 RSS

Oracle cluster RESTART_COUNT growing

TasslehoffBurrfoot Newbie
Currently Being Moderated
Hi all, I have a problema with an Oracle 10 instance configured as active/passive cluster with CRS, the storage is configured with ASM.
Please be patient as I'm a simple sysadmin and I have almost no experience with Oracle db, specially with CRS and ASM configurations.

The problem cause the dbms relocation from the active node to the passive node when the crs RESTART_COUNT parameter reaches RESTART_ATTEMPTS parameter value.
The two dbms hosted on this Oracle instance are really stable and work very well, we are searching for the cause of this problem to prevent unmanaged relocates and interruptions.

Having no experience with Oracle CRS we are monitoring these instances with a simple bash script that lauches crs_stat -v, parse the results and send an alert upon reaching a safety threshold, after that we schedule a programmed manual relocate to reset RESTART_COUNT parameter.

This problem does not occur frequently but it's quite boring, we are searching for any log that can give us some clue to find the real problem and find a real solution.

Any suggestion on some logs?
I've already checked alert.log and the listener log but I found nothing useful, is there any CRS log?

The Oracle instances (10.2.0.3.0 Enterprise) are installed on RedHat Linux ES 4.0 x64.

Thanks for any info.

Tas
  • 1. Re: Oracle cluster RESTART_COUNT growing
    onedbguru Pro
    Currently Being Moderated
    Go the ORACLE_HOME for the CRS and search for all of the logs you are looking for ohasd.log. You can ignore the install logs most of the logs will be 4-5 characters crsd.log, ohasd.log ctssd.log etc...

    I am guessing you are using active/passive for cost savings (ie not using RAC).
  • 2. Re: Oracle cluster RESTART_COUNT growing
    TasslehoffBurrfoot Newbie
    Currently Being Moderated
    onedbguru wrote:
    Go the ORACLE_HOME for the CRS and search for all of the logs you are looking for ohasd.log. You can ignore the install logs most of the logs will be 4-5 characters crsd.log, ohasd.log ctssd.log etc...
    First of all, thanks for reply :)

    I found only the crsd.log under /u01/crs/oracle/product/10.2.0/crs/log/orclsrv16/crsd/crsd.log, orclsrv16 is the server name.
    On that log I found some records created during every RESTART_COUNT parameter increase, here is the last occurrence of the problem:
    -----
    2013-04-16 12:22:23.189: [  CRSAPP][1482860896]0CheckResource error for PORTAL.listener error code = 1
    2013-04-16 12:22:23.194: [  CRSRES][1482860896]0In stateChanged, PORTAL.listener target is ONLINE
    2013-04-16 12:22:23.194: [  CRSRES][1482860896]0PORTAL.listener on orclsrv16 went OFFLINE unexpectedly
    2013-04-16 12:22:23.194: [  CRSRES][1482860896]0StopResource: setting CLI values
    2013-04-16 12:22:23.211: [  CRSRES][1482860896]0Attempting to stop `PORTAL.listener` on member `orclsrv16`
    2013-04-16 12:22:32.740: [  CRSRES][1482860896]0Stop of `PORTAL.listener` on member `orclsrv16` succeeded.
    2013-04-16 12:22:32.740: [  CRSRES][1482860896]0PORTAL.listener RESTART_COUNT=36 RESTART_ATTEMPTS=100
    2013-04-16 12:22:32.740: [  CRSRES][1482860896]0PORTAL.listener Uptime does not exceed uptime_threshold
    2013-04-16 12:22:32.741: [  CRSRES][1482860896]0Restarting PORTAL.listener on orclsrv16
    2013-04-16 12:22:32.757: [  CRSRES][1482860896]0startRunnable: setting CLI values
    2013-04-16 12:22:32.757: [  CRSRES][1482860896]0Attempting to start `PORTAL.listener` on member `orclsrv16`
    2013-04-16 12:22:32.953: [  CRSRES][1482860896]0Start of `PORTAL.listener` on member `orclsrv16` succeeded.
    2013-04-16 12:22:32.955: [  CRSRES][1482860896]0Successfully restarted PORTAL.listener on orclsrv16, RESTART_COUNT=37
    2013-04-16 12:22:32.991: [  CRSRES][1482860896]0PORTAL.listener Updated LAST_RESTART time in ocr
    -----

    It seems that all starts with that CheckResource error on PORTAL.listener.
    PORTAL.listener is one of the HA resources returned by the crsstat command, I can presume it's related to the Oracle listener, is there any way to check that relationship?
    I mean is there any configuration file, parameter or any other way that may confirm that PORTAL.listener is the CRS name for the Oracle listener?

    I checked the listener log and I found an error with the same timestamp of the crsd.log error, now I searching for more details on that error.
    I am guessing you are using active/passive for cost savings (ie not using RAC).
    Well to be honest we don't searched for CRS, we need to migrate two db on a different Oracle server for infrastructure consolidation, those CRS instances were already available and licensed with a very low system load, so we "catch the opportunity".
    I wound prefer two indipendent Oracle instances on an A/P cluster with redhat cluster manager, lee integration with Oracle but easier to manage imho :)

    Edited by: 1000142 on 16-apr-2013 6.12
  • 3. Re: Oracle cluster RESTART_COUNT growing
    KR10822864 Pro
    Currently Being Moderated
    Any suggestion on some logs?
    I've already checked alert.log and the listener log but I found nothing useful, is there any CRS log?

    The Oracle instances (10.2.0.3.0 Enterprise) are installed on RedHat Linux ES 4.0 x64.
    hope you may get clues from below logs in RAC env to do troubleshoot any prob related cluster db.

    $ORA_CRS_HOME/crs/log Contains trace files for the crs resources.you may get error details ....etc..
    $ORA_CRS_HOME/crs/init Contains trace files of the CRS daemon during startup and it may show any CRS login problems etc...
    $ORA_CRS_HOME/css/log The Cluster Synchronization (CSS) logs indicate all actions such as reconfigurations, missed check-ins, connects, and disconnects from the
    client CSS listener. In some cases, the logger logs messages with the category of auth.crit for the reboots done by Oracle. This could be used for checking the exact time when the reboot occurred.
    $ORA_CRS_HOME/css/init Contains core dumps from the Oracle Cluster Synchronization Service daemon (OCSSd) and the process ID (PID) for the CSS daemon whose
    death is treated as fatal. If abnormal restarts for CSS exist, the core files will have the format of core..etc...
    $ORA_CRS_HOME/srvm/log Log files for Oracle Cluster Registry , which contains the details at the Oracle cluster level.
    $ORA_CRS_HOME//log Log files cluster alert log, which contains diagnostic messages at the Oracle cluster level. This is available from Oracle database 10g r2.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points