This discussion is archived
1 2 Previous Next 17 Replies Latest reply: Oct 22, 2012 9:40 AM by rcc50886 RSS

How does 11g CRS start if OCR is in ASM

Chuck1958 Newbie
Currently Being Moderated
I've got an 11gR2 2 node RAC cluster that I inherited running on Win 2008 R2. Being only somewhat familiar with RAC from a 10g class I took years ago, my question is this...

CRS needs to read the OCR to start up. In 11gR2, by default the OCR is stored in ASM. ASM however gets started after CRS (by CRS). How is that possible? How does CRS start ASM, if it needs ASM running to read the OCR?

I have a 2nd question too that may be related (or not). When a node crashes (blue screens) or gets restarted via the start button (start/shutdown/restart) it never rejoins the cluster. The only way I've ever gotten either node to rejoin is to power cycle it. Even then I've found I must wait for one node to come back up completely with all Oracle services running before I can start the 2nd node or else it will not join the cluster. Is this a known/common problem? Does it behave like this on Linux too? The error I always get is that CRS is not running. If I run "crsctl start crs" however, it says it IS running. If I run "crsctl stop crs" or "crsctl stat res" it says it's NOT running.

TIS
  • 1. Re: How does 11g CRS start if OCR is in ASM
    damorgan Oracle ACE Director
    Currently Being Moderated
    The architecture of 11g CRS and 10g CRS are different ... hit the docs at http://tahiti.oracle.com.

    Pay attention to the sections on the SCAN address too.
  • 2. Re: How does 11g CRS start if OCR is in ASM
    Chuck1958 Newbie
    Currently Being Moderated
    SCAN I'm familiar with. Once CRS is up and running SCAN has never been an issue for me. Getting CRS to start often is.

    Where would I find the docs on how CRS starts up and reads the OCR? Is it under RAC, High Availability, Grid Computing, or Clusterware? (Finding things in Oracle's documentation is always a challenge).

    Thanks
  • 3. Re: How does 11g CRS start if OCR is in ASM
    damorgan Oracle ACE Director
    Currently Being Moderated
    Just search for "Clusterware."

    Look at "top matching books" on the right-hand side of the page.
  • 4. Re: How does 11g CRS start if OCR is in ASM
    Chuck1958 Newbie
    Currently Being Moderated
    I've done that and found perhaps 1000 references, none of which are what I'm looking for. Same-old Same-old whenever trying to find anything in Oracle's docs.
  • 5. Re: How does 11g CRS start if OCR is in ASM
    Levi-Pereira Guru
    Currently Being Moderated
    CRS needs to read the OCR to start up. In 11gR2, by default the OCR is stored in ASM. ASM however gets started after CRS (by CRS). How is that possible? How does CRS start ASM, if it needs ASM running to read the OCR?
    Key: ASM does not depend on CRS to start. ASM depend on Voting Disk which is accessed directly on Lun (without ASM).

    Startup Sequence:

    init -> OHAS (OLR) -> CSSD/ASM (access ASMDISK/LUN and read Voting Disks) -> CRSD (open DISKGROUP and read OCR)

    http://docs.oracle.com/cd/E11882_01/rac.112/e16794/intro.htm#BABIDEFI

    There are many improvements in version 11.2.

    Short anwser is:

    When Clusterware starts three files are involved.

    OLR - Is the first to be read and opened. This file is local and this file contains information where is stored voting disk, and information to startup the ASM. (e.g ASM DiscoveryString)

    VOTING DISK - This is the second file to be opened and read, to read the voting file only depend on the OLR be accessible. ASM start after CSSD or ASM does not start if CSSD is offline (i.e voting file missing)

    How Voting Disk are stored in ASM?
    Voting disks are placed directly on ASMDISK. Oracle Clusterware will store the votedisk on the disk within a disk group that holds the Voting Files.
    Oracle Clusterware does not rely on ASM to access the Voting Files, that’s means wich Oracle Clusterware does not need of Diskgroup to read and write on ASMDISK.
    You only know if exist a voting files in a ASMDISK (v$asm_disk using column VOTING_FILE). So, voting files not depend of Diskgroup to be accessed, does not mean that, we don’t need the diskgroup, diskgroup and voting file are linked by their settings.

    OCR - Finally the ASM Instance starts and mount all Diskgroups, then Clusterware Deamon (CRSD) open and read the OCR which is stored on Diskgroup.

    So, if ASM already started, ASM does not depend on OCR or OLR to be online. ASM depend on CSSD (Votedisk) to be online.

    There is a exclusive mode to start ASM without CSSD (but it's to restore OCR or VOTE purposes)


    http://levipereira.wordpress.com/2012/01/11/explaining-how-to-store-ocr-voting-disks-and-asm-spfile-on-asm-diskgroup-rac-or-rac-extended/

    Edited by: Levi Pereira on Oct 11, 2012 4:54 PM
  • 6. Re: How does 11g CRS start if OCR is in ASM
    863149 Newbie
    Currently Being Moderated
    Thanks Levi for perfect information !!!.
  • 7. Re: How does 11g CRS start if OCR is in ASM
    Chuck1958 Newbie
    Currently Being Moderated
    So do you have any idea why I cannot reboot a node and have it rejoin the cluster? Or have it rejoin after a blue screen?

    crsctl start crs always fails with a "CRS-0184: Cannot communicate with the CRS daemon." If I completely power it off and back on though, it starts up and rejoins the cluster. A simple reboot never works.

    The hardware for the cluster is 2 HP Blades.
  • 8. Re: How does 11g CRS start if OCR is in ASM
    Levi-Pereira Guru
    Currently Being Moderated
    Chuck1958 wrote:
    So do you have any idea why I cannot reboot a node and have it rejoin the cluster? Or have it rejoin after a blue screen?

    crsctl start crs always fails with a "CRS-0184: Cannot communicate with the CRS daemon." If I completely power it off and back on though, it starts up and rejoins the cluster. A simple reboot never works.

    The hardware for the cluster is 2 HP Blades.
    I already faced this problem and solved with this MOS note:

    *RAC on Windows: Oracle Clusterware Node Evictions a.k.a. Why do we get a Blue Screen (BSOD) Caused By Orafencedrv.sys? [ID 337784.1]*
  • 9. Re: How does 11g CRS start if OCR is in ASM
    Chuck1958 Newbie
    Currently Being Moderated
    I dont think this is the same problem. I'm not experiencing node evictions. This problem only occurs when I reboot the server ( or it blue screens for a legitimate reason ). The node doesn't rejoin the cluster again unless I completely power it off and back on.
  • 10. Re: How does 11g CRS start if OCR is in ASM
    Levi-Pereira Guru
    Currently Being Moderated
    Chuck1958 wrote:
    I dont think this is the same problem. I'm not experiencing node evictions. This problem only occurs when I reboot the server ( or it blue screens for a legitimate reason ). The node doesn't rejoin the cluster again unless I completely power it off and back on.
    Did you investigated what's going on?

    Check Windows System Event Viewer log, and alertlog of cluster. Your answer you will find by tracing log during clusterware startup on that node wich restarted. (tip: Use tail.exe for windows this will help you)

    P.S: On Windows the Clusterware take considerable time to start. I already faced this but never investigated the cause.
  • 11. Re: How does 11g CRS start if OCR is in ASM
    rcc50886 Journeyer
    Currently Being Moderated
    CRS needs to read the OCR to start up. In 11gR2, by default the OCR is stored in ASM. ASM however gets started after CRS (by CRS). How is that possible? How does CRS start ASM, if it needs ASM running to read the OCR?
    When you start the cluster, then it will look for local OCR copy(specific to local node) also called as OLR and on looking into OLR it starts the all resources one by one, as part of it it will start ASM and founds then eventually it found OCR and starts the cluster resources.
  • 12. Re: How does 11g CRS start if OCR is in ASM
    Chuck1958 Newbie
    Currently Being Moderated
    Thanks for the reply.

    Any idea why a node would fail to join the cluster after a reboot? Note that this is NOT a node eviction problem. Someone needs to perform some maintenance on the node that requires a reboot - perhaps a Windows update for example. When the node reboots it cannot re-join the cluster unless I power cycle the node. This requires manual intervention which is really inconvenient.

    This is reproducible 100% of the time.

    Shutdown/reboot doesn't work. Cluster resources don't restart. CRSCTL STAT RES -T says something like "crs is not running". CRSCTL START says it's already running. CRSCTL STOP says it's not running. Very bizarre.

    Power cycle fixes everything - unless I power cycle both nodes at the same time. If for any reason both nodes go down (which BTW happened to me recently because RAC has a glaring single point of failure) I must wait for one node to come up completely - all resources running - before I power on the 2nd node.
  • 13. Re: How does 11g CRS start if OCR is in ASM
    damorgan Oracle ACE Director
    Currently Being Moderated
    Many possible reasons but the question to ask is "is it the specific node or any node?" Try this test:

    1. With one node running try to start the second node. Keep the appropriate portion of the log if it fails.
    2. Stop that node and start the second node only so that it is the only one running. Does it start?
    3. With the second node running try to now start the node that is not running. Does it start? Again keep the appropriate portion of the logs with failure info.

    I have seen times when one node never starts and others where any node will start but no additional node will join the cluster. It is important that you distinguish between the two scenarios.

    BTW: Any chance someone changed the switch used for the cache fusion interconnect either bringing in a new switch or upgrading the software? I saw this a few years ago when Cisco Nexus 5010s were replaced with 7010s. The network admins swore they were not the cause: But needless to say they were.
  • 14. Re: How does 11g CRS start if OCR is in ASM
    Chuck1958 Newbie
    Currently Being Moderated
    It happens on any node. With either node up, a reboot of the other node will not work. Windows will come up but CRS will not start any resources. Doesn't matter which node is up and which is restarted. Only a power-cycle will allow a node to join the cluster once it has been restarted.

    If the switch were changed, wouldn't that also prevent a node from joining the cluster after a power cycle?

    Both nodes are HP blades in the same chassis. The switch is built into the chassis.
1 2 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points