This discussion is archived
8 Replies Latest reply: Apr 24, 2012 6:44 AM by rjanuary RSS

ASM/ACFS shutdown issues on 11.2.0.3 RAC.

rjanuary Newbie
Currently Being Moderated
We're running into issues trying to shutdown ASM on a 2 node 11.2.0.3 RAC cluster in one of our environments. This is on RHEL 5, kernel 2.6.18-274.17.1.el5
when issuing crsctl stop crs we get the following output
$ sudo /u01/app/11.2.0.3/grid/bin/crsctl stop has
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'opbld10'
CRS-2673: Attempting to stop 'ora.crsd' on 'opbld10'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'opbld10'
.....
CRS-2673: Attempting to stop 'ora.asm' on 'opbld10'
CRS-2677: Stop of 'ora.asm' on 'opbld10' succeeded
CRS-5014: Agent "/u01/app/11.2.0.3/grid/bin/orarootagent.bin" timed out starting process "/u01/app/11.2.0.3/grid/bin/acfsload" for action "stop": details at "(:CLSN00009:)" in "/u01/app/11.2.0.3/grid/log/opbld10/agent/crsd/orarootagent_root/orarootagent_root.log"
CRS-5017: The resource action "ora.drivers.acfs stop" encountered the following error: 
(:CLSN00009:)Utils:execCmd aborted. For details refer to "(:CLSN00108:)" in "/u01/app/11.2.0.3/grid/log/opbld10/agent/crsd/orarootagent_root/orarootagent_root.log".
CRS-2675: Stop of 'ora.drivers.acfs' on 'opbld10' failed
CRS-2679: Attempting to clean 'ora.drivers.acfs' on 'opbld10'
CRS-5014: Agent "/u01/app/11.2.0.3/grid/bin/orarootagent.bin" timed out starting process "/u01/app/11.2.0.3/grid/bin/acfsload" for action "clean": details at "(:CLSN00009:)" in "/u01/app/11.2.0.3/grid/log/opbld10/agent/crsd/orarootagent_root/orarootagent_root.log"
CRS-5017: The resource action "ora.drivers.acfs clean" encountered the following error: 
(:CLSN00009:)Utils:execCmd aborted. For details refer to "(:CLSN00106:)" in "/u01/app/11.2.0.3/grid/log/opbld10/agent/crsd/orarootagent_root/orarootagent_root.log".
CRS-2678: 'ora.drivers.acfs' on 'opbld10' has experienced an unrecoverable failure
CRS-2799: Failed to shut down resource 'ora.drivers.acfs' on 'opbld10'
CRS-2794: Shutdown of Cluster Ready Services-managed resources on 'opbld10' has failed
CRS-2675: Stop of 'ora.crsd' on 'opbld10' failed
CRS-2799: Failed to shut down resource 'ora.crsd' on 'opbld10'
CRS-2795: Shutdown of Oracle High Availability Services-managed resources on 'opbld10' has failed
CRS-4687: Shutdown command has completed with errors.
CRS-4000: Command Stop failed, or completed with errors.
Something appears to prevent the acfs driver from unloading. After some searching I found an old SR for this environment where acfs was not auto loading. The solution was a previous DBA adding ora.drivers.acfs as a weak start dependency of ora.asm.
The steps performed were as follows:
==============
1. On each instance of the cluster, add ora.drivers.acfs as a weak dependency
of ora.asm reaource.

As grid - crsctl modify resource ora.asm -attr
"START_DEPENDENCIES='weak(ora.LISTENER.lsnr,ora.drivers.acfs)'"

2. As Grid - [grid@opbld10 bin]$ /u01/app/11.2.0/grid/bin/crsctl add type
ora.registry.acfs.type -basetype ora.local_resource.type -file
/u01/app/11.2.0/grid/crs/template/registry.acfs.type

3.As Root - [root@opbld10 bin]$ /u01/app/11.2.0/grid/bin/crsctl add resource
ora.registry.acfs -attr ACL=\'owner:root:rwx,pgrp:oinstall:r-x,other::r--\'
-type ora.registry.acfs.type -f

4. As Grid - [grid@opbld10 bin]$ /u01/app/11.2.0/grid/bin/crsctl add type
ora.drivers.acfs.type -basetype ora.local_resource.type -file
/u01/app/11.2.0/grid/crs/template/drivers.acfs.type

5. As Root - [root@opbld10 bin]$ /u01/app/11.2.0/grid/bin/crsctl add resource
ora.drivers.acfs -attr ACL=\'owner:root:rwx,pgrp:oinstall:r-x,other::r--\'
-type ora.drivers.acfs.type -f

6. Reboot the nodes one at a time.

7. ACFS driver should load and acfs mount come up automatically upon reboot
without manually starting them up.
==============

I have verified this to still be in place.
$ /u01/app/11.2.0.3/grid/bin/crsctl status resource ora.asm -p
NAME=ora.asm
.....
START_DEPENDENCIES=weak(ora.LISTENER.lsnr,ora.drivers.acfs)
When comparing with other environments which do not exhibit this behavior I find the dependency does not exist.
$ crsctl status resource ora.asm -p
NAME=ora.asm
.....
START_DEPENDENCIES=weak(ora.LISTENER.lsnr)
If ACFS relies on ASM for it's actual storage, it seems strange to me that ACFS is listed as a dependency for ASM.
Performing more research this morning I found some oracle documentation stating the dependency should exist, however this appears to be in contrast to our working environments.
ACFS Advanced Topics: http://docs.oracle.com/cd/E14072_01/server.112/e10500/asmfs_extra.htm#CACECCDH]

Should this dependency exist? We do currently have an SR open with oracle, but have yet to find resolution. I'm trying to perform as much work on my own as I can to speed the process along. Any insight you can provide would be appreciated.

Edited by: rjanuary on Feb 9, 2012 8:25 AM - Added OS and kernel info.
  • 1. Re: ASM/ACFS shutdown issues on 11.2.0.3 RAC.
    rjanuary Newbie
    Currently Being Moderated
    It should also be noted that the shutdown fails outside of clusterware as well:
    $ sudo /u01/app/11.2.0.3/grid/bin/acfsload stop
    ACFS-9118: oracleacfs.ko driver in use - cannot unload.
    This was after the only acfs volume was umounted and it's diskgroup unmounted within asm.

    during the shutdown, the grid alert log shows the following:
    [client(22949)]CRS-10001:09-Feb-12 10:38 ACFS-9290: Waiting for ASM to shutdown.
    Looking back at the orarootagent.log I see the same:
    2012-02-09 10:39:15.466: [ora.drivers.acfs][1341069632] {1:33342:871} [stop] (:CLSN00010:)ACFS-9290: Waiting for ASM to shutdown.
    Ultimately (after 15 minutes) Itends with aborting the attempt due to timeout:
    2012-02-09 10:53:34.227: [    AGFW][1273952576] {1:33342:871} Created alert : (:CRSAGF00113:) :  Aborting
     the command: stop for resource: ora.drivers.acfs opbld09 1
    2012-02-09 10:53:34.227: [ora.drivers.acfs][1273952576] {1:33342:871} [stop] (:CLSN00110:) clsn_agent::ab
    ort {
    2012-02-09 10:53:34.227: [ora.drivers.acfs][1273952576] {1:33342:871} [stop] abort {
    2012-02-09 10:53:34.227: [ora.drivers.acfs][1273952576] {1:33342:871} [stop] abort command: stop
    2012-02-09 10:53:34.227: [ora.drivers.acfs][1273952576] {1:33342:871} [stop] tryActionLock {
    2012-02-09 10:53:34.283: [ora.drivers.acfs][1341069632] {1:33342:871} [stop] (:CLSN00009:)Utils:execCmd abort request: killing /u01/app/11.2.0.3/grid/bin/acfsload
    2012-02-09 10:53:34.284: [   AGENT][1341069632] {1:33342:871} UserErrorException: Locale is 
    2012-02-09 10:53:34.284: [ora.drivers.acfs][1341069632] {1:33342:871} [stop] clsnUtils::error Exception type=2 string=
    CRS-5014: Agent "/u01/app/11.2.0.3/grid/bin/orarootagent.bin" timed out starting process "/u01/app/11.2.0.3/grid/bin/acfsload" for action "stop": details at "(:CLSN00009:)" in "/u01/app/11.2.0.3/grid/log/opbld09/agent/crsd/orarootagent_root/orarootagent_root.log"
    
    2012-02-09 10:53:34.285: [    AGFW][1408186688] {1:33342:871} Agent sending reply for: RESOURCE_STOP[ora.drivers.acfs opbld09 1] ID 4099:83577
    2012-02-09 10:53:34.286: [   AGENT][1341069632] {1:33342:871} UserErrorException: Locale is 
    2012-02-09 10:53:34.287: [ora.drivers.acfs][1341069632] {1:33342:871} [stop] clsnUtils::error Exception type=2 string=
    CRS-5017: The resource action "ora.drivers.acfs stop" encountered the following error: 
    (:CLSN00009:)Utils:execCmd aborted. For details refer to "(:CLSN00108:)" in "/u01/app/11.2.0.3/grid/log/opbld09/agent/crsd/orarootagent_root/orarootagent_root.log".
    My question is in regards to why this is occuring. Is it due to ACFS being marked as a dependency of ASM, or is it actually due to ASM's active use of the kernel module?

    Edited by: rjanuary on Feb 9, 2012 8:58 AM
  • 2. Re: ASM/ACFS shutdown issues on 11.2.0.3 RAC.
    dsobral Newbie
    Currently Being Moderated
    Hello Rjanuary,

    We are facing the same problem on SLES 11 RAC 11.2.0.3 . We got this error when we tried to apply the lastest PSU on this enviroment and the OPatch try to shutdown the cluster.

    Have you got some awnser from MOS?

    Thanks,

    Edited by: user8980603 on Feb 14, 2012 7:41 AM
  • 3. Re: ASM/ACFS shutdown issues on 11.2.0.3 RAC.
    rjanuary Newbie
    Currently Being Moderated
    Unfortunately, no. The SR has been escallated and I finally saw a small amount of progress yesterday. I hope it continues. Interestingly enough, our issue is also preventing us from patching.

    Looking in the ASM alert log, I feel the reason it's not shutting down is because the disk group containing the OCR and ASM parameter file is not unmounting. I believe it's the OCR that's still in use causing the issues.
    Are you seeing any errors in your ASM alert log?
    if so, do you see any files within ASM?
  • 4. Re: ASM/ACFS shutdown issues on 11.2.0.3 RAC.
    dsobral Newbie
    Currently Being Moderated
    Hello,

    I`m not sure if the OCR is the cause. For me it is something related to ACFS , on my CRS alert we got this error:


    2012-02-14 09:11:51.533
    [ctssd(6341)]CRS-2405:The Cluster Time Synchronization Service on host rac02 is shutdown by user
    2012-02-14 09:11:51.536
    [mdnsd(6125)]CRS-5602:mDNS service stopping by request.
    [client(19979)]CRS-10001:14-Feb-12 09:11 ACFS-9290: Waiting for ASM to shutdown.
    2012-02-14 09:12:00.843
    [cssd(6214)]CRS-1603:CSSD on node rac02 shutdown by user.
    2012-02-14 09:12:00.848
    [ohasd(5279)]CRS-2767:Resource state recovery not attempted for 'ora.cssdmonitor' as its target state is OFFLINE
    2012-02-14 09:12:00.945
    [cssd(6214)]CRS-1660:The CSS daemon shutdown has completed
    [client(20014)]CRS-10001:14-Feb-12 09:12 ACFS-9118: oracleacfs.ko driver in use - cannot unload.
    2012-02-14 09:12:03.199
    [gpnpd(6138)]CRS-2329:GPNPD on node rac02 shutdown.
    *[client(20085)]CRS-10001:14-Feb-12 09:12 ACFS-9118: oracleacfs.ko driver in use - cannot unload.*


    The only way to shutdown or startup again is after a node reboot. Are your errors like these?

    Thanks,

    Edited by: user8980603 on Feb 14, 2012 11:17 AM
  • 5. Re: ASM/ACFS shutdown issues on 11.2.0.3 RAC.
    rjanuary Newbie
    Currently Being Moderated
    That appears to be your grid alert log. This log is stating that it's waiting for ASM to shutdown. What about your ASM logs? Are they giving you a reason for ASM not shutting down?

    This is what I'm seeing in my ASM log:
    Mon Feb 13 19:42:17 2012
    ORA-15032: not all alterations performed
    ORA-15027: active use of diskgroup "GRID" precludes its dismount
    ERROR: ALTER DISKGROUP GRID DISMOUNT  /* asm agent *//* {1:7752:794} */
    the only items on the grid disk group are the OCR and ACFS param file. I've verified that all client databases are shutdown. There are no clients connected to this ASM instance (only a reference back to itself).

    Oracle documentation states the ACFS kernel module is a dependency for ASM.
    In my case I feel ACFS isn't unloading because ASM is still running. ASM is still running because a disk group is mounted. That disk group is still mounted because the OCR is still in use.
  • 6. Re: ASM/ACFS shutdown issues on 11.2.0.3 RAC.
    dsobral Newbie
    Currently Being Moderated
    Hello,

    Unfortunately i think that we are facing different problems , for us the ASM instance goes down successfully the only resource pending is the resource related the oracleacfs, however i will check the ask logs and post for comparison.

    Thanks,
  • 7. Re: ASM/ACFS shutdown issues on 11.2.0.3 RAC.
    rjanuary Newbie
    Currently Being Moderated
    If ASM is shutting down cleanly, you're likely facing an unpublished bug. Oracle Support did confirm there are some unpublished bugs regarding unloading the ACFS driver in 11.2.0.3. You may open an SR mentioning your issues.
  • 8. Re: ASM/ACFS shutdown issues on 11.2.0.3 RAC.
    rjanuary Newbie
    Currently Being Moderated
    I just stumbled across this post through a google search and felt I should update with resolution. The ACFS volume had resources registered in both OHAS and CRS, when it should only be through one mechanism. There were multiple resources added by the previous DBA which created cyclical dependencies preventing the drivers from unloading properly. Once these were removed and the acfs volumes properly registered everything works as designed.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points