12 Replies Latest reply: Sep 25, 2012 12:41 PM by 441858 RSS

    Clusterware startup issues

    441858
      Oracle 11gR2 (11.2.0.3) OEL 5

      I am having problems starting the various clusterware processes. I think it is due to the permissions of the grid home accidentally getting changed.

      Is there a root.sh type script that can revert the permissions of the grid infrastructure home?

      Thanks.
        • 1. Re: Clusterware startup issues
          phaeus
          Hello,
          did you have any log or error messages?

          What you can do is to do a remove and add node from a remaining node which has correct permision, because the software is copied over. Or check the OS permission from the remaining node.

          regards
          Peter
          • 2. Re: Clusterware startup issues
            kuljeet singh -
            JrOraDBA wrote:
            Oracle 11gR2 (11.2.0.3) OEL 5

            I am having problems starting the various clusterware processes. I think it is due to the permissions of the grid home accidentally getting changed.
            whats the error message? can you post the files list and permission which has changed recently
            • 3. Re: Clusterware startup issues
              921598
              Plz check
              MOS doc, Troubleshoot Grid Infrastructure Startup Issues [ID 1050908.1]
              • 4. Re: Clusterware startup issues
                441858
                I probably could do that but that would just be a workaround.

                Also, I would then always have to rely on the second node to start up the first node.

                I want to get this issue fixed so that it can go back to its original state.

                Thanks.
                • 5. Re: Clusterware startup issues
                  441858
                  Here are some error messages from different log files:

                  crsd.log
                  2012-09-24 14:56:00.451: [GIPCHGEN][1155356992] gipchaEndpointCloseF [gipcmodGipcDisconnect : gipcmodGipc.c : 923]: 
                  EXCEPTION[ ret gipcretDaemonLost (34) ]  failed to close endpoint ctx 0x1fc5e50 [0000000000000010] { gipchaContext : host 'node011', name '3a15-ddb7-cea9-a198', luid 'f9e5c598-00000000', 
                  numNode 1, numInf 2, usrFlags 0x0, flags 0x9 }, endp 0x7fc7f82ae500 [00000000000004d9] { gipchaEndpoint : port 'CLSFRAME_1', peer ':', srcCid 00000000-00000000,  dstCid 00000000-00000000,
                  numSend 0, maxSend 100, groupListType 1, hagroup 0x7fc7f80b41d0, usrFlags 0x4000, flags 0x130 }, flags 0x0
                  2012-09-24 14:56:00.451: [  CRSCCL][1155356992]gipcWait returned from request type 4. GIPCD lost. Exiting.
                  gpnpd.log
                  2012-09-24 13:56:59.654: [  OCRMSG][1104447808]prom_send: Failed to send [12]
                  2012-09-24 13:56:59.654: [  OCRMSG][1104447808]GIPC error [12] msg [gipcretConnectionLost]
                  2012-09-24 13:56:59.654: [  OCRMSG][1104447808]prom_rpc: CLSC send failure..ret code 203
                  2012-09-24 13:56:59.654: [  OCRMSG][1104447808]prom_rpc: possible OCR retry scenario
                  2012-09-24 13:56:59.654: [  OCRAPI][1104447808]procr_open_key_ext: Fail Fast flag is set and OCR server is not found. Exiting the API.
                  2012-09-24 13:56:59.654: [    GPNP][1104447808]procr_open_key_ext: OCR api procr_open_key_ext failed for key SYSTEM.GPnP.profiles.peer.best
                  2012-09-24 13:56:59.654: [    GPNP][1104447808]procr_open_key_ext: OCR current boot level : 7
                  2012-09-24 13:56:59.654: [    GPNP][1104447808]procr_open_key_ext: OCR error code    : 32
                  2012-09-24 13:56:59.654: [    GPNP][1104447808]procr_open_key_ext: OCR error message : PROC-32: Cluster Ready Services on the local node is not running Messaging error [gipcretConnectionLost] [12]
                  2012-09-24 13:56:59.654: [    GPNP][1104447808]clsgpnpco_ocr2profile: [at clsgpnpco.c:578] Result: (59) CLSGPNP_OCR_NOSRV. Failed to open requested OCR Profile.
                  2012-09-24 13:56:59.654: [    GPNP][1104447808]clsgpnpd_reconcile: [at clsgpnpd.c:2449] Result: (59) CLSGPNP_OCR_NOSRV. Did not get a shared profile, OCR provider 0x7fd99c021a70
                  2012-09-24 14:06:23.004: [    GPNP][1104447808]clsgpnpcf_deletef: [at clsgpnpcf.c:657] Result: (7) CLSGPNP_IO. Error deleting profile 'pending.xml' in '/u01/app/11.2.0.3/grid/gpnp/node011/profiles/peer/'
                  2012-09-24 14:06:23.004: [    GPNP][1104447808]clsgpnpcf_deletef: [at clsgpnpcf.c:659] SlfRemove prf
                  Internal Error Information:
                    Category: SLF_SYSTEM(-8)
                    Operation: remove failed
                    Location: SlfRemove
                    Other:
                    Dep: 13
                    Dep Message: Permission denied
                  mdnsd.log
                  2012-09-24 14:56:00.421: [    MDNS][2059355872] mDNSPlatformSendUDP got error 22 (Invalid argument) sending packet to 224.0.0.251 on interface 192.168.xx.xx/eth0/2
                  2012-09-24 14:56:00.421: [    MDNS][2059355872] mDNSPlatformSendUDP got error 22 (Invalid argument) sending packet to 224.0.0.251 on interface 192.168.xx.xx/eth1/3
                  2012-09-24 14:56:00.421: [    MDNS][2059355872] mDNSPlatformSendUDP got error 22 (Invalid argument) sending packet to 224.0.0.251 on interface xx.xx.xxx.xx/bond0/12
                  2012-09-24 14:56:00.437: [ COMMCRS][2059355872]clsc_post: (0x11acb90) code 4, NS err (12603, 12560), transport (530, 101, 0)
                  
                  2012-09-24 14:56:01.522: [ COMMCRS][1086949696]clsc_post: (0x11acb90) code 2, NS err (12603, 12560), transport (530, 101, 0)
                  
                  2012-09-24 14:56:01.525: [ COMMCRS][1086949696]clsc_post: (0x11acb90) code 1, NS err (12603, 12560), transport (530, 101, 0)
                  
                  2012-09-24 14:56:01.527: [ COMMCRS][1086949696]clsc_post: (0x11acb90) code 2, NS err (12603, 12560), transport (530, 101, 0)
                  
                  2012-09-24 14:56:01.529: [ COMMCRS][1086949696]clsc_post: (0x11acb90) code 1, NS err (12603, 12560), transport (530, 101, 0)
                  oraagent_grid.log
                  2012-09-24 14:56:00.872: [ COMMCRS][1124915520]clsc_thrd_spawn: (0x7f357c09a5d0) thrd not active
                  
                  2012-09-24 14:56:00.872: [ COMMCRS][1124915520]clscugblmini: (0x7f357c09a530) Monitor thread spawn failed
                  
                  2012-09-24 14:56:00.872: [ USRTHRD][1124915520] ClusterSubscriber::SubscriberWorker::InternalClusterSubscriber::subscribeInternal EvmConnCreate failed [13]
                  2012-09-24 14:56:00.872: [ USRTHRD][1124915520] ClusterSubscriber::SubscriberWorker::doWork Caught exception while subscribing. Ignoring it and dropping current event
                  2012-09-24 14:56:01.426: [ CRSCOMM][1087174976][FFAIL] Ipc: Couldnt clscreceive message, no message: 11
                  2012-09-24 14:56:01.426: [ CRSCOMM][1087174976] Ipc: Client disconnected.
                  2012-09-24 14:56:01.426: [ CRSCOMM][1087174976][FFAIL] IpcC: Client could not receive message clscerror: 11
                  2012-09-24 14:56:01.426: [ CRSCOMM][1087174976] IpcC: IPC client connection 0x7f35840008c0 to member 0 has been removed
                  2012-09-24 14:56:01.426: [CLSFRAME][1087174976] Removing IPC Member:{Relative|Node:0|Process:0|Type:2}
                  2012-09-24 14:56:01.426: [CLSFRAME][1087174976] Disconnected from OHASD:node011 process: {Relative|Node:0|Process:0|Type:2}
                  • 6. Re: Clusterware startup issues
                    Mahir M. Quluzade
                    Can you try ?
                     crsctl stop cluster -all
                     crsctl start cluster -all
                    Regards
                    Mahir M. Quluzade
                    www.mahir-quluzade.com
                    • 7. Re: Clusterware startup issues
                      441858
                      [oracle@Node011 ~]$ crsctl check crs
                      CRS-4638: Oracle High Availability Services is online
                      CRS-4535: Cannot communicate with Cluster Ready Services
                      CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
                      CRS-4534: Cannot communicate with Event Manager
                      • 8. Re: Clusterware startup issues
                        441858
                        I just did a reboot and here is what I see in the logs:
                        [client(12033)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
                        2012-09-25 07:47:21.258
                        [client(12033)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0.3/grid/log/node011/client/crsctl_oracle.log.
                        2012-09-25 07:47:24.476
                        [client(12033)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
                        2012-09-25 07:47:24.476
                        [client(12033)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0.3/grid/log/node011/client/crsctl_oracle.log.
                        2012-09-25 07:47:28.179
                        [client(12096)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
                        2012-09-25 07:47:28.180
                        [client(12096)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0.3/grid/log/node011/client/crsctl_oracle.log.
                        2012-09-25 07:47:31.398
                        [client(12096)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
                        2012-09-25 07:47:31.399
                        [client(12096)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0.3/grid/log/node011/client/crsctl_oracle.log.
                        2012-09-25 07:47:34.973
                        [client(12134)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
                        2012-09-25 07:47:34.974
                        [client(12134)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0.3/grid/log/node011/client/crsctl_oracle.log.
                        2012-09-25 07:47:38.192
                        [client(12134)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
                        2012-09-25 07:47:38.193
                        [client(12134)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0.3/grid/log/node011/client/crsctl_oracle.log.
                        2012-09-25 07:47:41.609
                        [client(12171)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
                        2012-09-25 07:47:41.610
                        [client(12171)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0.3/grid/log/node011/client/crsctl_oracle.log.
                        2012-09-25 07:47:44.836
                        [client(12171)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
                        2012-09-25 07:47:44.837
                        [client(12171)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0.3/grid/log/node011/client/crsctl_oracle.log.
                        2012-09-25 07:47:48.441
                        [client(12221)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
                        2012-09-25 07:47:48.442
                        [client(12221)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0.3/grid/log/node011/client/crsctl_oracle.log.
                        2012-09-25 07:47:51.660
                        [client(12221)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
                        2012-09-25 07:47:51.661
                        [client(12221)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0.3/grid/log/node011/client/crsctl_oracle.log.
                        2012-09-25 07:47:55.218
                        [client(12263)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
                        2012-09-25 07:47:55.218
                        [client(12263)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0.3/grid/log/node011/client/crsctl_oracle.log.
                        2012-09-25 07:47:58.437
                        [client(12263)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
                        2012-09-25 07:47:58.437
                        [client(12263)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0.3/grid/log/node011/client/crsctl_oracle.log.
                        2012-09-25 07:48:01.874
                        [client(12300)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
                        2012-09-25 07:48:01.875
                        [client(12300)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0.3/grid/log/node011/client/crsctl_oracle.log.
                        2012-09-25 07:48:05.097
                        [client(12300)]CRS-2302:Cannot get GPnP profile. Error CLSGPNP_NO_DAEMON (GPNPD daemon is not running).
                        2012-09-25 07:48:05.098
                        [client(12300)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/11.2.0.3/grid/log/node011/client/crsctl_oracle.log.
                        2012-09-25 07:48:08.895
                        [ohasd(11849)]CRS-2112:The OLR service started on node node011.
                        2012-09-25 07:48:08.934
                        [ohasd(11849)]CRS-1301:Oracle High Availability Service started on node node011.
                        2012-09-25 07:48:08.954
                        [ohasd(11849)]CRS-8017:location: /etc/oracle/lastgasp has 2 reboot advisory log files, 0 were announced and 0 errors occurred
                        2012-09-25 07:48:09.985
                        [/u01/app/11.2.0.3/grid/bin/oraagent.bin(12585)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/u01/app/11.2.0.3/grid/log/node011/agent/ohasd/oraagent_grid//oraagent_grid.log"
                        2012-09-25 07:48:10.443
                        [/u01/app/11.2.0.3/grid/bin/orarootagent.bin(12589)]CRS-5016:Process "/u01/app/11.2.0.3/grid/bin/acfsload" spawned by agent "/u01/app/11.2.0.3/grid/bin/orarootagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/u01/app/11.2.0.3/grid/log/node011/agent/ohasd/orarootagent_root/orarootagent_root.log"
                        2012-09-25 07:50:11.652
                        [/u01/app/11.2.0.3/grid/bin/oraagent.bin(12690)]CRS-5818:Aborted command 'start' for resource 'ora.mdnsd'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0.3/grid/log/node011/agent/ohasd/oraagent_grid//oraagent_grid.log.
                        2012-09-25 07:50:15.655
                        [ohasd(11849)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.mdnsd'. Details at (:CRSPE00111:) {0:0:2} in /u01/app/11.2.0.3/grid/log/node011/ohasd/ohasd.log.
                        2012-09-25 07:52:18.460
                        [/u01/app/11.2.0.3/grid/bin/oraagent.bin(13854)]CRS-5818:Aborted command 'start' for resource 'ora.mdnsd'. Details at (:CRSAGF00113:) {0:0:2} in /u01/app/11.2.0.3/grid/log/node011/agent/ohasd/oraagent_grid//oraagent_grid.log.
                        2012-09-25 07:52:22.463
                        [ohasd(11849)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.mdnsd'. Details at (:CRSPE00111:) {0:0:2} in /u01/app/11.2.0.3/grid/log/node011/ohasd/ohasd.log.
                        The interesting thing is that the 'crsctl_oracle.log' file does not exist. There is only the 'crsctl_grid.log' and 'crsctl_root.log'
                        • 9. Re: Clusterware startup issues
                          Mahir M. Quluzade
                          You can not access ASM disk groups.

                          Did you remove your OCR and Vote disks?

                          Can you check diskgroups with ASMCA?


                          Regards
                          Mahir M. Quluzade
                          • 10. Re: Clusterware startup issues
                            441858
                            The instance is down so I cannot connect.

                            Even when I try to connect to the ASM instance it won't let me.
                            $ sqlplus sys as sysasm
                            
                            SQL*Plus: Release 11.2.0.3.0 Production on Tue Sep 25 09:09:19 2012
                            
                            Copyright (c) 1982, 2011, Oracle.  All rights reserved.
                            
                            Enter password:
                            ERROR:
                            ORA-09925: Unable to create audit trail file
                            Linux-x86_64 Error: 13: Permission denied
                            Additional information: 9925
                            ORA-01017: invalid username/password; logon denied
                            • 11. Re: Clusterware startup issues
                              Valentin Minzatu
                              Is there a chance that the user you are attempting to start ASM as or the ORACLE_HOME from which you are attempting to start ASM is not the intended one?

                              If neither of the above applies, then I would test creating a file at the o/s level in that same directory and see if that fails as well.
                              • 12. Re: Clusterware startup issues
                                441858
                                Thank you all for your input.

                                I have decided to just redo the whole install.

                                Thanks.