1 2 Previous Next 19 Replies Latest reply: Jun 30, 2014 8:47 AM by Hasan Al Mamun RSS

    Instance shutdown frequently in Oracle RAC 11gr2 OL 6.2 64 bit

    Hasan Al Mamun

      Hi

       

      I have Installed and deployed a 2-node RAC in Oracle Linux 6.2 64-bit. I have been facing Instance shutdown problem very frequently. In most of the cases I see the error

      Status Failed  ORA-12541: TNS:no listener (DBD ERROR: OCIServerAttach)

      Facing problem with both the instances sometimes both are down and end user is disrupted. Sometimes Instance starts automatically and otherwise I have to explicitly start the instance(s). Is it possible that NAS is the culprit for the Instance shutdown because it is an old NAS, I have concern about it.

       

      any suggestions... Please

       

      Hasan Al Mamun

        • 1. Re: Instance shutdown frequently in Oracle RAC 11gr2 OL 6.2 64 bit
          FreddieEssex

          With the information you provide it is impossible to tell..

           

          Check the CRS logs to see what is going on and what errors you see before the listener and instances go down.

           

          Also ask your system admin to check the OS logs to see if there are any issues/errors.

          • 2. Re: Instance shutdown frequently in Oracle RAC 11gr2 OL 6.2 64 bit
            J.A.

            Hi

             

            • Check the CRSD log file and the CSSD log file
            • is this NAS cetified by Oracle ?
            • Run this a send the output:

                 ./cluvfy stage -post nodeadd -n all -verbose

            • Maybe is not a solution, but what what happen if you only start one instance of the DB ? maybe that will avoid the instances shutdown while you find a solution

             

            Regards

            • 3. Re: Instance shutdown frequently in Oracle RAC 11gr2 OL 6.2 64 bit
              Billy~Verreynne

              The reason for shutdown should be in the logs (ASM, instance and CRS/CSS).

               

              There are typically two root causes for a RAC instance failing and going down.

              - Interconnect failure

              - Shared storage failure

               

              Your reference to "old" NAS raises the age old problem in my view of RACs being build on a h/w architecture and foundation that are flawed.

               

              If either of these are substandard and below par, RAC will be unstable. Just as you should not expect a building to be standing without cracks and crumbling walls when build on sand instead of solid ground, you should not expect a robust RAC on a poor Interconnect and subpar cluster storage.

               

              The following are mandatory architectural requirements in my view for RAC.

               

              Interconnect must be private, dedicated and at least 10Gbs. Not 1Gbs! Not shared!

               

              I/O fabric layer (including NAS), must be private and dedicated and at least 2Gbs (fibre) or 10Gbs (NAS). Not 1Gbs! Not shared!

               

              In a modern RAC, Interconnect should be 40Gbs (using RDS), and I/O fabric layer 4-8bs for fibre, and 10-40Gbs for NAS.

               

              There are nothing special or spectacular about these base requirements I have listed. It is based on standard h/w and s/w for clusters. If your RAC does not meet these, then in my view your RAC is below par, and will be less than robust, and will fail ito performance and scalability.

               

              The idea that one can build a RAC with an Ethernet switch and a couple of servers, is an idea that is fully ignorant as what HPC and RAC are about. And a waste of money as RAC s/w wise is not cheap at all.

               

              So yes - "old" NAS will be a problem. As would be a shared I/O fabric layer. A 1Gbs Interconnect. Etc.

              • 4. Re: Instance shutdown frequently in Oracle RAC 11gr2 OL 6.2 64 bit
                tvCa-Oracle

                Suppose I'm at a client, running RAC. More precisely, using a stretched cluster. How do I TEST the speed of the interconnect, while NOT being the ROOT user of any involved server. I just have access as database user (same as the one actually running the processes).

                 

                10Gbit/sec = 1280 MB/sec

                 

                according to my calculations. That's not a low requirement ... I have serious doubt they even reach anything remotely to that number.

                DB's are running Unix (HP)

                • 5. Re: Instance shutdown frequently in Oracle RAC 11gr2 OL 6.2 64 bit
                  Billy~Verreynne

                  If you do not have root access, how do you (or your client) expect you to be able to troubleshoot RAC/CRS failures?

                   

                  Questions.

                  1) what is the cluster storage layer?

                  2) what is the Interconnect?

                  3) what exactly does "stretched RAC" mean in this case?

                  • 6. Re: Instance shutdown frequently in Oracle RAC 11gr2 OL 6.2 64 bit
                    Hasan Al Mamun

                    Hi This is the message I am getting regarding Cluster ware in EM.

                     

                     

                    Metrichttps://hodfxdb-scan:1158/em/cabo/images/t.gifVoting Disk Alert Log Error
                    Time/Line Numberhttps://hodfxdb-scan:1158/em/cabo/images/t.gif2014-06-12 07:01:02.554/4749
                    Severityhttps://hodfxdb-scan:1158/em/cabo/images/t.gif
                    https://hodfxdb-scan:1158/em/images/criticalind_status.gifhttps://hodfxdb-scan:1158/em/cabo/images/t.gifCritical
                    Timestamphttps://hodfxdb-scan:1158/em/cabo/images/t.gifJun 12, 2014 7:04:49 AM
                    Acknowledgedhttps://hodfxdb-scan:1158/em/cabo/images/t.gifNohttps://hodfxdb-scan:1158/em/cabo/images/t.gifhttps://hodfxdb-scan:1158/em/cabo/images/cache/en/bALRTDTL_ACKNOWLEDGE_BTN_IDdplR.gif
                    Acknowledged Byhttps://hodfxdb-scan:1158/em/cabo/images/t.gifn/a
                    Messagehttps://hodfxdb-scan:1158/em/cabo/images/t.gif[cssd(9988)]CRS-1605:CSSD voting file is online: ORCL:VOTE1; details in /u01/app/11.2.0/grid/log/hodfxdb1-svr/cssd/ocssd.log. See /u01/app/11.2.0/grid/log/hodfxdb1-svr/alerthodfxdb1-svr.log for details.

                    https://hodfxdb-scan:1158/em/cabo/images/t.gif

                     

                    Now I have noticed logfile changes in  CRSD and CSSD log fiel by tail -f /../../../logile....log file, Sorry I do not understand anything of it. I have dosk group VOTE which consists of 5 disks namely VOTE1.....VOTE5 each has 3GB space allocation, i.e. total of 15 GB for group VOTE. In the Error message above it talks about VOTE1 but says it's online, meaning it's a good thing, still a critical error message!!

                     

                    I doubt here my VOTE1 disk is in trouble can I drop the disk from the group? will it help? or will it worsen the situation?

                     

                     

                     

                    Hasan Al Mamun

                    • 7. Re: Instance shutdown frequently in Oracle RAC 11gr2 OL 6.2 64 bit
                      Billy~Verreynne

                      3GB voting disk is a bit of an overkill ito space. And why 5 voting disks specifically? Is ASM used for OCR and voting disk storage?

                       

                      What does crsctl query css votedisk show?

                      • 8. Re: Instance shutdown frequently in Oracle RAC 11gr2 OL 6.2 64 bit
                        Hasan Al Mamun

                        [root@hodfxdb1-svr oracle]# crsctl query css votedisk

                        ##  STATE    File Universal Id                File Name Disk group

                        --  -----    -----------------                --------- ---------

                        1. ONLINE   02138f751c414f7abf058740fa5bbebb (ORCL:VOTE1) [VOTE]

                        Located 1 voting disk(s).

                        • 9. Re: Instance shutdown frequently in Oracle RAC 11gr2 OL 6.2 64 bit
                          J.A.

                          Hi Hasan

                          • You can add a new VOTE6 Disk to your VOTE DiskGroup and then Drop the VOTE1 disk. I have done this before with OCR/Vote diskgroup and it is an online operation.
                          • On the other hand I see that you only have One Voting disk. I recommend you to create at least 3 Voting disk for your environment (Redundancy).
                          • When your DB instance crash, check the time on the Alert log of your instance, then go to the CSSD log and CRSD log and check the messages at the same time, you have to find the reason of the crash.

                           

                          Regards

                          • 10. Re: Instance shutdown frequently in Oracle RAC 11gr2 OL 6.2 64 bit
                            Hasan Al Mamun

                            Hi J.A.

                             

                            Actually I have created VOTE disk group with 5 disks, VOTE1...VOTE5 but why is it showing only one Disk I am not sure. In EM all 5 disks are online but in query it shows only one. I dont' know why.

                             

                            Hasan Al Mamun

                            • 11. Re: Instance shutdown frequently in Oracle RAC 11gr2 OL 6.2 64 bit
                              Billy~Verreynne

                              Disks being online, does not mean disks are being used, or are configured to be used.

                              • 12. Re: Instance shutdown frequently in Oracle RAC 11gr2 OL 6.2 64 bit
                                Hasan Al Mamun

                                Hi

                                 

                                My node 1 is now down and oracleasm log say as follows:

                                 

                                [oracle@hodfxdb1-svr ~]$ tail -f /var/log/oracleasm

                                Disk "VOTE5" does not exist or is not instantiated

                                Instantiating disk "VOTE5"

                                Disk "VOTE1" does not exist or is not instantiated

                                Instantiating disk "VOTE1"

                                Disk "VOTE2" does not exist or is not instantiated

                                Instantiating disk "VOTE2"

                                Disk "VOTE3" does not exist or is not instantiated

                                Instantiating disk "VOTE3"

                                Disk "VOTE4" does not exist or is not instantiated

                                Instantiating disk "VOTE4"

                                 

                                 

                                Now here is my oracleasm configuration

                                 

                                [oracle@hodfxdb1-svr ~]$ oracleasm configure

                                ORACLEASM_ENABLED=true

                                ORACLEASM_UID=oracle

                                ORACLEASM_GID=oinstall

                                ORACLEASM_SCANBOOT=true

                                ORACLEASM_SCANORDER=""

                                ORACLEASM_SCANEXCLUDE=""

                                 

                                 

                                and also:

                                 

                                 

                                [oracle@hodfxdb1-svr disks]$ ls -l

                                total 0

                                brw-rw---- 1 oracle oinstall 8,  17 Jun 23 12:45 DATA1

                                brw-rw---- 1 oracle oinstall 8,  33 Jun 23 12:45 DATA2

                                brw-rw---- 1 oracle oinstall 8,  49 Jun 23 12:45 DATA3

                                brw-rw---- 1 oracle oinstall 8,  65 Jun 23 12:45 DATA4

                                brw-rw---- 1 oracle oinstall 8,  81 Jun 23 12:45 DATA5

                                brw-rw---- 1 oracle oinstall 8,  97 Jun 23 12:45 DATA6

                                brw-rw---- 1 oracle oinstall 8, 113 Jun 23 12:45 FRA1

                                brw-rw---- 1 oracle oinstall 8, 129 Jun 23 12:45 FRA2

                                brw-rw---- 1 oracle oinstall 8, 145 Jun 23 12:45 FRA3

                                brw-rw---- 1 oracle oinstall 8, 161 Jun 23 12:45 FRA4

                                brw-rw---- 1 oracle oinstall 8, 193 Jun 23 12:45 VOTE1

                                brw-rw---- 1 oracle oinstall 8, 209 Jun 23 12:45 VOTE2

                                brw-rw---- 1 oracle oinstall 8, 225 Jun 23 12:45 VOTE3

                                brw-rw---- 1 oracle oinstall 8, 241 Jun 23 12:45 VOTE4

                                brw-rw---- 1 oracle oinstall 8, 177 Jun 23 12:45 VOTE5

                                 

                                 

                                Now /var/log/message is

                                [root@hodfxdb1-svr ~]# tailf /var/log/messages

                                Jun 24 07:02:35 hodfxdb1-svr kernel: sd 2:0:0:1: [sdb] Unhandled error code

                                Jun 24 07:02:35 hodfxdb1-svr kernel: sd 2:0:0:1: [sdb] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK

                                Jun 24 07:02:35 hodfxdb1-svr kernel: sd 2:0:0:1: [sdb] CDB: Write(10): 2a 00 00 00 10 37 00 00 08 00

                                Jun 24 07:02:35 hodfxdb1-svr kernel: end_request: I/O error, dev sdb, sector 4151

                                Jun 24 08:49:57 hodfxdb1-svr kernel: __ratelimit: 4 callbacks suppressed

                                Jun 24 08:49:57 hodfxdb1-svr kernel: Machine check events logged

                                Jun 24 08:49:57 hodfxdb1-svr kernel: Machine check events logged

                                Jun 24 09:17:52 hodfxdb1-svr kernel: __ratelimit: 1 callbacks suppressed

                                Jun 24 09:17:52 hodfxdb1-svr kernel: Machine check events logged

                                Jun 24 09:17:52 hodfxdb1-svr kernel: Machine check events logged

                                 

                                Think this is the reason for instance shutdown. Now what can I do. It's live system. what can I do?

                                 

                                 

                                Hasan Al Mamun

                                • 13. Re: Instance shutdown frequently in Oracle RAC 11gr2 OL 6.2 64 bit
                                  Hemant K Chitale

                                  Call you Hardware Vendor's Support Engineer (if he is not already aware of the storage outage) and also log an SR with Oracle Support.

                                   

                                  Hemant K Chitale

                                  • 14. Re: Instance shutdown frequently in Oracle RAC 11gr2 OL 6.2 64 bit
                                    ora_tech

                                    Hi,

                                     

                                    Make sure storage connectivity and all other network connectivity with node1 is fine.

                                     

                                    As suggested by Hemant please log SR with oracle.

                                     

                                    thanks,

                                    X A H E E R

                                    1 2 Previous Next