10 Replies Latest reply on Jul 4, 2011 3:52 PM by 805789

    SCSI transport failed: reason 'tran_err': retrying command

    872955
      Hi Guys,

      I am receiving these below errors from my Sun Server Netra-T2000. Does anyone else here what are these errors and what are the cause of these? Please advise.
      uname -a
      *SunOS hostname 5.10 Generic_120011-14 sun4v sparc SUNW,Netra-T2000*
      Result from dmesg:
      Jul  2 10:09:00 hostname xntpd[277]: [ID 774427 daemon.notice] time reset (slew) 0.185121 s
      Jul  2 19:01:14 hostname scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g600c0ff0000000000b5b89082c3fd902 (ssd34):
      Jul  2 19:01:14 hostname         *+SCSI transport failed: reason 'tran_err': retrying command+*
      Jul  2 19:01:14 hostname scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g600c0ff0000000000b5b8904e28f1800 (ssd31):
      Jul  2 19:01:14 hostname         SCSI transport failed: reason 'tran_err': retrying command
      Jul  2 19:01:14 hostname fp: [ID 517869 kern.info] NOTICE: fp(4): PLOGI to a3 failed state=Packet Transport error, reason=No Connection
      Jul  2 19:01:14 hostname fctl: [ID 517869 kern.warning] WARNING: fp(4)::PLOGI to a3 failed. state=e reason=5.
      Jul  2 19:01:14 hostname scsi: [ID 243001 kern.warning] WARNING: /pci@780/pci@0/pci@8/pci@0/SUNW,qlc@1/fp@0,0 (fcp4):
      Jul  2 19:01:14 hostname         PLOGI to D_ID=0xa3 failed: State:Packet Transport error, Reason:No Connection. Giving up
      Jul  2 19:01:15 hostname qlc: [ID 630585 kern.info] NOTICE: Qlogic qlc(2): Loop OFFLINE
      Jul  2 19:01:15 hostname qlc: [ID 630585 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
      Jul  2 19:01:15 hostname scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g600c0ff0000000000b5b895ba84b2000 (ssd37):
      Jul  2 19:01:15 hostname         SCSI transport failed: reason 'tran_err': retrying command
      Jul  2 19:01:15 hostname qlc: [ID 630585 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
      Jul  2 19:01:15 hostname qlc: [ID 630585 kern.info] NOTICE: Qlogic qlc(2): Loop ONLINE
      Jul  2 19:01:15 hostname scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g600c0ff0000000000b5b8904e28f1800 (ssd31):
      Jul  2 19:01:15 hostname         Error for Command: write(10)               Error Level: Retryable
      Jul  2 19:01:15 hostname scsi: [ID 107833 kern.notice]   Requested Block: 659424                    Error Block: 659424
      Jul  2 19:01:15 hostname scsi: [ID 107833 kern.notice]   Vendor: SUN                                Serial Number: 04E28F18-00 
      Jul  2 19:01:15 hostname scsi: [ID 107833 kern.notice]   Sense Key: Unit Attention
      Jul  2 19:01:15 hostname scsi: [ID 107833 kern.notice]   ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
      Jul  2 19:01:15 hostname scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g600c0ff0000000000b5b89082c3fd902 (ssd34):
      Jul  2 19:01:15 hostname         Error for Command: write(10)               Error Level: Retryable
      Jul  2 19:01:15 hostname scsi: [ID 107833 kern.notice]   Requested Block: 65553                     Error Block: 65553
      Jul  2 19:01:15 hostname scsi: [ID 107833 kern.notice]   Vendor: SUN                                Serial Number: 082C3FD9-02 
      Jul  2 19:01:15 hostname scsi: [ID 107833 kern.notice]   Sense Key: Unit Attention
      Jul  2 19:01:15 hostname scsi: [ID 107833 kern.notice]   ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
      Jul  2 19:01:15 hostname scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g600c0ff0000000000b5b895ba84b2000 (ssd37):
      Jul  2 19:01:15 hostname         Error for Command: write(10)               Error Level: Retryable
      Jul  2 19:01:15 hostname scsi: [ID 107833 kern.notice]   Requested Block: 949281                    Error Block: 949281
      Jul  2 19:01:15 hostname scsi: [ID 107833 kern.notice]   Vendor: SUN                                Serial Number: 5BA84B20-00 
      Jul  2 19:01:15 hostname scsi: [ID 107833 kern.notice]   Sense Key: Unit Attention
      Jul  2 19:01:15 hostname scsi: [ID 107833 kern.notice]   ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
      Jul  2 19:01:16 hostname scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g600c0ff0000000000b5b89082c3fd903 (ssd33):
      Jul  2 19:01:16 hostname         Error for Command: read(10)                Error Level: Retryable
      Jul  2 19:01:16 hostname scsi: [ID 107833 kern.notice]   Requested Block: 98504                     Error Block: 98504
      Jul  2 19:01:16 hostname scsi: [ID 107833 kern.notice]   Vendor: SUN                                Serial Number: 082C3FD9-03 
      Jul  2 19:01:16 hostname scsi: [ID 107833 kern.notice]   Sense Key: Unit Attention
      Jul  2 19:01:16 hostname scsi: [ID 107833 kern.notice]   ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
      Thanks in advance.


      Br,
      Pete
        • 1. Re: SCSI transport failed: reason 'tran_err': retrying command
          Your excerpt is telling you what the issue happens to be.
          You have multiple bad blocks on a disk drive or drives
          and you can no longer write anything to those data blocks..

          Use your service contract and open a support case.
          Have technical support analyze the errors
          and they will help you determine what hardware needs to be replaced.
          Then you restore from backup, whatever data may need to be restored..
          • 2. Re: SCSI transport failed: reason 'tran_err': retrying command
            872955
            Hi rukbat,

            Thanks for the reply. But from the errors above, what these error means "*SCSI transport failed: reason 'tran_err': retrying command*", "*failed: State:Packet Transport error, Reason:No Connection. Giving up*" and "*NOTICE: Qlogic qlc(2): Loop OFFLINE*"?


            Thanks.


            Br,
            Pete
            • 3. Re: SCSI transport failed: reason 'tran_err': retrying command
              The system is trying to write data to the disk(s).
              Bad blocks force that write action to try and retry and retry.
              Solaris will do that only so many times before it gives up.

              Go read your excerpt again.
              The information as to what happened is all there.

              Go contact technical support.
              Get your familiar field service engineer out and get the system repaired.
              • 4. Re: SCSI transport failed: reason 'tran_err': retrying command
                805789
                Hi;

                May you please post the output from:

                format
                iostat -E

                Best.

                </SQ>
                • 5. Re: SCSI transport failed: reason 'tran_err': retrying command
                  872955
                  Hi Sergio Quiroz,

                  Below is the output of iostat -E. Thanks in advance.
                  hostname>iostat -E
                  sd1       Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
                  Vendor: FUJITSU  Product: MAW3147NCSUN146G Revision: 1703 Serial No: 0809C0CWG0 
                  Size: 146.80GB <146800115712 bytes>
                  Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
                  Illegal Request: 0 Predictive Failure Analysis: 0 
                  sd4       Soft Errors: 2 Hard Errors: 0 Transport Errors: 0 
                  Vendor: TSSTcorp Product: CD/DVDW TS-T632A Revision: SR03 Serial No:  
                  Size: 0.00GB <0 bytes>
                  Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
                  Illegal Request: 2 Predictive Failure Analysis: 0 
                  sd11      Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
                  Vendor: SEAGATE  Product: ST914602SSUN146G Revision: 0603 Serial No: 0811953JDM 
                  Size: 146.80GB <146800115712 bytes>
                  Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
                  Illegal Request: 0 Predictive Failure Analysis: 0 
                  sd15      Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
                  Vendor: SEAGATE  Product: ST914602SSUN146G Revision: 0603 Serial No: 08119543RN 
                  Size: 146.80GB <146800115712 bytes>
                  Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
                  Illegal Request: 0 Predictive Failure Analysis: 0 
                  ssd30     Soft Errors: 2 Hard Errors: 0 Transport Errors: 0 
                  Vendor: SUN      Product: StorEdge 3510    Revision: 421F Serial No:  
                  Size: 0.10GB <103809024 bytes>
                  Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
                  Illegal Request: 2 Predictive Failure Analysis: 0 
                  ssd31     Soft Errors: 43158 Hard Errors: 15 Transport Errors: 14 
                  Vendor: SUN      Product: StorEdge 3510    Revision: 421F Serial No:  
                  Size: 438.82GB <438823649280 bytes>
                  Media Error: 0 Device Not Ready: 0 No Device: 1 Recoverable: 0 
                  Illegal Request: 43158 Predictive Failure Analysis: 0 
                  ssd32     Soft Errors: 2 Hard Errors: 0 Transport Errors: 0 
                  Vendor: SUN      Product: StorEdge 3510    Revision: 421F Serial No:  
                  Size: 145.75GB <145749475328 bytes>
                  Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
                  Illegal Request: 2 Predictive Failure Analysis: 0 
                  ssd33     Soft Errors: 1 Hard Errors: 1 Transport Errors: 0 
                  Vendor: SUN      Product: StorEdge 3510    Revision: 421F Serial No:  
                  Size: 0.21GB <208666624 bytes>
                  Media Error: 0 Device Not Ready: 0 No Device: 1 Recoverable: 0 
                  Illegal Request: 1 Predictive Failure Analysis: 0 
                  ssd34     Soft Errors: 2 Hard Errors: 3 Transport Errors: 2 
                  Vendor: SUN      Product: StorEdge 3510    Revision: 421F Serial No:  
                  Size: 0.10GB <103809024 bytes>
                  Media Error: 0 Device Not Ready: 0 No Device: 1 Recoverable: 0 
                  Illegal Request: 2 Predictive Failure Analysis: 0 
                  ssd35     Soft Errors: 2 Hard Errors: 0 Transport Errors: 0 
                  Vendor: SUN      Product: StorEdge 3510    Revision: 421F Serial No:  
                  Size: 0.10GB <103809024 bytes>
                  Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
                  Illegal Request: 2 Predictive Failure Analysis: 0 
                  ssd36     Soft Errors: 96 Hard Errors: 0 Transport Errors: 0 
                  Vendor: SUN      Product: StorEdge 3510    Revision: 421F Serial No:  
                  Size: 1024.32GB <1024318439424 bytes>
                  Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
                  Illegal Request: 96 Predictive Failure Analysis: 0 
                  ssd37     Soft Errors: 2 Hard Errors: 2 Transport Errors: 1 
                  Vendor: SUN      Product: StorEdge 3510    Revision: 421F Serial No:  
                  Size: 32.21GB <32211206144 bytes>
                  Media Error: 0 Device Not Ready: 0 No Device: 1 Recoverable: 0 
                  Illegal Request: 2 Predictive Failure Analysis: 0 
                  st6       Soft Errors: 0 Hard Errors: 2 Transport Errors: 0 
                  Vendor: HP       Product: C7438A           Revision: ZP76 Serial No:    9
                  Br,
                  Pinpe
                  • 6. Re: SCSI transport failed: reason 'tran_err': retrying command
                    805789
                    Well, it looks HDDs are OK. I will point those issues to the HBA or the connection between the Switch and the Storage (You 're using a SE 3510). If you have Oracle Support please log a Service Request. An Explorer file will help to do troubleshooting.

                    Cheers.

                    </SQ>
                    • 7. Re: SCSI transport failed: reason 'tran_err': retrying command
                      872955
                      Hi Sergio Quiroz,

                      What is HBA and which Switch are you referring to? Thanks dude.


                      Br,
                      Pinpe
                      • 8. Re: SCSI transport failed: reason 'tran_err': retrying command
                        805789
                        Hi Pinpe:

                        HBA: Host Bus Adaptor. Probably you 're using a Fiber Channel HBA or a SCSI HBA. Because:

                        Jul 2 19:01:15 hostname scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g600c0ff0000000000b5b8904e28f1800 (ssd31):
                        Jul 2 19:01:15 hostname Error for Command: write(10) Error Level: Retryable
                        Jul 2 19:01:15 hostname scsi: [ID 107833 kern.notice] Requested Block: 659424 Error Block: 659424
                        Jul 2 19:01:15 hostname scsi: [ID 107833 kern.notice] Vendor: SUN Serial Number: 04E28F18-00
                        Jul 2 19:01:15 hostname scsi: [ID 107833 kern.notice] Sense Key: Unit Attention
                        Jul 2 19:01:15 hostname scsi: [ID 107833 kern.notice] ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
                        Jul 2 19:01:15 hostname scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g600c0ff0000000000b5b89082c3fd902 (ssd34):
                        Jul 2 19:01:15 hostname Error for Command: write(10) Error Level: Retryable
                        Jul 2 19:01:15 hostname scsi: [ID 107833 kern.notice] Requested Block: 65553 Error Block: 65553
                        Jul 2 19:01:15 hostname scsi: [ID 107833 kern.notice] Vendor: SUN Serial Number: 082C3FD9-02
                        Jul 2 19:01:15 hostname scsi: [ID 107833 kern.notice] Sense Key: Unit Attention
                        Jul 2 19:01:15 hostname scsi: [ID 107833 kern.notice] ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0

                        and (as an example from the logs you 've posted):

                        ssd31 Soft Errors: 43158 Hard Errors: 15 Transport Errors: 14
                        Vendor: SUN Product: StorEdge 3510 Revision: 421F Serial No:
                        Size: 438.82GB <438823649280 bytes>
                        Media Error: 0 Device Not Ready: 0 No Device: 1 Recoverable: 0
                        Illegal Request: 43158 Predictive Failure Analysis: 0
                        ssd32 Soft Errors: 2 Hard Errors: 0 Transport Errors: 0
                        Vendor: SUN Product: StorEdge 3510 Revision: 421F Serial No:
                        Size: 145.75GB <145749475328 bytes>
                        Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
                        Illegal Request: 2 Predictive Failure Analysis: 0

                        ssd34 Soft Errors: 2 Hard Errors: 3 Transport Errors: 2
                        Vendor: SUN Product: StorEdge 3510 Revision: 421F Serial No:
                        Size: 0.10GB <103809024 bytes>
                        Media Error: 0 Device Not Ready: 0 No Device: 1 Recoverable: 0
                        Illegal Request: 2 Predictive Failure Analysis: 0
                        ssd35 Soft Errors: 2 Hard Errors: 0 Transport Errors: 0
                        Vendor: SUN Product: StorEdge 3510 Revision: 421F Serial No:
                        Size: 0.10GB <103809024 bytes>
                        Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
                        Illegal Request: 2 Predictive Failure Analysis: 0

                        Those disks are external and located in a StorEdge 3510. You should be connected to an HBA which is located in the Server. ¿Which type? Probably an SCSI HBA.

                        ¿The SE 3510 is attached directly to the Server? or via SAN (probably a Switch in the middle). You tell me.!

                        I see that ssd31 (ssd31 Soft Errors: 43158 Hard Errors: 15 Transport Errors: 14 ) has some Hard errors but those errors do not indicate that we have a faulty disk.

                        Best.

                        </SQ>
                        • 9. Re: SCSI transport failed: reason 'tran_err': retrying command
                          872955
                          Hi Sergio Quiroz,

                          Please correct me if my understanding is correct or not from your last post. Are you saying that it is possible that the cause of the error messages is iffy cables or bad connections between the Server (Netra-T2000) and the StorEdge 3510? Thanks.


                          Br,
                          Pinpe