4 Replies Latest reply: May 7, 2013 8:21 AM by 990261 RSS

    System Crash Fatal

    990261
      Over the weekend I had a E250 crash. I was able to get it backup and running but I am not sure why it crashed. I am pretty sure that it has hardware issue. A user reported that it has been running slow for about three weeks. Two weeks ago I had to restore the root drive from backup because it crashed and I was getting a BAD SUPER BLOCK: MAGIC NUMBER WRONG. Today I came in and it was giving a BAD SUPER BLOCK: MAGIC NUMBER WRONG on /dev/rdsk/c0t0d0s1 which is the swap partition. I got lucky since it was the swap partition I just recreated the filesystem and it booted right up.

      My guess is bad CPU or bad memory module. If it is a memory module how do I know which one?

      Here is what was in the /var/adm/message log file. Thanks.

      May 4 23:13:53 orca SUNW,UltraSPARC-II: [ID 805541 kern.warning] WARNING: [AFT1] AFAR was derived from UE rep
      ort, CP event on CPU0 (caused access error on IOBUS31), errID 0x00055553.d7f6661b
      May 4 23:13:53 orca AFSR 0x00000000.01000001<CP> AFAR 0x00000000.3f209748
      May 4 23:13:53 orca AFSR.PSYND 0x0001(Score 95) AFSR.ETS 0x00
      May 4 23:13:53 orca UDBH 0x0000 UDBH.ESYND 0x00 UDBL 0x0000 UDBL.ESYND 0x00
      May 4 23:13:53 orca SUNW,UltraSPARC-II: [ID 728641 kern.info] [AFT2] errID 0x00055553.d7f6661b PA=0x00000000.
      3f209748
      May 4 23:13:53 orca E$tag 0x00000000.0bc007e4 E$State: Modified E$parity 0x05
      May 4 23:13:53 orca SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x00): 0x80000000.0486a012
      May 4 23:13:53 orca SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2] E$Data (0x08): 0x80000000.03c6c01a Bad
      PSYND=0x0001
      May 4 23:13:53 orca SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x10): 0x80000000.1dc6e012
      May 4 23:13:53 orca SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x18): 0x80000000.20a70012
      May 4 23:13:53 orca SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x20): 0x80000000.06272012
      May 4 23:13:53 orca SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x28): 0x80000000.17c74012
      May 4 23:13:53 orca SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x30): 0x80000000.22e76012
      May 4 23:13:53 orca SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] E$Data (0x38): 0x80000000.1a678012
      May 4 23:13:53 orca pcipsy: [ID 139652 kern.warning] WARNING: uncorrectable error detected by pci0 (upa mid 1
      f) during
      May 4 23:13:53 orca DVMA read transaction
      May 4 23:13:54 orca pcipsy: [ID 475334 kern.info] Transaction was a block operation.
      May 4 23:13:54 orca pcipsy: [ID 750218 kern.info] AFSR=40000000.3f800000 AFAR=00000000.3f209748,
      May 4 23:13:54 orca double word offset=1, Memory Module U0702 U0802 id 31.
      May 4 23:13:54 orca unix: [ID 836849 kern.notice]
      May 4 23:13:54 orca ^Mpanic[cpu0]/thread=300059eb440:
      May 4 23:13:54 orca unix: [ID 261965 kern.notice] Fatal PCI UE Error
      May 4 23:13:54 orca unix: [ID 100000 kern.notice]
      May 4 23:13:54 orca genunix: [ID 723222 kern.notice] 000002a100077e60 pcipsy:ecc_intr+1ac (3f209740, 40000000
      3f800000, 300007bde78, 3000005f908, 1f, 10242ab4)
      May 4 23:13:54 orca genunix: [ID 179002 kern.notice] %l0-3: 0000000000000008 0000000000004000 0000000000000
      000 000000000000a568
      May 4 23:13:54 orca %l4-7: 0000000000008b78 0000000000008b40 0000000000000000 0000000100279b90
      May 4 23:13:54 orca genunix: [ID 723222 kern.notice] 000002a100077f50 unix:current_thread+44 (10, 2, 0, 1003f
      9060, 1005460c0, 1000)
      May 4 23:13:55 orca genunix: [ID 179002 kern.notice] %l0-3: 00000000100072a4 000002a1009a52f1 0000000000000
      00e 0000000000000016
      May 4 23:13:55 orca %l4-7: 0000000000000000 0000000000000000 00000300059eb440 000002a1009a5ba0
      May 4 23:13:55 orca unix: [ID 100000 kern.notice]
      May 4 23:13:55 orca genunix: [ID 672855 kern.notice] syncing file systems...
      May 4 23:13:55 orca genunix: [ID 904073 kern.notice] done
      May 4 23:13:56 orca genunix: [ID 353387 kern.notice] dumping to /dev/dsk/c0t0d0s1, offset 1288699904
      May 4 23:13:56 orca scsi: [ID 107833 kern.warning] WARNING: /pci@1f,4000/scsi@3 (glm0):
      May 4 23:13:56 orca got SCSI bus reset
      May 4 23:13:57 orca genunix: [ID 408822 kern.info] NOTICE: glm0: fault detected in device; service still avai
      lable
      May 4 23:13:57 orca genunix: [ID 611667 kern.info] NOTICE: glm0: got SCSI bus reset
      May 4 23:14:13 orca genunix: [ID 409368 kern.notice] ^M100% done: 25399 pages dumped, compression ratio 2.65,

      May 4 23:14:13 orca genunix: [ID 851671 kern.notice] dump succeeded
      May 4 23:15:43 orca genunix: [ID 540533 kern.notice] ^MSunOS Release 5.8 Version Generic_108528-29 64-bit
      May 4 23:15:43 orca genunix: [ID 913632 kern.notice] Copyright 1983-2003 Sun Microsystems, Inc. All rights r
      eserved.
      May 4 23:15:43 orca genunix: [ID 678236 kern.info] Ethernet address = 0:3:ba:3:5:12
      May 4 23:15:43 orca unix: [ID 389951 kern.info] mem = 1048576K (0x40000000)
      May 4 23:15:43 orca unix: [ID 930857 kern.info] avail mem = 1025564672
      May 4 23:15:43 orca rootnex: [ID 466748 kern.info] root nexus = Sun (TM) Enterprise 250 (UltraSPARC-II 400MHz
      )
      May 4 23:15:43 orca rootnex: [ID 349649 kern.info] pcipsy0 at root: UPA 0x1f 0x4000
      May 4 23:15:43 orca genunix: [ID 936769 kern.info] pcipsy0 is /pci@1f,4000
      May 4 23:15:43 orca rootnex: [ID 349649 kern.info] pcipsy1 at root: UPA 0x1f 0x2000
      May 4 23:15:43 orca genunix: [ID 936769 kern.info] pcipsy1 is /pci@1f,2000
      May 4 23:15:43 orca scsi: [ID 365881 kern.info] /pci@1f,4000/scsi@3 (glm0):
      May 4 23:15:43 orca Rev. 5 Symbios 53c875 found.
      May 4 23:15:43 orca scsi: [ID 365881 kern.info] /pci@1f,4000/scsi@3 (glm0):
      May 4 23:15:43 orca target1-scsi-options=0x5f8
      May 4 23:15:43 orca scsi: [ID 365881 kern.info] /pci@1f,4000/scsi@3 (glm0):
      May 4 23:15:43 orca target2-scsi-options=0x5f8
      May 4 23:15:43 orca scsi: [ID 365881 kern.info] /pci@1f,4000/scsi@3 (glm0):
      May 4 23:15:43 orca target3-scsi-options=0x5f8
      May 4 23:15:43 orca scsi: [ID 365881 kern.info] /pci@1f,4000/scsi@3 (glm0):
      May 4 23:15:43 orca target4-scsi-options=0x5f8
      May 4 23:15:43 orca scsi: [ID 365881 kern.info] /pci@1f,4000/scsi@3 (glm0):
      May 4 23:15:43 orca target5-scsi-options=0x5f8
      May 4 23:15:43 orca scsi: [ID 365881 kern.info] /pci@1f,4000/scsi@3 (glm0):
      May 4 23:15:43 orca target6-scsi-options=0x5f8
      May 4 23:15:43 orca pcipsy: [ID 370704 kern.info] PCI-device: scsi@3, glm0
      May 4 23:15:43 orca genunix: [ID 936769 kern.info] glm0 is /pci@1f,4000/scsi@3
      May 4 23:15:43 orca scsi: [ID 365881 kern.info] /pci@1f,4000/scsi@3,1 (glm1):
      May 4 23:15:43 orca Rev. 5 Symbios 53c875 found.
      May 4 23:15:43 orca pcipsy: [ID 370704 kern.info] PCI-device: scsi@3,1, glm1
      May 4 23:15:43 orca genunix: [ID 936769 kern.info] glm1 is /pci@1f,4000/scsi@3,1
      May 4 23:15:43 orca scsi: [ID 193665 kern.info] sd0 at glm0: target 0 lun 0
        • 1. Re: System Crash Fatal
          Nik
          Hi.
          It's look like uncorrectable memory error.
          System detect this error on Memory Module U0702 U0802

          Regards.
          • 2. Re: System Crash Fatal
            990261
            How am I suppose to know which module since Bank 0 and Bank 1 both have a U0702 and a U0802?

            ========================= Memory =========================

            Interlv. Socket Size
            Bank Group Name (MB) Status
            ---- ----- ------ ---- ------
            0 none U0701 128 OK
            0 none U0801 128 OK
            0 none U0901 128 OK
            0 none U1001 128 OK
            0 none U0702 128 OK
            0 none U0802 128 OK
            0 none U0902 128 OK
            0 none U1002 128 OK
            1 none U0702 128 OK
            1 none U0802 128 OK
            1 none U0902 128 OK
            1 none U1002 128 OK
            • 3. Re: System Crash Fatal
              Nik
              Hi.
              Open cover and you can find only one DIMM U0702 and one U0802.

              Every dimm have two logical banks so prtdiag show every dimm twise.


              Regagards.
              • 4. Re: System Crash Fatal
                990261
                Once I removed the cover on an old system I noticed on the board that the DIMM slots are labeled. I hope this is the fix.