6 Replies Latest reply: Jan 17, 2012 11:49 AM by Aerosmith RSS

    Chassis | major: Jul 30 .... ERROR: Failed to send a fma event(rc=111)

    Aerosmith
      Hi,

      we saw this message in system console connected via ALOM.

      fmadm faulty
      --------------- ------------------------------------ -------------- ---------
      TIME EVENT-ID MSG-ID SEVERITY
      --------------- ------------------------------------ -------------- ---------
      Jun 07 01:39:14 63a0708d-6bf0-68e8-83fa-9f2bbf6f81b5 FMD-8000-11 Minor

      Host : xxxx
      Platform : SUNW,SPARC-Enterprise-T5120 Chassis_id :


      Description : A Solaris Fault Manager component generated a diagnosis for which
      no message summary exists. Refer to
      http://sun.com/msg/FMD-8000-11 for more information.

      Response : The diagnosis has been saved in the fault log for examination by
      Sun.

      Impact : The fault log will need to be manually examined using fmdump(1M)
      in order to determine if any human response is required.

      Action : Use fmdump -v -u <EVENT-ID> to view the diagnosis result. Run
      pkgchk -n SUNWfmd to ensure that fault management software is
      installed properly.

      fmdump -v -u 63a0708d-6bf0-68e8-83fa-9f2bbf6f81b5
      TIME UUID SUNW-MSG-ID
      Jun 07 01:39:14.5924 63a0708d-6bf0-68e8-83fa-9f2bbf6f81b5 FMD-8000-11


      pkgchk -n SUNWfmd

      svcs -l em/fmd:default
      fmri svc:/system/fmd:default
      name Solaris Fault Manager
      enabled true
      state online
      next_state none
      state_time Fri Jun 03 22:41:27 2011
      logfile /var/svc/log/system-fmd:default.log
      restarter svc:/system/svc/restarter:default
      contract_id 49
      dependency require_all/none file://localhost/usr/lib/fm/fmd/fmd (online)
      dependency require_all/none svc:/system/sysevent (online) svc:/system/filesystem/minimal (online) svc:/system/dumpadm (online)
      dependency optional_all/none svc:/network/rpc/bind (online)


      How should I troubleshoot this issue? I can provide more details if required. This is Solaris 10 (142900-09 sun4v) on T5120 server.

      Any help is appreciated. Thanks.
        • 2. Re: Chassis | major: Jul 30 .... ERROR: Failed to send a fma event(rc=111)
          abrante
          Hmm, that looks weird, was that really all you got from fmdump?

          .7/M.
          • 3. Re: Chassis | major: Jul 30 .... ERROR: Failed to send a fma event(rc=111)
            Aerosmith
            Hi,

            Thank you for reply.

            fmdump -ev show below from june 7 till July 12


            Jul 12 10:16:58.1133 ereport.cpu.ultraSPARC-T2.fbr 0x00a6371b2f800001
            Jul 12 10:17:58.5999 ereport.cpu.ultraSPARC-T2.fbr 0x01878a7703900001
            Jul 12 10:20:02.2716 ereport.cpu.ultraSPARC-T2.fbr 0x03543e74b1100001
            Jul 12 10:21:55.4040 ereport.cpu.ultraSPARC-T2.fbr 0x04f9afa2c8600001
            Jul 12 10:23:54.8911 ereport.cpu.ultraSPARC-T2.fbr 0x06b6cd1b21700001
            Jul 12 10:25:38.0553 ereport.cpu.ultraSPARC-T2.fbr 0x08371c13afd00001
            Jul 12 10:27:09.7635 ereport.cpu.ultraSPARC-T2.fbr 0x098cbdebab000001
            Jul 12 10:29:01.0229 ereport.cpu.ultraSPARC-T2.fbr 0x0b2b34f6d4c00001
            Jul 12 10:31:21.0058 ereport.cpu.ultraSPARC-T2.fbr 0x0d34ac4aa5b00001
            Jul 12 10:33:19.1766 ereport.cpu.ultraSPARC-T2.fbr 0x0eece257c0100001
            Jul 12 10:35:08.3406 ereport.cpu.ultraSPARC-T2.fbr 0x10838b036e800001
            Jul 12 10:36:25.9617 ereport.cpu.ultraSPARC-T2.fbr 0x11a4b2c1a2200001
            Jul 12 10:38:05.6394 ereport.cpu.ultraSPARC-T2.fbr 0x131804cf3a200001
            Jul 12 10:40:03.6569 ereport.cpu.ultraSPARC-T2.fbr 0x14cfa8a4f4200001
            Jul 12 10:41:45.9490 ereport.cpu.ultraSPARC-T2.fbr 0x164cb7fbe4600001
            Jul 12 10:43:43.2433 ereport.cpu.ultraSPARC-T2.fbr 0x1801aa22f2e00001
            Jul 12 10:44:47.8129 ereport.cpu.ultraSPARC-T2.fbr 0x18f2333499000001
            Jul 12 10:46:45.6152 ereport.cpu.ultraSPARC-T2.fbr 0x1aa909e5c8900001
            Jul 12 10:48:55.0681 ereport.cpu.ultraSPARC-T2.fbr 0x1c8b472d98900001
            Jul 12 10:50:24.4020 ereport.cpu.ultraSPARC-T2.fbr 0x1dd810e180100001
            Jul 12 10:52:20.7424 ereport.cpu.ultraSPARC-T2.fbr 0x1f89755ffa500001
            Jul 12 10:53:49.9879 ereport.cpu.ultraSPARC-T2.fbr 0x20d5eab8c4300001
            Jul 12 10:55:31.9664 ereport.cpu.ultraSPARC-T2.fbr 0x2251cef900000001
            Jul 12 10:57:37.8291 ereport.cpu.ultraSPARC-T2.fbr 0x2426ac6d42000001
            Jul 12 10:58:54.1762 ereport.cpu.ultraSPARC-T2.fbr 0x25431525f7300001
            Jul 12 11:00:25.1587 ereport.cpu.ultraSPARC-T2.fbr 0x26960312c5400001
            Jul 12 11:02:07.2747 ereport.cpu.ultraSPARC-T2.fbr 0x28126a6fc6500001
            Jul 12 11:04:03.9572 ereport.cpu.ultraSPARC-T2.fbr 0x29c5152701800001


            fmdump -eV <--- Messages as pasted below appear for each day from June 7 till July 12. I just chose to paste here for July 07

            Jul 07 2011 07:38:17.364462080 ereport.cpu.ultraSPARC-T2.fbr
            nvlist version: 0
            class = ereport.cpu.ultraSPARC-T2.fbr
            ena = 0xe7cfe7304c800001
            detector = (embedded nvlist)
            nvlist version: 0
            version = 0x0
            scheme = cpu
            cpuid = 0x0
            cpumask = 0x21
            serial = 0xfa4006de2ed3cc4
            (end detector)

            tstate = 0x9900001606
            htstate = 0x0
            tpc = 0x11f2768
            tl = 0x1
            tt = 0x63
            l2-esr = 0x4000000000
            l2-ear = 0x0
            l2-nd = 0x0
            d-esr = 0x8900000000000000
            dram-esr = 0x40000000000000
            dram-ear = 0x0
            dram-elr = 0x0
            dram-cntr = 0x0
            dram-fbd = 0x8000000000000001
            resource = (embedded nvlist)
            nvlist version: 0
            version = 0x0
            scheme = mem
            unum = MB/CMP0/BR3
            serial =
            memconfig = 0x0
            (end resource)

            __ttl = 0x0
            __tod = 0x4e15a8b9 0x15b94000

            Jul 07 2011 07:39:47.269386080 ereport.cpu.ultraSPARC-T2.fbr
            nvlist version: 0
            class = ereport.cpu.ultraSPARC-T2.fbr
            ena = 0xe91ed16611300001
            detector = (embedded nvlist)
            nvlist version: 0
            version = 0x0
            scheme = cpu
            cpuid = 0x0
            cpumask = 0x21
            serial = 0xfa4006de2ed3cc4
            (end detector)

            tstate = 0x1407
            htstate = 0x0
            tpc = 0x100f42c
            tl = 0x1
            tt = 0x63
            l2-esr = 0x4000000000
            l2-ear = 0x0
            l2-nd = 0x0
            d-esr = 0x8900000000000000
            dram-esr = 0x40000000000000
            dram-ear = 0x0
            dram-elr = 0x0
            dram-cntr = 0x0
            dram-fbd = 0x8000000000000001
            resource = (embedded nvlist)
            nvlist version: 0
            version = 0x0
            scheme = mem
            unum = MB/CMP0/BR3
            serial =
            memconfig = 0x0
            (end resource)

            __ttl = 0x0
            __tod = 0x4e15a913 0x100e8160
            • 4. Re: Chassis | major: Jul 30 .... ERROR: Failed to send a fma event(rc=111)
              abrante
              Hmm, looks like some sort of memory issue.


              Whats the output of:

              fmadm faulty

              and

              prtdiag -v



              I've seen similar issues caused by bugs and fixed by patches, but this doesn't really look like one of those.

              .7/M.
              • 5. Re: Chassis | major: Jul 30 .... ERROR: Failed to send a fma event(rc=111)
                Aerosmith
                System panicked yesterday night and rebooted itself.

                W-MSG-ID: SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major
                EVENT-TIME: 0x4e433df8.0x1e3cbedf (0x14db63c45a758a)
                PLATFORM: SUNW,SPARC-Enterprise-T5120, CSN: -, HOSTNAME:BBB
                SOURCE: SunOS, REV: 5.10 Generic_142900-09
                DESC: Errors have been detected that require a reboot to ensure system
                integrity. See http://www.sun.com/msg/SUNOS-8000-0G for more information.
                AUTO-RESPONSE: Solaris will attempt to save and diagnose the error telemetry
                IMPACT: The system will sync files, save a crash dump if needed, and reboot
                REC-ACTION: Save the error summary below in case telemetry cannot be saved

                panic[cpu0]/thread=2a10001fca0: Unrecoverable hardware error

                I see now the o/p of fmadm faulty has changed.

                Here is the new o/p (I don't have the o/p prior to panic reboot)
                fmadm faulty
                --------------- ------------------------------------ -------------- ---------
                TIME EVENT-ID MSG-ID SEVERITY
                --------------- ------------------------------------ -------------- ---------
                Dec 31 18:00:00 63a0708d-6bf0-68e8-83fa-9f2bbf6f81b5 FMD-8000-11 Minor

                Host : BBB
                Platform : SUNW,SPARC-Enterprise-T5120 Chassis_id :


                Description : A Solaris Fault Manager component generated a diagnosis for which
                no message summary exists. Refer to
                http://sun.com/msg/FMD-8000-11 for more information.

                Response : The diagnosis has been saved in the fault log for examination by
                Sun.

                Impact : The fault log will need to be manually examined using fmdump(1M)
                in order to determine if any human response is required.

                Action : Use fmdump -v -u <EVENT-ID> to view the diagnosis result. Run
                pkgchk -n SUNWfmd to ensure that fault management software is
                installed properly.


                prtdiag -v

                System Configuration: Sun Microsystems sun4v SPARC Enterprise T5120
                Memory size: 32640 Megabytes

                ================================ Virtual CPUs ================================


                CPU ID Frequency Implementation Status
                ------ --------- ---------------------- -------
                0 1165 MHz SUNW,UltraSPARC-T2 on-line
                1 1165 MHz SUNW,UltraSPARC-T2 on-line
                2 1165 MHz SUNW,UltraSPARC-T2 on-line
                3 1165 MHz SUNW,UltraSPARC-T2 on-line
                4 1165 MHz SUNW,UltraSPARC-T2 on-line
                5 1165 MHz SUNW,UltraSPARC-T2 on-line
                6 1165 MHz SUNW,UltraSPARC-T2 on-line
                7 1165 MHz SUNW,UltraSPARC-T2 on-line
                8 1165 MHz SUNW,UltraSPARC-T2 on-line
                9 1165 MHz SUNW,UltraSPARC-T2 on-line
                10 1165 MHz SUNW,UltraSPARC-T2 on-line
                11 1165 MHz SUNW,UltraSPARC-T2 on-line
                12 1165 MHz SUNW,UltraSPARC-T2 on-line
                13 1165 MHz SUNW,UltraSPARC-T2 on-line
                14 1165 MHz SUNW,UltraSPARC-T2 on-line
                15 1165 MHz SUNW,UltraSPARC-T2 on-line
                16 1165 MHz SUNW,UltraSPARC-T2 on-line
                17 1165 MHz SUNW,UltraSPARC-T2 on-line
                18 1165 MHz SUNW,UltraSPARC-T2 on-line
                19 1165 MHz SUNW,UltraSPARC-T2 on-line
                20 1165 MHz SUNW,UltraSPARC-T2 on-line
                21 1165 MHz SUNW,UltraSPARC-T2 on-line
                22 1165 MHz SUNW,UltraSPARC-T2 on-line
                23 1165 MHz SUNW,UltraSPARC-T2 on-line
                24 1165 MHz SUNW,UltraSPARC-T2 on-line
                25 1165 MHz SUNW,UltraSPARC-T2 on-line
                26 1165 MHz SUNW,UltraSPARC-T2 on-line
                27 1165 MHz SUNW,UltraSPARC-T2 on-line
                28 1165 MHz SUNW,UltraSPARC-T2 on-line
                29 1165 MHz SUNW,UltraSPARC-T2 on-line
                30 1165 MHz SUNW,UltraSPARC-T2 on-line
                31 1165 MHz SUNW,UltraSPARC-T2 on-line

                ======================= Physical Memory Configuration ========================
                Segment Table:
                --------------------------------------------------------------
                Base Segment Interleave Bank Contains
                Address Size Factor Size Modules
                --------------------------------------------------------------
                0x0 32 GB 8 4 GB MB/CMP0/BR0/CH0/D0
                MB/CMP0/BR0/CH1/D0
                4 GB MB/CMP0/BR0/CH0/D1
                MB/CMP0/BR0/CH1/D1
                4 GB MB/CMP0/BR1/CH0/D0
                MB/CMP0/BR1/CH1/D0
                4 GB MB/CMP0/BR1/CH0/D1
                MB/CMP0/BR1/CH1/D1
                4 GB MB/CMP0/BR2/CH0/D0
                MB/CMP0/BR2/CH1/D0
                4 GB MB/CMP0/BR2/CH0/D1
                MB/CMP0/BR2/CH1/D1
                4 GB MB/CMP0/BR3/CH0/D0
                MB/CMP0/BR3/CH1/D0
                4 GB MB/CMP0/BR3/CH0/D1
                MB/CMP0/BR3/CH1/D1


                ================================ IO Devices ================================
                Slot + Bus Name + Model
                Status Type Path
                ----------------------------------------------------------------------------
                MB/NET0 PCIE network-pciex8086,105e
                /pci@0/pci@0/pci@1/pci@0/pci@2/network@0
                MB/NET1 PCIE network-pciex8086,105e
                /pci@0/pci@0/pci@1/pci@0/pci@2/network@0,1
                MB/NET2 PCIE network-pciex8086,105e
                /pci@0/pci@0/pci@1/pci@0/pci@3/network@0
                MB/NET3 PCIE network-pciex8086,105e
                /pci@0/pci@0/pci@1/pci@0/pci@3/network@0,1
                MB/SASHBA PCIE scsi-pciex1000,58 LSI,1068E
                /pci@0/pci@0/pci@2/scsi@0
                MB PCIX usb-pciclass,0c0310
                /pci@0/pci@0/pci@1/pci@0/pci@1/pci@0/usb@0
                MB PCIX usb-pciclass,0c0310
                /pci@0/pci@0/pci@1/pci@0/pci@1/pci@0/usb@0,1
                MB PCIX usb-pciclass,0c0320
                /pci@0/pci@0/pci@1/pci@0/pci@1/pci@0/usb@0,2

                ============================ Environmental Status ============================
                Fan sensors:
                ------------------------------------------------------------
                Location Sensor Status
                ------------------------------------------------------------
                SYS/FANBD0/FM1/F0 TACH ok
                SYS/FANBD0/FM1/F1 TACH ok
                SYS/FANBD1/FM0/F0 TACH ok
                SYS/FANBD1/FM0/F1 TACH ok
                SYS/FANBD1/FM1/F0 TACH ok
                SYS/FANBD1/FM1/F1 TACH ok
                SYS/FANBD1/FM2/F0 TACH ok
                SYS/FANBD1/FM2/F1 TACH ok

                Temperature sensors:
                ------------------------------------------------------------
                Location Sensor Status
                ------------------------------------------------------------
                SYS/MB T_AMB ok
                SYS/MB/CMP0/BR0/CH0/D0 T_AMB ok
                SYS/MB/CMP0/BR0/CH0/D1 T_AMB ok
                SYS/MB/CMP0/BR0/CH1/D0 T_AMB ok
                SYS/MB/CMP0/BR0/CH1/D1 T_AMB ok
                SYS/MB/CMP0/BR1/CH0/D0 T_AMB ok
                SYS/MB/CMP0/BR1/CH0/D1 T_AMB ok
                SYS/MB/CMP0/BR1/CH1/D0 T_AMB ok
                SYS/MB/CMP0/BR1/CH1/D1 T_AMB ok
                SYS/MB/CMP0/BR2/CH0/D0 T_AMB ok
                SYS/MB/CMP0/BR2/CH0/D1 T_AMB ok
                SYS/MB/CMP0/BR2/CH1/D0 T_AMB ok
                SYS/MB/CMP0/BR2/CH1/D1 T_AMB ok
                SYS/MB/CMP0/BR3/CH0/D0 T_AMB ok
                SYS/MB/CMP0/BR3/CH0/D1 T_AMB ok
                SYS/MB/CMP0/BR3/CH1/D0 T_AMB ok
                SYS/MB/CMP0/BR3/CH1/D1 T_AMB ok
                SYS/MB/CMP0 T_TCORE ok
                SYS/MB/CMP0 T_BCORE ok

                Current sensors:
                ------------------------------------------------------------
                Location Sensor Status
                ------------------------------------------------------------
                SYS/PS0 I_IN_MAIN ok
                SYS/PS0 I_IN_LIMIT ok
                SYS/PS0 I_OUT_MAIN ok
                SYS/PS0 I_OUT_LIMIT ok
                SYS/PS1 I_IN_MAIN ok
                SYS/PS1 I_IN_LIMIT ok
                SYS/PS1 I_OUT_MAIN ok
                SYS/PS1 I_OUT_LIMIT ok

                Voltage sensors:
                ------------------------------------------------------------
                Location Sensor Status
                ------------------------------------------------------------
                SYS/MB V_VMEML ok
                SYS/MB V_VMEMR ok
                SYS/MB V_+3V3_STBY ok
                SYS/MB V_VCORE ok
                SYS/MB V_+3V3_MAIN ok
                SYS/MB V_VDDIO ok
                SYS/MB V_+12V0_MAIN ok
                SYS/MB V_VBAT ok
                SYS/PS0 V_IN_MAIN ok
                SYS/PS0 V_OUT_MAIN ok
                SYS/PS1 V_IN_MAIN ok
                SYS/PS1 V_OUT_MAIN ok

                Voltage indicators:
                ------------------------------------------------------------
                Location Indicator Condition
                ------------------------------------------------------------
                SYS/MB VCORE_POK ok
                SYS/MB VMEML_POK ok
                SYS/MB VMEMR_POK ok
                SYS/MB I_USB0 ok
                SYS/MB I_USB1 ok
                SYS/PS0 AC_POK ok
                SYS/PS0 DC_POK ok
                SYS/PS0 CUR_FAULT ok
                SYS/PS0 VOLT_FAULT ok
                SYS/PS0 FAN_FAULT ok
                SYS/PS0 TEMP_FAULT ok
                SYS/PS1 AC_POK ok
                SYS/PS1 DC_POK ok
                SYS/PS1 CUR_FAULT ok
                SYS/PS1 VOLT_FAULT ok
                SYS/PS1 FAN_FAULT ok
                SYS/PS1 TEMP_FAULT ok

                LEDs:
                ------------------------------------------------------------
                Location LED State
                ------------------------------------------------------------
                SYS SERVICE off
                SYS LOCATE off
                SYS ACT steady
                SYS PS_FAULT off
                SYS TEMP_FAULT off
                SYS FAN_FAULT off
                SYS/MB/CMP0/BR0/CH0/D0 SERVICE off
                SYS/MB/CMP0/BR0/CH0/D1 SERVICE off
                SYS/MB/CMP0/BR0/CH1/D0 SERVICE off
                SYS/MB/CMP0/BR0/CH1/D1 SERVICE off
                SYS/MB/CMP0/BR1/CH0/D0 SERVICE off
                SYS/MB/CMP0/BR1/CH0/D1 SERVICE off
                SYS/MB/CMP0/BR1/CH1/D0 SERVICE off
                SYS/MB/CMP0/BR1/CH1/D1 SERVICE off
                SYS/MB/CMP0/BR2/CH0/D0 SERVICE off
                SYS/MB/CMP0/BR2/CH0/D1 SERVICE off
                SYS/MB/CMP0/BR2/CH1/D0 SERVICE off
                SYS/MB/CMP0/BR2/CH1/D1 SERVICE off
                SYS/MB/CMP0/BR3/CH0/D0 SERVICE off
                SYS/MB/CMP0/BR3/CH0/D1 SERVICE off
                SYS/MB/CMP0/BR3/CH1/D0 SERVICE off
                SYS/MB/CMP0/BR3/CH1/D1 SERVICE off
                SYS/HDD0 SERVICE off
                SYS/HDD0 OK2RM off
                SYS/HDD1 SERVICE off
                SYS/HDD1 OK2RM off
                SYS/HDD2 SERVICE off
                SYS/HDD2 OK2RM off
                SYS/HDD3 SERVICE off
                SYS/HDD3 OK2RM off
                SYS/FANBD0/FM1 SERVICE off
                SYS/FANBD1/FM0 SERVICE off
                SYS/FANBD1/FM1 SERVICE off
                SYS/FANBD1/FM2 SERVICE off

                ============================ FRU Status ============================
                Location Name Status
                ------------------------------------------------------
                SYS MB enabled
                SYS/MB RISER0 enabled
                SYS/MB RISER1 enabled
                SYS/MB RISER2 enabled
                SYS/MB SCC_NVRAM enabled
                SYS/MB/CMP0/BR0/CH0 D0 enabled
                SYS/MB/CMP0/BR0/CH0 D1 enabled
                SYS/MB/CMP0/BR0/CH1 D0 enabled
                SYS/MB/CMP0/BR0/CH1 D1 enabled
                SYS/MB/CMP0/BR1/CH0 D0 enabled
                SYS/MB/CMP0/BR1/CH0 D1 enabled
                SYS/MB/CMP0/BR1/CH1 D0 enabled
                SYS/MB/CMP0/BR1/CH1 D1 enabled
                SYS/MB/CMP0/BR2/CH0 D0 enabled
                SYS/MB/CMP0/BR2/CH0 D1 enabled
                SYS/MB/CMP0/BR2/CH1 D0 enabled
                SYS/MB/CMP0/BR2/CH1 D1 enabled
                SYS/MB/CMP0/BR3/CH0 D0 enabled
                SYS/MB/CMP0/BR3/CH0 D1 enabled
                SYS/MB/CMP0/BR3/CH1 D0 enabled
                SYS/MB/CMP0/BR3/CH1 D1 enabled
                SYS HDD0 enabled
                SYS HDD1 enabled
                SYS HDD2 enabled
                SYS HDD3 enabled
                SYS PDB enabled
                SYS SASBP enabled
                SYS DVD enabled
                SYS USBBD enabled
                SYS FANBD0 enabled
                SYS/FANBD0 FM1 enabled
                SYS FANBD1 enabled
                SYS/FANBD1 FM0 enabled
                SYS/FANBD1 FM1 enabled
                SYS/FANBD1 FM2 enabled
                SYS PS0 enabled
                SYS PS1 enabled

                ============================ FW Version ============================
                Version
                ------------------------------------------------------------
                Sun System Firmware 7.1.7.h 2009/02/13 11:42


                ====================== System PROM revisions =======================
                Version
                ------------------------------------------------------------
                OBP 4.29.1 2008/10/29 14:15

                Chassis Serial Number
                ---------------------
                abcd :)
                • 6. Re: Chassis | major: Jul 30 .... ERROR: Failed to send a fma event(rc=111)
                  Aerosmith
                  Motherboard on this system was replaced and everything is now fine on this system.