7 Replies Latest reply on Jan 29, 2013 6:42 PM by rukbat

    Analysing show error buffer output E2900 server

      Hi Guys Please help to find out the faulty memories in E2900 server using showerrorbuffer output. I am continuously getting incoming read messages which denotes memory errors but there is no memory error in MEM2 POST and show outputs . I can find out the corresponding CPU from port. A sample output below.

      Date: Thu Apr 26 03:03:01 PDT 2012
      Device: /SB4/dx3
      ErrorID: 0x33061ff1
      Port: 1
      Syndrome: 0x86(CE bit 17)
      Direction: incoming read
      First error: true
      TargetAid: 0x10
      Transid: 0x1

      As per Oracle doc 1002710.1 memory pair is found out using the databit(CE bit 17) as data bit 41 (ESYN 0xd) translates to the DIMM pair(J16500/J16501). So please help how to translate data bit to memory pair.
        • 1. Re: Analysing show error buffer output E2900 server

          Generaly */var/adm/messages* provide more readable information.

          For decoding try found document:
          Document ID: ID70962 Synopsis: Sun Fire[TM] High/Midrange Servers: Decoding data bit to a pair of DIMMs

          1 person found this helpful
          • 2. Re: Analysing show error buffer output E2900 server
            Thank you very much for the link ... :)
            • 3. Re: Analysing show error buffer output E2900 server
              Device: /SB4/dx3
              translates to the DIMM pair(J16500/J16501)
              Go to SB4 and see that the DIMM slots for the Sunfire MidRange Systems have printed labels already in place on each board. If you are running multiple cpu/memory boards at the same time and if your firmware version is new enough to let you to disable the board, you would then be able pull it from the running chassis, and swap out the questionable RAM.

              Since you have Metalink access, you can find a drawing of the board layout in the System Handbook.

              The E2900 documentation is at:
              You can glance in the E2900 Service Manual (23MB PDF file):
              for service procedures.

              (Of course it will ALWAYS be safer to just schedule a proper maintenance window when you can shut the whole chassis down to a power-off state.)

              For those other people that might read this thread in the future and NOT have access to the MOS version of the System Handbook, here's a diagram of a Sunfire MidRange cpu/memory board as hosted on a non-Oracle archive of the old Sun System Handbook:
              • 4. Re: Analysing show error buffer output E2900 server
                Sorry Rukbat you 're wrong. This "error" points to DIMMS J14400 or J14401 @ SB4.

                However I 'll suggest first to count how many "incoming read errors" you have from "showerrorbuffer". Check @ /var/adm/messages for any AFT message like AFT2 or AFT0. May be you are experiencing Correctable Errors. If you have an AFT1, does mean you 're experiencing "Uncorrectable Errors" so DIMMS will need to be replaced.

                If you still feel no confidence, please open a ticket with Oracle Support.


                SPARC Engineer
                • 5. Re: Analysing show error buffer output E2900 server
                  I hadn't done any analysis of the original poster's errors.
                  Their initial post was a bit too cryptic for me to attempt anything.
                  My reply to their inquiry was to quote their conclusion that it was going to be that pair particular of modules.

                  Yes, they most definitely need to log a ticket for proper analysis.
                  The errors could require service to the system or the errors could simply be an indication the box is far underpatched and thus mean nothing.
                  • 6. Re: Analysing show error buffer output E2900 server
                    Haa !! Now that I read carefully, you 're right Rukbat. Please accept my apologizes.! User "midhu" was the one pointing to J16500/J16501.


                    SPARC Engineer
                    • 7. Re: Analysing show error buffer output E2900 server
                      Sergio Quiroz wrote:
                      Haa !! Now that I read carefully ...
                      No problem.
                      A very long, long time ago in a galaxy far, far away and in a previous incarnation, I enjoyed a few years as a Hardware support engineer for Sun. We handled many DIMM error support calls on a daily basis. The vast majorityof those inquiries needed no parts swapped, just OS and firmware patches applied.

                      My Sunfire Midrange training was so long ago the documentation for the training classes were in what might be described as an alpha version. The E2900 is nothing more than a slightly newer UltraSPARC-III / UltraSPARC-IV iteration of the US-II V1280.