6 Replies Latest reply on Mar 21, 2017 9:52 AM by Gurudatta N.R-Oracle

    Sunfire X4500 reporting Critical Interrupt : BIOS : Bus Uncorrectable error

    erazmus

      I have two Sunfire X4500 machines reporting "Critical Interrupt : BIOS : Bus Uncorrectable error" when trying to boot the BIOS, followed by a bunch of "OEM record e0" lines, repeating continuously. I'm wondering what the cause is, and whether re-applying the latest firmware would address it? I can find no way of obtaining the firmware (specifically file ilom.X4500-2.0.2.5-r47053.ima) without some type of entitlement.

       

      Suggestions are appreciated.

        • 1. Re: Sunfire X4500 reporting Critical Interrupt : BIOS : Bus Uncorrectable error
          erazmus

          Replying to my own question, I am seeing more in the error log, specifically this:

           

          20472 IPMI Log critical Thu Mar 16 19:33:36 2017 ID = 4ff4 : 03/16/2017 : 19:33:36 : System Event : BIOS : Undetermined system hardware failure

          20471 IPMI Log critical Thu Mar 16 19:33:35 2017 ID = 4ff3 : 03/16/2017 : 19:33:35 : OEM sensor : BIOS : Hyper-Transport Sync Flood Error

          20470 IPMI Log critical Thu Mar 16 19:33:35 2017 ID = 4ff2 : 03/16/2017 : 19:33:35 : System Boot Initiated : BIOS : Automatic boot to diagnostic

          20469 IPMI Log critical Thu Mar 16 19:33:35 2017 ID = 4ff1 : 03/16/2017 : 19:33:35 : Processor : BIOS : Presence detected

          20468 IPMI Log critical Thu Mar 16 19:33:35 2017 ID = 4ff0 : 03/16/2017 : 19:33:35 : System Boot Initiated : BIOS : Initiated by warm reset

           

          I can't seem to get into the BIOS to change any settings - it just repeats this and my previous error repeatedly. I have two machines and they are suspiciously doing the same thing.

          • 2. Re: Sunfire X4500 reporting Critical Interrupt : BIOS : Bus Uncorrectable error
            Gurudatta N.R-Oracle

            Hi Erazmus,

             

            Good day, Could you kindly share the FUL ILOM snapshot from the host,

             

            This symptom is common amongst legacy AMD servers from all vendors of that vintage.

            The issue is that AMD developed the Opteron processor with a single power plane which drove both the CPU cores and the memory / I/O hub at the same time.

            What happens in summary is that when BIOS is allowed to power manage the CPU's cores, the voltage driving the memory controller is degraded causing correctable and uncorrectable errors.

            Sun which his now Oracle recommend disabling power management in BIOS for AMD servers as this issue is directly caused by the Opteron processor changing power  states.

            AMD attempted to fix the issue in late F stepping dual core CPU's but then broke the design in 10 stepping quad core CPU's.

            They designed the core with a split power plane which drove the CPU cores and the memory / I/O hub separately.

            It took AMD time to iron out the bugs associated with driving that split plane with multi voltage in system boards.

             

            Note: From community notes.

             

            Note: For further investigation Kindly open a SR with ORACLE  and share Full Ilomshot may reboot your node.

             

            Regards

            Gurudatta N.R

            • 3. Re: Sunfire X4500 reporting Critical Interrupt : BIOS : Bus Uncorrectable error
              erazmus

              Thank you for your response. I have discovered that I can successfully get into the BIOS only if I have the minimum number of DIMMs installed - any additional and I get the above behaviour.

               

              I am unable to find a BIOS setting for power management in the BIOS. I did locate the setting for AMD PowerNow, which was enabled, but has since been disabled, but does not seem to affect my issue.

               

              Are you able to point to me where in the BIOS I disable power management?

               

              Here is a link to a full ILOM snapshot from one of the machines:

              https://www.dropbox.com/s/jh2jt6hnt2cree8/SUNSP00144FD35E3F_10.248.42.43_2017-03-17T18-22-43.zip?dl=0

              • 4. Re: Sunfire X4500 reporting Critical Interrupt : BIOS : Bus Uncorrectable error
                Gurudatta N.R-Oracle

                I am getting following issue,

                 

                Could you pls share the logs to gurudatta.nadig@oracle.com.

                 

                https://www.dropbox.com/s/jh2jt6hnt2cree8/SUNSP00144FD35E3F_10.248.42.43_2017-03-17T18-22-43.zip?dl=0 Peer's Certificate issuer is not recognized. HTTP Strict Transport Security: true HTTP Public Key Pinning: true Certificate chain: -----BEGIN CERTIFICATE----- MIIEATCCAumgAwIBAgIUQyGYCPTIg7CVW6FeED4iwqKxlXQwDQYJKoZIhvcNAQEL BQAweTEbMBkGA1UECgwST3JhY2xlIENvcnBvcmF0aW9uMQwwCgYDVQQLDANHSVQx FTATBgNVBAcMDFJlZHdvb2QgQ2l0eTELMAkGA1UECAwCQ0ExCzAJBgNVBAYTAlVT MRswGQYDVQQDDBJPcmFjbGUgV2ViIEdhdGV3YXkwHhcNMTcwMzE3MDkzODQxWhcN MTgwMzE3MDkzODQxWjAaMRgwFgYDVQQDDA93d3cuZHJvcGJveC5jb20wggEiMA0G CSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQCq2NBSLqIXiUFanRfh2xf9kE8OOgdM uG9lpWmphaLsKbUBO6ckfTNVx5Cf7rktQxDUwINiBwQUdxvPCEIXrcPQncvjr5uB YefLOFv51DAmlJ198yt/PSebvb3NK9tOmq/lqNDOCJFnvrhdcLPm9x31T5iz+Rlf kn0d5jBgDHUwwfHQm/uk2TXGV/ePDIaE1zRjkr4QO4LkX9K9X60MByUR9unbsZiL rngzIAOYnLDzRc6Cp9BXRYKH7+oTLf/vKTqlMLSDxh4PTXkTo4knxcN+Pu8TUjCz Zh+F/0LU/DIVLudbYltOAB1rAi6pVB/CU7NX3vPfAiqo/A9UormSklYZAgMBAAGj gd8wgdwwCQYDVR0TBAIwADAdBgNVHQ4EFgQU88h+phHPgcgDih3eUvzKfp80qGEw gY0GA1UdIwSBhTCBgqF9pHsweTEbMBkGA1UECgwST3JhY2xlIENvcnBvcmF0aW9u MQwwCgYDVQQLDANHSVQxFTATBgNVBAcMDFJlZHdvb2QgQ2l0eTELMAkGA1UECAwC Q0ExCzAJBgNVBAYTAlVTMRswGQYDVQQDDBJPcmFjbGUgV2ViIEdhdGV3YXmCAQEw CwYDVR0PBAQDAgWgMBMGA1UdJQQMMAoGCCsGAQUFBwMBMA0GCSqGSIb3DQEBCwUA A4IBAQBZOwFnVw/YA7+wV9VDBL0GAA6eYgkHlyac7QoZKa9RV4OUAUHhDEwPkKe1 ZpEFoGKqHaUDUDeii8MiK9ZlnBS+4HbN0dewwUncIBEbfmnIiYNNHL0dV0187xkI yTJrYX9qAEoNhhv3Nv4mfx1BHrnaReTL0DdwaokFDR4ffy/JHC5zc97U5BtFkpEv nXK2Ot2Oo1dNoNV+70iAB//olu8asrHQHS1LdTjE9GjjVOcolTwNZOfqgzVTGVos 2nEzESMBY+jiwcvJXEbwsSbJKvEku0+ZiESxFnR14DwSiKhBMRfYptuT9Whhshd3 wb0VEYB+0G2i//g9mHle8F0YDazc -----END CERTIFICATE-----

                 

                Regards

                Gurudatta N.R

                • 6. Re: Sunfire X4500 reporting Critical Interrupt : BIOS : Bus Uncorrectable error
                  Gurudatta N.R-Oracle

                  Quick que

                   

                  Have you removed any CPU from the host ?

                   

                  hdtDiag: HT test passed !!

                  hdtDiag: Power cycling for clean start

                  hdtDiag: Power on  00

                     no dbdry cpu 00

                  hdtDiag: Error, HDT command failed, no CFF  cpu 0

                  hdtDiag: Error exiting HDT mode

                   

                  hdtDiag: Error, HDT command failed, no CFF  cpu 0 =====================>

                  hdtDiag: Error exiting HDT mode

                  hdtDiag: exit HDT mode  cpu 00

                   

                  I am unable to find valid information apart from the some  HD failed info in the given logs.

                   

                  ipmitool -H  < ilom ipaddress  -U root fru, Kindly share the following information.

                   

                  Regards

                  Gurudatta N.R