4 Replies Latest reply: Feb 14, 2013 11:46 AM by bigdelboy RSS

    M8000 IO Module failure

    midhu
      Today morning one of the domain in M8000 has gone down and I found that IO Unit(IOU#1) assigned to that domain has gone to faulted state. I have observed that the IO module went to faulted state due to a power sub system failure in it. So I just simply reinserted the IO module after powering down particular domain and run diagnostics on that. IO Module came up without any fault and started working. So I powered on the domain and handed over the server. Now the server is running without any issues. But the issue here is I need to provide the root cause analysis for same. It is having firmware level XCP 1101 which is old but I could not find any bugs related to power in the bug list of this firmware. So what could be the reason. Logs during failure is given below. Need your help..

      From error logs

      Date: Feb 13 08:14:41 IST 2013 Code: 80006000-33010000-010493af00000000
      Status: Alarm Occurred: Feb 13 08:14:38.195 IST 2013
      FRU: /IOU#1
      Msg: Power subsystem failure(detector=0)
      Diagnostic Code:
      01000000 00000000 00000000
      00000000 00000000 00000000 00000000
      00000000 00000000 00000000 00000000
      UUID: dd45e60d-9280-4df5-b777-2ca3db33db25 MSG-ID: SCF-8004-G4
      Date: Feb 13 09:22:02 IST 2013 Code: 60004500-ffff0000-0300000800030000
      Status: Warning Occurred: Feb 13 09:22:01.249 IST 2013
      FRU: /UNSPECIFIED
      Msg: Externally initiated reset occurred
      Diagnostic Code:
      ffffffff ffff0000 00000000
      58495200 00000000 00000000 00000000
      00000000 00000000 00000000 00000000
      UUID: d41f807a-0f1f-4067-b0fe-20e7e556e3d5 MSG-ID: SCF-8008-3U
      Date: Feb 13 09:44:07 IST 2013 Code: 60004500-ffff0000-0300000800030000
      Status: Warning Occurred: Feb 13 09:44:06.867 IST 2013
      FRU: /UNSPECIFIED
      Msg: Externally initiated reset occurred
      Diagnostic Code:
      ffffffff ffff0000 00000000
      58495200 00000000 00000000 00000000
      00000000 00000000 00000000 00000000
      UUID: 869f198c-e2ff-4a92-9fa9-ffc563f9be60 MSG-ID: SCF-8008-3U

      From monitor logs

      Feb 13 08:14:46 M8K00-XSCF0 Alarm: /IOU#1:SCF:Power subsystem failure(detector=0)
      Feb 13 08:14:49 M8K00-XSCF0 monitor_msg: SCF:DomainID 2 state change (initialize phase started, detail#10)
      Feb 13 08:16:21 M8K00-XSCF0 monitor_msg: SCF:DomainID 2: Reset released



      As per M-series ASR doc the MSG-ID: SCF-8004-G4 comes when a component in a FRU has signaled it has failed.
        • 1. Re: M8000 IO Module failure
          midhu
          Any idea guys??
          • 2. Re: M8000 IO Module failure
            bigdelboy
            midhu wrote:
            Any idea guys??
            I composed a response .... determined it was inappropriate and have done the following response instead.

            You would have had a reasonable response ... but as the purpose is for an RCA .... and as 'Any idea guys?' feels like challange to hurry up, I have thought about this and declined.

            Raising an Oracle Service request may however be sensible.
            • 3. Re: M8000 IO Module failure
              midhu
              Hi Bigdel

              I have typed "Any idea guys" just to make the thread active. I am sorry if you felt bad because of that.
              • 4. Re: M8000 IO Module failure
                bigdelboy
                midhu wrote:
                Hi Bigdel

                I have typed "Any idea guys" just to make the thread active. I am sorry if you felt bad because of that.
                I am aware of the bumping. It approximates to 'hurry up' (only approximate).

                And you are pressing because your service management procedures may have an SLA on your RCA.

                However I had half thought of an answer ..... and may have provided it .... but it sets bad precedent to respond to a bump.

                .....

                Better way of bumping would have been adding additional relevant information; and to show you had spent effort serching around to find it.

                .....

                and I assume it isn't under Oracle Support ... and having virtually any M series not under Oracle support would almost invariably be extremely bad practice.