This discussion is archived
4 Replies Latest reply: Feb 14, 2013 9:46 AM by bigdelboy RSS

M8000 IO Module failure

midhu Newbie
Currently Being Moderated
Today morning one of the domain in M8000 has gone down and I found that IO Unit(IOU#1) assigned to that domain has gone to faulted state. I have observed that the IO module went to faulted state due to a power sub system failure in it. So I just simply reinserted the IO module after powering down particular domain and run diagnostics on that. IO Module came up without any fault and started working. So I powered on the domain and handed over the server. Now the server is running without any issues. But the issue here is I need to provide the root cause analysis for same. It is having firmware level XCP 1101 which is old but I could not find any bugs related to power in the bug list of this firmware. So what could be the reason. Logs during failure is given below. Need your help..

From error logs

Date: Feb 13 08:14:41 IST 2013 Code: 80006000-33010000-010493af00000000
Status: Alarm Occurred: Feb 13 08:14:38.195 IST 2013
FRU: /IOU#1
Msg: Power subsystem failure(detector=0)
Diagnostic Code:
01000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
UUID: dd45e60d-9280-4df5-b777-2ca3db33db25 MSG-ID: SCF-8004-G4
Date: Feb 13 09:22:02 IST 2013 Code: 60004500-ffff0000-0300000800030000
Status: Warning Occurred: Feb 13 09:22:01.249 IST 2013
FRU: /UNSPECIFIED
Msg: Externally initiated reset occurred
Diagnostic Code:
ffffffff ffff0000 00000000
58495200 00000000 00000000 00000000
00000000 00000000 00000000 00000000
UUID: d41f807a-0f1f-4067-b0fe-20e7e556e3d5 MSG-ID: SCF-8008-3U
Date: Feb 13 09:44:07 IST 2013 Code: 60004500-ffff0000-0300000800030000
Status: Warning Occurred: Feb 13 09:44:06.867 IST 2013
FRU: /UNSPECIFIED
Msg: Externally initiated reset occurred
Diagnostic Code:
ffffffff ffff0000 00000000
58495200 00000000 00000000 00000000
00000000 00000000 00000000 00000000
UUID: 869f198c-e2ff-4a92-9fa9-ffc563f9be60 MSG-ID: SCF-8008-3U

From monitor logs

Feb 13 08:14:46 M8K00-XSCF0 Alarm: /IOU#1:SCF:Power subsystem failure(detector=0)
Feb 13 08:14:49 M8K00-XSCF0 monitor_msg: SCF:DomainID 2 state change (initialize phase started, detail#10)
Feb 13 08:16:21 M8K00-XSCF0 monitor_msg: SCF:DomainID 2: Reset released



As per M-series ASR doc the MSG-ID: SCF-8004-G4 comes when a component in a FRU has signaled it has failed.
  • 1. Re: M8000 IO Module failure
    midhu Newbie
    Currently Being Moderated
    Any idea guys??
  • 2. Re: M8000 IO Module failure
    bigdelboy Pro
    Currently Being Moderated
    midhu wrote:
    Any idea guys??
    I composed a response .... determined it was inappropriate and have done the following response instead.

    You would have had a reasonable response ... but as the purpose is for an RCA .... and as 'Any idea guys?' feels like challange to hurry up, I have thought about this and declined.

    Raising an Oracle Service request may however be sensible.
  • 3. Re: M8000 IO Module failure
    midhu Newbie
    Currently Being Moderated
    Hi Bigdel

    I have typed "Any idea guys" just to make the thread active. I am sorry if you felt bad because of that.
  • 4. Re: M8000 IO Module failure
    bigdelboy Pro
    Currently Being Moderated
    midhu wrote:
    Hi Bigdel

    I have typed "Any idea guys" just to make the thread active. I am sorry if you felt bad because of that.
    I am aware of the bumping. It approximates to 'hurry up' (only approximate).

    And you are pressing because your service management procedures may have an SLA on your RCA.

    However I had half thought of an answer ..... and may have provided it .... but it sets bad precedent to respond to a bump.

    .....

    Better way of bumping would have been adding additional relevant information; and to show you had spent effort serching around to find it.

    .....

    and I assume it isn't under Oracle Support ... and having virtually any M series not under Oracle support would almost invariably be extremely bad practice.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points