This discussion is archived
9 Replies Latest reply: Sep 18, 2012 5:52 AM by Nicolas Wipfli - Oracle RSS

Disk Array 2530-M2 disk replacement procedure via CLI

954225 Newbie
Currently Being Moderated
Hi,

I wish to know the procedure to replace the Sun/Oracle Storage Disk Array 2530-M2's disk.
From the manuals it can be seen that the expected way is via the WEB based GUI. But the problem is that we haven't installed that, we are using only the CLI based commands (CAM) .

So it will be a great help if I can get any pointers in this regard. Awaiting eagerly.

Thanks,
Savvy
  • 1. Re: Disk Array 2530-M2 disk replacement procedure via CLI
    Nicolas Wipfli - Oracle Expert
    Currently Being Moderated
    Hello,

    You need to ensure that the drive is failed (amber LED ON), then you can pull the drive, wait 1 min, insert the new drive.
    If the drive is not failed, please indicate why you need to replace it ?

    Thanks
    Nicolas
  • 2. Re: Disk Array 2530-M2 disk replacement procedure via CLI
    954225 Newbie
    Currently Being Moderated
    Ok,

    Actually, I am preparing a Hardware Troubleshooting guide for Disk Replacement of Disk Array 2530-M2.
    Previously we were using Sun 3510 storage box and we had a proper disk replacement procedure. We need a corresponding document for 2530.

    Thanks,
    Sabyasachi
  • 3. Re: Disk Array 2530-M2 disk replacement procedure via CLI
    Nicolas Wipfli - Oracle Expert
    Currently Being Moderated
    Hello,

    We do not have sch document because with CAM everything is in Service Advisor.
    That said, if the drive is faulty, you can pull it, wait 1 min, then insert the new one. Normally when the drive is faulty, its amber and blue LEDs are ON.
    Now in situations where you need to replace a drive that reported a predictive failure analysis, this drive is not faulty. Therefore, you need to fail the drive and this is something you can do with the command "service -d <deviceid> -c fail -t <tXdriveY>" where:

    /* `service` is under:
    /* Solaris: /opt/SUNWsefms/bin/
    /* Linux: /opt/sun/cam/private/fms/bin/
    /* Windows: C:\Program Files\Sun\Common Array Manager\Component\fms\bin\

    Once the drive is failed, you can proceed with its replacement as indicated above.
    If you have hot spare and you want to avoid the first reconstruction to one of them, then the copy back, you can disable the hot spare drives (see the sscs man page for the command "sscs modify disk") before failing the drive. Then once the replacement is done and the reconstruction to the new drive is ongoing, you can reassign the necessary drives to the hot spare list.

    Regards
    Nicolas
  • 4. Re: Disk Array 2530-M2 disk replacement procedure via CLI
    954225 Newbie
    Currently Being Moderated
    Thanks a lot for your detailed description. It is of great help.. As I am a novice I am listing the Sample of point. Please help me edit it / add more details.


    The general procedure of replacing an FRU for a 2530-M2 Disk Array is through Service Advisor on the Web based application. But there is no such Web based application supported by us, hence we will be using the manual procedure.
    On a high level there are 2 scenarios where in a Disk replacement may be required. Firstly, when the disk has actually faulted and secondly, when the disk has not faulted.

    Check the color of the LED next to the concerned disk on the Disk array. If the color of LED is either Blue or Amber then the drive is faulty.
    Or else, if the color is Green, then the drive is not faulty.

    Case # 1: The Drive is faulty

    1.     If the drive is faulty, pull out the faulty drive from its slot.
    2.     Wait for 1 min.
    3.     Then insert the new drive in the slot from which the faulty drive was removed.

    Case # 2: The drive is not faulty

    Suppose the Drive reported a predictive failure analysis, the drive is not faulty.
    In this case, one needs to fail the drive before changing it.
    Execute the following command:
    service –d <deviceid> -c fail –t <tXdriveY>
    Where service is under /opt/SUNWsefms/bin/
    Once the Drive is failed, follow the steps in Case # 1 (The Drive is faulty).

    If there is hot spare enabled and we wish to avoid the first reconstruction of one of them, then copy back, the hot spare drives can be disabled. (use “sscs modify disk”) before failing the drive.
    Then once the replacement is done and re-construction is going on, we can assign the necessary drives to the hot-spare list.


    Thanks,
    Savvy
  • 5. Re: Disk Array 2530-M2 disk replacement procedure via CLI
    954225 Newbie
    Currently Being Moderated
    Hi,

    Can you please explain, what is meant by "Suppose the Drive reported a predictive failure analysis, the drive is not faulty." Does it means that in this case the LED will be 'Amber' and not 'Blue'?
    Please explain
  • 6. Re: Disk Array 2530-M2 disk replacement procedure via CLI
    Nicolas Wipfli - Oracle Expert
    Currently Being Moderated
    This means that the electronic on the drive detected a predictive failure, hence it is recommended to replace it. However, the amber LED will not be ON, because the drive state is not failed. As for the blue LED it will not be ON either, because the drive is not ready for removal. CAM displays an alarm for the predictive failure.
  • 7. Re: Disk Array 2530-M2 disk replacement procedure via CLI
    954225 Newbie
    Currently Being Moderated
    Ok Sir,

    Do we have any particular output sample for the same. I actually need to tell the customer the exact condition for each of the 2 scenarios:

    - Good condition, I have the capture for it.
    - Bad Condition, I need to know the value of the 'State' and 'Status' of the disk.
    Or any other pointer which tells that the disk is faulty. (I think I have the LED combination)

    And lastly how do we see the Predictive failure alarm (we do not have a CAM Web-application for this). MY guess from the list alarm command. Do you have a sample output of this also.
  • 8. Re: Disk Array 2530-M2 disk replacement procedure via CLI
    Nicolas Wipfli - Oracle Expert
    Currently Being Moderated
    Please read through our document "Troubleshooting Sun Storage[TM] Array Impending Drive Failures (Doc ID 1103184.1)" which will give you all the details on how to treat the PFA alarms.
    When there is a drive reporting a PFA, the array firmware is notified and it raises the alarm. CAM simply pulls the array firmware alarms every 5 min and bring that back to the user via the BUI or the CLI (sscs list alarm). For the PFA, you will get an alarm type xx.66.1026.

    Regards
    Nicolas
  • 9. Re: Disk Array 2530-M2 disk replacement procedure via CLI
    Nicolas Wipfli - Oracle Expert
    Currently Being Moderated
    If you want to see what the alarm looks like, you can run the following command on your CAM server:

    # ras_admin advisor -e <event type>

    /* `ras_admin` is under:
    /* Solaris: /opt/SUNWsefms/bin/
    /* Linux: /opt/sun/cam/private/fms/bin/
    /* Windows: C:\Program Files\Sun\Common Array Manager\Component\fms\bin\

    We have three type of PFA, Risk Low, Risk Medium, Risk High.

    Example for a 6180 (90.xx.xxxx):

    Event Code : 90.66.1026
    Event Type : 6180.ProblemEvent.REC_IMPENDING_DRIVE_FAILURE_RISK_LOW
    Severity : 0
    Sample Description : Impending drive failure for disk {0} (Low Data Availability
    Risk).
    Probable Cause : A drive is reporting internal errors that could cause the
    drive to fail. If the affected drive is an unassigned
    drive, it will not be available for volume configuration.
    If the affected drive is a Standby hot spare drive, it will
    not be available to take over for a failed drive.
    Recommended Action : Refer to the Service Advisor procedure for recovering from
    this problem.

    Event Code : 90.66.1025
    Event Type : 6180.ProblemEvent.REC_IMPENDING_DRIVE_FAILURE_RISK_MED
    Severity : 0
    Sample Description : Impending drive failure in Virtual Disk {0} (Medium Data
    Availability Risk), affected drive(s): {1}
    Probable Cause : A drive is reporting internal errors that could cause the
    drive to fail. If this drive fails, the volumes in the
    volume group will become degraded.
    Recommended Action : Refer to the Service Advisor procedure for recovering from
    this problem.

    Event Code : 90.66.1024
    Event Type : 6180.ProblemEvent.REC_IMPENDING_DRIVE_FAILURE_RISK_HIGH
    Severity : 0
    Sample Description : Impending drive failure in Virtual Disk {0} (High Data
    Availability Risk), affected drive(s): {1}
    Probable Cause : A drive is reporting internal errors that could cause the
    drive to fail. If this drive fails before you follow the
    recovery steps, the volumes in the volume group will fail
    and all data on the volumes will be lost.
    Recommended Action : Refer to the Service Advisor procedure for recovering from
    this problem.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points