This discussion is archived
5 Replies Latest reply: Apr 4, 2013 6:18 AM by 990261 RSS

Replace hard drive

990261 Newbie
Currently Being Moderated
I need to replace a hard drive on a Sun Fire X2200. It is giving a bad sector error. The problem is I do not know what is the best method to replace the hard drive. Our UNIX admin had a stroke so I am left to take over the administration until he is back. There are two hard drives that are in the server. I have never replaced a hard drive in a UNIX system. I usually use ghost to backup Windows machines. Below are some commands I ran to gather some information. For some reason I cannot run the format command right now. I will have to try it at the console tomorrow. The hard drive that needs replaced is c2d0. I am not sure if I need to copy the data to another hard drive or restore it from a backup? Do I need to boot into single user mode to do the backup and restore? The system is being backed up using Legato Networker on a regular basis, but I have never used the software until now. I have been just changing the tapes out. Also, I ordered an identical server so I can do some testing on it. Thanks in advance.

mil3%dmesg
Feb 8 14:04:55 mil3 Error for command 'read sector' Error Level: Fatal
Feb 8 14:04:55 mil3 gda: [ID 107833 kern.notice] Requested Block 813460600, Error Block: 813460615
Feb 8 14:04:55 mil3 gda: [ID 107833 kern.notice] Sense Key: uncorrectable data error
Feb 8 14:04:55 mil3 gda: [ID 107833 kern.notice] Vendor 'Gen-ATA ' error code: 0x7
Feb 8 14:04:57 mil3 gda: [ID 107833 kern.warning] WARNING: /pci@0,0/pci-ide@5/ide@1/cmdk@0,0 (Disk1):

mil3% iostat -en
---- errors ---
s/w h/w trn tot device
0 0 0 0 c1d0
0 5736 0 5736 c2d0
6 2 0 8 c0t0d0

mil3% more /etc/vfstab
#device device mount FS fsck mount mount
#to mount to fsck point type pass at boot options
#
fd - /dev/fd fd - no -
/proc - /proc proc - no -
/dev/dsk/c1d0s1 - - swap - no -
/dev/dsk/c1d0s0 /dev/rdsk/c1d0s0 / ufs 1 no -
/dev/dsk/c1d0s6 /dev/rdsk/c1d0s6 /disk1 ufs 2 yes -
/dev/dsk/c2d0s7 /dev/rdsk/c2d0s7 /disk2 ufs 2 yes -
/devices - /devices devfs - no -
ctfs - /system/contract ctfs - no -
objfs - /system/object objfs - no -
swap - /tmp tmpfs - yes -

mil3% df -h
Filesystem size used avail capacity Mounted on
/dev/dsk/c1d0s0 79G 17G 61G 22% /
/devices 0K 0K 0K 0% /devices
ctfs 0K 0K 0K 0% /system/contract
proc 0K 0K 0K 0% /proc
mnttab 0K 0K 0K 0% /etc/mnttab
swap 4.7G 904K 4.7G 1% /etc/svc/volatile
objfs 0K 0K 0K 0% /system/object
/usr/lib/libc/libc_hwcap2.so.1
79G 17G 61G 22% /lib/libc.so.1
fd 0K 0K 0K 0% /dev/fd
swap 4.7G 84K 4.7G 1% /tmp
swap 4.7G 32K 4.7G 1% /var/run
/dev/dsk/c2d0s7 688G 453G 228G 67% /disk2
/dev/dsk/c1d0s6 835G 656G 170G 80% /disk1
/view 79G 17G 61G 22% /view
  • 1. Re: Replace hard drive
    bigdelboy Pro
    Currently Being Moderated
    I am reluctant to give absolute advice on this. And I may not have spotted something, or misread something.

    You appear to be running two unmirrored disks of about 1TB capacity

    It does seem your system disk appears to be unaffected (so you shouldn'y need a bare metal recovery) ... which may ease your problems somewhat.

    I think you will need fmthard instead of format for these disks.

    One approach is to:
    - Stop applications.
    - Take legato backup (BX)
    - Get VTOC of failed disk
    - Set /disk2 filesystem not to mount
    - Possibly disable applications from starting.
    - Power off machine
    - Replace disk 2
    - Restart machine
    - Format (label) new disk.
    - Create Filesystem on new disk.
    - Recover disk2 using legato.
    - start applications.

    ..........

    Can things go wrong with this plan ... possibly ..... it does have some weakspots .... (if not a deadly oversight somewhere.)

    ..........

    Looking at your disks sizes disk 2 appears somewhat smaller than I might expect ..... this may mean something is being missed.

    ..........

    Some applications may not survive well any time inconsistencies about this backup. Hopefully the backup at (BX) will have everything up to date ... excluding files that cannot be read.

    And it is possible corruption may have crept into the backup.

    ((( So you may have to consider restoring files on both disks to the same earlier point in time for example .... far more complex than here )))

    ..........

    Understanding your applications and its needs may be helpful.

    ..........

    If you have USB 2.0 ports backup to an external hard disk may also be possible as a second safety backup medium. However you may find some files fail to backup.

    ...........

    Single user mode in inself will not be sufficent for this recovery as you need to get networking started. However attempting recovery with the application running may not be good (application dependent).

    .... Feel free to take advice from others on this .... you post was looking lonely without a reply and I was passing .....

    Best wishes to your sysadmin .... if he/she would have run mirrored disks, and even better zfs, she/he may have had less stress!
  • 2. Re: Replace hard drive
    990261 Newbie
    Currently Being Moderated
    Thanks for the reply. That is useful information. I am not sure why he does not run anything in a RAID configuration. There are around 50 UNIX systems that he managed and my guess is that none of them have any type of RAID. Most of UNIX systems are old dinosaurs that are kept around for doing software builds. This system host some users home directories and is a IBM clearcase server. However, I still need to do some more research on the system to make sure I am not missing anything. One of my concerns is why the drive only shows up as 688GB. It has to be a 1TB hard drive. Would it be worth using rsync to make a copy of the data on another hard drive?

    Also, I was able to get the format command to work.
    # echo | format
    Searching for disks...done


    AVAILABLE DISK SELECTIONS:
    0. c1d0 <DEFAULT cyl 60797 alt 2 hd 255 sec 126>
    /pci@0,0/pci-ide@5/ide@0/cmdk@0,0
    1. c2d0 <DEFAULT cyl 45597 alt 2 hd 255 sec 126>
    /pci@0,0/pci-ide@5/ide@1/cmdk@0,0
    Specify disk (enter its number): Specify disk (enter its number):
  • 3. Re: Replace hard drive
    bigdelboy Pro
    Currently Being Moderated
    This system host some users home directories and is a IBM clearcase server.
    Could make the data on it important .... machines used for testing may be expendable and reconstructable .. loss of surse code respository could sometimes be serious.
    However, I still need to do some more research on the system to make sure I am not missing anything.
    Sensible
    One of my concerns is why the drive only shows up as 688GB. It has to be a 1TB hard drive.
    Possibly not. It shows up with about 3/4 of the number of cylinders of the first drive, so a 1TB + 750GB is possible (though unusual).

    [[ Unless the two disks have been hardware raid 0 on the internal raid controller and two volumes created .... I'm over 95% sure this isn't the case ... ]]
    Would it be worth using rsync to make a copy of the data on another hard drive?
    An additional backup of something you have confidence in might be useful.

    >
    Also, I was able to get the format command to work.
    # echo | format
    Searching for disks...done


    AVAILABLE DISK SELECTIONS:
    0. c1d0 <DEFAULT cyl 60797 alt 2 hd 255 sec 126>
    /pci@0,0/pci-ide@5/ide@0/cmdk@0,0
    1. c2d0 <DEFAULT cyl 45597 alt 2 hd 255 sec 126>
    /pci@0,0/pci-ide@5/ide@1/cmdk@0,0
    Specify disk (enter its number): Specify disk (enter its number):
    .......


    If you have a support contract with sun/oracle it's worth logging, though I suspect you may not.

    I'm still not confident I haven't missed anyhingthing.

    iostat -En (with a capital 'E' might show something mor about the disks).
  • 4. Re: Replace hard drive
    990261 Newbie
    Currently Being Moderated
    You were right about the hard drive being a 750GB. I used iostat -En with the capital E. Before I was just using it with a lower case e and it does not as much details. I guess I was just kind of baffled that it is a 750GB hard drive. I have only seen them in 2.5" drives that for laptops.

    Once I looked at the iostat -En I knew it was not running in a RAID. I am still waiting on my test server to get here. I am hoping to receive it tomorrow so I can do some test restores to it. You have been extremely helpful. I am going to talk to some of our developers that use the system to see if they have a better understanding of what it does. I forgot that our UNIX admin was also our clearcase specialist. I know he keeps support for clearcase but that is a larger issue that will have to be dealt with. I doubt he has support on the server since he has replaced the drives with larger ones. It came with two 250GB hard drives. I am probably going to log a case with EMC for Legato since he also have maintenance on it. I am probably going to have to go to some training since I will be taking over as the UNIX admin.

    # iostat -En
    c1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
    Model: ST31000524AS Revision: Serial No: 6VP Size: 1000.20GB <1000202305536 bytes>
    Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
    Illegal Request: 0
    c2d0 Soft Errors: 0 Hard Errors: 6102 Transport Errors: 0
    Model: ST3750330AS Revision: Serial No: 3QK Size: 750.15GB <750147600384 bytes>
    Media Error: 6102 Device Not Ready: 0 No Device: 0 Recoverable: 0
    Illegal Request: 0
  • 5. Re: Replace hard drive
    990261 Newbie
    Currently Being Moderated
    Here are the steps I actually used. Since the hard drive was just storing data on it and it was constantly getting backed up with EMC Networker, I decided to just make a copy of the drive to a USB drive since there were no drive slots available.

    I first had to get the hard drive partitioned correctly. I decided to use prtvtoc and fmthard to create the partition tables the same.

    # prtvtoc /dev/rdsk/c1t0d0s2 > /var/tmp/c1t0d0s2.vtoc
    I then modified the file to match the settings of my new drive before running the fmthard command
    # fmthard -s /var/tmp/c1t0d0s2.vtoc /dev/rdsk/c2t0d0s2

    I then started to use tar to move the files so the dates and permissions would stay the same but it was taking forever, so I decided cpio was the best option. The developers have millions of small files on the drive. I noticed that one of the developers must of had a script written wrong because one of the directories repeats itself 22 times with the same data in it. Still have to talk to him about that.

    This is what I used to move the files.
    # find ./file_or_dir -print -depth | cpio -pamVd /rmdisk/unnamed_rmdisk/s7/
    After the cpio finished I was ready to replace the drive but the full backup was running so I had to wait.
    I then ran this command to sync the failing drive to the new drive.
    # find . -xdev -depth -print | cpio -pamVd /rmdisk/unnamed_rmdisk/s7/apps/

    Once that was finished I shutdown the system and swapped the drives out. It hung during startup saying that I needed to run fsck on the new drive. I was pretty nervous about replacing the hard drive since it was my first on a UNIX system. I always have the backup to fall back to.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points