1 Reply Latest reply: Jan 24, 2013 3:38 AM by 927019 RSS

    Need Help with faulty disk on Sun Enterprise 450 Server

    927019
      Hi,

      Recently i came across a disk that seems to be faulty and need help. I have gathered some information by running below commands and any help on how to solve this will be great.




      Code:

      # uname –a

      SunOS XYZ 5.7 Generic_106541-16 sun4u sparc SUNW,Ultra-4



      Code:

      #df -k

      Filesystem kbytes used avail capacity Mounted on
      /proc 0 0 0 0% /proc
      /dev/md/dsk/d0 2056211 791794 1202731 40% /
      fd 0 0 0 0% /dev/fd
      /dev/md/dsk/d6 482455 129619 304591 30% /var
      /dev/md/dsk/d9 17404618 12474839 4755733 73% /oracle
      /dev/md/dsk/d12 15281351 4116289 11012249 28% /archive
      /dev/md/dsk/d15 52211532 16096689 35592728 32% /db01
      /dev/md/dsk/d18 52211532 17333076 34356341 34% /backup
      /dev/md/dsk/d21 2114063 87178 1963464 5% /home
      swap 2325656 1080 2324576 1% /tmp



      Code:

      # ./metadb

      flags first blk block count
      Wm p l 16 1034 /dev/dsk/c0t0d0s7
      a p luo 16 1034 /dev/dsk/c2t0d0s7
      a p luo 16 1034 /dev/dsk/c5t0d0s7
      a p luo 16 1034 /dev/dsk/c4t0d0s7
      a p luo 16 1034 /dev/dsk/c2t1d0s7
      a p luo 16 1034 /dev/dsk/c3t1d0s7
      a p luo 16 1034 /dev/dsk/c4t1d0s7
      a p luo 16 1034 /dev/dsk/c5t1d0s7



      Code:

      # ./metastat -p

      d0 -m d1 d2 1
      d1 1 1 c0t0d0s0
      d2 1 1 c2t3d0s0
      d3 -m d4 d5 1
      d4 1 1 c0t0d0s1
      d5 1 1 c2t3d0s1
      d6 -m d7 d8 1
      d7 1 1 c0t0d0s3
      d8 1 1 c2t3d0s3
      d9 -m d101 d11 1
      d101 1 1 c5t3d0s0
      d11 1 1 c4t3d0s0
      d12 -m d13 1
      d13 1 1 c0t2d0s0
      d15 -m d16 d17 1
      d16 1 3 c2t0d0s0 c3t0d0s0 c2t1d0s0 -i 32b
      d17 1 3 c4t2d0s0 c5t2d0s0 c5t1d0s0 -i 32b
      d18 -m d19 d20 1
      d19 1 3 c4t0d0s0 c5t0d0s0 c0t1d0s1 -i 32b
      d20 1 3 c2t2d0s0 c3t2d0s0 c3t1d0s0 -i 32b
      d21 -m d22 d23 1
      d22 1 1 c0t2d0s1
      d23 1 1 c0t0d0s4


      After checking metastat output individually I am able to see the maintenance mode for all sub mirrors which belong to c0t0d0s7 --> d1, d4, d7 and d23


      Code:

      d1: Submirror of d0
      State: Needs maintenance
      Invoke: metareplace d0 c0t0d0s0 <new device>
      Size: 4198392 blocks
      Stripe 0:
      Device Start Block Dbase State Hot Spare
      c0t0d0s0 0 No Maintenance


      From below messages and format command which shows error with c0t0d0

      Code:

      # format

      Searching for disks...done
      AVAILABLE DISK SELECTIONS:
      *0. c0t0d0 <drive type unknown*>
      /pci@1f,4000/scsi@3/sd@0,0
      1. c0t1d0 <SUN36G cyl 24620 alt 2 hd 27 sec 107>
      /pci@1f,4000/scsi@3/sd@1,0
      2. c0t2d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> arc disk
      /pci@1f,4000/scsi@3/sd@2,0
      3. c2t0d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> db01_1
      /pci@4,2000/scsi@1/sd@0,0
      4. c2t1d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> db01_3
      /pci@4,2000/scsi@1/sd@1,0
      5. c2t2d0 <SUN36G cyl 24620 alt 2 hd 27 sec 107>
      /pci@4,2000/scsi@1/sd@2,0
      6. c2t3d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> OS mir
      /pci@4,2000/scsi@1/sd@3,0
      7. c3t0d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> db01_2
      /pci@4,2000/scsi@1,1/sd@0,0
      8. c3t1d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> back_3m
      /pci@4,2000/scsi@1,1/sd@1,0
      9. c3t2d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> back_2m
      /pci@4,2000/scsi@1,1/sd@2,0
      10. c4t0d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> back_1
      /pci@6,2000/scsi@1/sd@0,0
      11. c4t1d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> back_3
      /pci@6,2000/scsi@1/sd@1,0
      12. c4t2d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> db01_1m
      /pci@6,2000/scsi@1/sd@2,0
      13. c4t3d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> ora_mir
      /pci@6,2000/scsi@1/sd@3,0
      14. c5t0d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> back_2
      /pci@6,2000/scsi@1,1/sd@0,0
      15. c5t1d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> db01_3m
      /pci@6,2000/scsi@1,1/sd@1,0
      16. c5t2d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> db01_2m
      /pci@6,2000/scsi@1,1/sd@2,0
      17. c5t3d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> arch_mir
      /pci@6,2000/scsi@1,1/sd@3,0
      Specify disk (enter its number): 0
      AVAILABLE DRIVE TYPES:
      0. Auto configure
      1. Quantum ProDrive 80S
      2. Quantum ProDrive 105S
      3. CDC Wren IV 94171-344
      4. SUN0104
      5. SUN0207
      6. SUN0327
      7. SUN0340
      8. SUN0424
      9. SUN0535
      10. SUN0669
      11. SUN1.0G
      12. SUN1.05
      13. SUN1.3G
      14. SUN2.1G
      15. SUN2.9G
      16. SUN18G
      17. SUN18G
      18. SUN18G
      19. SUN36G
      20. other
      Specify disk type (enter its number):



      Code:

      # iostat -En

      c0t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 3
      Vendor: FUJITSU Product: MAJ3182M SUN18G Revision: 0804 Serial No: 02P19623
      Size: 18.11GB <18110967808 bytes>
      Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
      Illegal Request: 0 Predictive Failure Analysis: 0


      When check in messages i came across stuff that c0t0d0 is faulty.

      Code:

      # cat /var/adm/messages

      Mar 27 08:43:00 XYZ unix: WARNING: /pci@1f,4000/scsi@3/sd@0,0 (sd0):
      Mar 27 08:43:00 XYZ disk not responding to selection
      Mar 27 08:43:01 XYZ unix: WARNING: /pci@1f,4000/scsi@3/sd@0,0 (sd0):
      Mar 27 08:43:01 XYZ disk not responding to selection



      Code:

      # cat /var/adm/messages.0

      Mar 20 21:10:25 XYZ unix: WARNING: /pci@1f,4000/scsi@3/sd@0,0 (sd0):
      Mar 20 21:10:25 XYZ SCSI transport failed: reason 'incomplete': retrying command
      Mar 20 21:11:30 XYZ unix: /pci@1f,4000/scsi@3 (glm0):
      Mar 20 21:11:30 XYZ Cmd (0x2aca648) dump for Target 0 Lun 0:
      Mar 20 21:11:30 XYZ unix: /pci@1f,4000/scsi@3 (glm0):
      Mar 20 21:11:30 XYZ cdb=[ 0x2a 0x0 0x2 0x1b 0x76 0x54 0x0 0x0 0x1 0x0 ]
      Mar 20 21:11:30 XYZ unix: /pci@1f,4000/scsi@3 (glm0):
      Mar 20 21:11:30 XYZ pkt_flags=0x4000 pkt_statistics=0x61 pkt_state=0x7
      Mar 20 21:11:30 XYZ unix: /pci@1f,4000/scsi@3 (glm0):
      Mar 20 21:11:30 XYZ pkt_scbp=0x0 cmd_flags=0x18e1
      Mar 20 21:11:30 XYZ unix: WARNING: /pci@1f,4000/scsi@3 (glm0):
      Mar 20 21:11:30 XYZ Disconnected tagged cmd(s) (1) timeout for Target 0.0
      Mar 20 21:11:30 XYZ unix: WARNING: ID[SUNWpd.glm.cmd_timeout.6018]
      Mar 20 21:11:30 XYZ unix: WARNING: /pci@1f,4000/scsi@3/sd@0,0 (sd0):
      Mar 20 21:11:30 XYZ SCSI transport failed: reason 'reset': retrying command
      Mar 20 21:11:30 XYZ unix: WARNING: /pci@1f,4000/scsi@3/sd@0,0 (sd0):
      Mar 20 21:11:30 XYZ SCSI transport failed: reason 'timeout': retrying command
      Mar 20 21:11:34 XYZ unix: WARNING: /pci@1f,4000/scsi@3/sd@0,0 (sd0):
      Mar 20 21:11:34 XYZ SCSI transport failed: reason 'incomplete': retrying command
      Mar 20 21:11:38 XYZ unix: WARNING: /pci@1f,4000/scsi@3/sd@0,0 (sd0):
      Mar 20 21:11:38 XYZ disk not responding to selection
      Mar 20 21:11:40 XYZ unix: WARNING: /pci@1f,4000/scsi@3/sd@0,0 (sd0):
      Mar 20 21:11:40 XYZ disk not responding to selection
      Mar 20 21:11:40 XYZ unix: WARNING: md: d1: write error on /dev/dsk/c0t0d0s0
      Mar 20 21:11:40 XYZ unix: WARNING: md: d1: /dev/dsk/c0t0d0s0 needs maintenance
      Mar 20 21:11:40 XYZ unix: WARNING: md: d4: read error on /dev/dsk/c0t0d0s1
      Mar 20 21:11:42 XYZ unix: WARNING: md: d7: write error on /dev/dsk/c0t0d0s3
      Mar 20 21:11:42 XYZ unix: WARNING: md: d4: /dev/dsk/c0t0d0s1 needs maintenance
      Mar 20 21:11:42 XYZ unix: WARNING: md: d7: /dev/dsk/c0t0d0s3 needs maintenance
      Mar 21 01:36:42 XYZ unix: WARNING: /pci@1f,4000/scsi@3/sd@0,0 (sd0):
      Mar 21 01:36:42 XYZ disk not responding to selection
      Mar 21 01:36:42 XYZ unix: WARNING: md: d23: read error on /dev/dsk/c0t0d0s4
      Mar 21 01:36:42 XYZ unix: WARNING: md: d23: /dev/dsk/c0t0d0s4 needs maintenance
      Mar 21 08:00:02 XYZ unix: WARNING: /pci@1f,4000/scsi@3/sd@0,0 (sd0):
      Mar 21 08:00:02 XYZ disk not responding to selection




      Code:

      # ./prtdiag

      System Configuration: Sun Microsystems sun4u Sun Enterprise 450 (2 X UltraSPARC-II 400MHz)
      System clock frequency: 100 MHz
      Memory size: 1024 Megabytes
      ========================= CPUs =========================
      Run Ecache CPU CPU
      Brd CPU Module MHz MB Impl. Mask
      --- --- ------- ----- ------ ------ ----
      SYS 1 1 400 4.0 US-II 10.0
      SYS 3 3 400 4.0 US-II 10.0
      ========================= Memory =========================
      Interlv. Socket Size
      Bank Group Name (MB) Status
      ---- ----- ------ ---- ------
      0 none 1901 256 OK
      0 none 1902 256 OK
      0 none 1903 256 OK
      0 none 1904 256 OK
      ========================= IO Cards =========================
      Bus Freq
      Brd Type MHz Slot Name Model
      --- ---- ---- ---- -------------------------------- ----------------------
      SYS PCI 33 4 pciclass,001000 Symbios,53C875
      SYS PCI 33 6 pciclass,001000 Symbios,53C875

      No failures found in System




      When I tried see the label for faulty sub mirror it gives as unable to read disk geometry but I was able to read other half of sub mirror as below.

      Code:

      # prtvtoc /dev/dsk/c0t0d0s0

      prtvtoc: /dev/rdsk/c0t0d0s0: Unable to read Disk geometry

      # prtvtoc /dev/dsk/c2t3d0s0

      * /dev/dsk/c2t3d0s0 (volume "OS mir") partition map
      *
      * Dimensions:
      * 512 bytes/sector
      * 248 sectors/track
      * 19 tracks/cylinder
      * 4712 sectors/cylinder
      * 7508 cylinders
      * 7506 accessible cylinders
      *
      * Flags:
      * 1: unmountable
      * 10: read-only
      *
      * Unallocated space:
      * First Sector Last
      * Sector Count Sector
      * 9424000 25925424 35349423
      * 35363560 4712 35368271
      *
      * First Sector Last
      * Partition Tag Flags Sector Count Sector Mount Directory
      0 2 00 0 4198392 4198391
      1 3 01 4198392 4198392 8396783
      2 5 01 0 35368272 35368271
      3 7 00 8396784 1027216 9423999
      7 0 00 35349424 14136 35363559


      Usually i use to do this to solve it but not done for master anytime.
      1. copy the labels of d2,d5, d8 and d22 and detach the faulty onces from mirror.
      2. remove the old meta device and un configure the disk using cfgadm
      3. now replace the disk and write the copied label to new disk and recreate the metadevices and attach them to the mirror.

      But since its a c0t0d0 primary disk which contains partitions like root,boot etc i am not sure if i can do hot swapping as this is a production server cant really play much on it and also wasnt sure how to unconfigure and reconfigure it as when i type cfgadm as root user it says cfgadm: Configuration administration not supported.

      After so much time i found this command drvconfig and after running it i got disk recognized as below which was given as unknown type in above format command. i am even able to to see the output of prtvtoc /dev/rdsk/c0t0d0s0 now before it was showing geometry error and there is even a change in messages log file, but the write error still stays any help would be great.

      # Cat /var/adm/messages

      Mar 28 11:09:14 XYZ unix: WARNING: /pci@1f,4000/scsi@3/sd@0,0 (sd0):
      Mar 28 11:09:14 XYZ disk not responding to selection
      Mar 28 11:28:38 XYZ unix: NOTICE: ses: 64-bit driver module not found
      Mar 28 11:28:39 XYZ unix: NOTICE: bwtwo: 64-bit driver module not found
      Mar 28 11:28:39 XYZ unix: NOTICE: audio: 64-bit driver module not found
      Mar 28 11:28:39 XYZ unix: NOTICE: cgthree: 64-bit driver module not found
      Mar 28 11:29:07 XYZ unix: ecpp0 at ebus0: offset 14,3043bc
      Mar 28 11:29:07 XYZ unix: ecpp0 is /pci@1f,4000/ebus@1/ecpp@14,3043bc
      Mar 28 11:29:07 XYZ unix: NOTICE: xbox: 64-bit driver module not found
      Mar 28 11:29:07 XYZ unix: pseudo-device: winlock0
      Mar 28 11:29:07 XYZ unix: winlock0 is /pseudo/winlock@0
      Mar 28 11:29:07 XYZ unix: pseudo-device: lockstat0
      Mar 28 11:29:07 XYZ unix: lockstat0 is /pseudo/lockstat@0
      Mar 28 11:29:07 XYZ unix: NOTICE: xbox: 64-bit driver module not found
      Mar 28 11:29:08 XYZ unix: pseudo-device: llc10
      Mar 28 11:29:08 XYZ unix: llc10 is /pseudo/llc1@0
      Mar 28 11:29:08 XYZ unix: NOTICE: socal: 64-bit driver module not found
      Mar 28 11:29:08 XYZ unix: NOTICE: sf: 64-bit driver module not found
      Mar 28 11:29:08 XYZ unix: NOTICE: soc: 64-bit driver module not found
      Mar 28 11:29:08 XYZ unix: NOTICE: pln: 64-bit driver module not found
      Mar 28 11:29:08 XYZ unix: NOTICE: ssd: 64-bit driver module not found
      Mar 28 11:29:08 XYZ unix: pseudo-device: pm0
      Mar 28 11:29:08 XYZ unix: pm0 is /pseudo/pm@0
      Mar 28 11:29:08 XYZ unix: pseudo-device: tod0
      Mar 28 11:29:08 XYZ unix: tod0 is /pseudo/tod@0
      Mar 28 11:29:08 XYZ unix: pseudo-device: fcp0
      Mar 28 11:29:08 XYZ unix: fcp0 is /pseudo/fcp@0
      Mar 28 12:20:18 XYZ unix: NOTICE: ses: 64-bit driver module not found
      Mar 28 12:20:18 XYZ unix: NOTICE: bwtwo: 64-bit driver module not found
      Mar 28 12:20:18 XYZ unix: NOTICE: audio: 64-bit driver module not found
      Mar 28 12:20:18 XYZ unix: NOTICE: cgthree: 64-bit driver module not found
      Mar 28 12:20:45 XYZ unix: NOTICE: xbox: 64-bit driver module not found
      Mar 28 12:20:45 XYZ last message repeated 1 time
      Mar 28 12:20:45 XYZ unix: NOTICE: socal: 64-bit driver module not found
      Mar 28 12:20:45 XYZ unix: NOTICE: sf: 64-bit driver module not found
      Mar 28 12:20:45 XYZ unix: NOTICE: soc: 64-bit driver module not found
      Mar 28 12:20:45 XYZ unix: NOTICE: pln: 64-bit driver module not found
      Mar 28 12:20:45 XYZ unix: NOTICE: ssd: 64-bit driver module not found


      # format
      Searching for disks...done


      AVAILABLE DISK SELECTIONS:
      0. c0t0d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> OS disk
      /pci@1f,4000/scsi@3/sd@0,0

      Thanks in advance

      Edited by: 924016 on Mar 28, 2012 7:49 AM