About three years ago, we had a disk drive fail so after the long delayed, repair, I reconfigured the data disk drives into one RAID-5 metadata drive:
$ cat /etc/lvm/md.cf
# metadevice configuration file
# do not hand edit
d45 -r c1t1d0s0 c1t2d0s0 c1t3d0s0 c1t4d0s0 c1t5d0s0 c1t9d0s0 c1t10d0s0 -k -i 256b
(256b -> 256 blocks or 128k bytes)
I went RAID-5 because I wanted the next disk failure to keep the application up until we could schedule a repair activity . . . not fail the application hard and make a crisis d'jour. But at the time, I was working fast and did not realize the '256b' was larger than the cache size of the existing disk drives, the smallest disk cache in the array is just over 105k bytes. Still the array built fine and works in our application. It is 'fast enough' until I try to load a backup . . . then it performs like a dog. It is the Oracle database restore that takes too long.
I suspect my large stripe/interlace size has cut the number of hardware cache buffers in half, from 64 to 32 and this would explain the poor restore performance. I am looking at alternatives now that 1TB USB drives are in the petty cash range but I work on a government contract and money is tight, which leads to these questions:
- UFS blocksize is 8192 - is there a known, optimum RAID-5 "interlace" size that is an integer multiple of the UFS block size?
- Larger blocksizes means longer data transfer times to and from cache but compared to rotational delays, this is modest. Is there a credible, RAID-5 performance model for random read/writes to a RAID-5 array that uses rotational delay, data transfer speed, and seek delays for an optimum solution?
- Are there Solaris tools that might give us insights to disk-layer, SCSI commands being used? Something for SCSI like 'snoop' is for network traffic.
- The disk drives appear to have an option to use the cache for read-modify-write instead of physically hitting the same track over and over again. Are there Solaris tools that would allow us to use basic level SCSI commands to reconfigure the disk drives for more intense cache operation during the restore and return to default performance during normal operation?
My current thinking is to take the system down for a series of benchmarks using a semi-log scale "stripe/interlace" of "n" multiples of 8192, the UFS blocksize:
- 1*8192 -> the smallest tested only if n=2-5 suggests n=2 might be faster than any others
- 2*8192 -> second test, matches some examples in the documentation
- 5*8192 -> first test (I have some smaller backups to use that should run within a day)
- 10*8192 -> third test, determines if there is any reason to go larger
- 13*8192 -> is the largest that would fit in the smallest disk drive, cache
Based upon these benchmarks, I would test the range between the two fastest to see if there is a multiple of 8192 that shows a performance advantage. However, these tests take time, measured in days. I'm OK with taking the down time but wanted to ask the community if anyone has a better plan or insights to share.
This is not a 'hair on fire' problem as I will schedule the RAID-5 benchmarks for late May. But I thought I'd ask the community first.
ps. This forum software seems to be at war with our government e-mail system. I will check back but you can also send a note to firstname.lastname@example.org.