I'm running a small SAM-QFS environment and have some strange performance issue on the disk storage part, which somebody here might be able to explain.
Configuration: one 3510, dual controller, RAID-5 9+1, one hot spare and one disk not configured for whatever reason. The R5 logical drive hosts a 150GB LUN for SAM-QFS metadata (mm in SAM-FS speak) and a 1TB LUN for data (mr in SAM-FS speak). Further, there are two small LUNs (2GB, 100GB) for some other purpose. Those two LUNs have nearly no I/O. All disks are SUN146G. Host connection is 2GBit, multipathing enabled and working.
Then the disk cache became too small, and the customer added a 3511 expansion unit with SUN300G disks. One logical drive is a RAID-1, 1+1, used for NetBackup catalog. The other is a RAID-5, 8+1, providing two LUNs: 260GB SAM-FS metadata (mm) and 1.999TB SAM-FS data (mr).
For SAM-FS, the LUNs form two file systems: one "residing" in the 3510, the other "residing" in the 3511 expansion. Cabling is according to the manual and checked several times by several independant people. Operating system is Solaris 10, hardware is a V880.
The problem we observe: SAM-FS I/O on LUNs on disks inside the 3510 is fine. With iostat, I see 100MB/s read and 50MB/s write at the same time. On the SAM-FS file system which is running on the two LUNs in the 3511, the limit seems to be at 40MB/s read/write. Both SAM-FS file systems are configured the same in regards of block size.
In case I have activity on both SAM-FS file systems, I see 100MB/s+ on the LUN running inside the controller shelf and another 40MB/s on the disk runnin in the 3511 expansion chassis. So, the controller is easily capable of handling 150MB/s.
Cache settings in the 3510 controller are default I think (wasn't installed by me), batteries are fine.
Is this 40MB/s we experience a limitation by the expansion shelf? Don't think so. Anybody has any ideas on this? What parameters to check or to change? Any hint appreciated. I can also provide further details if needed. Thank you.
Depending on how many files are in the SAMFS file system, sharing the mm and mr devices on the same RAID array can be a pretty horrible idea. In my opinion and experience, it's almost always better to NEVER put more than one LUN on a RAID array. Period. Putting more than one LUN on an array results in IO contention on that array. And large, unnaturally configured (9+1? Why?) RAID arrays will have problems from the start.
What are the block sizes used on the RAID arrays? It wouldn't surprise me to see that the RAID array on the expansion tray has a very large block size. Larger block sizes are, in general, not better. Especially for SAMFS metadata - which IIRC is something like 8k or 16k blocks.
I suspect what is happening is most of the metadata updates are going to the mm device on the new array, contending with the IO operations on the file data.
How much space is left on each mm device? What does "iostat -sndxz 2" show when you're having the IO problems?