This content has been marked as final. Show 3 replies
I know this is an old thread, but there is precious little information out there, so maybe this will help the next guy. After studying mode page settings, it seems that ARRE and AWRE set to one have reduced (but not eliminated) the timeouts. I have played with a LOT of settings. After globally setting the values as described below, the timeouts have been reduced. Prior to setting the values, the drives were a mixed bag of values.
bash-4.1$ for d in /dev/rdsk/c0t2*d0; do sudo sdparm -S -s AWRE=1 $d; done
bash-4.1$ for d in /dev/rdsk/c0t2*d0; do sudo sdparm -S -s ARRE=1 $d; done
If anyone else knows of good ways to reduce these timeouts (on drives that otherwise appear OK), please speak up.
I found a pattern with this. The more disks in the vdev, the higher the chances of getting this timeout. Previously, a pool with 6x vdevs each RADIZ1 with 4 disks had no timeouts. It was recently changed to 3x vdevs RADZ2 each with 8 disks and has some sporadic timeouts. Another pool with the same model of disks is configured as a single vdev RAIDZ3 with 20 disks and times out regularly under load.
It is either the spindles / vdev ratio or the RAIDZ level, or some combination.