I am attempting to build a NAS with Solaris 11.1 utilizing an LSI SAS9207-8i 6Gb HBA. It is correctly using the mtp_sas driver.
The problem I am having is after building the server, creating ZFS pools, and mirroring the boot drive I perform a scrub. At this point the rpool is the only pool with data. The scrub always comes back with some sort of error (read, write, cksum, or it fails one of the drives completely).
Scenario (after fresh install, boot drive is mirrored, second pool created using 22 additional disks)
1. Scrub rpool & Repair errors that are found.
2. Subsequent scrubs are clean.
3. Upload a 400MB ISO.
4. Run scrub
5. Errors are found and repaired
6. Subsequent scrubs are clean.
7. Copy same ISO to second pool
8. Run scrub
9. Errors are found and repaired
10. Subsequent scrubs are clean.
11. Upload a new file 450MB ISO.
12. Run scrub
13. Errors are found and repaired
14. Subsequent scrubs are clean.
15. Copy same ISO to second pool
16. Run scrub
17. Errors are found and repaired
If the file uploaded is larger, say 1.5GB ISO, generally one of the disks in rpool will fail.
I have swapped out the motherboard, memory, HBA, HBA cable, disks, ran memtest, reinstalled about ten times, tried different files to upload (doc, iso, pic) always with the same result. Either ZFS is a bad filesystem (which I highly doubt), I have bad hardware (not likely at this point), or this is software (probably driver) related. I have broached the issue with LSI and there response was "The issue is possible to be a driver problem. We do not have driver support for Solaris 11.1 and this could be an issue in performance with use of an LSI SAS HBA."
The card is listed in the HCL and I have also verified the driver is correct as the card uses the SAS2308 chip. Any new data (files, etc) has errors after being uploaded. I have tried using an older 3Gb card sas3081e-r, but the installer will not boot and only shows consistent errors communicating with disks. At this point I am out of ideas. Any help would be appreciated.
This issue is starting to sound familiar to me so gathering a bit more info would be helpful to isolate this problem.
Does the fmdump -eV include text like this:
driver-assessment = fail
un-decode-info = invalid-sense-data
The most recent round of file copies and scrubs left the system unbootable. I am performing a re-install of the OS now. Should have an answer within a couple hours.
When should I run the command? Right after install or should I create the failures again (ZFS and/or Disk) then run the command?
I think there are a combination of issues. LSI says this driver is not supported in S11.1 and some change in S11.1 is
causing very bad stability problems for this config. Please send me the output of this command:
# prtconf -vD > /tmp/prtconf.out
Send it to me directly:
I want to add this info to the S11.1 bug, which I think is 15819341.
Did you ever test this config on Solaris 11? You might consider going back to S11 to test this config or consider switching HBAs.
I was able to install Solaris 10 U11. After recreating all pools I did a variety of "stress tests" and was able to upload large amounts of data with no errors found during scrubs. This was definitely my problem. Thanks Cindys
Good to hear this is working out for you.
I may be paranoid but I seen enough sad stories with problematic hardware or bad configs, that you need to keep monitoring this config routinely to know this one is a keeper. Use zpool status, iostat -En and fm* commands routinely as described here:
Resolving General Hardware Problems
ZFS best practices are here:
Thanks. Unfortunately I'm not out of the woods yet. As I started using Sol 10 there were no ZFS errors but later I found large amounts of transport errors with iostat. I was eventually able to generate enough volume with a VM where I started seeing hardware errors and ZFS errors.