11 Replies Latest reply: Mar 13, 2013 11:41 AM by 994707 RSS

    Issues with HBA and ZFS

    994707
      I am attempting to build a NAS with Solaris 11.1 utilizing an LSI SAS9207-8i 6Gb HBA. It is correctly using the mtp_sas driver.

      The problem I am having is after building the server, creating ZFS pools, and mirroring the boot drive I perform a scrub. At this point the rpool is the only pool with data. The scrub always comes back with some sort of error (read, write, cksum, or it fails one of the drives completely).

      Scenario (after fresh install, boot drive is mirrored, second pool created using 22 additional disks)
      1. Scrub rpool & Repair errors that are found.
      2. Subsequent scrubs are clean.
      3. Upload a 400MB ISO.
      4. Run scrub
      5. Errors are found and repaired
      6. Subsequent scrubs are clean.
      7. Copy same ISO to second pool
      8. Run scrub
      9. Errors are found and repaired
      10. Subsequent scrubs are clean.
      11. Upload a new file 450MB ISO.
      12. Run scrub
      13. Errors are found and repaired
      14. Subsequent scrubs are clean.
      15. Copy same ISO to second pool
      16. Run scrub
      17. Errors are found and repaired

      If the file uploaded is larger, say 1.5GB ISO, generally one of the disks in rpool will fail.

      I have swapped out the motherboard, memory, HBA, HBA cable, disks, ran memtest, reinstalled about ten times, tried different files to upload (doc, iso, pic) always with the same result. Either ZFS is a bad filesystem (which I highly doubt), I have bad hardware (not likely at this point), or this is software (probably driver) related. I have broached the issue with LSI and there response was "The issue is possible to be a driver problem. We do not have driver support for Solaris 11.1 and this could be an issue in performance with use of an LSI SAS HBA."

      The card is listed in the HCL and I have also verified the driver is correct as the card uses the SAS2308 chip. Any new data (files, etc) has errors after being uploaded. I have tried using an older 3Gb card sas3081e-r, but the installer will not boot and only shows consistent errors communicating with disks. At this point I am out of ideas. Any help would be appreciated.
        • 1. Re: Issues with HBA and ZFS
          Cindys-Oracle
          Hello again,

          This issue is starting to sound familiar to me so gathering a bit more info would be helpful to isolate this problem.

          Does the fmdump -eV include text like this:

          driver-assessment = fail
          un-decode-info = invalid-sense-data

          Thanks, Cindy
          • 2. Re: Issues with HBA and ZFS
            994707
            The most recent round of file copies and scrubs left the system unbootable. I am performing a re-install of the OS now. Should have an answer within a couple hours.

            When should I run the command? Right after install or should I create the failures again (ZFS and/or Disk) then run the command?
            • 3. Re: Issues with HBA and ZFS
              994707
              Cindy, you were right on. After install I ran the command. There are tons of these so I only copied in a couple. What does this mean?

              root@solaris:~# zpool status
              pool: rpool
              state: ONLINE
              scan: none requested
              config:

              NAME STATE READ WRITE CKSUM
              rpool ONLINE 0 0 0
              c0t5000C500420CDED7d0s1 ONLINE 0 0 0

              errors: No known data errors

              Mar 04 2013 16:00:41.538745414 ereport.io.scsi.cmd.disk.tran
              nvlist version: 0
                   class = ereport.io.scsi.cmd.disk.tran
                   ena = 0xccda04969f00425
                   detector = (embedded nvlist)
                   nvlist version: 0
                        version = 0x0
                        scheme = dev
                        cna_dev = 0x5135168e00000041
                        device-path = /pci@0,0/pci8086,340e@7/pci1000,3020@0/iport@f0/disk@w5000c500420cded5,0
                   (end detector)

                   devid = id1,sd@n5000c500420cded7
                   driver-assessment = fail
                   op-code = 0x2a
                   cdb = 0x2a 0x0 0x3 0x88 0xc 0x5c 0x0 0x7 0x43 0x0
                   pkt-reason = 0x4
                   pkt-state = 0x0
                   pkt-stats = 0x10
                   __ttl = 0x1
                   __tod = 0x51351989 0x201c9a46



              Mar 05 2013 08:45:42.284527395 ereport.io.scsi.cmd.disk.dev.uderr
              nvlist version: 0
                   class = ereport.io.scsi.cmd.disk.dev.uderr
                   ena = 0x16d21f1a8600001
                   detector = (embedded nvlist)
                   nvlist version: 0
                        version = 0x0
                        scheme = dev
                        cna_dev = 0x5136050200000018
                        device-path = /pci@0,0/pci8086,340e@7/pci1000,3020@0/iport@f0/disk@w5000c500420cded5,0
                        devid = id1,sd@n5000c500420cded7
                   (end detector)

                   devid = id1,sd@n5000c500420cded7
                   driver-assessment = retry
                   op-code = 0x2a
                   cdb = 0x2a 0x0 0x2c 0x9 0x9b 0x76 0x0 0x0 0x5 0x0
                   pkt-reason = 0x4
                   pkt-state = 0x0
                   pkt-stats = 0x10
                   stat-code = 0x0
                   un-decode-info = invalid-sense-data

                   __ttl = 0x1
                   __tod = 0x51360516 0x10f58b23
              • 4. Re: Issues with HBA and ZFS
                Cindys-Oracle
                I think there are a combination of issues. LSI says this driver is not supported in S11.1 and some change in S11.1 is
                causing very bad stability problems for this config. Please send me the output of this command:

                # prtconf -vD > /tmp/prtconf.out

                Send it to me directly:

                cindy.swearingen@oracle.com

                I want to add this info to the S11.1 bug, which I think is 15819341.

                Did you ever test this config on Solaris 11? You might consider going back to S11 to test this config or consider switching HBAs.


                Thanks, Cindy
                • 5. Re: Issues with HBA and ZFS
                  diamaunt
                  I've also had a couple of problems with my system, (11.1) and lsi sas cards (3080) with FW: 01.33.00.00; BIOS: 6.36.00.00 in Initiator-target mode.
                  • 6. Re: Issues with HBA and ZFS
                    994707
                    Info sent. How can I get version 11. The download page seems to only offer 11.1. I can get 10 easily enough.
                    • 7. Re: Issues with HBA and ZFS
                      Dave Miner
                      If you have support, previous releases can be downloaded via support.oracle.com, document is ID 1277964.1
                      • 8. Re: Issues with HBA and ZFS
                        994707
                        Unfortunately I don't have a support contract. I've installed 10_U11 will let you know how it goes.
                        • 9. Re: Issues with HBA and ZFS
                          994707
                          I was able to install Solaris 10 U11. After recreating all pools I did a variety of "stress tests" and was able to upload large amounts of data with no errors found during scrubs. This was definitely my problem. Thanks Cindys
                          • 10. Re: Issues with HBA and ZFS
                            Cindys-Oracle
                            Good to hear this is working out for you.

                            I may be paranoid but I seen enough sad stories with problematic hardware or bad configs, that you need to keep monitoring this config routinely to know this one is a keeper. Use zpool status, iostat -En and fm* commands routinely as described here:

                            http://docs.oracle.com/cd/E26505_01/html/E37384/gbbbb.html#scrolltoc
                            Resolving General Hardware Problems

                            ZFS best practices are here:

                            http://docs.oracle.com/cd/E26505_01/html/E37384/practice-1.html#scrolltoc

                            Thanks, Cindy
                            • 11. Re: Issues with HBA and ZFS
                              994707
                              Thanks. Unfortunately I'm not out of the woods yet. As I started using Sol 10 there were no ZFS errors but later I found large amounts of transport errors with iostat. I was eventually able to generate enough volume with a VM where I started seeing hardware errors and ZFS errors.