8 Replies Latest reply: Oct 14, 2013 4:32 PM by AnhQ RSS

    Solaris 11 ZFS NAS Hardware Advice Needed

    9eab956d-95d7-480c-89aa-cb2189aeb92e

      Hi All,

       

      I am seeking some advice on a new server build to be used in my home lab. The server will be designed to serve my home network using CIFS as well as NAS storage to run as an ESXi NFS store for virtual machines. I will use this server for study and have no problems spending the money (within reason) as long as the server performs quickly. I would like to be able to run 20 or more VM’s which will have little activity but respond quickly when installing software, cloning, snapshots, backing up etc. Hardware wise I am looking at:


      • Supermicro X9DRH-ITF Motherboard (primarily for 10GB)
      • Intel Xeon E5-2620
      • 32GB RAM

       

      The server will have space for about 18 drives so I am also looking for recommendations for appropriate RAID cards.

      Can someone please offer some advice on SSD configuration for cache, logs etc. to work with 9 X 4TB 7200’s drives and any indication as to what sort of performance I can expect. I understand mirroring is fastest but that simply becomes too cost prohibitive. I am yet to decide whether dedup is something I should consider, given how cheap disks are I don’t know if it is worthwhile.

      Thank you,


      Adam

        • 1. Re: Solaris 11 ZFS NAS Hardware Advice Needed
          Cindys-Oracle

          Hi Adam,

           

          I have a few comments:

           

          1. The quality of your hardware should always match or exceed the importance of your data.

           

          2. I would recommend a JBOD-mode array and letting ZFS do the redundancy. You say disks are cheap but mirroring is too expensive.

          I disagree. If you had to replace the data, what would that cost in terms of time and money?

           

          3. Mirrored ZFS configs perform best for small read/write workloads. RAIDZ configs perform best for large I/O workloads like streaming

          video.

           

          4. SSDs as log devices can help improve performance of a synchronous write workload like NFS. SSDs as cache devices can help improve performance of a mostly read workload.

           

          5. Always have good, recent backups. Regardless of data-center quality or consumer quality gear, stuff happens and everyone needs a backup solution.

           

          More ZFS best practices are here:

           

          Recommended Oracle Solaris ZFS Practices - Oracle Solaris 11.1 Administration: ZFS File Systems

           

          Thanks, Cindy

          • 2. Re: Solaris 11 ZFS NAS Hardware Advice Needed
            9eab956d-95d7-480c-89aa-cb2189aeb92e

            Thanks Cindy,

             

            1. I don't disagree in that the quality of the hardware should be proportionate to the importance of the data. I am considering RAIDZ as the NAS will be replicated to a second NAS as well as an off-site backup using cloud storage. With that in mind I feel that mirroring is an expensive proposition.

             

            2. Yes, JBOD will be the approach that I take. Any recommendations on RAID controllers that support JBOD well under ZFS, inbuilt cache that may be advantageous etc.?

             

            3. I will consider mirroring but would rather invest in more SSD's or RAM given that the data I want to be accessed quickly is only a small subset of my actual data (1TB at most of a 24TB array)

             

            4. I am happy to use SSD say in a mirror config for reads and or write. Happy for any feedback here.

             

            5. I will certainly be backing up.

             

            Thanks,

             

            Adam

            • 3. Re: Solaris 11 ZFS NAS Hardware Advice Needed
              Cindys-Oracle

              If performance is part of your criteria, then I don' think RAIDZ is a good choice (based on your workload description). Thanks, Cindy

              • 4. Re: Solaris 11 ZFS NAS Hardware Advice Needed
                AnhQ

                Adam,

                 

                For that much storage, I would bump up your RAM if possible (doubling your 32GB shouldn't be too much of a $ hit). Basically the more RAM, the more ARC (read cache) available. Also, strongly consider using SAS drives if possible.

                 

                I would go with the LSI 9207-8e (if external) or LSI 9211-8i (if internal) SAS controllers. Both are fantastic with ZFS. As Cindy mentioned, JBOD is preferred. If using a server chassis with built-in drive bays, you'll want to figure out if the bays are attached to a backplane (probably). You'll connect the backplane(s) to the LSI controller(s).

                 

                In addition to Cindy's link, this is a really good resource for learning more about ZFS and gaining some best practices: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide

                 

                In particular, pay close attention to the sections on using L2ARC and dedicated ZIL devices, as well as recommended vdev configurations for optimal performance under your particular workload. Cindy is right in that mirrors will get you better performance for random IO (typical ESXi workload), while raidz may get you better sequential throughput for large files (such as streaming video). If you do mirrors, go with 3-way; if you do raidz, go with raidz2. Both will survive the loss of up to two drives. See this fantastic blog post arguing for mirrors: http://constantin.glez.de/blog/2010/01/home-server-raid-greed-and-why-mirroring-still-best

                 

                Dedupe: don't do it. Too much of a resource hog and I find that for a home server with typical home stuff, you're not gonna gain much.

                 

                SSD's for L2ARC and ZIL: What you might want to do is create your zpool without dedicated L2ARC and ZIL to start, measure performance against your target, then go from there. One of the great things about ZFS is that you can add new devices on-the-fly, including your cache devices. If you decide you need read and write caches, start with a 240GB MLC SSD (there are a bunch to choose from in the PDF below). For ZIL, go with an SLC SSD if possible. They are better performing and more reliable over time, but a lot more spendy. However, since ZFS flushes writes to disk every 5 seconds, your ZIL device (for caching writes) needn't be larger than what your server can sustain in throughput for that 5 second period. Usually an 8GB device is just fine.

                 

                Lastly, consider taking a look at Nexenta. It is based on the last version of OpenSolaris, has support for numerous commodity components (see their HCL here: http://info.nexenta.com/rs/nexenta/images/nexenta_hardware_supported_list.pdf), and there is a community edition that is free for up to 18TB usable capacity.

                • 5. Re: Solaris 11 ZFS NAS Hardware Advice Needed
                  Svetoslav Gyurov

                  Hi Adam,

                   

                  I'm also in a process of building home lab server and I still looking on enterprise class (either HP ML or HP XW) or consumers class PC. Although I'm also looking for power and response time of the VMs I can't neglect the noise level and power consumption and I yhink I'll go for the PC, cost is another reason. Ideally I'll be looking for a motherboard to support core i7 and Intel Smart Response Technology to support SSD caching or another option would be LSI MegaRAID controller with support of read/write using SSD as well.

                   

                  Keep in mind that with ESXi 5.5 the memory limitation of 32GB has fallen so having support for more than that is always a plus.

                   

                  I still haven't decided the storage layout. I'll definitely will be using hardware RAID controller, but still have concerns whether to give VMware to manage all the space or other way around. A similar setup which I saw recently was to install ESXi on small hard drive (could be USB flash) along with a single VM with FreeNAS, then FreeNAS was responsible for managing 2x2TB drives. Finally ZFS volumes are build and shared to ESX using iSCSI. Thus you could have you home NAS and home ESX on one physical machine. The only disadvantage of that is that your VMs won't run automatically after reboot, you should have to rescan and initiate the iscsi and then power on the VMs. Now the question comes why to use RAID controller when there will be FreeNAS running - well I want it just to share the resources across the network and don't want to put any more load on the machine as it will serve as ESX server as well.

                   

                  Although FreeNAS is good option, I'll consider NexentaStor as well. So far I see that NexentaStore supports de-duplication and NDMP protocol over the FreeNAS.

                   

                  I guess once I got the hardware I'll spent a week or two to build different setups and see which one works best for me.

                   

                  Good luck and share your experience once you build your configuration.

                   

                  Regards,

                  Sve

                  • 6. Re: Solaris 11 ZFS NAS Hardware Advice Needed
                    9eab956d-95d7-480c-89aa-cb2189aeb92e

                    Hi AnhQ,


                    Thanks for the response, you’re advice has really helped me better understand my requirements and how I might tailor the build.

                    Having read more about mirroring I know have a better understanding of the performance benefits to be had and I believe the best solution for me is to have a mirrored vdev of SSD’s for my VM data store and a RAIDZ2 vdev for my general home network file shares which is where the vast majority of the data lies. I found this blog post https://calomel.org/zfs_raid_speed_capacity.html to be very useful.


                    Mirrored SSD VDev

                    At this time I plan on mirroring 2 500GB SSD to use for VM’s. I do not believe I should need for than 500GB or later 1TB worth of VM’s (thin provisioned) available on fast storage. Once they are fairly static I don’t mind moving them off to slower disk.


                    RAIDZ2 VDev

                    I have larger 24TB VDev for all my movies, music, files etc. where users will read files and my ESXi servers will read software and ISO’s etc. In reference to the link above 500MB/s read which is more than ample (ESXi will use 10GB direct link). My users will only be reading and writing to the VDev over 1GB Ethernet so I can settle for 200MB/s (worst case) and be happy with that. What I would like however is some read cache, perhaps write cache, again I would not think there I would need a lot of cache… a mixture of RAM and a 256B or 500GB SSD should be fine.

                     

                    I will read the links you provided to better understand the L2ARC and ZIL factors.


                    I just have a couple for questions:

                         1.   I believe Solaris 11 is free for personal use, if my needs are simple why not use that? (I will still receive security updates I believe)

                     

                         2.   Do you have any comments or feedback using the Highpoint Rocketraid 2760A 24 port RAID card VS the LSI 9211-8i?

                     

                         3.   Any advice on the SSD read and write caches given my now better understood requirements above?


                         4. Hardware wise I am considering saving some bucks and using a previous generation Supermicro X8 motherboard with 1 or 2 E5520 CPU's. I would assume this would be more than ample. Do you see any drawbacks here if is were to result in a saving of $600 - $700 which could be spent on cache.


                             a. One point I do not quite understand is how people achieve acceptable write performance when using mirroring especially in multiple 1+1 sets. If the write speed is limited to a single drives throughput I assume that even if a pool comprises of 10 of these mirrored sets the write speed is limited to the slowest drive which is a single drive.


                     

                    Thank you again,


                    Adam

                    • 7. Re: Solaris 11 ZFS NAS Hardware Advice Needed
                      9eab956d-95d7-480c-89aa-cb2189aeb92e

                      Hey Sve,

                       

                      I responded to AnhQ which may also be helpful, there is a good link there on configuration VS speed.


                      Personally I want a dedicated storage box which is as simple as possible. Essentially this is where all my data is and therefore I want something simplistic. I find that if you try layer too many technologies if/when things fall apart recovery becomes all that much harder if not impossible. While I haven’t had any problems with VMFS perse in the event of a hardware failure I would like to be be able to pull out my disks, slot them into a new server and get to my data ASAP. ZFS, MDADM offer that quiet easily, I cannot speak for VMFS.


                      I have always run ESXi on a USB key or IDE/SATA DOM and that worked well for me. In the smaller cases that allowed for an extra bay for storage.

                      I favour a full OS rather than FreeNAS. Personally I like the freedom it offers where support for other features is far greater. I don’t mind putting in that extra time to set things up.

                      One more option I had planned was to run some core VM’s on my NAS using VMWare Workstation or Player. You can leverage the benefits of ESXi but for some core system that may be fine. That way the lab can go up and down as it pleases without disrupting “core” VM’s.

                       

                      On the iSCSI front I chose to go to NFS. Primary reason being that its one less layer of abstraction. For me the only real loss is not being able to use VAAI. I chose to run 10GB NIC’s so I offset that extra overhead with bandwidth. I personally have never had an issue but see some people get burned with the iSCSI connector dieing and then being unable to get to their data. Most likely never be a problem for me but why take on the risk of I don’t have to.

                       

                      Cheers,

                       

                      Adam

                      • 8. Re: Solaris 11 ZFS NAS Hardware Advice Needed
                        AnhQ

                        Hey Adam - responses to your additional questions below...

                         

                        1.   I believe Solaris 11 is free for personal use, if my needs are simple why not use that? (I will still receive security updates I believe)

                         

                        Two reasons to consider NexentaStor: 1) It's set up as an "appliance", so the OS is streamlined for NAS purposes and anything that is unnecessary is taken out, 2) Nexenta provides a web GUI to work with if you'd rather not manage everything via CLI. The downside is if you go over 18TB usable capacity, it's no longer free.

                         

                        2.   Do you have any comments or feedback using the Highpoint Rocketraid 2760A 24 port RAID card VS the LSI 9211-8i?

                         

                        I don't have any experience w/ the RAID card you mentioned, but keep in mind that ZFS prefers to handle RAID on its own, with direct access to your disks (i.e., JBOD) as opposed to through a hardware RAID controller. If your disks are masked behind a hardware RAID controller, ZFS may not be able to report and fix issues (half the advantage of ZFS, I think), and I would assume you wouldn't be able to attach/detach drives on the fly, without having to interface with the RAID controller first. In addition, you definitely want to use something that is either on the Oracle or Nexenta HCLs. LSI also makes the Oracle-branded HBAs if that sways your decision at all. If you need more than 8 ports, they have a 16 (9201-16i) and of course you can get multiple HBAs to cover the # of ports you need.

                         

                        3.   Any advice on the SSD read and write caches given my now better understood requirements above?


                        It sounds like you are intending to create a storage pool for your VM datastore, and a separate one for your other miscellaneous data. If you don't think the VM footprint will be that large, your plan to go with a mirrored SSD pool seems good.  As far as cache - since SSDs are still fairly expensive, I think start out w/out dedicated cache devices, test your workload, then decide whether or not you need additional read/write cache.

                         

                        4. Hardware wise I am considering saving some bucks and using a previous generation Supermicro X8 motherboard with 1 or 2 E5520 CPU's. I would assume this would be more than ample. Do you see any drawbacks here if is were to result in a saving of $600 - $700 which could be spent on cache.


                        That generation of Xeons should be fine, but also keep in mind ZFS likes clock speed over cores. If it was between a 6-core and a 4-core and the 4-core was clocked higher, I'd go with the 4. Other than power savings, I'm not sure there is that much advantage going with the newer Xeons, but I could be wrong. Also remember that ZFS loves a lot of RAM, so get the most you can afford. I've got a dual 2.4 Ghz 5530, 96 GB RAM machine in production (also Supermicro X8) with a 20-vdev config (18 x 6-disk raidz2, 2 x 500GB L2ARC), serving about 50-60 clients as a media server + VMware datastore, and the CPUs rarely ever break 15% load. I think max I've ever seen is about 22% or so. IOPS and overall throughput is definitely acceptable, with most sequential writes over NFS being able to hit about 90MB/s over Gigabit with sync off. There are about 20 VMs running from it (web+mysql servers, LDAP, DNS/DHCP, OS X NetBoot, FTP, and some random developer VMs).


                        a. One point I do not quite understand is how people achieve acceptable write performance when using mirroring especially in multiple 1+1 sets. If the write speed is limited to a single drives throughput I assume that even if a pool comprises of 10 of these mirrored sets the write speed is limited to the slowest drive which is a single drive.


                        There's performance in terms of random IO (typical VM workloads, general purpose file servers, databases, render farms -- lots of small transactions), and then there is performance in terms of sequential throughput (streaming video, large file copies, backups, etc. -- smaller # of transactions but larger files). When people say you'll get better overall performance under load (many clients) with lots of mirrored vdevs, they typically mean with random IO, where the filer needs to transact many small operations vs. fewer large (sequential) writes.

                         

                        Understanding I/O: Random vs Sequential | flashdba

                         

                        One way to improve your write performance with either type of workload is to add a dedicated, fast, ZIL device (write cache). With a dedicated SSD ZIL device, your writes are written to the SSD first, then flushed to spinning disk (every 5 seconds by default).