5 Replies Latest reply: May 10, 2012 9:47 PM by 934442 RSS

    zfs Performance issues and it seams that Zil is not being used at all.

    934442
      Hi Everyone,

      I have a 4tb zfs (raidz) with 3x2tb drives. I have an ssd dedicated to cache and logs. The cache seems to be utilised fine but the log is not. What could be causing this? I dedicated gigabytes to the log which I know now is a mistake.

      How could I diagnose why the zil is not being used?

      If I try and copy a large file to the server it will generally flake out after 100 to 200 mb.

      Cheers

      NAME PROPERTY VALUE SOURCE
      monster type filesystem -
      monster creation Tue Mar 8 10:04 2011 -
      monster used 7.51T -
      monster available 671G -
      monster referenced 34.6K -
      monster compressratio 1.00x -
      monster mounted yes -
      monster quota none default
      monster reservation none default
      monster recordsize 128K default
      monster mountpoint /monster default
      monster sharenfs off default
      monster checksum on default
      monster compression off local
      monster atime on default
      monster devices on default
      monster exec on default
      monster setuid on default
      monster readonly off default
      monster zoned off default
      monster snapdir hidden default
      monster aclmode discard default
      monster aclinherit restricted default
      monster canmount on default
      monster xattr on default
      monster copies 1 default
      monster version 5 -
      monster utf8only off -
      monster normalization none -
      monster casesensitivity sensitive -
      monster vscan off default
      monster nbmand off default
      monster sharesmb off local
      monster refquota none default
      monster refreservation none default
      monster primarycache all default
      monster secondarycache all default
      monster usedbysnapshots 0 -
      monster usedbydataset 34.6K -
      monster usedbychildren 7.51T -
      monster usedbyrefreservation 0 -
      monster logbias throughput local
      monster dedup on local
      monster mlslabel none -
      monster sync standard local
      monster encryption off -
      monster keysource none default
      monster keystatus none -
      monster rekeydate - default
      monster rstchown on default
      monster shadow none -


      james@solaris:~# zpool status monster
      pool: monster
      state: ONLINE
      scan: scrub canceled on Wed Feb 29 17:50:25 2012
      config:

      NAME STATE READ WRITE CKSUM
      monster ONLINE 0 0 0
      raidz1-0 ONLINE 0 0 0
      c8t0d0 ONLINE 0 0 0
      c8t2d0 ONLINE 0 0 0
      c8t3d0 ONLINE 0 0 0
      logs
      c8t1d0p2 ONLINE 0 0 0
      cache
      c8t1d0p3 ONLINE 0 0 0

      errors: No known data errors

      output of ./zilstat.ksh
      N-Bytes N-Bytes/s N-Max-Rate B-Bytes B-Bytes/s B-Max-Rate ops <=4kB 4-32kB >=32kB
      0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0
        • 1. Re: zfs Performance issues and it seams that Zil is not being used at all.
          Cindys-Oracle
          Hi--

          I have several comments:

          1. Your logbias property is set to throughput:

          monster logbias throughput local

          Trying setting it to latency. From the zfs.1m man page:

          If logbias is set to throughput, ZFS does not use the
          pool's separate log devices.

          2. How large are your synchronous writes? I think larger synchronous writes go to the main
          pool, even if you have separate log devices, but try logbias=latency.

          3. Using a device's fdisk partitions (the p*) devices is not recommended:

          logs
          c8t1d0p2 ONLINE 0 0 0
          cache
          c8t1d0p3 ONLINE 0 0 0

          I would remove these devices, create a large slice (s0) for the cache and another slice for log (slice 1),
          depending on your zilstat results, and re-add these devices.

          Thanks,

          Cindy
          • 2. Re: zfs Performance issues and it seams that Zil is not being used at all.
            934442
            Hi Cindy,

            Thanks for the help. I have changed to latency and repartitioned the drive with slices. All free hog on s0 and s3 is an 8gb slice.

            I am not sure how big my synchronous writes are. I am using my solaris box as file server. I am the only user of the machine and files are usually around 3mb-20mb with lots of them. I am doing a copy right now of 70k files at a total of 300gb from many directories to a single one and it is running at 3.22mb/s over gigabit Ethernet. Zilstats still giving all 0's.

            More info below but thanks heaps for the help. Any other suggestions would be greatly appreciated.

            Cheers

            James


            Should I be worried about this:

            james@solaris:~# zpool add monster cache c8t1d0s0
            invalid vdev specification
            use '-f' to override the following errors:
            /dev/dsk/c8t1d0s0 overlaps with /dev/dsk/c8t1d0s2



            Total disk cylinders available: 6685 + 2 (reserved cylinders)

            Part Tag Flag Cylinders Size Blocks
            0 root wm 1 - 5639 43.20GB (5639/0/0) 90590535
            1 swap wu 0 0 (0/0/0) 0
            2 backup wu 0 - 6684 51.21GB (6685/0/0) 107394525
            3 unassigned wm 5640 - 6684 8.01GB (1045/0/0) 16787925
            4 unassigned wm 0 0 (0/0/0) 0
            5 unassigned wm 0 0 (0/0/0) 0
            6 usr wm 0 0 (0/0/0) 0
            7 unassigned wm 0 0 (0/0/0) 0
            8 boot wu 0 - 0 7.84MB (1/0/0) 16065
            9 alternates wm 0 0 (0/0/0) 0
            • 3. Re: zfs Performance issues and it seams that Zil is not being used at all.
              Cindys-Oracle
              Is the OS 11 FCS release?

              I think you are running into an existing bug with the overlap warning and you
              should be able to force the add, like this:

              # zpool add -f monster cache c8t1d0s0

              Let me check in with someone else on the seemingly update ZIL with your workload.

              Thanks,

              Cindy
              • 4. Re: zfs Performance issues and it seams that Zil is not being used at all.
                Cindys-Oracle
                The only other comment is to make sure that you are testing your workload's synchronous write performance
                with the ZIL/logbias=latency configuration.

                Thanks,

                Cindy
                • 5. Re: zfs Performance issues and it seams that Zil is not being used at all.
                  934442
                  Hey Guys,

                  Thanks so much for the suggestions but it seems that throughput is still really bad. 3mb/s is kind of crazy. Here is the version info:
                  james@solaris:~$ more /etc/release
                  Oracle Solaris 11 11/11 X86
                  Copyright (c) 1983, 2011, Oracle and/or its affiliates. All rights reserved.
                  Assembled 18 October 2011

                  Interestingly it seems to be having most problems with writes. Maybe it is a problem with something other than zil. It is all happening over smb.

                  Cheers