13 Replies Latest reply on Jul 6, 2012 3:05 AM by 933584

    Very low arcsize problem.

      For weeks now I've been scrambling to find a cause for our newly created Solaris 11 NAS slowness issue. The server has two SSD drives for Sol 11 in a mirrored rpool and four SATA drives in a mirror'd pool for data with many NFS mounts. It has 16GB of memory and a recently added Samsung SSD log drive for the data pool.
      The server uses IP over InfiniBand as a storage network between a ESXi node hosting approx 15 NFS attached vm's.
      We have great performance for approximately three days, then for no apparent reason, IO crawls. Going from upper 300MB/s range on the VM's down to 10-15MB/s. When looking on the Solaris side, I see the same throughput change eliminating the IB network or something with ESXi. When this state occurs, system memory goes from (in good state) 2.5G free to ~13GB free. The only solution to this state seems to be a reboot of Solaris.
      After doing some research, I found some users out there(not on this forum) that had seen the same issue. They had found it revolved around the arc dropping to a very low state for no good reason. I looked when my server was slow and sure enough, my arc size was under 100M and I had over 13GB free. They had also found that the arc_no_grow flag was getting set to 1 and was not getting set back to 0 to allow the ARC to take back the memory from perceived memory pressure situation which they or myself have nothing running on the servers to create a memory pressure situation.
      I have yet to find a fix for this or maybe even a work around. I attempted to set "zfs_arc_min" to 4GB, this change took effect but just like before, after a few days the arc dropped, only difference is this time its seems to be staying up around ~200MB and only dipping in the 90MB's very rarely allowing us to stay up at the time of this writing at over seven days. Since putting this in, IO doesn't bottom out like before, and allows us to still run about 50% of a normal IO.
      I dont have dedupe or compression on anywhere. The arc_min set to 4GB is allowing us to at least use the NAS but its a shame that 13GB is free at all times that could be used for cache and we are not getting the great performance we know we can get for a few days after a reboot.

      If anyone has any information on this or have experienced this before, please share.
        • 1. Re: Very low arcsize problem.
          I suspect something is using a lot of memory after a few days, putting ZFS into the state you're seeing. Maybe your backups put a lot of memory pressure at times, reading lots of data into memory?

          You can set up sar (man sar.1) to see what's going on. Set it up to take samples every 10 seconds or so. Longer than that and you might miss something. When you find out when your memory issues start (asssuming that's what happening), you can then start looking for what's causing the problem.

          You can also try setting "zfs_arc_max" to a value that's somewhat smaller than the max size you see when you're running good. For example, if when you're running good your ARC is at 14 GB, try setting it to 12 GB and see if your problem goes away. That's more of a guess-and-hope though, because you'll never know if your problem will return unless you figure out what it is.
          • 2. Re: Very low arcsize problem.
            Thank you for your response and suggestions. The server memory is never affected by any process consuming memory or needing memory.I've yet to setup any backups or zfs replication. I was using a perl script called "arcstat.pl" to watch the arc this last time it happened. It literally went a arcsize of 10G down to 574M in under 20 minutes. At that time I was watching all the process memory consumption with prstat and top. I am going to setup sar as you suggested to help better watch things, thank you for that suggestion.

            time     read     miss     miss% dmis dm% pmis     pm% mmis mm% arcsz     c
            13:30:40     232     18     7     8     3     10     50     0     1     10G     11G
            13:40:40     2.4K     717     30     9     0     707     90     7     0     8.5G     8.5G
            13:50:40     4.4K     2.2K     48     169     8     2.0K     81     21     1     574M     4.0G
            14:00:40     3.7K     2.2K     58     263     19     1.9K     80     27     4     234M     4.0G
            14:10:40     3.9K     2.3K     59     255     19     2.0K     79     25     4     160M     4.0G
            14:20:40     4.7K     2.8K     60     307     20     2.5K     79     29     3     211M     4.0G
            14:30:40     3.7K     2.1K     55     188     13     1.9K     81     24     2     131M     4.0G
            14:40:40     2.5K     1.5K     61     224     25     1.3K     80     32     7     127M     4.0G
            14:50:40     2.8K     1.7K     59     238     23     1.4K     81     41     8     168M     4.0G
            15:00:40     7.1K     4.2K     58     359     15     3.8K     80     50     3     185M     4.0G
            15:10:40     6.3K     3.9K     61     409     21     3.5K     78     42     4     238M     4.0G

            This was on 3/2/12 and the server is still up, arcz is still floating from 90M to at most I've seen looking through the logs ~700M. Before I set the zfs_arc_min, this looked the same with the exception of two things, arcz floated around the 40-50MB's, "c" around the same and IO was terrible.

            Here is some detailed arc and memory info if this helps:

            3:46pm up 7 day(s), 16:26, 1 user, load average: 0.06, 0.05, 0.05
            System Memory:
            Physical RAM: 16342 MB
            Free Memory : 12863 MB
            LotsFree: 255 MB

            ZFS Tunables (/etc/system):
            set zfs:zfs_arc_min = 4294967296

            ARC Size:
            Current Size: 154 MB (arcsize)
            Target Size (Adaptive): 4096 MB (c)
            Min Size (Hard Limit): 4096 MB (zfs_arc_min)
            Max Size (Hard Limit): 15318 MB (zfs_arc_max)

            ARC Size Breakdown:
            Most Recently Used Cache Size: 6% 256 MB (p)
            Most Frequently Used Cache Size: 93% 3839 MB (c-p)

            ARC Efficency:
            Cache Access Total: 335961590
            Cache Hit Ratio: 52% 176807570 [Defined State for buffer]
            Cache Miss Ratio: 47% 159154020 [Undefined State for Buffer]
            REAL Hit Ratio: 77% 261160422 [MRU/MFU Hits Only]

            Data Demand Efficiency: 65%
            Data Prefetch Efficiency: 13%

            Anon: --% Counter Rolled.
            Most Recently Used: 46% 82783131 (mru) [ Return Customer ]
            Most Frequently Used: 100% 178377291 (mfu) [ Frequent Customer ]
            Most Recently Used Ghost: 0% 0 (mru_ghost) [ Return Customer Evicted, Now Back ]
            Most Frequently Used Ghost: 0% 0 (mfu_ghost) [ Frequent Customer Evicted, Now Back ]
            Demand Data: 58% 103054838
            Prefetch Data: 8% 14640256
            Demand Metadata: 33% 58571606
            Prefetch Metadata: 0% 540870
            Demand Data: 33% 53709385
            Prefetch Data: 58% 92853092
            Demand Metadata: 5% 9285319
            Prefetch Metadata: 2% 3306224
            Page Summary Pages MB %Tot
            ------------ ---------------- ---------------- ----
            Kernel 839647 3279 20%
            ZFS File Data 16542 64 0%
            Anon 28776 112 1%
            Exec and libs 1514 5 0%
            Page cache 7189 28 0%
            Free (cachelist) 9940 38 0%
            Free (freelist) 3280164 12813 78%

            Total 4183772 16342
            Physical 4183771 16342

            39M 22M sleep 0:14 0.00% fmd
            20M 12M sleep 0:09 0.00% svc.startd
            20M 19M sleep 0:56 0.00% svc.configd
            18M 5192K sleep 0:09 0.00% sshd
            18M 5188K sleep 0:21 0.00% sshd
            18M 4704K sleep 0:03 0.00% sshd
            18M 4660K sleep 0:01 0.00% sshd
            15M 12M sleep 0:02 0.00% arcstat.pl
            15M 3512K sleep 0:00 0.00% rad
            14M 4680K sleep 0:00 0.00% kcfd
            14M 3004K sleep 0:00 0.00% sshd
            13M 4748K sleep 0:20 0.00% nscd
            13M 4680K sleep 0:00 0.00% syseventd
            13M 6776K sleep 0:11 0.00% devfsadm
            12M 1928K sleep 0:19 0.00% nfsmapid
            11M 3300K sleep 0:05 0.00% inetd
            11M 2628K sleep 0:03 0.00% mountd
            11M 1276K sleep 0:00 0.00% net-physical
            11M 2060K sleep 0:00 0.00% syslogd
            11M 2772K sleep 0:00 0.00% picld
            10M 6556K sleep 0:00 0.00% bash
            10M 2208K sleep 0:00 0.00% bash
            10M 1968K sleep 0:00 0.00% asr-notify
            10M 2100K sleep 0:00 0.00% smtp-notify
            • 3. Re: Very low arcsize problem.
              If you're running that Perl script as root, it'd be really interesting to see what the output of 'echo ::memstat | mdb -k' is while your ARC is shrinking away.
              • 4. Re: Very low arcsize problem.

                You could be hitting Bug 7111576 - "arc shrinks in the absence of memory pressure". To verify can you check the following please:

                echo "::memstat" | mdb -k
                echo "::vmem ! grep zfs"
                echo "::arc" | mdb -k

                To confirm the bug is a match for you, the following must be true:

                - The "memstat" should show plenty of free memory.
                - The "vmem" output should show that the INUSE value is at or very close to the TOTAL values, eg:

                ::vmem ! grep zfs
                ffffff0900cc0000 zfs_file_data 32377929728 34351349760 30878 0
                ffffff0900cc8000 zfs_file_data_buf 62935040 62935040 961845 0

                - The "::arc" should show 'arc_no_grow = 1' constantly, so it's worth running this command several times a day just to be certain. It will also show "size" being a lot less than c_max despite the system having lots of free memory.

                - Finally you can use the following example to do the maths on your system to verify the issue

                $ egrep "vmem:32:zfs_file_data:mem_inuse|vmem:32:zfs_file_data:mem_total" kstat-p.out
                vmem:32:zfs_file_data:mem_inuse 7017070592
                vmem:32:zfs_file_data:mem_total 17171480576

                vmem_size(zio_arena, VMEM_FREE)
                = vmem_size(zfs_file_data, VMEM_FREE)
                = 17171480576 - 7017070592
                = 10154409984

                vmem_size(zio_arena, VMEM_ALLOC) >> 4
                = vmem_size(zfs_file_data, VMEM_ALLOC) >> 4
                = 7017070592 >> 4
                = 438566912

                ### 438566912 < 10154409984 ? TRUE

                In this example the result is TRUE, thus hitting the bug.

                If you find that you are hitting the issue, then please log a service request with us and we can provide you with an IDR (Interim Development Relief). The bug is due to be pushed out to the S10u11 and S11u1 releases due out later this year. There won't be an interim public patch which is why we have the IDRs.

                • 5. Re: Very low arcsize problem.
                  Thank you SteveS.
                  I'm 99% sure I'm hitting Bug 711576. I've always noticed that arc_no_grow stays 1 when this occurs and never changes unless we reboot the node. I had found this "Bug 711576" on a google search a couple weeks ago which let me to check the status of "arc_no_grow". Here is my memstat, vmem and arc results.

                  echo "::memstat" | mdb -k
                  Page Summary Pages MB %Tot
                  ------------ ---------------- ---------------- ----
                  Kernel 840032 3281 20%
                  ZFS File Data 32762 127 1%
                  Anon 28729 112 1%
                  Exec and libs 1515 5 0%
                  Page cache 7180 28 0%
                  Free (cachelist) 9973 38 0%
                  Free (freelist) 3263581 12748 78%

                  Total 4183772 16342
                  Physical 4183771 16342

                  echo "::arc" | mdb -k
                  hits = 222812662
                  misses = 199136276
                  demand_data_hits = 129142079
                  demand_data_misses = 67466530
                  demand_metadata_hits = 76002741
                  demand_metadata_misses = 11531493
                  prefetch_data_hits = 17004177
                  prefetch_data_misses = 115446786
                  prefetch_metadata_hits = 663665
                  prefetch_metadata_misses = 4691467
                  mru_hits = 103977792
                  mru_ghost_hits = 0
                  mfu_hits = 226191001
                  mfu_ghost_hits = 0
                  deleted = 211310007
                  mutex_miss = 8636436
                  hash_elements = 138858349
                  hash_elements_max = 138858349
                  hash_collisions = 16428656
                  hash_chains = 1299829
                  hash_chain_max = 10
                  p = 283 MB
                  c = 4096 MB
                  c_min = 4096 MB
                  c_max = 15318 MB
                  size = 163 MB
                  buf_size = 66 MB
                  data_size = 88 MB
                  other_size = 8 MB
                  l2_hits = 0
                  l2_misses = 199136276
                  l2_feeds = 0
                  l2_rw_clash = 0
                  l2_read_bytes = 0 MB
                  l2_write_bytes = 0 MB
                  l2_writes_sent = 0
                  l2_writes_done = 0
                  l2_writes_error = 0
                  l2_writes_hdr_miss = 0
                  l2_evict_lock_retry = 0
                  l2_evict_reading = 0
                  l2_abort_lowmem = 0
                  l2_cksum_bad = 0
                  l2_io_error = 0
                  l2_hdr_size = 0 MB
                  memory_throttle_count = 0
                  meta_used = 74 MB
                  meta_max = 201 MB
                  meta_limit = 0 MB
                  arc_no_grow = 1
                  arc_tempreserve = 0 MB

                  As far as going about getting the IDR, this is just a prof of concept install and setup, it is housing some heavly used VM's but this is to prove mainly the ability to replicate ZFS to another node. I've not even go to that point due to this bug unfortunately. If I can iron the bugs out and make it work right we are going with Solaris 11, if not I've been told we have to look at other solutions. In saying this if I cant get this fixed I'll have to move away from Solaris and with exception of this bug, I'm really loving it and I know it is a perfect fit for us. Am I still eligible to get this fix with just a public download used for testing?
                  • 6. Re: Very low arcsize problem.
                    I agree that you've hit the bug.
                    Am I still eligible to get this fix with just a public download used for testing?
                    Unfortunately not. You need to have a support contract with Oracle. The support contract will allow you to create a service request and from that we then provide the IDR. Without an SR I/We cannot provide the IDR. S11u1 is quite a few months away from being generally available and unfortunately won't be available before the end of your testing phase.

                    One suggestion would be to contact your Oracle Service Account Team (SAM) if you have one or our Pre-Sales/Sales team. If you explain what you're doing, the issue you've hit, and point them to this Thread, I'm sure they'll be able to help out. I work in Support so can't do anything other than provide technical assistance. If they can provide you with a temporary CSI (Customer Support Identifier) you can get the SR created and we can give you the IDR to get you through the testing and proof of concept.

                    http://www.oracle.com/uk/index.html has a UK number for Sales of 08705 332200 and a link to "Live Chat with Sales" which may help.

                    • 7. Re: Very low arcsize problem.
                      Ok, thank you again Steve for the great help. We'll attempt to get the temporary CSI as you suggested. Just knowing this is a real bug and there's a known fix for it is hope at least.
                      • 8. Re: Very low arcsize problem.
                        I just wanted to know if anyone here has managed to get the IDR from Oracle. I've had a ticket open for two weeks now with them and they keep coming back 2 days later asking for more misc info like pkg versions ect.

                        Quite frankly this is extremely disappointing. I'm hitting this on every Solaris 11 box I have setup so far. Each with different hardware and zfs loadouts.

                        I've been testing the heck out of it trying to figure out what triggers the bug. I have one box with 24g ram and 4 drives in raid-z with a 5th SSD as the ZIL / CACHE drive. Another box with 8 drives in RAID 0 + 1 and a 9th SSD as ZIL.

                        I am thinking its possible that adding a separate ZIL has somehow triggered it. The big 8 drive box ran for a couple days under heavy IO load without it triggering and then the same day I added the ZIL I hit it. I will try again with the ZIL removed and see if I can trigger it.

                        Anyonw else had any experience trouble shooting this?

                        This is a serious show stopper for us and has caused a lot of grief. Initial testing did not catch this. It wasn't until we actually put one of the boxes in production and started to really keep the load sustained for several days that it manifested. I can't believe this has not made a bigger uproar among customers. I was thinking I may have a unique hardware loadout, but when the bug hit on the other 2... That is interesting..

                        Edited by: TomS on May 31, 2012 6:01 PM

                        one more edit.. I noticed a sure fire way of seeing if its tripped is to try and manually set the arc_no_grow. When the machine is operating normally, you can perform a
                        >echo "arc_no_grow/W 0" | mdb -kw
                        arc_no_grow:    0x1             =       0x0
                        and flip the bit from 1 to 0 or vice versa.

                        After the bug is hit. the same command appears to be executed properly, but the no_grow is perminantly stuck at 1 and the meta used and max never change.
                        meta_used                 =        73 MB
                        meta_max                  =       615 MB
                        meta_limit                =         0 MB
                        arc_no_grow               =         1
                        Edited by: TomS on May 31, 2012 6:05 PM

                        Edited by: TomS on May 31, 2012 6:08 PM
                        • 9. Re: Very low arcsize problem.
                          We've been running idr145.4.p5p for some time and it helps to get l2arc running. Dramatic increase in performance, but unfortunately it also makes our server panic every few week. Luckily this happens at night when the backup is running.

                          Tomorrow we'll start testing idr145.7.p5p. The panics should be over.
                          • 10. Re: Very low arcsize problem.
                            Victor de Solar

                            The bug 7111576 arc shrinks in the absence of memory pressure
                            is fixed in SRU 8.5 which is available for "public" since 22.06.2012.

                            Go and get it now!

                            • 11. Re: Very low arcsize problem.
                              I am confused. When you say 'available for public', do you mean for free download, or is support contract required. I have a S11 install where i just noticed the ARC is gimped, which is a killer for me.
                              • 12. Re: Very low arcsize problem.
                                Victor de Solar

                                Unfortunately, the SRUs (Support Repository Updates) is available @ My Oracle Support for contracted customer only.
                                If you don't have Support contract then you cannot download the SRU.

                                • 13. Re: Very low arcsize problem.
                                  Yup the SRU came out a few weeks after we got the IDR. I still asked for and received the IDR even though they told me the SRU will be out in June, mostly because this bug was grinding our production to a halt.. Anyhow.. Glad they got it fixed.