5 Replies Latest reply: Apr 17, 2013 4:10 AM by 1001955 RSS

    Performance with Dedup on HP ProLiant DL380p Gen8

    1001955
      Hi all,

      it is not that i haven't been warned. It is just that i simply do not understand why write performance on the newly created pool ist so horrible...

      Hopefully, i'll get some mor advise here. Some basic figures:

      The machine is a HP ProLiant DL380p Gen8 with two Intel Xeon E5-2665 CPUs and 128GB Ram.
      The storage-pool is made out of 14 900GB SAS 10k disks on two HP H221 SAS HBAs in two HP D2700 storage enclosures.
      The System is Solaris 11.1

      root@server12:~# zpool status -D datenhalde
      pool: datenhalde
      state: ONLINE
      scan: none requested
      config:

      NAME STATE READ WRITE CKSUM
      datenhalde ONLINE 0 0 0
      mirror-0 ONLINE 0 0 0
      c11t5000C5005EE0F5D5d0 ONLINE 0 0 0
      c12t5000C5005EDBBB95d0 ONLINE 0 0 0
      mirror-1 ONLINE 0 0 0
      c11t5000C5005EE20251d0 ONLINE 0 0 0
      c12t5000C5005ED658F1d0 ONLINE 0 0 0
      mirror-2 ONLINE 0 0 0
      c11t5000C5005ED80439d0 ONLINE 0 0 0
      c12t5000C5005EDB23F1d0 ONLINE 0 0 0
      mirror-3 ONLINE 0 0 0
      c11t5000C5005EDA2315d0 ONLINE 0 0 0
      c12t5000C5005ED6E049d0 ONLINE 0 0 0
      mirror-4 ONLINE 0 0 0
      c11t5000C5005EDBB289d0 ONLINE 0 0 0
      c12t5000C5005EDB9479d0 ONLINE 0 0 0
      mirror-5 ONLINE 0 0 0
      c11t5000C5005EDD8385d0 ONLINE 0 0 0
      c12t5000C5005ED72855d0 ONLINE 0 0 0
      mirror-6 ONLINE 0 0 0
      c11t5000C5005ED8759Dd0 ONLINE 0 0 0
      c12t5000C5005EE3AB59d0 ONLINE 0 0 0
      spares
      c11t5000C5005ED6CEADd0 AVAIL
      c12t5000C5005EDA2CD5d0 AVAIL

      errors: No known data errors

      DDT entries 5354008, size 292 on disk, 152 in core

      bucket allocated referenced
      ______ ______________________________ ______________________________
      refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE
      ------ ------ ----- ----- ----- ------ ----- ----- -----
      1 3,22M 411G 411G 411G 3,22M 411G 411G 411G
      2 1,28M 163G 163G 163G 2,93M 374G 374G 374G
      4 440K 54,9G 54,9G 54,9G 2,12M 271G 271G 271G
      8 140K 17,5G 17,5G 17,5G 1,39M 177G 177G 177G
      16 36,1K 4,50G 4,50G 4,50G 689K 85,9G 85,9G 85,9G
      32 6,26K 798M 798M 798M 277K 34,4G 34,4G 34,4G
      64 1,92K 244M 244M 244M 136K 16,9G 16,9G 16,9G
      128 56 6,52M 6,52M 6,52M 10,5K 1,23G 1,23G 1,23G
      256 222 27,5M 27,5M 27,5M 71,0K 8,80G 8,80G 8,80G
      512 2 256K 256K 256K 1,38K 177M 177M 177M
      1K 4 384K 384K 384K 6,00K 612M 612M 612M
      4K 1 512 512 512 4,91K 2,45M 2,45M 2,45M
      16K 1 128K 128K 128K 24,9K 3,11G 3,11G 3,11G
      512K 1 128K 128K 128K 599K 74,9G 74,9G 74,9G
      Total 5,11M 652G 652G 652G 11,4M 1,43T 1,43T 1,43T

      root@server12:~# zpool list
      NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
      datenhalde 5,69T 662G 5,04T 11% 2.22x ONLINE -

      root@server12:~# ./arc_summery.pl
      System Memory:
      Physical RAM: 131021 MB
      Free Memory : 18102 MB
      LotsFree: 2047 MB

      ZFS Tunables (/etc/system):

      ARC Size:
      Current Size: 101886 MB (arcsize)
      Target Size (Adaptive): 103252 MB (c)
      Min Size (Hard Limit): 64 MB (zfs_arc_min)
      Max Size (Hard Limit): 129997 MB (zfs_arc_max)

      ARC Size Breakdown:
      Most Recently Used Cache Size: 100% 103252 MB (p)
      Most Frequently Used Cache Size: 0% 0 MB (c-p)

      ARC Efficency:
      Cache Access Total: 124583164
      Cache Hit Ratio: 70% 87975485 [Defined State for buffer]
      Cache Miss Ratio: 29% 36607679 [Undefined State for Buffer]
      REAL Hit Ratio: 103% 128741192 [MRU/MFU Hits Only]

      Data Demand Efficiency: 91%
      Data Prefetch Efficiency: 29%

      CACHE HITS BY CACHE LIST:
      Anon: --% Counter Rolled.
      Most Recently Used: 74% 65231813 (mru) [ Return Customer ]
      Most Frequently Used: 72% 63509379 (mfu) [ Frequent Customer ]
      Most Recently Used Ghost: 0% 0 (mru_ghost) [ Return Customer Evicted, Now Back ]
      Most Frequently Used Ghost: 0% 0 (mfu_ghost) [ Frequent Customer Evicted, Now Back ]
      CACHE HITS BY DATA TYPE:
      Demand Data: 15% 13467569
      Prefetch Data: 4% 3555720
      Demand Metadata: 80% 70648029
      Prefetch Metadata: 0% 304167
      CACHE MISSES BY DATA TYPE:
      Demand Data: 3% 1281154
      Prefetch Data: 23% 8429373
      Demand Metadata: 73% 26879797
      Prefetch Metadata: 0% 17355


      root@server12:~# echo "::arc" | mdb -k
      hits = 88823429
      misses = 37306983
      demand_data_hits = 13492752
      demand_data_misses = 1281335
      demand_metadata_hits = 71470790
      demand_metadata_misses = 27578897
      prefetch_data_hits = 3555720
      prefetch_data_misses = 8429373
      prefetch_metadata_hits = 304167
      prefetch_metadata_misses = 17378
      mru_hits = 66467881
      mru_ghost_hits = 0
      mfu_hits = 64253247
      mfu_ghost_hits = 0
      deleted = 41770876
      mutex_miss = 172782
      hash_elements = 18446744073676992500
      hash_elements_max = 18446744073709551615
      hash_collisions = 12375174
      hash_chains = 18446744073698514699
      hash_chain_max = 9
      p = 103252 MB
      c = 103252 MB
      c_min = 64 MB
      c_max = 129997 MB
      size = 102059 MB
      buf_size = 481 MB
      data_size = 100652 MB
      other_size = 924 MB
      l2_hits = 0
      l2_misses = 28860232
      l2_feeds = 0
      l2_rw_clash = 0
      l2_read_bytes = 0 MB
      l2_write_bytes = 0 MB
      l2_writes_sent = 0
      l2_writes_done = 0
      l2_writes_error = 0
      l2_writes_hdr_miss = 0
      l2_evict_lock_retry = 0
      l2_evict_reading = 0
      l2_abort_lowmem = 0
      l2_cksum_bad = 0
      l2_io_error = 0
      l2_hdr_size = 0 MB
      memory_throttle_count = 0
      meta_used = 1406 MB
      meta_max = 1406 MB
      meta_limit = 0 MB
      arc_no_grow = 1
      arc_tempreserve = 0 MB
      root@server12:~#

      The write-performance is really really slow:

      read/write within this pool:
      root@server12:/datenhalde/s12test/Bild-DB/Testaktion# /usr/gnu/bin/dd if=Test.tif of=Test2.tif
      1885030+1 records in
      1885030+1 records out
      965135496 bytes (965 MB) copied, 145,923 s, 6,6 MB/s

      read from this pool and write to the root-pool:

      root@server12:/datenhalde/s12test/Bild-DB/Testaktion# /usr/gnu/bin/dd if=Test.tif of=/tmp/Test2.tif
      1885030+1 records in
      1885030+1 records out
      965135496 bytes (965 MB) copied, 9,51183 s, 101 MB/s
      root@server12:/datenhalde/s12test/Bild-DB/Testaktion# /usr/gnu/bin/dd if=FS2013_Fashionation_Beach_06.tif of=FS2013_Test.tif


      I just do not get this. Why is it that slow? Am i missing any tunable parameters? From the above figures the ddt should use 5354008*152=776MB in RAM. That should fit easily.

      Sorry for the longish post, but i really need some help here, because the real data with much higher dedup ratio is still to be copied to that pool.
      Compression is no real alternative, because most of the data will be compressed images and i don't expect to see great compression ratios.

      TIA and kind regards,
      Tom

      Edited by: vigtom on 16.04.2013 07:51
        • 1. Re: Performance with Dedup on HP ProLiant DL380p Gen8
          1001955
          And why is a "zdb -S datenhalde" dumping a 8.9G core instead of showing any infos?

          zdb -S datenhalde
          error: No such hold 4000 on refcount 7ff3ede8d170
          Abort (core dumped)

          -rw------- 1 root root 9566626126 Apr 16 17:25 core

          TIA,
          Tom
          • 2. Re: Performance with Dedup on HP ProLiant DL380p Gen8
            Cindys-Oracle
            Hi Tom,

            The zdb -S command must be run on a quiet pool. I see this isn't listed in the ZFS Admin Guide so I will add it.
            We have a bug 15760285 about ZFS dedup observability that identifies the zdb -S quiescent pool issue.

            I don't know why this pool performance is so slow. Your stats look fine to me but I'm no perf expert. I know one recommendation is to increase max ARC metadata size. Would this help the cache hit rates?

            Outside of additional tuning, a quick test would be to disable dedup to see if this is the cause.

            Thanks, Cindy
            • 3. Re: Performance with Dedup on HP ProLiant DL380p Gen8
              1001955
              Hi Cindy,

              thanks for answering :)

              Isn't the tunable parameter "arc_meta_limit" obsolete in Solaris 11?

              Before Solaris 11 you could tune arc_meta_limit by setting something reasonable in /etc/system with "set zfs:zfs_arc_meta_limit=...." which - at boot - is copied into arc_c_max overriding the default setting.

              On this Solaris 11.1 c_max is already maxed out to "kstat -p zfs:0:arcstats:c_max -> zfs:0:arcstats:c_max 136312127488" without any tunig. This is also reflected by the parameter "meta_limit = 0". Am i missing something here?

              When looking at the output of "echo "::arc" | mdb -k" i see the values of "meta_used", "meta_max" and "meta_limit". I understand these as "memory used for metadata right now", "max memory used for metadata in the past" and "theoretical limit of memory used for metadata" with an value of "0" as "unlimited". Right?

              What exactly is "arc_no_grow = 1" saying here?

              Sorry for maybe asking some silly questions. This is all a bit frustrating ;)

              When disabling dedup on the pool write performance is increasing almost instantly. I did not test it long enough to get real figures. I'll probably do this (eventually even with Solaris 10) tomorrow.

              Would Oracle be willing to help me out under a support plan when running Solaris 11.1 on a machine which is certified for Solars 10 only?

              Thanks again and kind regards,
              Tom
              • 4. Re: Performance with Dedup on HP ProLiant DL380p Gen8
                Cindys-Oracle
                You're not missing a thing. I am. :-)

                There was a bug with arc_no_grow but it was fixed before S11.1 released. It should be set to 1 most of the time.
                Did you cap the ARC at all or it is at the default value?

                Another easy fix would be to add more memory to see if that helps. We run on mirrored pools but not with dedup so I'm not familiar with the latest tips, but more memory is probably best.

                I'm not familiar with all support policies either so I can't answer your support question.

                Thanks, Cindy
                • 5. Re: Performance with Dedup on HP ProLiant DL380p Gen8
                  1001955
                  Hi Cindy,
                  arc_no_grow should be set to 1 most of the time.
                  Is that so? Why? Doesn't it say that arc is not allowed to grow?
                  Did you cap the ARC at all or it is at the default value?
                  I left everything default.
                  Another easy fix would be to add more memory to see if that helps. We run on mirrored pools but not with dedup so I'm not familiar with the latest tips, but more memory is probably best.
                  Hmm... i already have 128G of RAM in this machine and from my calculation this should be more then enough to hold the ddt in memory. Genuine HP-RAM is expensive and this is not easily done for a test...

                  I am setting this machine up with Solaris 10 right now and will test the performance in an undeduped pool. We'll see what numbers that will show. Do you know of anyone who is willing to dig deeper into that dedup problem, to help me find a solution?

                  TIA and kind regards,
                  Tom