This discussion is archived
5 Replies Latest reply: Apr 17, 2013 2:10 AM by 1001955 RSS

Performance with Dedup on HP ProLiant DL380p Gen8

1001955 Newbie
Currently Being Moderated
Hi all,

it is not that i haven't been warned. It is just that i simply do not understand why write performance on the newly created pool ist so horrible...

Hopefully, i'll get some mor advise here. Some basic figures:

The machine is a HP ProLiant DL380p Gen8 with two Intel Xeon E5-2665 CPUs and 128GB Ram.
The storage-pool is made out of 14 900GB SAS 10k disks on two HP H221 SAS HBAs in two HP D2700 storage enclosures.
The System is Solaris 11.1

root@server12:~# zpool status -D datenhalde
pool: datenhalde
state: ONLINE
scan: none requested
config:

NAME STATE READ WRITE CKSUM
datenhalde ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c11t5000C5005EE0F5D5d0 ONLINE 0 0 0
c12t5000C5005EDBBB95d0 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
c11t5000C5005EE20251d0 ONLINE 0 0 0
c12t5000C5005ED658F1d0 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
c11t5000C5005ED80439d0 ONLINE 0 0 0
c12t5000C5005EDB23F1d0 ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
c11t5000C5005EDA2315d0 ONLINE 0 0 0
c12t5000C5005ED6E049d0 ONLINE 0 0 0
mirror-4 ONLINE 0 0 0
c11t5000C5005EDBB289d0 ONLINE 0 0 0
c12t5000C5005EDB9479d0 ONLINE 0 0 0
mirror-5 ONLINE 0 0 0
c11t5000C5005EDD8385d0 ONLINE 0 0 0
c12t5000C5005ED72855d0 ONLINE 0 0 0
mirror-6 ONLINE 0 0 0
c11t5000C5005ED8759Dd0 ONLINE 0 0 0
c12t5000C5005EE3AB59d0 ONLINE 0 0 0
spares
c11t5000C5005ED6CEADd0 AVAIL
c12t5000C5005EDA2CD5d0 AVAIL

errors: No known data errors

DDT entries 5354008, size 292 on disk, 152 in core

bucket allocated referenced
______ ______________________________ ______________________________
refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE
------ ------ ----- ----- ----- ------ ----- ----- -----
1 3,22M 411G 411G 411G 3,22M 411G 411G 411G
2 1,28M 163G 163G 163G 2,93M 374G 374G 374G
4 440K 54,9G 54,9G 54,9G 2,12M 271G 271G 271G
8 140K 17,5G 17,5G 17,5G 1,39M 177G 177G 177G
16 36,1K 4,50G 4,50G 4,50G 689K 85,9G 85,9G 85,9G
32 6,26K 798M 798M 798M 277K 34,4G 34,4G 34,4G
64 1,92K 244M 244M 244M 136K 16,9G 16,9G 16,9G
128 56 6,52M 6,52M 6,52M 10,5K 1,23G 1,23G 1,23G
256 222 27,5M 27,5M 27,5M 71,0K 8,80G 8,80G 8,80G
512 2 256K 256K 256K 1,38K 177M 177M 177M
1K 4 384K 384K 384K 6,00K 612M 612M 612M
4K 1 512 512 512 4,91K 2,45M 2,45M 2,45M
16K 1 128K 128K 128K 24,9K 3,11G 3,11G 3,11G
512K 1 128K 128K 128K 599K 74,9G 74,9G 74,9G
Total 5,11M 652G 652G 652G 11,4M 1,43T 1,43T 1,43T

root@server12:~# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
datenhalde 5,69T 662G 5,04T 11% 2.22x ONLINE -

root@server12:~# ./arc_summery.pl
System Memory:
Physical RAM: 131021 MB
Free Memory : 18102 MB
LotsFree: 2047 MB

ZFS Tunables (/etc/system):

ARC Size:
Current Size: 101886 MB (arcsize)
Target Size (Adaptive): 103252 MB (c)
Min Size (Hard Limit): 64 MB (zfs_arc_min)
Max Size (Hard Limit): 129997 MB (zfs_arc_max)

ARC Size Breakdown:
Most Recently Used Cache Size: 100% 103252 MB (p)
Most Frequently Used Cache Size: 0% 0 MB (c-p)

ARC Efficency:
Cache Access Total: 124583164
Cache Hit Ratio: 70% 87975485 [Defined State for buffer]
Cache Miss Ratio: 29% 36607679 [Undefined State for Buffer]
REAL Hit Ratio: 103% 128741192 [MRU/MFU Hits Only]

Data Demand Efficiency: 91%
Data Prefetch Efficiency: 29%

CACHE HITS BY CACHE LIST:
Anon: --% Counter Rolled.
Most Recently Used: 74% 65231813 (mru) [ Return Customer ]
Most Frequently Used: 72% 63509379 (mfu) [ Frequent Customer ]
Most Recently Used Ghost: 0% 0 (mru_ghost) [ Return Customer Evicted, Now Back ]
Most Frequently Used Ghost: 0% 0 (mfu_ghost) [ Frequent Customer Evicted, Now Back ]
CACHE HITS BY DATA TYPE:
Demand Data: 15% 13467569
Prefetch Data: 4% 3555720
Demand Metadata: 80% 70648029
Prefetch Metadata: 0% 304167
CACHE MISSES BY DATA TYPE:
Demand Data: 3% 1281154
Prefetch Data: 23% 8429373
Demand Metadata: 73% 26879797
Prefetch Metadata: 0% 17355


root@server12:~# echo "::arc" | mdb -k
hits = 88823429
misses = 37306983
demand_data_hits = 13492752
demand_data_misses = 1281335
demand_metadata_hits = 71470790
demand_metadata_misses = 27578897
prefetch_data_hits = 3555720
prefetch_data_misses = 8429373
prefetch_metadata_hits = 304167
prefetch_metadata_misses = 17378
mru_hits = 66467881
mru_ghost_hits = 0
mfu_hits = 64253247
mfu_ghost_hits = 0
deleted = 41770876
mutex_miss = 172782
hash_elements = 18446744073676992500
hash_elements_max = 18446744073709551615
hash_collisions = 12375174
hash_chains = 18446744073698514699
hash_chain_max = 9
p = 103252 MB
c = 103252 MB
c_min = 64 MB
c_max = 129997 MB
size = 102059 MB
buf_size = 481 MB
data_size = 100652 MB
other_size = 924 MB
l2_hits = 0
l2_misses = 28860232
l2_feeds = 0
l2_rw_clash = 0
l2_read_bytes = 0 MB
l2_write_bytes = 0 MB
l2_writes_sent = 0
l2_writes_done = 0
l2_writes_error = 0
l2_writes_hdr_miss = 0
l2_evict_lock_retry = 0
l2_evict_reading = 0
l2_abort_lowmem = 0
l2_cksum_bad = 0
l2_io_error = 0
l2_hdr_size = 0 MB
memory_throttle_count = 0
meta_used = 1406 MB
meta_max = 1406 MB
meta_limit = 0 MB
arc_no_grow = 1
arc_tempreserve = 0 MB
root@server12:~#

The write-performance is really really slow:

read/write within this pool:
root@server12:/datenhalde/s12test/Bild-DB/Testaktion# /usr/gnu/bin/dd if=Test.tif of=Test2.tif
1885030+1 records in
1885030+1 records out
965135496 bytes (965 MB) copied, 145,923 s, 6,6 MB/s

read from this pool and write to the root-pool:

root@server12:/datenhalde/s12test/Bild-DB/Testaktion# /usr/gnu/bin/dd if=Test.tif of=/tmp/Test2.tif
1885030+1 records in
1885030+1 records out
965135496 bytes (965 MB) copied, 9,51183 s, 101 MB/s
root@server12:/datenhalde/s12test/Bild-DB/Testaktion# /usr/gnu/bin/dd if=FS2013_Fashionation_Beach_06.tif of=FS2013_Test.tif


I just do not get this. Why is it that slow? Am i missing any tunable parameters? From the above figures the ddt should use 5354008*152=776MB in RAM. That should fit easily.

Sorry for the longish post, but i really need some help here, because the real data with much higher dedup ratio is still to be copied to that pool.
Compression is no real alternative, because most of the data will be compressed images and i don't expect to see great compression ratios.

TIA and kind regards,
Tom

Edited by: vigtom on 16.04.2013 07:51
  • 1. Re: Performance with Dedup on HP ProLiant DL380p Gen8
    1001955 Newbie
    Currently Being Moderated
    And why is a "zdb -S datenhalde" dumping a 8.9G core instead of showing any infos?

    zdb -S datenhalde
    error: No such hold 4000 on refcount 7ff3ede8d170
    Abort (core dumped)

    -rw------- 1 root root 9566626126 Apr 16 17:25 core

    TIA,
    Tom
  • 2. Re: Performance with Dedup on HP ProLiant DL380p Gen8
    cindys Pro
    Currently Being Moderated
    Hi Tom,

    The zdb -S command must be run on a quiet pool. I see this isn't listed in the ZFS Admin Guide so I will add it.
    We have a bug 15760285 about ZFS dedup observability that identifies the zdb -S quiescent pool issue.

    I don't know why this pool performance is so slow. Your stats look fine to me but I'm no perf expert. I know one recommendation is to increase max ARC metadata size. Would this help the cache hit rates?

    Outside of additional tuning, a quick test would be to disable dedup to see if this is the cause.

    Thanks, Cindy
  • 3. Re: Performance with Dedup on HP ProLiant DL380p Gen8
    1001955 Newbie
    Currently Being Moderated
    Hi Cindy,

    thanks for answering :)

    Isn't the tunable parameter "arc_meta_limit" obsolete in Solaris 11?

    Before Solaris 11 you could tune arc_meta_limit by setting something reasonable in /etc/system with "set zfs:zfs_arc_meta_limit=...." which - at boot - is copied into arc_c_max overriding the default setting.

    On this Solaris 11.1 c_max is already maxed out to "kstat -p zfs:0:arcstats:c_max -> zfs:0:arcstats:c_max 136312127488" without any tunig. This is also reflected by the parameter "meta_limit = 0". Am i missing something here?

    When looking at the output of "echo "::arc" | mdb -k" i see the values of "meta_used", "meta_max" and "meta_limit". I understand these as "memory used for metadata right now", "max memory used for metadata in the past" and "theoretical limit of memory used for metadata" with an value of "0" as "unlimited". Right?

    What exactly is "arc_no_grow = 1" saying here?

    Sorry for maybe asking some silly questions. This is all a bit frustrating ;)

    When disabling dedup on the pool write performance is increasing almost instantly. I did not test it long enough to get real figures. I'll probably do this (eventually even with Solaris 10) tomorrow.

    Would Oracle be willing to help me out under a support plan when running Solaris 11.1 on a machine which is certified for Solars 10 only?

    Thanks again and kind regards,
    Tom
  • 4. Re: Performance with Dedup on HP ProLiant DL380p Gen8
    cindys Pro
    Currently Being Moderated
    You're not missing a thing. I am. :-)

    There was a bug with arc_no_grow but it was fixed before S11.1 released. It should be set to 1 most of the time.
    Did you cap the ARC at all or it is at the default value?

    Another easy fix would be to add more memory to see if that helps. We run on mirrored pools but not with dedup so I'm not familiar with the latest tips, but more memory is probably best.

    I'm not familiar with all support policies either so I can't answer your support question.

    Thanks, Cindy
  • 5. Re: Performance with Dedup on HP ProLiant DL380p Gen8
    1001955 Newbie
    Currently Being Moderated
    Hi Cindy,
    arc_no_grow should be set to 1 most of the time.
    Is that so? Why? Doesn't it say that arc is not allowed to grow?
    Did you cap the ARC at all or it is at the default value?
    I left everything default.
    Another easy fix would be to add more memory to see if that helps. We run on mirrored pools but not with dedup so I'm not familiar with the latest tips, but more memory is probably best.
    Hmm... i already have 128G of RAM in this machine and from my calculation this should be more then enough to hold the ddt in memory. Genuine HP-RAM is expensive and this is not easily done for a test...

    I am setting this machine up with Solaris 10 right now and will test the performance in an undeduped pool. We'll see what numbers that will show. Do you know of anyone who is willing to dig deeper into that dedup problem, to help me find a solution?

    TIA and kind regards,
    Tom

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points