This discussion is archived
9 Replies Latest reply: Jul 18, 2012 3:59 AM by robinsc RSS

High Small read request on Exadata

robinsc Explorer
Currently Being Moderated
Hi everyone,
We have seen some interesting values for metrics on both our own Exadata and at another company. when running the query dcli -g cell_group -l cellmonitor cellcli -e "list metrichistory where metricvalue '>500000' and Name like 'CD_IO_TM_R_SM_RQ'"

we have got values ranging from half a second up to 8 seconds for a cell single block read. I have an SR open on the same but am not getting much traction from development. So I was hoping to gather additional data by asking the folks here to run the same on their machines and post the data . This would help establish a baseline of the expected value of this metric ( which seems to me should not exceed a couple hundred miliseconds ). I am also trying to establish if activating IORM and setting an objective really had an effect on this or not.

Before setting an IORM objective
exass03: CD_IO_TM_R_SM_RQ CD_08_exass03 *7,474,522* us/request 2012-03-30T23:09:49+05:30

After setting IORM objective=auto
exass01: CD_IO_TM_R_SM_RQ CD_00_exass01 548,574 us/request 2012-04-07T12:36:40+05:30
exass01: CD_IO_TM_R_SM_RQ CD_01_exass01 525,864 us/request 2012-04-07T12:36:40+05:30
exass01: CD_IO_TM_R_SM_RQ CD_02_exass01 1,342,631 us/request 2012-04-07T12:36:40+05:30
exass01: CD_IO_TM_R_SM_RQ CD_02_exass01 503,974 us/request 2012-04-08T16:55:33+05:30

Edited by: robinsc on May 22, 2012 1:05 PM - fixed spelling mistakes
  • 1. Re: High Small read request on Exadata
    Marc Fielding Journeyer
    Currently Being Moderated
    Hi robinsc,

    Can you share a bit about your workload? What is the impact of having slow reads? Are you maxing out the I/O capacity of the hard drives?

    And have you considered a min_latency target? It will trade some throughput but it might be acceptable if you need your small reads to be consistently fast. Another option to consider would be more active flash cache management to help get your latency-sensitive (and hopefully repeatable) reads into cache.

    Marc
  • 2. Re: High Small read request on Exadata
    912595 Expert
    Currently Being Moderated
    Did you try t0 check "cell single block physical read" in AWR section? If i remember correctly good baselines for this wait event is from .01 to .1 ms. Please check AWR and pos it here.
  • 3. Re: High Small read request on Exadata
    robinsc Explorer
    Currently Being Moderated
    At the other site we tracked down an instance of a query exceeding expected SLA. It basically did an index lookup for a goodly number of rows. When looking up one of the rows it waited for over a second for a single block read to complete. that pushed the whole operation over the SLA. However what we are trying to understand is whether the variances we are seeing are expected and typical or are atypical. As I remarked in two sites having x2-2 Exadata racks using high performance disk we have observed similar behavior. Oracle support was of the opinion that the disks were not at 100% utilization when the issue occurred. Though we have a good flash hit ratio we can never guarantee a 100% hit rate so our issue boils down to predictability. If Exadata single block reads truly are that variable we could go back to the business and renegotiate for a different SLA but no one has given us a definitive answer as to the expected wait histogram of cell single block physical reads.
    That's why I was asking if this forum could gather additional data points.
    I really hope you and everyone else here could give it a shot... run the cellcli query and check the output over your existing metrichistory.....
    Hope you can do that for me .... pretty please :)
    If others experience the same as we are then it would at least give everyone a heads up on what to expect .
    Thanks
  • 4. Re: High Small read request on Exadata
    robinsc Explorer
    Currently Being Moderated
    Hi there problem her eis that it tends to get overwhelmed since the number of occurances is low. the problem is that each occurance will cause an SLA miss. So we actually found the issue by looking at dba_hist views rather than awr trying to find out why a particular query from a particular session ran slow.
    we then corellated via cellcli to find the same.

    select WAIT_TIME_MILLI, WAIT_COUNT from v$event_histogram where event='cell single block physical read'


    WAIT_TIME_MILLI     WAIT_COUNT
    1     7982705
    2     152638
    4     46952
    8     90880
    16     94920
    32     20834
    64     5633
    128     1630
    256     296
    512     101
    1024     16
    2048     11
    4096     1
  • 5. Re: High Small read request on Exadata
    tychos Expert
    Currently Being Moderated
    Hi robinsc,
    It looks like you only had two moments some grid disk had a high response time.
    2012-04-07T12:36
    2012-04-08T16:55
    Both on the same host "exass01" (seems like a none default cell name).
    Did you check the node to discard some hardware failure?
    Regards,
    Tycho
  • 6. Re: High Small read request on Exadata
    robinsc Explorer
    Currently Being Moderated
    the first one was on exass03 ... and this is the defualt naming as per what acs generated for us... seems ss stands for storage server instead of cel which they used to use...
  • 7. Re: High Small read request on Exadata
    848750 Newbie
    Currently Being Moderated
    Are there overlapping Smart Scans in flight?
  • 8. Re: High Small read request on Exadata
    Marc Fielding Journeyer
    Currently Being Moderated
    Hi robinsc,

    Apologies for the late reply.

    Here are some numbers for yesterday on a production system that tends to saturate I/O frequently. This system doesn't have IORM in use at all, and we can see that >1s response times to exist.

    $ dcli -g cell_group -l cellmonitor cellcli -e "list metrichistory where metricvalue '>500000' and Name like 'CD_IO_TM_R_SM_RQ'" > cellcli.out
    $ cat cellcli.out | grep 2012-05-29T | tr -d , | sort -nk4 | tail

    dm1c06: CD_IO_TM_R_SM_RQ CD_00_dm1c06 2848684 us/request 2012-05-29T21:56:25+00:00
    dm1c08: CD_IO_TM_R_SM_RQ CD_00_dm1c08 3032456 us/request 2012-05-29T19:17:41+00:00
    dm1c09: CD_IO_TM_R_SM_RQ CD_01_dm1c09 3070021 us/request 2012-05-29T04:23:11+00:00
    dm1c07: CD_IO_TM_R_SM_RQ CD_00_dm1c07 3166033 us/request 2012-05-29T18:56:32+00:00
    dm1c11: CD_IO_TM_R_SM_RQ CD_00_dm1c11 3213968 us/request 2012-05-29T02:26:13+00:00
    dm1c07: CD_IO_TM_R_SM_RQ CD_00_dm1c07 3407311 us/request 2012-05-29T04:31:20+00:00
    dm1c05: CD_IO_TM_R_SM_RQ CD_00_dm1c05 3642966 us/request 2012-05-29T16:30:08+00:00
    dm1c10: CD_IO_TM_R_SM_RQ CD_01_dm1c10 3701307 us/request 2012-05-29T03:32:23+00:00
    dm1c07: CD_IO_TM_R_SM_RQ CD_01_dm1c07 3843500 us/request 2012-05-29T23:01:37+00:00
    dm1c10: CD_IO_TM_R_SM_RQ CD_01_dm1c10 4809542 us/request 2012-05-29T02:31:22+00:00

    SQL> select WAIT_TIME_MILLI, WAIT_COUNT from v$event_histogram where event='cell single block physical read';

    WAIT_TIME_MILLI WAIT_COUNT
    --------------- ----------
    1 9102589811
    2 276214328
    4 67696074
    8 54400628
    16 51409920
    32 16854159
    64 3894203
    128 1800957
    256 973878
    512 482764
    1024 253573
    2048 106873
    4096 21850
    8192 3137
    16384 117
    32768 6

    So while there aren't many of them, slow single-block read requests to exist on this system.

    If you have a strict per-query response time SLA though, I'd highly recommend trying out a MIN_LATENCY objective and seeing if you can tolerate the throughput impact.

    Cheers,

    Marc
  • 9. Re: High Small read request on Exadata
    robinsc Explorer
    Currently Being Moderated
    I got an answer back from oracle support. Even their test system has slow response times so it seems that setting an alert on this metric is a recipe for late night emergency panic calls.

    We do seem to be getting more even response times with io_objective set to auto

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points