This discussion is archived
5 Replies Latest reply: Jul 19, 2013 8:08 AM by jsavit RSS

Problem with I/O in LDOM

955397 Newbie
Currently Being Moderated

Hi there,

 

OS in LDOM: Solaris 10u10

OS in primary: Solaris 11 11/11

 

Since one week the system in a LDOM reacts unacceptable slow. Before that one year working without problems. Every command or app routine slows system down so that a simple 'du -sh/* needs hours to complete.

No apps running in LDOM, all zones are halted.

No error messages. What I see:

 

ldom# iostat -xd 5 300

  extended device statistics

device r/s w/s kr/s kw/s wait actv svc_t %w %b

vdc0 0.8 5.6 1.1 228.0 0.0 7.6 3185.7 0 100

 

At the same time the underlying disks in the primary domain:

primary domain# iostat -xd 5 300

  extended device statistics

device r/s w/s kr/s kw/s wait actv svc_t %w %b

sd7 0.0 286.8 0.0 3645.2 0.0 5.6 19.5 0 95

sd8 0.0 287.6 0.0 3641.5 0.0 5.2 18.0 0 88

 

There's a big difference between the amount of data written in the LDOM and written in the primary.

vdisk in LDOM is an exported Volume in a ZFS Pool in primary domain, made up of two mirrored disks. No other Datasets in that pool.


Traced with iosnoop from DTrace Toolkit I saw that there is heavy write I/O load from process zpool-ldom10. In the LDOM I just see a few write calls from the processes zpool-rpool and sched.

 

Has anyone an idea who causes this extra traffic and what I can do to trace that?

 

Thank you & regards,

 

Michael

  • 1. Re: Problem with I/O in LDOM
    jsavit Newbie
    Currently Being Moderated

    Hi Michael,

     

    Could you figure out who is driving the writes from the guest domains?  Maybe do 'prstat -Lm' to see who has a lot of system calls.The zpool-* processes are just workers handling load created by somebody else. There must be somebody asking for I/O to be done...  Also, could you send the output of 'zpool list' and 'zpool status'? Is this RAIDZ or mirror? Is it close to full? Are there scrub or resilvering operations going on? There are many things that could affect  ZFS performance. It also would be good to see the domain definition, as in 'ldm list -l $MYDOMAIN'

     

    I hope that's helpful to get started.... Jeff

  • 2. Re: Problem with I/O in LDOM
    955397 Newbie
    Currently Being Moderated

    Hi Jeff,

     

    thank you for answering.

    No I couldn't figure it out because the problem disappeared in the meantime. No idea why until now.

     

    I traced the read & write system calls with DTrace and what I saw was confusing to me. In the global zone I had a lot of I/O from the zpool-* process, at the same time in the nonglobal zone I saw only 2% - 5% of syscalls. That's what I wondered about.

     

    Pool is healthy, I/O from global zone to another dataset in the same pool shows no problems. Pool was 60% filled up, it's a two-way-mirror.

     

    It was not a ZFS problem in general. I just had no idea where to look further, because:

    "The zpool-* processes are just workers handling load created by somebody else. There must be somebody asking for I/O to be done..."

    Yes, but it couldn't figure out any process in the zone asking for I/O.

     

    But now everything works fine, maybe the problem "strikes back" in the future, then you will read more :-)

     

    Regards,

     

    Michael

  • 3. Re: Problem with I/O in LDOM
    jsavit Newbie
    Currently Being Moderated

    Hi Michael,

     

    I'm glad the problem went away - too bad we don't know why it happened the first time. If it reoccurs, please check back here, or better yet, on the part of the forums that deal with ZFS. Another thing to do is subscribe to the zfs-discuss@solaris-zfs.java.net mailing list. You can post questions where very ZFS-knowledgeable  people are likely to see it.

     

    regards, Jeff

  • 4. Re: Problem with I/O in LDOM
    955397 Newbie
    Currently Being Moderated

    Hi Jeff,

     

    the problem occurs again two weeks later.

    Seems we ran into a bug:

    -------------

    ZFS ARC can shrink down without memory pressure result in slow performance [ID 1404581.1]

    -------------

     

    I created a SR and had help to discover what happens here. Currently I'm updating to 11.1. Hopefully the problem will be fade away then..


    Regards,


    Michael


  • 5. Re: Problem with I/O in LDOM
    jsavit Newbie
    Currently Being Moderated

    Sorry that the problem reoccurred, and there are definitely a lot of fixes in Solaris 11.1 - make sure you go to a recent SRU and check out the README information.  I'm still concerned about the very high service times, though. Are the backend disks zvols within a local (on the control domain) ZFS pool?  Check that the pool isn't too full, as that will affect performance.

     

    Download arcstat.pl and arc_summary.pl if you don't have them, as that gives very good information about the current and recent sizes of the ARC and the cache hit and miss statistics. Please post how how things turn out.

     

    regards, Jeff

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points