This content has been marked as final. Show 8 replies
user13465954 wrote:And this has to do with Java ... How exactly?
I'll be investigating and reading up further on dtrace but in the meantime if anyone could point me in the direction of a dtrace command which could enlighten me as to the IO problem it would be appreciated.
Moved from the Java Programming forum to the Dtrace Forum
Can you show iostat result when databaase show normal and bad response time ?
I've found the following syntax useful:
~/DTraceToolkit/DTraceToolkit-0.99/iosnoop -o -m /u06 -vN
(Note: this was on a V490 and LUNs on a 6140 SAN.)
It's my understanding that:
kstat -m cpu_stat | grep "iowait "
will show the number of iowaits by CPU (this is the number of I/Os waiting on each CPU when the command is run, I see mostly 0's on my V490 Oracle DB server but if recall and run the command a dozen times I can usually get a 1 for 1 or 2 CPUs (there are 8 CPUs in the box)).
I do not remember where I found the above command but it came with this explanation:
"This field is incremented whenever biowait() is called and decremented before return. boiwait() is documented in biowait(9f)."
It's hard to diagnose IO performance issues across the internet, but here are some things you need to be looking at:
1. When you're seeing your IO performance problem, what does the output of "iostat -sndxz 2" look like?
2. How are the LUNs laid out on your storage?
3. Do multiple LUNs share the same physical disks (bad for performance)?
4. Are your IO operations aligned with the LUN blocksize?
5. What kind(s) of LUNs do you have? RAID-1? RAID-5?
6. What kind of disks? SATA? FC? SAS?
It's not that hard to take a supposedly-high-performance disk system and make it run really slow. Something like a lot of really small random writes to several LUNs built with large block sizes all sharing the same drives in a RAID-5 array is a real good way to do just that.
Thanks for the replies and the example commands, appreciate its almost impossible to diagnose over the board.
Still need to do some serious digging and reading on dtrace to find the reason for the slow performance but will post a solution if I ever find it.
Thanks try and go through them one by one.....
the %b/busy column in iostat is commonly mis-understood. There is a good document in the my support knowledge base explaining why busy is at 100%. It hard to generalise but if %b is 100% with little signs of issues its normally the application (database in this case) is not utilising the hardware, eg going stuff in serials instead of parallel.
What Does %b (or %Busy) Actually Mean in the Output of iostat(1M)? [ID 1003635.1]
The 'dtrace: xxxx dynamic variable drops' messages are associated with the internal dtrace log buffers overfilling. Off the top of my head the easier way to address this is reduce the amount of data you are logging, increase the buffer size of increase the rate these buffers are flushed. Try looking at the following two documents. You may be interested in bufsize, dynvarsize and cleanrate.
No idea where 720 IOPS came from. Very rough rule of thumbs is if you are getting 500 iops from a standard disk your are doing well. With someone workloads you may only see 150 iops. If you look at the specs for some of the arrays your talking 100k iops if not more.
Stuff changes all the time but last time I looked response times within iostat are the amount of time an io is on the wire PLUS the time to complete that io in software PLUS the time to schedule the next io. So get the numbers iostat reports and compare it those the array is reporting. Are they the same (array saturated?) or different (software issue?) or something else.
Anyway, play with dtrace and see what you can work out.
All rather interesting for a nerd.