This content has been marked as final. Show 3 replies
High iowait is not something to be concerned about it is simply idle CPU time during which there was at least one pending I/O request. It is not a very good indicator of, well, anything really. Do you actually see heavy I/O load on the disks during this period by using 'sar' or 'iostat'?
It is a little surprising that a checkpoint on one machine would impact the other but I have some questions/comments on your setup if I may:
1. Your checkpointing parameters seem very strangely set. You only checkpoint every 10 hours or after many GB of log generation. How quickly do these datastores generate log data under normal load? You should be aiming to checkpoint every 5-10 minutes typically.
2. How many concurrent connections do you typically have to each datastore under normal conditions (as reported by ttStatus)? Hopefully it is <<150...
3. You say that these machines have just 2 cores but your setup (LogBufParallelism setting, number of connections, use of parallel replication) suggests a setup for a machine with a much higher numvber of cores...
4. Are you using RETURN RECEIPT or RETURN TWOSAFE replication?
1) Just today we do performance tests. After 7 hours of load session db write 361 log files (30 checkpoints). 1 checkoint every 14 minutes.
2) 39 and 38 connections. We do not use multithreading. Every process open only one connection for every database
3) I'm wrong. Server have 2 CPU with 4 cores. 8 cores total
4) We don't use RETURN RECEIPT or RETURN TWOSAFE
I can publish charts with CPU load. I don't see high I/O load on disk.
http://www.digilo.com/images/cpu.png - cpu chart. As you can see iowait not scale with load
Edited by: Vladimir Romanov on 15.03.2012 20:27
Okay. I don't see any reason to be concerned about iowait. It's a normal thing and doesn't tell us much about anything. The only 'issue' as far as I can tell is your statement that when a checkpoint occurs on the replication subscriber machine you see a CPU peak on the replication primary machine. This seems very unlikely so can I ask if you have checked for any correlation between the CPU peaks on the primary and other activity on the primary (for example maybe one of the datastores on the primary checkpoints). A checkpoint is a fairly CPU intensive operation and can lead to impact on the application if the system is not well balanced.
One concern I have is that you have two datastores apparently sharing the same filesystem for both checkpoints and log files. This for sure not recommended. Can you tell me:
1. Can you tell me what kind of disk storage you are using here. I'm hoping it is something like a RAID-10 stripe across several 15k rpm disks together with a cached hardware RAID controller and not just plain internal disks...
2. Are the checkpoint and log files for both datastores really in the same filesystem as it appears from the DSN definitions?
3. What kind of load (log files per hour) does the 'spr' datastore generate? Is this workload being run concurrently with the workload on the 'session' datastore during your tests?
Edited by: ChrisJenkins on Mar 16, 2012 11:09 AM