This content has been marked as final. Show 4 replies
I recommend you read the note *CSS Timeout Computation in Oracle Clusterware [ID 294430.1]* on MOS.
This note will help you:
<li>Define misscount parameter
<li>Define the default calculations for the misscount parameter
<li>Describe Cluster Synchronization Service (CSS) heartbeats and their interrelationship
<li>Describe the cases where the default calculation may be too sensitive
CSS Timeout Computation in Oracle Clusterware
The CSS misscount parameter represents the maximum time, in seconds, that a network heartbeat can be missed before entering into a cluster reconfiguration to evict the node.
I need the timeout parameter for the I/O timeouts.
See my logfile:
cssd(11473)]CRS-1615:No I/O has completed after 50% of the maximum interval. Voting file ORCL:VOTE1 will be considered not functional in 99760 milliseconds
[cssd(11473)]CRS-1614:No I/O has completed after 75% of the maximum interval. Voting file ORCL:VOTE1 will be considered not functional in 49760 milliseconds
[cssd(11473)]CRS-1613:No I/O has completed after 90% of the maximum interval. Voting file ORCL:VOTE1 will be considered not functional in 19760 milliseconds
[cssd(11473)]CRS-1649:An I/O error occured for voting file: ORCL:VOTE1; details at (:CSSNM00059:) in /crs/log/host1/cssd/ocssd.log.
This note help you with i/o timeout, but I belive it's not your problem.
The synchronization services component (CSS) of the Oracle Clusterware maintains two heartbeat mechanisms 1.) the disk heartbeat to the voting device and 2.) the network heartbeat across the interconnect which establish and confirm valid node membership in the cluster. Both of these heartbeat mechanisms have an associated timeout value. The disk heartbeat has an internal i/o timeout interval (DTO Disk TimeOut), in seconds, where an i/o to the voting disk must complete. The misscount parameter (MC), as stated above, is the maximum time, in seconds, that a network heartbeat can be missed. The disk heartbeat i/o timeout interval is directly related to the misscount parameter setting.
Modifying the default value of misscount not only influences the timeout interval for the i/o to the voting disk, but also influences the tolerance for missed network heartbeats across the interconnect.
Misscount should NOT be modified to workaround the below-mentioned issues.
QLogic HBA cards with a Link Down Timeout greater than the default misscount.
Bad cables to the SAN/storage array that effect i/o latencies
SAN switch (like Brocade) failover latency greater than the default misscount
EMC Clariion Array when trespassing the SP to the backup SP greater than default misscount
EMC PowerPath path error detection and I/O repost and redirect greater than default misscount
Poor SAN network configuration that creates latencies in the I/O path.
So I configured external redundancy for my ocr diskgroup where my votingfile is located. When one of the mirrored storaged gets switched off, the system hungs for nearly 2 minutes.As you are using external redundancy Oracle does not know that there is a mirrored disk from behind.
Perhaps the OS or Storage are holding I/O when you stop the mirroring due to a misconfiguration. I believe this problem is related to OS or Storage not the Oracle Clusterware.
If you perform this test with the diskgroup (external redundancy) that store data will have the same result.
did you solve your problem? We have the same issue on our 4 Node RAC while doing a failover in the SAN Virtualisation Appliance.
Grüße aus Tirol