4 Replies Latest reply: Mar 29, 2013 1:19 AM by 907779 RSS

    Modify timeout for voting files

    Christian
      Hey, is there a way to modify the default timeout for the voting files. The idea: I have a (soft) grid storage. So I configured external redundancy for my ocr diskgroup where my votingfile is located.

      When one of the mirrored storaged gets switched off, the system hungs for nearly 2 minutes. The voting timeout is set to 99 seconds, right ? This should be the default setting, at least according to the crs alert log.

      Is it possible to modify this value ?

      At the moment, the database gets shutdown, after expiring the timeout.

      Christian
        • 1. Re: Modify timeout for voting files
          Levi Pereira
          Hi,
          I recommend you read the note *CSS Timeout Computation in Oracle Clusterware [ID 294430.1]* on MOS.

          This note will help you:

          <li>Define misscount parameter
          <li>Define the default calculations for the misscount parameter
          <li>Describe Cluster Synchronization Service (CSS) heartbeats and their interrelationship
          <li>Describe the cases where the default calculation may be too sensitive

          CSS Timeout Computation in Oracle Clusterware
          The CSS misscount parameter represents the maximum time, in seconds, that a network heartbeat can be missed before entering into a cluster reconfiguration to evict the node.

          Regards,
          Levi Pereira
          • 2. Re: Modify timeout for voting files
            Christian
            Hey Levi,

            I need the timeout parameter for the I/O timeouts.

            See my logfile:
            cssd(11473)]CRS-1615:No I/O has completed after 50% of the maximum interval. Voting file ORCL:VOTE1 will be considered not functional in 99760 milliseconds
            2011-12-20 20:57:04.614
            [cssd(11473)]CRS-1614:No I/O has completed after 75% of the maximum interval. Voting file ORCL:VOTE1 will be considered not functional in 49760 milliseconds
            2011-12-20 20:57:34.610
            [cssd(11473)]CRS-1613:No I/O has completed after 90% of the maximum interval. Voting file ORCL:VOTE1 will be considered not functional in 19760 milliseconds
            2011-12-20 20:57:41.140
            [cssd(11473)]CRS-1649:An I/O error occured for voting file: ORCL:VOTE1; details at (:CSSNM00059:) in /crs/log/host1/cssd/ocssd.log.
            • 3. Re: Modify timeout for voting files
              Levi Pereira
              Hi,
              This note help you with i/o timeout, but I belive it's not your problem.

              See:
              The synchronization services component (CSS) of the Oracle Clusterware maintains two heartbeat mechanisms 1.) the disk heartbeat to the voting device and 2.) the network heartbeat across the interconnect which establish and confirm valid node membership in the cluster. Both of these heartbeat mechanisms have an associated timeout value. The disk heartbeat has an internal i/o timeout interval (DTO Disk TimeOut), in seconds, where an i/o to the voting disk must complete. The misscount parameter (MC), as stated above, is the maximum time, in seconds, that a network heartbeat can be missed. The disk heartbeat i/o timeout interval is directly related to the misscount parameter setting.

              Modifying the default value of misscount not only influences the timeout interval for the i/o to the voting disk, but also influences the tolerance for missed network heartbeats across the interconnect.

              Misscount should NOT be modified to workaround the below-mentioned issues.
              QLogic HBA cards with a Link Down Timeout greater than the default misscount.
              Bad cables to the SAN/storage array that effect i/o latencies
              SAN switch (like Brocade) failover latency greater than the default misscount
              EMC Clariion Array when trespassing the SP to the backup SP greater than default misscount
              EMC PowerPath path error detection and I/O repost and redirect greater than default misscount
              Poor SAN network configuration that creates latencies in the I/O path.
              So I configured external redundancy for my ocr diskgroup where my votingfile is located. When one of the mirrored storaged gets switched off, the system hungs for nearly 2 minutes.
              As you are using external redundancy Oracle does not know that there is a mirrored disk from behind.
              Perhaps the OS or Storage are holding I/O when you stop the mirroring due to a misconfiguration. I believe this problem is related to OS or Storage not the Oracle Clusterware.
              If you perform this test with the diskgroup (external redundancy) that store data will have the same result.


              Regards,
              Levi Pereira
              • 4. Re: Modify timeout for voting files
                907779
                Hi Chirstian,

                did you solve your problem? We have the same issue on our 4 Node RAC while doing a failover in the SAN Virtualisation Appliance.

                Grüße aus Tirol
                Stefan