8 Replies Latest reply on Jan 4, 2012 1:39 PM by andy.colvin

    ASM High redundancy on Quarter rack should be possible

      I was reading the whitepaper "Best Practices For Database Consolidation On Exadata Database Machine" http://www.oracle.com/technetwork/database/features/availability/exadata-consolidation-522500.pdf and came across this statement

      On page 5

      "There is also the option of deploying a smaller pool consisting of an Exadata X2-2 quarter rack for entry-level consolidation of non mission critical systems.This option is discouraged for business critical requirements because it does not fully support ASM high redundancy disk groups (full support for ASM high redundancy disk groups requires a minimum of Exadata X2-2 half rack)."

      As a quarter rack has 3 storage cells seems to me that high redundancy should be possible. can some one throw some light on the origin and meaning of this statement ?
        • 1. Re: ASM High redundancy on Quarter rack should be possible
          The reason that you can't have high redundancy on all diskgroups in a quarter rack is because of the requirements for OCR/voting disks when they reside within ASM. For OCR and voting disks to be placed in a high redundancy ASM diskgroup, you must have 5 failgroups. Because a quarter rack only has 3 cells, one of your diskgroups (most likely DBFS_DG) will have to be normal redundancy to meet the OCR and voting disk requirements.

          It is possible to have DATA and RECO be high redundancy on a quarter rack, but they cannot hold OCR and voting disks.
          • 2. Re: ASM High redundancy on Quarter rack should be possible
            Just to understand what happens in Exadata when I install the grid home on a quarter rack it should show me a minimum of 36 candidate grid disks on which to place the ocr which belong to 3 failgroups. ( I assume I can create some smaller sized grid disks specifically for ocr and vote if I want) Will I be able to choose high redundancy and then select 5 grid disks from among the three failgroups or will it prevent me from doing so.

            I would then be protected against failure of 2 full cells or 4 individual grid disks but not 4 individual cells. I assume that ASM high redundancy is supposed to protect against failure of 4 individual cells simultaneously ( which is a very very rare occurance I would think :) )

            • 3. Re: ASM High redundancy on Quarter rack should be possible
              ASM HIGH redundancy protects against 2 simultaneous failures (because there are 3 copies of each extent). I'm not sure where you're getting 4 failures - ASM can only handle a max of 2 simultaneous failures with HIGH redundancy - no more than 2.
              • 4. Re: ASM High redundancy on Quarter rack should be possible
                Dan is correct. Even though you have 5 failure groups with a high redundancy diskgroup containing OCR and voting (3 copies of the OCR, 5 voting disks), the standard rules for ASM redundancy are still in play. That means that the high redundancy diskgroup can only sustain the loss of 2 failure groups.

                That is why you need to keep the failgroups at the cell level. If you had 3 failgroups that were all on one cell, and that cell went down, your diskgroup would be dismounted as well.

                Edited by: andy.colvin on Jan 4, 2012 7:40 AM - modified statement about number of copies of OCR
                • 5. Re: ASM High redundancy on Quarter rack should be possible
                  I think I see where 4 comes from now. Seems like a review of failgroups may be in order.

                  In ASM, a failgroup isn't a complete copy of the data in the diskgroup. A failgroup in ASM is a grouping of disks that must have mirror copies in some other failgroup. It doesn't mean that every ASM extent is represented in every failgroup except in special cases. For example, if you have an ASM HIGH redundancy diskgroup with 3 failgroups, then you have a special case where each failgroup has a complete copy of all the data in the diskgroup. If you have an ASM HIGH redundancy diskgroup with 8 failgroups, then you still have protection against 2 simultaneous failures, but any 3 failgroups of that diskgroup will only have a portion of the data from the diskgroup. You'll need at least 6 failgroups of the 8 to remain completely intact in order to make sure you have access to all data.

                  In Exadata, each storage cell (with 12 disks each) is automatically placed in its own failgroup. In non-Exadata configurations, failgroups must be manually managed with ASM if non-default behavior is required.

                  Hope that helps.
                  • 6. Re: ASM High redundancy on Quarter rack should be possible
                    1. In 10G RAC external redundancy had a minimum of 1 candidate disk, normal had two and high had three disks. I assume the same is true for 11G RAC.

                    2. The 11g ASM OCR diskgroup requires 2 candidate disks for external , 3 for normal and 5 for high. These are different than required for other disk groups like reco and data.

                    3. It is recommended but not mandatory for each OCR candidate disk to be on a separate failgroup. I assume the contention here is that only if all 5 OCR candidate disks are on different failgroups then the configuration meets the criteria for "*ASM high redundancy disk groups*" But as I haven't found this explicitly stated in the docs yet this is just my understanding.

                    4. Seeing that all 5 candidate disks used to create the diskgroup on which the ocr will reside are on separate failgroups it means that any 4 cells can fail without causing any loss of availability of the ocr & vote.

                    I would like to get a confirmation that it is not mandatory for each ocr diskgroup to be on a separate failgroup.
                    I also find it a little confusing that oracle chose to use the same terms high ,normal and external for ocr & vote since they are clearly different from standard disk group high normal and exterrnal.

                    they should probably have named them something else and prevent confusion.
                    • 7. Re: ASM High redundancy on Quarter rack should be possible
                      The standard 11gr2 docs state "To add OCR to an Oracle ASM disk group, ensure that the Oracle Clusterware stack is running and run the following command as root:

                      # ocrconfig -add +new_disk_group
                      You can run this command more than once if you add multiple OCR locations. You can have up to five OCR locations. However, each successive run must point to a different disk group."

                      which imples I can have up to 5 different diskgroups ( not candidate disks ) for the ocr/vote. Each diskgroup can be at a different redundancy level.
                      I have seen that I need *5* candidate disks to make a high redundancy diskgroup that holds the ocr and vote whereas in my prior 10g experience I only needed *3* candidate disks for the same. Can someone confirm that requiring 5 disks for high redundancy is a 11gR2 specific thing only for the ocr_vote diskgroup or the same is true for high redundancy in any disk group ?
                      • 8. Re: ASM High redundancy on Quarter rack should be possible
                        I agree that it can be a little confusing at first. I had to read the documentation several times to be sure that I was getting it right when I did my first 11.2 RAC install in our lab. It is certainly different from the 10.2/11.1 RAC configuration. In the old days, you used raw or block devices to hold the OCR and voting disks, and they were always separate from each other. With 11.2, Oracle wants to place those files within ASM. One of the key contributing factors to this was Exadata, along with furthering the reach of ASM (just conjecture, not stated as fact). On a normal 11.2 RAC build (non-Exadata), I always separate the OCR/voting diskgroup from the others, in order to keep the differences clear. Also, most non-Exadata RAC builds do not use ASM redundancy for the DATA and RECO diskgroups.

                        If you have a diskgroup that contains voting disks along with other types of files, only the voting disks hold to the rule of 3 or 5 copies for normal or high redundancy. The other datafiles will still only have 2 copies for normal redundancy, and 3 copies for high redundancy.

                        I guess we need to split this up into 2 categories, because OCR and voting disks are treated differently. They are normally kept together because that is what Oracle's installer does. Most shops don't change the OCR and voting disk configurations once a cluster has been configured properly. OCR can have multiple copies in different diskgroups, but voting disks have to be in the same diskgroup. When you place OCR in a high redundancy diskgroup, there are 3 copies of the OCR (I was incorrect in my assertion above that there are 5 copies of the OCR). In order to keep the cluster up and running in the event of the diskgroup containing OCR going offline, you can create up to 4 additional (5 total) OCR locations. That doesn't help you on Exadata, because the diskgroups all share the same physical disks. If one diskgroup goes offline, then the others likely will as well (unless you have a mixture of normal and high redundancy diskgroups, and lost 2 cells, etc).

                        For voting disks, all voting disks are placed in the same diskgroup. There is no function like the "ocrconfig -add +new_disk_group" command to add another location containing voting disks. You can only use "crsctl replace votedisk" to make changes to the voting disk configuration. Storing voting disks in an ASM diskgroup is where the number of failgroups comes into play. I'm just going to quote the documentation on this because it's easier (http://goo.gl/eMrQM). Emphasis added is mine:
                        Storing Voting Disks on Oracle ASM

                        Oracle ASM manages voting disks differently from other files that it stores. If you choose to store your voting disks in Oracle ASM, then Oracle ASM stores all the voting disks for the cluster in the disk group you choose. You cannot use voting disks stored in Oracle ASM and voting disks not stored in Oracle ASM in the same cluster.

                        Once you configure voting disks on Oracle ASM, you can only make changes to the voting disks' configuration using the crsctl replace votedisk command. This is true even in cases where there are no working voting disks. Despite the fact that crsctl query css votedisk reports zero vote disks in use, Oracle Clusterware remembers the fact that Oracle ASM was in use and the replace verb is required. Only after you use the replace verb to move voting disks back to non-Oracle ASM storage are the verbs add css votedisk and delete css votedisk again usable.

                        The number of voting files you can store in a particular Oracle ASM disk group depends upon the redundancy of the disk group.

                        External redundancy: A disk group with external redundancy can store only one voting disk

                        Normal redundancy: A disk group with normal redundancy stores three voting disks

                        High redundancy: A disk group with high redundancy stores five voting disks

                        By default, Oracle ASM puts each voting disk in its own failure group within the disk group. A failure group is a subset of the disks in a disk group. Failure groups define disks that share components, such that if one fails then other disks sharing the component might also fail. An example of what you might define as a failure group would be a set of SCSI disks sharing the same SCSI controller. Failure groups are used to determine which Oracle ASM disks to use for storing redundant data. For example, if two-way mirroring is specified for a file, then redundant copies of file extents must be stored in separate failure groups.

                        If voting disks are stored on Oracle ASM with normal or high redundancy, and the storage hardware in one failure group suffers a failure, then if there is another disk available in a disk group in an unaffected failure group, Oracle ASM recovers the voting disk in the unaffected failure group.

                        A normal redundancy disk group must contain at least two failure groups but if you are storing your voting disks on Oracle ASM, then a normal redundancy disk group must contain at least three failure groups. A high redundancy disk group must contain at least three failure groups. However, Oracle recommends using several failure groups. A small number of failure groups, or failure groups of uneven capacity, can create allocation problems that prevent full use of all of the available storage.

                        You must specify enough failure groups in each disk group to support the redundancy type for that disk group.

                        Using the crsctl replace votedisk command, you can move a given set of voting disks from one Oracle ASM disk group into another, or onto a certified file system. If you move voting disks from one Oracle ASM disk group to another, then you can change the number of voting disks by placing them in a disk group of a different redundancy level as the former disk group.

                        You cannot directly influence the number of voting disks in one disk group.

                        You cannot use the crsctl add | delete votedisk commands on voting disks stored in Oracle ASM disk groups because Oracle ASM manages the number of voting disks according to the redundancy level of the disk group.

                        You cannot add a voting disk to a cluster file system if the voting disks are stored in an Oracle ASM disk group. Oracle does not support having voting disks in Oracle ASM and directly on a cluster file system for the same cluster at the same time.
                        The cool thing that it mentions is that if a failgroup containing a voting disk goes offline, the cluster will automatically create a new copy in a failgroup that doesn't currently have a voting file (if such a failgroup exists).