    Disk controller and failgroup and LUN




      I read about Failgroup of ASM.I am confused.Please clear it.


      1,Any one of the disk(/devices/diska1) is failed(due to hardware issues) in a failure_group_1.We couldn't the second disk in the group until repair.Is my understanding is right?



        FAILGROUP failure_group_1 DISK

          '/devices/diska1' NAME diska1,

          '/devices/diska2' NAME diska2

        FAILGROUP failure_group_2 DISK

          '/devices/diskb1' NAME diskb1,

          '/devices/diskb2' NAME diskb2;


      2,It's confusing "  An example of a failure group is a set of SCSI disks sharing the same SCSI controller " The below link


      What is the meaning of it?


      3,We use SAN.I don't have much knowledge in Storage.Usually OS people says "LUN".I don't know about it.

      How to check/Ask does the disk  have seperate controller in a disk?Please anyone explain me briefly.




        Re: Disk controller and failgroup and LUN

          1. If a disk in failgroup1 fails, the diskgroup itself will still be available. The database will be unaware of the failure and simply continue.


          If the failed disk's repair time is exceeded, ASM marks that disk as bad and ejects it from the  failgroup (disk is dropped).


          If the failure is a cable fault and the disk fine, and the cable is repaired, and the repair time not exceeded, you can simply online the failed disk. It will be checked, enabled and rebalanced.


          If the diskgroup, with the failed disk in failgroup1, is unmounted, it cannot be mounted normally afterwards. For example disk fails and cluster/ASM is bounced or restarted, diskgroups with broken failgroups cannot auto-mount. In this case you need to manually force the mount - which will work as long as that diskgroup has a single intact failgroup. After which you can proceed to fix the broken failgroup(s).



          2. If a bunch of disks are sharing the same SCSI controller (think of it as a piece of h/w with a cable connecting a bunch of disks), and the controller fails, all disks on that controller is no longer accessible (disks may still be fine).


          So you do not want to use those disks in different failgroups or even diskgroups - as a single SCSI controller failure will cause all the disks to fail, and all the failgroups in which these disks are used to fail. Ideally a SCSI controller failure should have minimal impact - which means it should only cause a single failgroup to fail. Not multiple failgroups.



          3. A SAN creates logical devices for servers to use. These logical devices are called LUNs. A LUN can be a RAID5 or RAID10 configuration using 10 physical SAN disks.


          The LUN is seen, via the storage fabric layer (e.g. fibre channels), by a server as if it the LUN is a physical SCSI disk. Typically the fabric layer will provide multiple I/O paths to the LUN. The server will see each of the I/O paths to a single LUN, as a separate SCSI disk.


          From the server side, you do not know whether that SAN LUN is RAIDed, or how many actual SAN disks and SAN controllers are used for that LUN. Nor do you need to know that from the server side as it it not relevant to how the server use the LUN via multiple I/O paths.

          Re: Disk controller and failgroup and LUN

            1. To simplify things You can think of it like that.


            for every byte in the disks of failure_group_1 the copies are held in the disks of failure_group_2 . If you lose one disk in one of the failure groups you will be fine since you have a copy in another failure group. if you lose two disks on different failure groups then you will lose your data and you have to do a disaster recovery from your backups.


            2. "  An example of a failure group is a set of SCSI disks sharing the same SCSI controller " What it says is : In respect to redundancy, The disks in the failure group are logically connected so physical dependency among disks should be considered.


            3 .  Generally you can consider a LUN as a failure group. But it depends on the configuration with the storage guys.