Verifying I/O Activity Balance Across Disks in ASM

Version 2

    Verifying I/O Activity Balance Across Disks in Oracle Automatic Storage Management

     

    by Pini Dibask 

     

    This article describes Oracle Automatic Storage Management's mirroring and striping capabilities, focusing on how to identify unbalanced I/O operations.

     

    Since it was introduced in 2003 with Oracle Database 10g, the Oracle Automatic Storage Management (Oracle ASM) feature has become a part of almost every Oracle DBA's life. Oracle ASM is the recommended storage management solution from Oracle.

     

    Oracle ASM has some great capabilities and advantages such as ease of management, high performance, and low overhead. It also solves many storage challenges, because it allows removing/adding disks online and it will automatically rebalance the data across the disks to avoid a bottleneck on a specific disk; thus, it provides the best performance.

     

    Oracle ASM provides striping and mirroring capabilities that are similar to RAID 10; however, Oracle ASM doesn't behave exactly as RAID and it can be even smarter due to the fact that it's aware of the different Oracle Database file types. So it can stripe data differently for different file types, while traditional RAID stripes at the block level aren't aware of the different Oracle Database file types. Oracle ASM is also more flexible in terms of mirroring, because it allows different types of mirroring for different files.

     

    Striping

     

    The goal of striping is to maximize the storage subsystem throughput by balancing the I/O load across several disks, which results in a better performance. By doing so, the I/O latency will be reduced because balancing removes bottleneck from one specific disk.

     

    Oracle ASM will stripe the data in small chunks of 128 KB for lower I/O latency for small I/O operations such as writing redo log entries to the redo log files (fine-grained striping). For data files, for example, Oracle ASM will stripe the data in bigger chunks that are equal to the Allocation Unit Size (coarse-grained striping).

     

    Oracle ASM writes in a round-robin fashion across the disks in the disk group—this is why small disks will be filled up faster than large disks, which can cause more frequent rebalance operations. Therefore, a known best practice is to use disks with the same characteristics (such as identical size and performance).

     

    Mirroring

     

    Oracle ASM mirrors the data based on file extents. Each file extent contains one or more allocation units. Oracle ASM allows us to choose, for each Oracle ASM disk group, the desired redundancy level. We can choose to work with the following:

     

    • Normal redundancy: Each file extent has a single copy (also called two-way mirroring).
    • High redundancy: Each file extent has two copies (also called three-way mirroring).
    • External redundancy (no mirroring).

     

    Moreover, in addition to the ability to configure redundancy level per disk group, we can set the redundancy level for each specific file using Oracle ASM templates, which are another great advantage of Oracle ASM.

     

    Oracle ASM mirrors copies of file extents in separate failure groups so that if all the disks in a failure group are lost, our database will continue to function properly because there are copies of the file extents in the other failure group or failure groups. Obviously, if you choose to work with external redundancy, there will be no failure groups in the Oracle ASM disk group. The external redundancy option is commonly used in environments that use storage solutions that already take care of the protection of and the distribution of their data across multiple disks using RAID.  The most common RAID options for Oracle Database are RAID 1+0 (also called RAID 10) and RAID 5. Both provide data protection and striping across disks. In RAID 10, each separate set of disks (usually pairs) is mirrored individually and striping occurs on top of the mirrored sets of disks. RAID 5 distributes the data blocks as well as the parity blocks across all disks. The parity block can be used upon a disk failure for calculating the missing data stripe.  RAID 10 is considered to be the best RAID option for Oracle Database in terms of performance.

     

    During the disk creation ("ALTER DISKGROUP ...  ADD" clause), disk deletion ("ALTER DISKGROUP ... DROP" clause"), and resize ("ALTER DISKGROUP ... RESIZE" clause) operations, Oracle ASM ensures that the file extents will be equally distributed across the disks in the Oracle ASM disk group so the storage should be balanced across the disks in the Oracle ASM disk group.

     

    Figure 1 contrasts external redundancy with no Oracle ASM mirroring, normal two-way mirroring provided by Oracle ASM, and high-redundancy three-way mirroring provided by Oracle ASM. Each square within the disks represents an Oracle ASM file extent.

     

    f1.png

    Figure 1. Mirroring choices with Oracle ASM

     

    Verifying that I/O Activity Is Equally Balanced Across the Disks

     

    So far, we've reviewed the basics of the Oracle ASM striping and mirroring mechanisms. As I've already mentioned, as part of the Oracle ASM mirroring capabilities, during the creation of a new file, Oracle ASM ensures that file extents will be equally distributed, and as part of the Oracle ASM striping capabilities, Oracle ASM writes in a round-robin fashion in order to balance the load across several disks.

     

    Although Oracle ASM does its best to equally balance the load across the disks, you might wonder how you can verify that the actual I/O activity (that is, I/O read requests and I/O write requests) is equally balanced across the disks in the Oracle ASM disk group.

     

    In order to check whether the I/O activity is balanced across the disks, you can use the V$ASM_DISK_STAT dynamic performance view, which provides performance statistics per each disk allocated to the Oracle ASM instance. This dynamic performance view holds information about the total number of I/O read requests (READS column) as well as the total number of I/O write requests (WRITES column) for each disk allocated to the Oracle ASM instance.

     

    The following formula can be used in order to calculate the I/O activity balance across the disks in the Oracle ASM disk group:

     

    [100 * (1 – sum of the deviation from the avg_io/total_io)]

     

    This formula will return a number in the range of [0, 100], where 0 represents a disk group that is not balanced at all (all the load goes to one specific disk) and 100 represents a disk group that is totally balanced (that is, the I/O requests and time spent per disk are equally distributed across the disks in the Oracle ASM disk group). As a rough estimation, I would say that we should expect a result that is higher than 80.

     

    Examples

     

    Let's say you have five disks allocated to your "DATA" Oracle ASM disk group. The total number of I/O requests in the disk group is 500 and each disk is associated with 100 I/O requests. The average number of I/O requests per disk is 100 (500/5), and the deviation from the average is 0. In this scenario, the disk group I/O activity is totally balanced and we will get an output of

     

    100 * (1 – 0/500) = 100 * (1 – 0) = 100 * 1 = 100

     

    Or, let's say you have five disks allocated to your "DATA" Oracle ASM disk group. The total number of I/O requests in the disk group is 1,000. Two disks are associated with 50 I/O requests each, and each of the other three disks is associated with 300 I/O requests. The average number of I/O requests per disk is 200 (1,000/5) and the total deviation from the average is 600. In this scenario, the disk group I/O activity is not distributed very well across the disks and we will get an output of

     

    100 * (1 – 600/1,000) = 100 * (1 – 0.6) = 100 * 0.4 =40

     

    Finally, let's say you have five disks allocated to your "DATA" Oracle ASM disk group. The total number of I/O requests in the disk group is 1,000. One disk is associated with 1,000 I/O requests and the rest of the disks have no activity at all (zero I/O requests).The average number of I/O requests per disk is 200 (1,000/5), and the total deviation from the average is 1,000. In this scenario, the disk group I/O activity is not balanced at all and we will get an output of

     

    100 * (1 – 1,000/1,000) = 100 * (1 – 1) = 100 * 0 = 0

     

    Query Text

     

    The following script can assist in identifying unbalanced I/O operations:

     

    SELECT dg.group_number "GROUP#",
           dg.name,
           DECODE (total_dg.total_io, 0, 100, 100 * (DECODE (SIGN (1 - df.sum_io / total_dg.total_io), -1, 0, (1 - df.sum_io / total_dg.total_io)))) "IO_BALANCED"
      FROM (SELECT d.group_number group_number,
                     SUM (ABS ((d.reads + d.writes) - tot.avg_io)) sum_io
                FROM v$asm_disk_stat d,
                     (SELECT group_number,
                               SUM (reads) + SUM (writes),
                               DECODE (COUNT (*), 0, 0, (SUM (reads) + SUM (writes)) / COUNT (*)) avg_io
                          FROM v$asm_disk_stat
                         WHERE header_status = 'MEMBER'
                      GROUP BY group_number) tot
               WHERE header_status = 'MEMBER' AND tot.group_number = d.group_number
            GROUP BY d.group_number) df,
           (SELECT group_number,
                     SUM (reads) + SUM (writes) total_io
                FROM v$asm_disk_stat
               WHERE header_status = 'MEMBER'
            GROUP BY group_number) total_dg,
            V$ASM_DISKGROUP dg
     WHERE df.group_number = total_dg.group_number
     AND df.group_number = dg.group_number
    

     

    Note: The V$ASM_DISK _STAT dynamic view provides performance statistics per disk allocated to the Oracle ASM instance that are

    cumulative since instance startup. Therefore, if you've added an additional disk or disks to the Oracle ASM disk group during the time that the instance has been up and running, it's likely that the query will report that the I/O activity is not balanced across the disks in the Oracle ASM disk group, because for some disks the I/O operations that the V$ASM_DISK_STAT dynamic view reports is based on a longer period than for other disks.

    Sample Output #1—Balanced Oracle ASM Disk Groups

     

    Figure 2 shows an example of an environment where there are two disk groups in the Oracle ASM instance: DATA and FRA. As you can see, the I/O activity is distributed equally across the Oracle ASM disks in each disk group. The conclusion here is that the disk groups (DATA  and FRA) are totally balanced.

     

    f2.jpg

    Figure 2. Example output when the load is balanced

     

    Sample Output #2—Unbalanced Oracle ASM Disk Groups

     

    Figure 3 shows an example of an environment where there are three disk groups in the Oracle ASM instance: DATA_EXA2, DBFS_DG, and RECO_EXA2.  As you can see, the I/O activity is distributed evenly enough across the disks in the DATA_EXA2 disk group (partially balanced), but the load is not balanced well across the disks in the DBFS_DG disk group, and the load is not balanced at all in the RECO_EXA2 disk group.

     

    f3.jpg

    Figure 3. Example output when the load is not balanced

     

    As you can see, although Oracle ASM ensures that file extents will be equally distributed across all the disks in an Oracle ASM disk group, this doesn't guarantee that the actual I/O activity will be fully balanced across the disks in the disk group, because in some cases most of the "hot" allocation units of the segments, that is, allocation units that are being accessed the most, might reside on a specific disk and not necessarily be equally balanced across the disks of the disk group.

     

    My Oracle Support Doc ID 367445.1 explains very well how you can check the storage balance across disks in Oracle ASM disk groups; however, even if the storage is balanced, this doesn't guarantee that the actual I/O activity will be balanced. So the next section provides a few tips that will reduce the chances of having unbalanced I/O operations in the Oracle ASM disks.

     

    Pro Tips

     

    • Make sure that all the disks in the same Oracle ASM disk group are the same size. Oracle ASM writes in a round-robin fashion to the Oracle ASM disk group disks; therefore, small Oracle ASM disks will be full faster than larger disks, which results in unbalanced I/O activity across the Oracle ASM disks.
    • Have at least four Oracle ASM disks allocated to each disk group.
    • All disks in the same Oracle ASM disk group should have the same performance characteristics. Allocating a 7K RPM disk and a 15K RPM disk to the same disk group, for example, is not a very good idea, because the performance will be limited to the slowest disk (that is, it will be limited to 7K RPM).
    • Check the storage balance using My Oracle Support Doc ID 367445.1 and, if needed, execute a manual rebalance command ("ALTER DISKGROUP ... REBALANCE").
    • If you are using Oracle Database version 11.1.0.7 and there is unbalanced storage across the Oracle ASM disks in the disk group, you might be experiencing Bug 7699985, which was fixed in Oracle Database 11g Release 2. If so, upgrade to Oracle Database 11g Release 2 or, even better, to Oracle Database 12c.
    • Use the query in this article to monitor the I/O activity balance across the disks.

     

    Conclusion

     

    Unbalanced I/O operations (both reads and writes) on a disk group's disks can result in I/O bottlenecks and inefficient usage of the storage resources that might lead to poor performance. Unbalanced I/O might indicate that the data is not equally spread across the Oracle ASM disks inside the disk group.

     

    In this article, I provided a query that can assist in identifying unbalanced I/O operations, and you can also check whether the unbalanced I/O operations are a result of unbalanced storage across disks in the Oracle ASM disk group by using My Oracle Support DOC ID 367445.1 (see the "See Also" section). If that's the case, consider issuing a manual rebalance command ("ALTER DISKGROUP ... REBALANCE").

     

    See Also

     

     

    About the Author

     

    Pini Dibask is a senior Oracle Database Architect with 10 years of experience. During these years, Pini has worked as an Oracle DBA and DBA team leader. He mainly specializes in performance tuning, high availability, data protection, and other database areas.  Pini is an active member of the Oracle Technology Network community forums, and is currently the Oracle Domain Expert at Dell Software group.

    Find his full contact details at his community profile