Happy New Year Folks!
Hoping someone has a clear head today and can help answer the following.
Set up: Oracle 11g 2 node RAC on Windows 2008 running ASM
An alert has been raised on both ASM instances as follows: Disk Group OCR_VOTE requires rebalance because the space usage imbalance between disks is high.
I have manually ran a rebalance on both instances but this has not cleared the alert. I have also tried to run the rebalance via EM but that doesn't work either.
I have connects as sys with SYSASM and run the follwing:
ALTER DISKGROUP OCR_VOTE REBALANCE POWER 5;
This appears to work as I get the output Diskgroup Altered but the alert remains.
Any ideas of what I am doing wrong or is this one of those annoying alerts that never clear.
Any help would be appreciated.
Not familiar with SANs usage by Windows. Do you by Fileshare mean SMB protocol between file server and database server?
And how is it configured RAID/redundancy wise? Is RAID used on the file server to provide a LUN via SMB that the Windows server use as a NTFS file system? Or are the raw disks on the file server presented as is (as a scsi device) to the Windows server (no RAID)?
What does ASM use as raw devices? The actual SMB device? A "raw file" on the SMB mount/map drive? Something else?
The ASM diskgroup for voting and OCR files are by default required as a high redundancy diskgroup (3 way "mirror"). One need to explicitly override this at install time if the LUN provided by the SAN is already redundant (RAID'ed) and external redundancy is used.
How is your OCR_VOTE diskgroup configured (ito disks, failgroups and redundancy)?
Are there any outstanding rebalancing processes in ASM (what does select * from v$asm_operation show)?
I'm not familiar with Windows SAN usage either !
I didn't set up the DB so I'll need to go back to the clients to answer the questions regarding the set up of LUNs and the OCR_VOTE diskgroup set up as EM is not giving me a lot of information.
I can confirm that there are no records in the v$asm_operation view - I have run the job via EM and via SQLPLUS connecting to both ASM instances.
I'll try and report back later with more info.....
Does not make much sense to me... Windows Server has a SAN s/w component that supports LUNs, multiple I/O paths and so on - all the usual stuff one would expect with a SAN (running Fibre or Infiniband as I/O fabric layer). Why is this not used?
It sounds, from your description, like a typical head-up-behind Windows-Duh! approach (hack) to storage.
If the storage layer is reporting unexpected stuff (wrt to things like total size/available space), ASM could very well be confused. For example, if high redundancy is used for a diskgroup and the disks/LUNs/whatever used are reporting inconsistent/different sizes, it would want to balance the diskgroup.
What is the output to the following sqlplus script?
break on diskgroup skip 1
nvl(g.name, '<not mounted>') as "DISKGROUP",
from v$asm_disk d,
where d.group_number = g.group_number (+ )
and 'MOUNTED' = g.state (+ )
Unfortunately this is all the info I have in regards to the setup and the client does not have and design docs and I don't have the access/knowledge to go digging
Here is the ouput of the query:
It seems like OCR_VOTE diskgroup is a normal redundant diskgroup with 2 fail groups? Or was it created with external redundancy?
Either way, there is an inbalance as device ORCLDISKOCR0 is 10237MB in size, and device ORCLDISKVOTING0 is 20234MB in size. This means that irrespective of normal redundancy (fail group "mirrors"), or external redundancy (disks "striped"), there will be a problem due to the difference in size.
ASM recommendations are disks of the same speed, and disks of the same size, in a diskgroup.
ASM supports diskgroups consisting of different sized disks. The how depends however on the redundancy configured for that diskgroup.
Read support note 460155.1 - it explains issues around freespace and extent allocation using disks with different capacity, and how ASM approaches the balancing issue on such diskgroups.
I would however look at only using disks that are the same size in a diskgroup - as that is the recommendation.
As far as I know an automatic disk rebalance occurs when the underlying disk configuration changes, in which case ASM needs to rebalance for performance and data redundancy reasons. If you use disks with different sizes then performance and use of free space will not be optimal.
What I suspect in your situation is unreliable communication with your storage array, which could be a bad cable, bad storage controller, bad firmware, bad software driver or device. Or perhaps storage provide by your SAN is dynamic, which will confuse ASM.