This content has been marked as final. Show 12 replies
user9175193 wrote:And WHY is rman "not an option in this environment."?
in one of our setups we have a database server (11.2) that has only binaries and config (not the datafiles) and all ASM disks that are on the SAN. The SAN has daily snapshots, so the datafiles are backed up. My question is, how do I proceed if the database server fails and needs to be reinstalled? My question is how do I tell to the grid that we have already a diskgroup with disks on the SAN. I found md_backup and md_restore, but I am not sure how / when to proceed with md_restore.
I have searched some time, but I am quite confused with the approaches, please note that RMAN is not an option in this environment.
Thanks in advance
When rejecting solutions out-of-hand, you need to specify why, so others will know the parameters of an acceptable solution ... or be able to explain to you why your rejection is invalid.
How does this work?
Have you ever tested a recovery with it?
The SAN has daily snapshots, so the datafiles are backed up.
I agree with Ed about RMAN. Given the information you have provided I would consider reviewing my entire backup and recovery plan ASAP.
Maybe you are OK, but it always raises a flag when somebody says SAN snapshot without a fair amount of detail.
When somebody tells me RMAN isn't an option my knee jerk is the database is setup wrong.
Please provide more details.
Snapshots Are NOT Backups
even if you storage has snapshot, do you set the database into backup mode?
=> Alter database begin backup;
=> Storage snapshot
=> Alter database end backup;
Alternatively you could shutdown the database, which would have a similar effect, but if you use storage snapshotting I assume your database needs to be online all the time...
If you don't do the above, then there is no guarantee your backup concept works at all. Even a storage snapshot while taking a very short time, might snapshot one block exactly at that time, when the dbwriter/logwriter writes it, hence may corrupt the contents.
Furthermore a Snapshot alone is not necessarily a backup. If your storage does not physically move the block afterwards, but rely on the changes before (e.g. copy on write technology), then if you have a physical problem with the block, your backup will also have one.
Last but not least, I hope you have archive log mode enabled. Otherwise your snapshot backup won't work.... or at least a consistent copy is not available.
Now regarding your question:
You only need to restore/reinstall GI and the database software. In case of storage snapshot it is best practice to use a separate diskgroup for OCR and voting, which would allow you to simply create a new diskgroup for the new installation.
As soon as ASM is running with the new software, you should be able to mount the other diskgroups (containing your data and FRA). All which is left is to reregister the database with srvctl, that it gets automatically started (or use an OCR/OLR backup which has all the information included).
And btw. RMAN can register storage snapshot.
So to be more clear,
I am a dba taking over a database and at this point I am evaluating disaster recovery plan. I am gathering information on the go, so far been said, snapshots are taking care of backups, trying to find way to restore. RMAN at least in the form of classical backup, is not an option because of the time it takes, the database should be up and running in about 10 minutes and it has more than 40 TB of data (changing quickly, so full backups are needed).
I didn't know I can register snapshots via RMAN, I'll check with SAN team to see, what is done in that respect.
Allow me to rephrase question: Database server is down, but all disks on SAN are ok. Basically I need to reinstall just the software and reconnect the disks. Here I don't know how - recreate empty ASM instance + empty diskgroups and then somehow connect all disks? Where (if) I do use md_restore?
Thanks for your answers.
my advice for your problem: Use Data Guard and Fast Start Failover, if you only have 10 minutes.
You will never get a system build up if it fails in 10 minutes....
Even if you setup a standby system and try to mount the snapshot there, then you are left with the "rezoning" and everything, which definitely will take more than 10 minutes.....
And what do you do if your storage fails or has a problem? Do you think, the storage people can get the snapshot rezoned in less than 10 minutes? Let alone that they have to be informed to do something....
And btw, what do you think happens if the storage fails? And the snapshot is from the night? Even if you think the storage manages to get the snapshot up and running in 1 minute and your database restarts...
To get the data to the acutal date, all archive redologs must be applied on top. And if your database is really changing that fast, do you think applying all the archive redologs will take less than 10 minutes, if you snapshot was from the night and you have 5pm in the afternoon?
I don't think classical backup recovery (no matter if snapshot or RMAN) will get you what is needed.
O.k. that changes things. You should have mentioned that in the beginning of your post, that you shutdown the database before you doing the snapshot and your are not using archivelog mode.
I would advise to also dismount the diskgroups before doing the snapshots.
However when was the last time you tried the archivelog mode? 11g increased the performance a lot.
Note: In this case the snapshot is only a point in time view. In case of an error, you will loose data. (The last hours from your last snapshot).
Now regarding your questions:
=> You will install a new system with GI and database software. You will need a new diskgroup (e.g. INFRA) for the new OCR and Voting disks (they should not be contained in the other diskgroups like DATA and FRA, so no need to snapshot the old INFRA).
=> Then simply present the LUNs from DATA and FRA diskgroup to the new server. ASM will be able to mount the diskgroups as soon as all disks are available (and have the right permission). No need for md_backup or restore. MD_BACKUP backups the contents of the ASM headers, but since you still have the disks this meta information is still intact.
=> Reregister the database and the services with the Grid Infrastructure (srvctl add database / srvctl add service etc.). Just make sure to point to the correct spfile (you can find that e.g. with ASMCMD).
=> Start database...
That is if your snapshot did work. Just have another forum thread, where a diskgroup got corrupted due to snapshot (luckily only FRA).
And just as reminder: A snapshot is not a backup. Depending on how the storage is doing the snapshot, you should take precautions to move it to separate disks and verfiy it (that it is usable).
So if I understand well, I install Grid, then create some (any) ASM instance with just 1 disk and different disk group name. Then the other (old) disks should be visible and form itself the complete diskgroup?
I am in process of testing, so just want to confirm, to avoid unnecessary re-installations.
the asm_disktring (spfile parameter) defines what disks are seen by ASM.
If ASM finds valid disks he will be able to mount the diskgroup.
There are just some important points:
a.) Every disks should only be seen once (multipathing)
b.) All disks of a diskgroup must be there to mount it
c.) You should not have diskgroups with the same name.