Forum Stats

  • 3,825,923 Users
  • 2,260,580 Discussions
  • 7,896,735 Comments

Discussions

Comparing Recovery Strategy

User_OCZ1T
User_OCZ1T Member Posts: 1,923 Gold Badge
edited Jan 13, 2020 2:45PM in Recovery Manager (RMAN)

Hi, we are using version 12.1.2.0 of Oracle Exadata. We have RMAN L1 backup scheduled on weekdays and LO runs each sunday. We do have physical standby exists which is running with zero lag and is there for disaster recovery. We have a critical database and the business need , to recover the DB object in case of any logical error and the recovery should happen within ~4hrs. We have flashback ON to cater this need as because in earlier versions even to recover a single object from RMAN backup we were supposed to restore the whole database which was time consuming so was not a viable option for us.

Now currently , we are seeing significant I/O overhead due to flashback write(flashback buf free by RVWR waits) during peak hours and those affecting application query(mainly writes/DMLS) performance. As its doing additional logging for flashback affecting application performance, So we are evaluating if we can turn off the flashback and use the RMAN backup to recover the DB objects in case of mishap without breaching sla.

1)So want to understand from experts if its advisable here to replace FLASHBACK with the recovery using RMAN? Is this a standard practice?

2)What would be the Approx. turnaround time difference between flashback VS RMAN recovery option(say for an object having size ~10TB)?

3)In current scenario,  we can't think of any good reason for take whole database back to past point in time.And in that case i think ofcourse flashback will be significantly faster than RMAN restore. but want to understand from experts if there is any such real need(or any standard  practice) in which we have to take the whole database back to a point in time?

Joerg.Sobottka

Best Answer

  • Dude!
    Dude! Member Posts: 22,829 Black Diamond
    edited Jan 9, 2020 6:13PM Answer ✓

    You can backup the standby database instead of the primary database to off-load I/O. If you don't have performance issues during RMAN backups, then you obviously don't need to.

    There are several flashback technologies. Flashback database and restore point use flashback logs in the Fast Recovery Area. Flashback drop relies on the Recycle bin. Flashback table, transaction, query, and version rely on undo. Flashback data archive collects undo and creates a data archive.

    The undo retention is 900 seconds by default. You can increase the value to several hours or a few days if space permits, but then you should configure undo retention guarantee; I suggest to discuss this in the General Database forum, since this is beyond RMAN.

    User_OCZ1T
«1

Answers

  • Dude!
    Dude! Member Posts: 22,829 Black Diamond
    edited Jan 8, 2020 8:31PM

    RMAN, by default, does not check for logical corruption, unless you specify "check logical", in which case errors will be logged.

    Keep in mind that you cannot restore and recover only parts of a database, unlike restoring individual files in a conventional file system backup. All data in a database needs to be consistent or you cannot open it. Hence you must restore and recover all data up to a certain point in time.

    In 12c you can recover a table, which is actually recovered into an auxiliary instance and then exported and imported. The same applies to point in time tablespace recovery in previous versions (TSPITR), requiring an auxiliary database to use export/import to restore data. You cannot recover (apply level 1 or archivelog) without restoring the database first (level 0, image copy). RMAN recovery can only redo and recover the database forward while rebuilding undo to undo uncommitted transaction along the way when necessary. Flashback does the opposite, sort of archiving undo, and does not require a restore of the database.

    No one can tell how much time you save using Flashback vs. Restore and Recovery, because it depends on the volume of data that needs to be restored and recovered. If you need a fast recovery option with RMAN, you might consider RMAN incremental merge, meaning you use incremental level 1 RMAN backups to recover RMAN image copies, and do this every 4 hours if you need to. This can happen in the background and you can simply switch the database to the RMAN copy in case of an emergency, which is only a matter of seconds.

    Or, find out if there's anything you can do to reduce I/O. Are you using multiple RMAN channels? Check you are not exceeding hardware limitations that result in I/O race conditions and processing overhead, slowing down performance more than necessary.

    User_OCZ1T
  • User_OCZ1T
    User_OCZ1T Member Posts: 1,923 Gold Badge
    edited Jan 9, 2020 4:53AM

    Thank you. We do use ~25 channels for RMAN backup.

    Actually we were thinking that the option flashback is only used in our case to recover the object from any data error caused by wrong DML/Code etc. So with the new option available on 12.1 as we can recover the only table or handful of tables without the need of restoring whole database, so should we switch off the flashback as its adding overhead to the database and also affecting application queries during peak hours. But as you said , about the time require for such table recovery from RMAN backup VS flashback, it only depends on the size of the table/data needs to be recovered. So perhaps we need to test both the strategies, to get to a conclusion.

    And yes the option flashback whole database , i think we wont need that in any scenario(atleast not in case of data error).

  • Joerg.Sobottka
    Joerg.Sobottka Senior Consultant and Oracle Ace Member Posts: 597 Bronze Trophy
    edited Jan 9, 2020 5:16AM

    Wow, a lot of questions in one post. I try to cover some of them.

    First is, every Backup and Recovery Strategy starts as a Recovery Strategy. So you have some special SLAs you must fullfil and you have to think about, how can you fullfil them.

    For flashback. like you have written, you need to write in addition to the archive logs also flashback logs to the FRA. This is an additional I/O and may can lead to problems, if you are in a bottleneck situation.

    Restoring (and recovering) big files also means a lot I/O and if you have to get things over network, you get an additional bottleneck situation there, so you are hitten twice.

    Like you have said, what is the SLA and the situation behind from business/application intention? You have standby databases in place, maybe you can have another, additional, one with a time lag on applying with 4 hours?

    You could also think about doing Level 1 backups every hour and restore a database file copy (rolling forward image copies) with that.

    Or you just take RMAN to restore a database table to another schema...

    It's really depending on the business/application needs....

    Regards

    Jörg

    User_OCZ1T
  • Dude!
    Dude! Member Posts: 22,829 Black Diamond
    edited Jan 9, 2020 5:38AM

    What is the decision to use 25 channels based on? More channels do not essentially translate to better performance and can cause the opposite effect depending on your hardware.

    Flashback has limitations, for example, cannot be used to undo purged objects, or undo structural changes. From what I understand, the new feature in 12c to recover tables still needs to create an auxiliary database and use data pump export and import under the hood, but automates the process.

    Enabling or disabling features to address performance is usually not the best strategy. Instead, figure out what you need and then see what can be done to maintain performance.

  • User_OCZ1T
    User_OCZ1T Member Posts: 1,923 Gold Badge
    edited Jan 9, 2020 9:24AM

    "What is the decision to use 25 channels based on? More channels do not essentially translate to better performance and can cause the opposite effect depending on your hardware."

    Actually we have that setup since long back and currently we dont see any issue during L1 backup though. Or you can say the peak load is not coinciding with the RMAN L1 backup. So we dont have any issue with that and the backup does finish within ~1hrs.

    "Flashback has limitations, for example, cannot be used to undo purged objects, or undo structural changes. From what I understand, the new feature in 12c to recover tables still needs to create an auxiliary database and use data pump export and import under the hood, but automates the process."

    Yes so its the system related tables/schema s which will be created rather the all the objects as its used to happen eralier version(before 12.1). So my thought is , it should be quick enough as compared to earlier.

    "Enabling or disabling features to address performance is usually not the best strategy. Instead, figure out what you need and then see what can be done to maintain performance."

    Agreed. I think its the best and fully tested feature for quick recovery. But as we know there exists really lot of overhead associated and seeing many notes to re-evaluate the need before turning this feature on. As it also have UNDO related overhead which we did see in our case and its matching to the symptoms explained in these below threads. So we are kind of thinking re-evaluating the flashback option. Do you think its still not the good approach to turn off the flashback? Do you encounter any scenario in which we may be in trouble without this(flashback) turned on but only RMAN backup available with us?

    https://blog.dbi-services.com/flashback-logging-overhead-db-file-sequential-read-on-undo-tablespace/

    https://grokbase.com/t/freelists.org/oracle-l/11akah0b65/severe-performance-issues-when-flashback-is-on

    https://activeoracle.wordpress.com/2015/06/22/flashback-on-and-insert-spending-time-on-undo/

  • Dude!
    Dude! Member Posts: 22,829 Black Diamond
    edited Jan 9, 2020 12:51PM

    You can also look at it form another perspective: Flashback database cannot by itself recover from media failure, hence it is no replacement or substitute for RMAN. Flashback is a nice to have but not a must have similar to running in archivelog mode or having a reasonable RMAN backup and recovery strategy. Flashback can complement a backup strategy if you need a convenient and quick option to recover from logical errors, and user errors.

    We do have physical standby exists which is running with zero lag and is there for disaster recovery. We have a critical database and the business need , to recover the DB object in case of any logical error and the recovery should happen within ~4hrs. We have flashback ON to cater this need as because in earlier versions even to recover a single object from RMAN backup we were supposed to restore the whole database which was time consuming so was not a viable option for us. 

    Logically corrupt blocks are usually a result of a failure in the Oracle software, or memory corruption.

    Like I mentioned before, RMAN, by default, does not check for logical corruption. If this is a must have, then you could use RMAN backup "check logical" and monitor the V$DATABASE_BLOCK_CORRUPTION view. This arguably adds some administration overhead when problems are found. Flashback database is more convenient to address logical corruption or user errors on the fly, but it is also subject to the Flashback retention target. If you discover logical corruption outside the retention period, Flashback won't safe you.

    Perhaps you're better off with Flashback Data Archive to track and store transactional changes to a specific table over its lifetime, and forget about Flashback Database if you cannot afford the additional performance and disk space requirements. This may also serve some "audit requirements" purpose. You need to evaluate the requirement to protect from logical corruption as your justification for Flashback, which is "expensive", or rely on RMAN to recover from such scenarios, in addition to monitoring the V$DATABASE_BLOCK_CORRUPTION view.

    Since you are also running a standby database, have you considered that the solution to your performance problem could be to backup the standby database instead of your primary database?

    User_OCZ1T
  • User_OCZ1T
    User_OCZ1T Member Posts: 1,923 Gold Badge
    edited Jan 9, 2020 2:52PM

    Yes, so RMAN with 12.1 feature seems must to have and making flashback as a good to have kind of feature and we need to check if we can afford the additional time RMAN will take to perform object recovery in case of data error so we can turn off flashback.

    In dev database we have flashback off but when i tried executing the 'select as of timestamp'  query ona table it worked. And also 'flashback table before drop' worked fine. So does it mean that these options only need undo and don't really need flashback to be turned on at database level? And in that case can we just increase the undo retention value to higher(24hr or so) and set it as guaranteed undo retention true and then this setup will eventually help us recovering data from undo and also in a quick time. And we don't have to take the additional performance hit by turning on the flashback on database level. Is my understanding correct here?

    And does this mean flashback is only help us to do the flashback whole database in quick time while other options (select as of timestamp and flashback table) can we'll be done just on undo data with enough undo retention even flashback option is turned off at dB level?

  • User_OCZ1T
    User_OCZ1T Member Posts: 1,923 Gold Badge
    edited Jan 9, 2020 3:03PM

    "Since you are also running a standby database, have you considered that the solution to your performance problem could be to backup the standby database instead of your primary database?"

    Not able to understand this point fully. By the way ,we don't see performance overhead of RMAN backup anyway on primary. It's the flashback which is hitting performance. So do you mean we should turn on flashback in the physical standby rather on the primary? I am not sure if it's possible but if it's possible then  , is it advisable to  do that way? Wont it impact the standby performance during peak and can cause large lag between primary and standby ?

  • Joerg.Sobottka
    Joerg.Sobottka Senior Consultant and Oracle Ace Member Posts: 597 Bronze Trophy
    edited Jan 9, 2020 3:49PM

    "In dev database we have flashback off but when i tried executing the 'select as of timestamp'  query ona table it worked. "

    Flashback database is something different then flashback table entries.

    While flashback database uses flashback logs and archive logs to put the database back in time (all database files) the flashback data uses UNDO tablespace and the "before images" of blocks in there. So as soon as the database overwrites the undo, you can't use select as of timestamp anymore for that block at the specified timestamp.

    User_OCZ1T
  • Dude!
    Dude! Member Posts: 22,829 Black Diamond
    edited Jan 9, 2020 6:13PM Answer ✓

    You can backup the standby database instead of the primary database to off-load I/O. If you don't have performance issues during RMAN backups, then you obviously don't need to.

    There are several flashback technologies. Flashback database and restore point use flashback logs in the Fast Recovery Area. Flashback drop relies on the Recycle bin. Flashback table, transaction, query, and version rely on undo. Flashback data archive collects undo and creates a data archive.

    The undo retention is 900 seconds by default. You can increase the value to several hours or a few days if space permits, but then you should configure undo retention guarantee; I suggest to discuss this in the General Database forum, since this is beyond RMAN.

    User_OCZ1T