6 Replies Latest reply: Aug 14, 2013 9:33 AM by Laura Sallwasser RSS

    Questions About Incremental checkpoint on the Primary and lagging Redo Transport to Standby

    Laura Sallwasser

      Hello:

      This is a four-instance primary, 10.2.0.4 maximum performance with a single instance standby using RTA.  Both are on AIX 5.3 TL 12.  The primary is very large (about 20 TB) and the DBA recently grew the online redo logs from 1 GB to 5 GB.  The larger size is useful during the busy times; it generates about 4 logs/hour with 5 GB logs instead of around 100/hour when they were 1 GB.

       

      When activity slows we can see just one archived log every four hours.  There are multiple incremental checkpoints between log switches.  The problem is that the service destination for the standby is not selected until the primary completes all the incremental checkpoints and switches to the next log.  This means that the standby can lag the primary by many hours. 

       

      I found that the redo transport service is using the archiver; LGWR ASYNC is not specified in the related log_archived_dest_n setting.  If the redo transport service uses LGWR, will it ship the contents of the incremental checkpoints to the standby immediately?  I have been looking for notes or articles on this but so far no luck.  I would like to have some details on this before I ask the DBA to change the service definition for the standby, resize the primary online and standby redo logs, or change any of the *log* related parameters.  For example:  there are some bugs with setting archive_lag_target that could affect this 10gR2 database, and we are limited (very) by the vendor when it comes to patching or upgrading it.

       

      Thank you for your help,

       

      Laura Sallwasser

      DBA

        • 1. Re: Questions About Incremental checkpoint on the Primary and lagging Redo Transport to Standby
          mseberg

          Hello;

           

          "If real-time apply is enabled, Data Guard recovers redo data directly from the current standby redo log file as it is being filled up by the RFS process."

           

          See section 5.3.2.2 LGWR SYNC Archival Processing

           

          of Oracle® Data Guard Concepts and Administration 10g Release 2 (10.2) B14239-05

           

          6.2.1 Using Real-Time Apply to Apply Redo Data Immediately   ( same oracle document )

           

          Best Regards

           

          mseberg

          • 2. Re: Questions About Incremental checkpoint on the Primary and lagging Redo Transport to Standby
            Laura Sallwasser

            Hello

             

            mseberg:

             

            RTA is enabled; there is an MRP0 process on the standby node, and the standby views show it does apply logs.  The problem is that the standby redo log is being filled after the primary completes all the incremental checkpoints and switches to the next log.  This means that the standby can lag the primary by many hours. For example:  on the primary, log 10 starts at 00:05, and switches to log 11 at 04:00.  In between 00:05 and 04:00, there are multiple incremental checkpoints, which is expected.  However, log 10 is not selected for standby redo log until 04:00, so its contents are not added to the standby redo log until after 04:00.  The DBA added the service supporting redo transport without using LGWR ASYNC, so redo transport is using the Archiver processes to send redo to the standby redo logs.  If the redo transport service uses LGWR, will it ship the contents of the incremental checkpoints to the standby immediately?

             

            Thank you,

            Laura Sallwasser

            • 3. Re: Questions About Incremental checkpoint on the Primary and lagging Redo Transport to Standby
              mseberg

              Hello again;

               

              If Real time apply is working you should have Instantly up-to-date data.

               

              My gut feeling is your Standby Redo logs might be the wrong size, in which case RTA won't work. Are the SRL the same size as the Primary Redo?

               

              The LGWR should not be an issue

               

              Apply Services

               

              Have you looked at this document?

               

              Data Guard Real-Time Apply FAQ (Doc ID 828274.1)

               

              Also this document lists a few Oracle 10 bugs you might run into:

               

              PHYSICAL: Managed Recovery(MRP) is requestiong applied/old archivelogs (real-time apply only) (Doc ID 1159718.1)

               

              Best Regards

               

              mseberg

              • 4. Re: Questions About Incremental checkpoint on the Primary and lagging Redo Transport to Standby
                Laura Sallwasser

                Hello:

                The standby redo logs are the same size as the primary online redo logs, and there is one extra group of SRLs (more than the primary) per thread, too.  If the logs were not the same size, the standby would stop applying content, and it has not.  There is nothing wrong with the standby database as such.  RTA is working according to the note you sent me, and to the views I have from the standby. View   v$managed_standby shows that process MRP0 exists and is applying logs.  View v$dataguard_status logs messages such as "Successfully opened standby log 23:, etc." and "media recovery waiting for thread 1 sequence 165477" - etc.  View v$standby_log lists all the logs and their size is 5 GB.

                 

                The MRP is not asking for old logs, so note 1159718.1 is not an issue here.  I keep referring to LGWR for redo transport because without it, ARCn is the default redo transport vehicle.  ARCn is slower than LGRW, but there is one LGRW and many ARCn processes, and Oracle can spawn more.

                 

                Is there any way to ensure that the same data written to the primary online redo is immediately sent to the standby?  There are multiple, incremental checkpoints across four hours on the primary in between log switches.  The same log does not get selected for standby until after it it becomes an archived log on the primary.  How do I find out why this happens?  Isn't the content of the online redo including the incremental checkpoints supposed to be carried to the standby immediately?

                 

                Thanks again,

                 

                Laura Sallwasser

                • 5. Re: Questions About Incremental checkpoint on the Primary and lagging Redo Transport to Standby
                  Andrey Goryunov

                  Hello Laura,

                   

                  to enable real time apply on standby not only LGWR should be used for transport but also there should not be problems with standby logs (they should exist for each of primary threads)

                  and managed standby should be started with ...current logfile .... clause. In that case all changes coming from primary will be saved in standby redo logs and applied to standby.

                   

                  As you mentioned there still ARCH transport used to transfer redo to standby so even there are standby logs real time apply won't be used, but most likely they will be used to save redo

                  during switch of logfile at primary (as v$dataguard_status showed). And if realtime apply working properly there won't be messages like "media recovery waiting ..."

                   

                  In regards to bugs, definitely MOS is the place to check but I think mostly they are fixed in 10.2.0.3 although close testing and monitoring would be required if you want to enable real time

                  to see what values will be in v$dataguard_stats, how standby logs will be used in v$standby_log and of course lots of info will be found in alert.log

                   

                  Thanks,

                  Andrey

                  • 6. Re: Questions About Incremental checkpoint on the Primary and lagging Redo Transport to Standby
                    Laura Sallwasser

                    Hello Andrey:

                     

                    RTA is enabled correctly.  There are no messages about media recovery waiting.  I read the alert logs and the process logs daily.

                     

                    The issue is using LGWR for redo transport instead of ARC.  RTA is not dependent on which method I choose; RTA just means that the standby's log apply services recover redo from the standby redo logs at the same time the logs are being written to.  Since this is 10gR2, LGWR and ARC can both write to the standby redo logs.

                     

                    The question is how often is standby redo written to.  If I choose ARC, it happens only on archive events.  If I choose LGRW, it writes on commits from the primary. 

                     

                    Archive events were not happening frequently enough on the primary during certain hours of the day.  The standby fell behind the primary.  I've fixed this now by ensuring a minimum of 2 log switches per hour and by using LGWR to transport redo to the standby.

                     

                    Regards,

                     

                    L Sallwasser