1 2 3 Previous Next 42 Replies Latest reply on Mar 27, 2009 7:30 AM by 26741 Go to original post
      • 30. Re: Long Chkpoints,Is unable to write archive log fast enough the problem?
        75407
        Okay:

        Bucket and shovel Pompey Mathematics Called for.

        One every 3 hours implies your generating 2GB redo every 3 hours.
        ie abour 600MB / hour 300Mb per 300 mins or 200 MB every 20mins.


        Rule of thumb: Generally switching logs every 20 / 30 mins is considered good practice if running a standby..


        So even if we have the same level of response time hit it could be a 20x increase in response time every 3 hours down to a 3x increae in response time every 20 minutes.

        Additionally a 200 mb archived log copy should stress the i/o storage systems and caches too much and reduce contention at that time which is our objective. Its about 10% of a 2 GB one.



        Therefore if I was in your shoes I would (if I felt I had to do something):

        set ARCHIVE_LAG_TARGET= 1200
        set FAST_START_MTTR_TARGET = 300
        set LOG_CHECKPOINT_TIMEOUT = 0
        set LOG_CHECKPOINT_INTERVAL = 0
        set FAST_START_IO_TARGET = 0

        The first log switch after this is done may still be a big one. Thereafter they should be every 20 minutes.
        The reason for setting FAST_START_MTTR_TARGET explicitly is I think it helps make available a redo size advisor and has the side effect of limiting the number of dirty buffers and redo since last checkpoint. The last 3 are necessary if you choose to set FAST_START_MTTR_TARGET. Monitor throughput and response before and after change and be prepared to revert.

        Any change you make is your own responsibility.
        user628400 wrote:
        I ran that script and see that log switch occurs once in 3 hrs. But during backup it occurs 3 log switches occur.
        I expect the reason you have extra log switches during backup is:
        1) Some are being generated manually bythe backup script.
        2) Whilst tablespaces are in backup mode the anount of redo generated will be much, much, much greater until they are taken out of backup mode. They should only be in this long enough for you to split the mirror, and then they should be taken out of backup mode as soon as possilble after mirro split has been successful. +(I do not anticipate they need to be in backup mode when resilvering occurs at the end of the backup).+


        Good Luck - Sometime these things work - sometimes we have to learn some more factors - bigdelboy
        • 31. Re: Long Chkpoints,Is unable to write archive log fast enough the problem?
          26741
          But during backup it occurs 3 log switches occur.
          I am not surprised. When you do Split Mirror backups you are well adviced to use SWITCH LOGFILE at least, ARCHIVE LOG preferably, for, at the minimum, 1 current redo log. Some DBAs (including me) put 2 or 3 SWITCH/ARCHIVE commands or ARCHIVE LOG ALL commands.
          The ArchiveLog file of the current redo log must be generated after the DATABASE/TABLESPACE END BACKUP and before the ArchiveLog volume is "splitted". (Else your Database Mirror volume would have the datafiles in Backup mode but your ArchiveLog volume wouldn't have the corresponding Redo that captures the "END BACKUP" command when you copy both volumes to tape).

          BTW, if this is EMC are you sure that you don't have ALTER SYSTEM SUSPEND and ALTER SYSTEM RESUME commands in your backup script ?
          A SUSPEND and RESUME is required on certain storage implementations -- and I have seen this required on EMC CX storage a few years ago.
          Alternately, the EMC command set itself SUSPENDs all I/O.
          For example, see
          http://www.emcstorageinfo.com/2007/08/emc-bcv-operation-on-host-running.html


          A SUSPEND would mean that user transactions would freeze and any queries using temporary segments would also freeze. Similarly, queries trying to access buffers not in the DB_CACHE would wait for the RESUME.
          • 32. Re: Long Chkpoints,Is unable to write archive log fast enough the problem?
            mohitanchlia
            Thanks. Yes we are indeed using EMC storage but don't use SUSPEND/RESUME

            Is there any sample script that I can look at?
            • 33. Re: Long Chkpoints,Is unable to write archive log fast enough the problem?
              26741
              Ask your EMC Engineer or Sales person for the paper
              "EMC CLARiiON Database Storage Solutions: Oracle9i with SnapView in SAN Environments"
              • 34. Re: Long Chkpoints,Is unable to write archive log fast enough the problem?
                75407
                Hermant has given a link to a sample script in this previous post which I fully endorse.

                Hermant's Comment should really mean that we should recheck:
                (1) That log switches / checkpoints causing slow responses time occur at some point other than at the backup.
                (2) That the poor response time was only in the period was between 09:08:11 and 09:09:12 (and not in the period immediately following that)
                user628400 wrote:
                This is from Posted: 24-Mar-2009 12:37
                I see below messages in alert log and I see performance being degraded between 09:08:11 and 09:09:12. I can exactly pinpoint: ....
                If (1) and (2) cannot be confirmed then reducing the size of archived logs might not help us.

                If is worth checking the alert log to confirm:

                (A) The time of the first and last:
                - alter tablespace XXXX begin backup command

                and (B) the time of the first and last:
                - alter tablespace XXXX end backup command

                And rechecking there is no SUSPEND/RESUME statements recorded between.

                [http://www.emcstorageinfo.com/2007/08/emc-bcv-operation-on-host-running.html]
                Refering Hermant's link I would expect EMC commands such as those given below to issue oracle commands 'under the hood'

                symioctl begin backup –type oracle –nop
                - To include issue a set of alter tablespace xxxx begin backup commands (and possibly alter system archive log commands )

                symioctl freeze –type oracle –nop
                - To include a alter system suspend that would be recorded in the alert log.
                • 35. Re: Long Chkpoints,Is unable to write archive log fast enough the problem?
                  26741
                  See Page 21 of the white paper on 10g on SAN , if using CLARiiON


                  for example at http://singapore.emc.com/collateral/hardware/white-papers/h1290-clariion-db-stor-sol-ldv.pdf
                  • 36. Re: Long Chkpoints,Is unable to write archive log fast enough the problem?
                    mohitanchlia
                    All I see is
                    alter system checkpoint
                    alter system archive log current
                    alter system backup control ..
                    begin backup - all tablespaces are individually taken in backup mode by issuing alter tablespace <TBS> begin backup
                    Take backup.
                    end backup - for all tablespaces

                    Should we be using suspend/resume instead?
                    • 37. Re: Long Chkpoints,Is unable to write archive log fast enough the problem?
                      75407
                      user628400 wrote:
                      Should we be using suspend/resume instead?
                      - Suspend/Resume should be used in addition to the other things not in place of.
                      - If you add suspend/resume you may cause a performance hit, but this may either be not necessary, recommended or mandatory in order to have a 100% for sure usable oracle database backup; and this can depend on your EMC model.
                      user628400 wrote:
                      Take Backup.
                      - At this point I would expect you to be splitting the mirror off. The actual backup, whatever that is, should continue in parallel while the tablespaces are taken out of backup mode. (It may be your Take backup command actually spawns off a different process to do this already ... I'd just like to be certain.



                      I am thinking there is a chance we may be needing to check your backup/recovery regime is satisfactory and conforms to good practice. This thread/forum may not be the best place for this, it may be too big a job for it. I anticipate Hermant who I is better in this area than I am and who has given good references already were to efficiently comment further he might need to see / know:-
                      - The Model of EMC storage in use.
                      - A detailed description of the script used for the backup, or at least the emc command/scripts being called +(especially arround Take Backup)+
                      In all cases ensure machine names, passwords, users names are changed when posting.

                      Must go now - bigdelboy
                      • 38. Re: Long Chkpoints,Is unable to write archive log fast enough the problem?
                        26741
                        Should we be using suspend/resume instead?
                        As "bigdelboy" and I have asserted repeatedly, you should be checking the type of EMC storage you are using and talk to your EMC Vendor and Engineers -- particularly talk to EMC Support.
                        • 39. Re: Long Chkpoints,Is unable to write archive log fast enough the problem?
                          mohitanchlia
                          I understand but irrespective of what storage we use what's the advantage or disadvantage of it. Should we think of using anyways before we put tablespace in backup mode?

                          BTW: We are using EMC symmetrix
                          • 40. Re: Long Chkpoints,Is unable to write archive log fast enough the problem?
                            26741
                            Should we think of using anyways before we put tablespace in backup mode?
                            Indicates that you HAVEN'T read the document. Why would a SUSPEND be issued before the Backup mode ? A SUSPEND, if used, is AFTER the BEGIN BACKUP.

                            Or, take the trouble to search for SUSPEND in the Oracle documentation.

                            http://download.oracle.com/docs/cd/B19306_01/server.102/b14231/start.htm#sthref614

                            http://download.oracle.com/docs/cd/B19306_01/backup.102/b14191/osbackup.htm#BRADV204
                            • 41. Re: Long Chkpoints,Is unable to write archive log fast enough the problem?
                              mohitanchlia
                              So I read the documents earlier but I didn't undersdtand what's point of suspending the transaction after all IO intensive activities are done like BEGIN BACKUP.
                              • 42. Re: Long Chkpoints,Is unable to write archive log fast enough the problem?
                                26741
                                SUSPEND is not about I/O intensivity.

                                It has been available for storage implementations where the mirror-split across LUNs is not atomic. The database is normally on multiple filesystems and each filesystem on different LUNs. If the LUNs are not really as at the same point in time (eg, the storage splits one LUN pair but allows writes to continue to another LUN before splitting the secone one), then they are not an atomic split. An Oracle block that happens to span the LUNs while being written becomes "fractured". The SUSPEND is to prevent any writes to the storage while the LUN pairs are being split.

                                If your script or storage implementation uses SUSPEND, then you WOULD see application response time issues during the backup. That is what I have been saying so far.
                                You find that the script does not issue an ALTER SYSTEM SUSPEND from the Oracle instance. Just confirm with EMC that the storage also doesn't "freeze" IO while the mirror splits are occurring.
                                1 2 3 Previous Next