This discussion is archived
1 2 3 4 Previous Next 46 Replies Latest reply: Jan 6, 2013 7:35 AM by yxes2013 Go to original post RSS
  • 15. Re: Redo Logs Sizing
    yxes2013 Newbie
    Currently Being Moderated
    UNDO_RETENTION has NO impact on ORA-01555 errors
    You mean you can never avoid this error? Then what is the better way to tune to avoid it? Can I set retention guarantee is ENABLED? or It will cause more problems.


    Thanks
  • 16. Re: Redo Logs Sizing
    sb92075 Guru
    Currently Being Moderated
    yxes2013 wrote:
    UNDO_RETENTION has NO impact on ORA-01555 errors
    You mean you can never avoid this error? Then what is the better way to tune to avoid it?
    if/when problem is caused by COMMIT inside LOOP,
    then only solution is to only COMMIT once outside the LOOP after it completes.
    Can I set retention guarantee is ENABLED? or It will cause more problems.


    Thanks
  • 17. Re: Redo Logs Sizing
    yxes2013 Newbie
    Currently Being Moderated
    Thanks,
    if/when problem is caused by COMMIT inside LOOP,
    then only solution is to only COMMIT once outside the LOOP after it completes.
    tHIS is a Oracle Applications EBS :( I can tell when to commit outside the loop. Is there an init.ora parameter to set all process to AUTOCOMMIT ON?
  • 18. Re: Redo Logs Sizing
    Maran Viswarayar Pro
    Currently Being Moderated
    UNDO Retention should be set to the longest running query execution time...

    UNDO_RETENTION=1800 seem to be less as your query is executing 2913 sec....try to set a higher value and increase you undo tablespace size.
  • 19. Re: Redo Logs Sizing
    Rob_J Journeyer
    Currently Being Moderated
    If UNDO_RETENTION has no impact on ORA-01555 error, what error would you get in this scenario?

    UNDO_RETENTION = 1800
    Query running time = 1900

    User runs SQL and requires a data block to be read from UNDO which was updated 1900 ago. In the meantime, other users are using the system and the expired undo (anything over 1800) has been overwritten. The user's query will error, and I would say that it will be an ORA-01555? Correct me if I'm wrong.

    As was pointed out bby Maran Viswarayar, your UNDO_RETENTION should be set to the longest running query length. That's what it's there for. Yes, it can happen when there are commits inside of a loop but it can happen in this case, too, I believe.

    Actions:

    1. Increase size of REDO logs
    2. Take another statspack report for the same time period on a comparable day (in terms of load, processes running, etc) and see how it looks

    Question: Your statspack report is from 10pm - midnight, when are your peak loads? Do you have an overnight batch job window which this is, and then load during the day? If so, we will need to look at that period of time for a statspack report, also.
  • 20. Re: Redo Logs Sizing
    yxes2013 Newbie
    Currently Being Moderated
    Thanks Rob,

    Yes we have overnight jobs, rush ones(intended for year-end processing) that should be finished at 10pm, but instead the system started performing slow at this time (10pm), and all the batch jobs were not finish until morning.

    I can not understand why it seems everyone of you here not seen any problem with the statspack report? But the server got really very slow at those times.

    Which part of the report shows disk I/O contentions? Because the is what I felt was the cause.

    Can you also guide me which part of the statspack show CPU usage? and Memory Usage?

    Can you teach me how to read/interpret the statspack report please?


    Thanks a lot,

    Jan
  • 21. Re: Redo Logs Sizing
    Aman.... Oracle ACE
    Currently Being Moderated
    The report is for 119 minutes and any report longer than 30 minutes is not really very useful for the analysis. So it would be better if you limit the report duration to not more than 30 minutes and also , take it when the database is reported as slow.

    Aman....
  • 22. Re: Redo Logs Sizing
    Rob_J Journeyer
    Currently Being Moderated
    have you increased the size of your REDO logs yet? That is without a doubt causing you performance issues so that should be your number 1 priority.

    The top timed event is CPU usage, so unless you are constrained by CPU the time period that you pulled the statspack out from doesn't show too much of a problem, just points out the SQL query which could be looked at. I suspect that you will find performance issues if you pull out a statspack report from a time period when the system is being used during the day. Please confirm:

    1. Have you increased your REDO logs
    2. When is the performance "slow"?

    You would look at that top timed events and see what's in there first. That doesn't show any I/O issues, only CPU. Your total CPU available in seconds is:

    12 (cores) X 7140 (seconds in 119mins) = 85680.

    You were using 74,717 of those, which is around 87% CPU utilisation. That's quite high. You should check for CPU queuing at that time.

    There was 1 SQL statement causing 38% of that load so that would be next to look at.

    First, though, please change the REDO log sizes and then look at the stats again
  • 23. Re: Redo Logs Sizing
    Jonathan Lewis Oracle ACE Director
    Currently Being Moderated
    You have 12 CPUs, and the report is for 2 hours.
    That gives you 86,400 CPU seconds to use.


    Your CPU time reported is 74,000 seconds - so a very high percentage.

    High levels of CPU usage often lead to exaggerated latch contention.

    You are doing db file scattered reads in less than 0.5 ms - so you are constantly tablescanning (or doing index fast full scans) from a file system cache - which burns up more CPU outside Oracle. You are reporting 43,700 blocks read from disc per second - which is a bit excessive.

    You don't have a problem with redo logs - you're only generating 54K per second and you've lost about 100 seconds in total on log activity - which may be slowed down by CPU starvation.

    Check the SQL by physical reads (which is likely to be the same as the SQL by buffer gets) and see if you can make it choose a more efficient execution plan - this may be sufficient to ensure that it finished quickly enough that the ORA-01555 doesn't have a chance to occur.

    As others have pointed out, committing inside a cursor loop can easily cause ORA-01555 even if you have an enormous amount of space available for undo to grow - so I'd look for that as a logic error as well - but eliminating the massive use of resources may be the easy first option.

    Regards
    Jonathan Lewis
  • 24. Re: Redo Logs Sizing
    jgarry Guru
    Currently Being Moderated
    yxes2013 wrote:
    Thanks Rob,

    Yes we have overnight jobs, rush ones(intended for year-end processing) that should be finished at 10pm, but instead the system started performing slow at this time (10pm), and all the batch jobs were not finish until morning.
    This is exactly why I size my redo (edit: and undo) for the unusual situations.
    >
    I can not understand why it seems everyone of you here not seen any problem with the statspack report? But the server got really very slow at those times.
    Statspack is for tuning more normal situations, you keep a sample one as a baseline to compare to when normal things go bad. You use it to try to narrow down what is going bad. But you already know that, you have the alert log telling you checkpoints don't complete and redo switches way too often. At this point, all you can do is some tuning for next year when you do this (make the redo much larger, modify undo retention), followed by tuning for normal operations (put in the timeout to switch the logs, plus take a baseline statspack whenever you change anything or something in normal operations and batches is going wrong). This is an iterative process, and you may have to make new decisions about undo retention.

    >
    Which part of the report shows disk I/O contentions? Because the is what I felt was the cause.
    You have to fix the sql, whether or not it is the problem sb says. Since you can't do that, all you can do is throw more redo at it and hope for the best. Do you have a test environment to replicate the problem? Or can you just tell management it might take overnight next year too? (I have some periodic inventory processing that has wild swings in how long it takes, so that was the answer for me in that case.)

    >
    Can you also guide me which part of the statspack show CPU usage? and Memory Usage?

    Can you teach me how to read/interpret the statspack report please?
    http://jonathanlewis.wordpress.com/statspack-examples/

    There are examples here in this forum where people walk through it, too.

    Edited by: jgarry on Jan 4, 2013 9:21 AM
  • 25. Re: Redo Logs Sizing
    yxes2013 Newbie
    Currently Being Moderated
    Thanks a lot jq,
    You have to fix the sql, whether or not it is the problem sb says. Since you can't do that, all you can do is throw more redo at it and hope for the best. 
    Do you have a test environment to replicate the problem? Or can you just tell management it might take overnight next year too? 
    (I have some periodic inventory processing that has wild swings in how long it takes, so that was the answer for me in that case.)
    We have test environment to replicate the issue. But it will take an effort to build the test data, and I do not know how to monitor it better aside from using the statspack which has nothing do say. I am afraid my test will just be in vain.

    The issue usually occurs during month-end where lots of POs and Invoices and Reports are being run. No invioces means no delivery no income.
    Since all the batch programs seem accessing the same big table, so I just guess there might me I/O contention.
    What do you suggest if 20 same programs accessing the same tables by updating 20 client accounts, and generating 20 reports simultaneously?
    And each of the account manager is asking to prioritize his/her account for delivery deadline. Do I need to cache the entire table to the memory? or partition the table by account and put it
    on separate disks. So that each have its own I/O? Can you check if creating additional indexes will improve performance? How do I know which columns need more indexes based on the statspack? Do I need to display each program EXPLAIN PLAN? My instinct tells me that I must create an index to improve the performance.
    This kind of batch jobs can not afford to be delayed since this a critical need for product delivery to lots of client. For the problem which caused the delay yesterday caused cancelled orders
    worth P5million says the management. + plus the penalty for not able to meet the service level agreement.
    That why the management ask, why does Oracle can not perform well? I am afraid they might consider SAP :(

    Thanks
  • 26. Re: Redo Logs Sizing
    yxes2013 Newbie
    Currently Being Moderated
    Thanks aman,
    The report is for 119 minutes and any report longer than 30 minutes is not really very useful for the analysis. So it would be better if you limit the report duration to not more than 30 minutes and >also , take it when the database is reported as slow.
    The statspack is being run every hour because that was I read in the docs as normal stats gathering interval. So you mean If I generate statspack report from 10pm-11pm , it will not still produce
    desirable output analysis?


    Thanks
  • 27. Re: Redo Logs Sizing
    yxes2013 Newbie
    Currently Being Moderated
    Thanks Jon,
    As others have pointed out, committing inside a cursor loop can easily cause ORA-01555 even if you have an enormous amount of space available for undo to grow - so I'd look for that as a logic >error as well - but eliminating the massive use of resources may be the easy first option.
    Actually this ora-01555 seldoms happen. This is just the dupler effect of the root cause if the issue, which is slow performance. had the program been finished fast then ora-01555 may not exist.
  • 28. Re: Redo Logs Sizing
    jgarry Guru
    Currently Being Moderated
    yxes2013 wrote:
    Thanks a lot jq,
    You have to fix the sql, whether or not it is the problem sb says. Since you can't do that, all you can do is throw more redo at it and hope for the best. 
    Do you have a test environment to replicate the problem? Or can you just tell management it might take overnight next year too? 
    (I have some periodic inventory processing that has wild swings in how long it takes, so that was the answer for me in that case.)
    We have test environment to replicate the issue. But it will take an effort to build the test data, and I do not know how to monitor it better aside from using the statspack which has nothing do say. I am afraid my test will just be in vain.

    The issue usually occurs during month-end where lots of POs and Invoices and Reports are being run. No invioces means no delivery no income.
    Since all the batch programs seem accessing the same big table, so I just guess there might me I/O contention.
    What do you suggest if 20 same programs accessing the same tables by updating 20 client accounts, and generating 20 reports simultaneously?
    You ought to be able to test running the reports concurrently versus serially. Even going back 30 years I saw improvement for that. Coming into the current decade, Oracle has some sophisticated features for concurrency, but they do come at a cost. For example, since each user has their own view of the data, Oracle will make copies of the buffers they access. This can add up quickly in the SGA (which I would guess is one reason so many things now are being pushed like direct reads to the PGA, which still may require undo access to be completely consistent to an SCN, and so on. Jonathan Lewis has written some excellent pieces, and even complete books, about these kinds of things). So it can help to have a large SGA - but then, if lots of stuff is in the SGA, the optimizer may decide to thrash the CPU reading through all this in memory. I've found the EM performance screens very helpful in visualizing this kind of thing, if you can get the appropriate optional packs. In particular, if a batch is updating something while someone else is, the red spikes for TX contention stand out like, well, sore thumbs. If you have updates jamming out redo i/o, while direct reads are expecting lots of reading i/o, and the cpu is too busy servicing memory processing to get to the i/o... well, that's why system tuning is iterative, squeezing any of those balloons can have unpredictable effects on the others. See Cary Millsap's writings and optimization book for more info on that (google Method-R).
    And each of the account manager is asking to prioritize his/her account for delivery deadline. Do I need to cache the entire table to the memory? or partition the table by account and put it
    on separate disks. So that each have its own I/O? Can you check if creating additional indexes will improve performance? How do I know which columns need more indexes based on the statspack? Do I need to display each program EXPLAIN PLAN?
    Yes, spreading I/O among as many spindles as possible can be quite positive, you especially want to avoid contention between redo, archiving, undo and regular data files (for my systems, I see undo being the most used of the random access files, YMMV). Oracle is very good about automatically caching blocks that you are using, though again, month-end processing may be different - lots has been written on this, and all you can say briefly is It Depends.

    As far as indices, you can only figure that out by looking at the plans of all data access, sometimes trade-offs and decisions have to be made. It is certainly possible for one crummy report to take up way more resources than it should, and fixing such a hog can work wonders. But you only know that if you know your data and access.
    This kind of batch jobs can not afford to be delayed since this a critical need for product delivery to lots of client. For the problem which caused the delay yesterday caused cancelled orders
    worth P5million says the management. + plus the penalty for not able to meet the service level agreement.
    Yikes! Well, Cary does emphasize that the business must define performance priorities. I know I've had projects where it makes sense to just do things way out in left field to hit performance targets. Where left field means lots of expensive programming, sometimes throwing out conventional wisdom.
    That why the management ask, why does Oracle can not perform well? I am afraid they might consider SAP :(
    From all I've seen online, if you think support of Oracle performance is difficult to deal with, you ain't seen nuthin' yet. But I wouldn't know anything about it.

    It sounds like what you really need is some solid performance tuning experts to come in and evaluate things. It looks from here like there is quite a valley between your level of expertise (not making any judgement! Many of us have been there, or even live there permanently :-) ) and what the organization needs.
  • 29. Re: Redo Logs Sizing
    yxes2013 Newbie
    Currently Being Moderated
    Thanks everyone and jq for the clear explanation,

    Sorry for being redundantly repeating. I am just sharing my issues to relieve stress. :)
    AutoInvoice Transfer Program has been bugging down the Company's business, a bad start for 2013. Please help for this serious problem 
    
    Update as of this hour, there are still 139 invoices not yet downloaded due to system slow down. This has been experienced early part 
    of last night which until now, no speed progress has been achieved yet. A lot of trucks are still pending in the warehouse awaiting invoices
    and dispatch documents. As of the moment, herewith are the pending trips not yet dispatch due to the given problem. 
    All trucks are still pending in the warehouse waiting from printed invoices and gate pass which should not come out due to system slow down. 
    This was raised around 8pm that they are encountering again system slow down. We have a lot of scheduled deliveries today, almost are late 
    dispatched already and best effort if they can finished all the drops we load to them due to the delays that we have encountered. 
    And until now more than 15 trucks are still there in the warehouse bound for North, South and Central area and mostly are ABC accounts.
    This is a serious problem that we need to addressed immediately. Sales will be blaming us if we were not able to deliver on time their 
    hard earned POs. Aside from this, we are anticipating late delivery penalty from the accounts which we cannot charge to trucker
    outright if the reason is late dispatched.
    I still really do not know what to do. :(
    You ought to be able to test running the reports concurrently versus serially. Even going back 30 years I saw improvement for that
    Yes I guess this is the best solution (maybe if you are using cheap software). But the company buy this expensive product for this reason.

    The bottom line really is, There is a program A, when you run in alone (1 copy) the performance is OK. But if you run it (20 times each simultaneously against the same table, of course what do u expect? It will slowdown and even if you have big memory or lots of CPU?

    The simple analogy is, supposing you have a "movie" file in your hardisk. If you watch it alone the performance is good. But if you share your disk to 20 office-mates and all of you watch the movie
    simultaneously what will happen? I/O contention right? Does more memory and CPU help? I can not understand why is this not showing in statspack?
    It seems actually checking of the V$ tables is needed?

    It this regard how to do tune this kind of scenario?

    Does raising SR to support helps? providing them the statspack report?
    It sounds like what you really need is some solid performance tuning experts to come in and evaluate things. It looks from here like there is quite a valley between your level of expertise
    (not making any judgement! Many of us have been there, or even live there permanently ) and what the organization needs.
    Who is more expert that those people who responded here? especially SB ;)

    Supposing the issue repeats again now. Can you tell me what select statement I will run to list the program that is holding the locks so I can abort it? or something like that? Events that
    causing performance issue.


    Thanks a lot,

    Jan

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points