This discussion is archived
1 2 3 4 Previous Next 54 Replies Latest reply: Jan 15, 2013 4:29 PM by jgarry Go to original post RSS
  • 30. Re: Performance Issue
    Rob_J Journeyer
    Currently Being Moderated
    yxes2013 wrote:
    I thank you all,


    1. Are the queries optimal?
    I dont know :( do I need to show you the explain plan. How do I know if its optimal?
    Yes, that is what we would have to do to look at the SQL queries in more detail.

    >
    2. If so, how much data are they dealing with? KBs, MBs, GBs?
    They said if just around 200 invoices. Compared to others I see in metalink that logs issue whiich has 10,000 PO's invioces.
    How large are the tables that you are dealing with? What we need to know is how much data are you reading from disk, into the SGA (possibly PGA if you are doing sorts, too) and how does that compare to the size you have for the db_cache, which controls how much data is available for the database blocks in the SGA.
    3. Can you increase your SGA to handle more appropriately the volume required by the queries?
    Is it already 1.7 which is the limit of 32 bit Oracle. which we have.
    It might be set to limit at 1.7GB, which is why you have the SGA_MAX_SIZE set, but your db_cache_size is 288MB. I could be wrong but I think without having the benefit of automatic memory management by using the sga_target parameter, the db_cache_size on 9i will be an upper limit?
    4. Also note that the queries in your good and bay days are different so the load pattern is not comparable.
    It maybe that that queries has finished fast and they have done other programs being caught in that time range. Do I need generate statspack starting 8am to 5pm? so that everything is captured?
    You could do this. It all depends on how your DB is used. Can users just log on and run reports/custom queries whenever they like? Or is it just one application with predictable, repeated actions which are run over and over each day? If the former, your performance will not be as predictable because users could run many versions of queries, like in a data warehouse. If the latter, then you'd normally expect to see the same queries being run over and over in a repeated fashion throughout the day, like in an OLTP system. The only change in that might be at month end, or specific periods where the users need to use another part of the system more often.

    With the program being EXCEL.exe it souds like there are some kind of adhoc queries being run, but that's just a completely wild guess. You should be able to know about how the application works and tell us.
    5. What did you base the 1800 to 3600 decision on? You need to look at how long your longest running query is, in seconds, and set the undo to at least that value. How long is the query which runs running for before you get the ORA-01555 error?
    The logical thing is just to test it by increasing, but it seems does not matter as "sbt" has mentioned.
    It does matter because you need to tell Oracle to attempt to keep the undo data going back X seconds (X being your longest running query time). If you have a query which takes 3 hours, you need to ensure that that query is able to create a consistent view of the data as it was at the start time of the query. This could be up to 3 hours in the past towards the end of the query running time. So, what query are you running and how long does it take?
    6. Your db_cache is tiny, 288MB, how much data are you selecting? If more than 288MB, can you increase your cache?

    How do I increase my cache size?
    Based on the above, and the fact that you seem to need to increase performance ASAP initally, I would personally increase the db_cache_size from 288MB to something larger like 1024MB. That leaves you with space for the other areas of the SGA within your 1.7GB limit. The reason for doing this is because 288MB is tiny and you can see from the statspack reports taht there are queries running which are doing a lot of reads.

    Then I would look at the SQL queries in more detail and attempt to reduce the number of reads that they were doing and make them as optimal as possible, starting with the worst ones on the statspack report and working my way through.

    In addition to that, I would see what the users were actually doing. Are they generating large report style queries whenever they like? If so, can they be run outside of peak hours so they run overnight and in hours when the system is not as busy?

    Oh, and one more thing. As jgarry pointed out, the pga is larger than your db_cache. Is there a specific reason for this, such as are you running long queries which sort lots of data?

    Edited by: Rob_J on Jan 9, 2013 9:42 AM
  • 31. Re: Performance Issue
    jgarry Guru
    Currently Being Moderated
    Rob_J wrote:
    yxes2013 wrote:
    I thank you all,


    1. Are the queries optimal?
    I dont know :( do I need to show you the explain plan. How do I know if its optimal?
    Yes, that is what we would have to do to look at the SQL queries in more detail.
    2. If so, how much data are they dealing with? KBs, MBs, GBs?
    They said if just around 200 invoices. Compared to others I see in metalink that logs issue whiich has 10,000 PO's invioces.
    How large are the tables that you are dealing with? What we need to know is how much data are you reading from disk, into the SGA (possibly PGA if you are doing sorts, too) and how does that compare to the size you have for the db_cache, which controls how much data is available for the database blocks in the SGA.
    3. Can you increase your SGA to handle more appropriately the volume required by the queries?
    Is it already 1.7 which is the limit of 32 bit Oracle. which we have.
    It might be set to limit at 1.7GB, which is why you have the SGA_MAX_SIZE set, but your db_cache_size is 288MB. I could be wrong but I think without having the benefit of automatic memory management by using the sga_target parameter, the db_cache_size on 9i will be an upper limit?
    Yes, this brings up the question of whether AMM was turned off, or simply never turned on. Edit: Oh duh, can't turn on what doesn't exist yet.

    >
    4. Also note that the queries in your good and bay days are different so the load pattern is not comparable.
    It maybe that that queries has finished fast and they have done other programs being caught in that time range. Do I need generate statspack starting 8am to 5pm? so that everything is captured?
    You could do this. It all depends on how your DB is used. Can users just log on and run reports/custom queries whenever they like? Or is it just one application with predictable, repeated actions which are run over and over each day? If the former, your performance will not be as predictable because users could run many versions of queries, like in a data warehouse. If the latter, then you'd normally expect to see the same queries being run over and over in a repeated fashion throughout the day, like in an OLTP system. The only change in that might be at month end, or specific periods where the users need to use another part of the system more often.

    With the program being EXCEL.exe it souds like there are some kind of adhoc queries being run, but that's just a completely wild guess. You should be able to know about how the application works and tell us.
    5. What did you base the 1800 to 3600 decision on? You need to look at how long your longest running query is, in seconds, and set the undo to at least that value. How long is the query which runs running for before you get the ORA-01555 error?
    The logical thing is just to test it by increasing, but it seems does not matter as "sbt" has mentioned.
    It does matter because you need to tell Oracle to attempt to keep the undo data going back X seconds (X being your longest running query time). If you have a query which takes 3 hours, you need to ensure that that query is able to create a consistent view of the data as it was at the start time of the query. This could be up to 3 hours in the past towards the end of the query running time. So, what query are you running and how long does it take?
    6. Your db_cache is tiny, 288MB, how much data are you selecting? If more than 288MB, can you increase your cache?

    How do I increase my cache size?
    Based on the above, and the fact that you seem to need to increase performance ASAP initally, I would personally increase the db_cache_size from 288MB to something larger like 1024MB. That leaves you with space for the other areas of the SGA within your 1.7GB limit. The reason for doing this is because 288MB is tiny and you can see from the statspack reports taht there are queries running which are doing a lot of reads.
    I would want to make sure this isn't a question about how to modify the init.ora...

    >
    Then I would look at the SQL queries in more detail and attempt to reduce the number of reads that they were doing and make them as optimal as possible, starting with the worst ones on the statspack report and working my way through.

    In addition to that, I would see what the users were actually doing. Are they generating large report style queries whenever they like? If so, can they be run outside of peak hours so they run overnight and in hours when the system is not as busy?

    Oh, and one more thing. As jgarry pointed out, the pga is larger than your db_cache. Is there a specific reason for this, such as are you running long queries which sort lots of data?
    There are probably a number of things wrong, we must differentiate between configuration, system tuning, and SQL tuning. Of course they are interrelated, but this is why I advocate a step zero in answer to Method-R and similar methodologies - they assume things are in the ballpark, so step zero is to get in the ballpark. As we saw with the redo log size, and now with buffers and pga, not there yet, and the tailgate party isn't fun. And we have to be careful about not changing too many things at once, with a production system that has ongoing performance complaints. We'll get through it, but there is no single "look at this report and see what to do." Each balloon we squeeze will affect the others, change the major problem significantly. That's why I ask why things are set the way they are, there might be some additional data points, including the possibility some drudge slammed in an installation and just accepted inappropriate defaults, so we have to do some initial evaluation detective work. Though I am tempted to support your buffer cache suggestion as a "get in the dang ballpark," though I might suggest AMM instead (if it hasn't been rejected already for some reason Edit: like I'm totally confused myself about versions features).

    Edited by: jgarry on Jan 9, 2013 10:50 AM
  • 32. Re: Performance Issue
    jgarry Guru
    Currently Being Moderated
    Ah heck, is sga_target a 10g thing? The first thing to go is memory...
  • 33. Re: Performance Issue
    Jonathan Lewis Oracle ACE Director
    Currently Being Moderated
    jgarry wrote:
    Ah heck, is sga_target a 10g thing? The first thing to go is memory...
    The automatic bit is 10g, the variable bit and being able to move memory around is 9i.

    One detail that seems to be that the sga_max_size has been set to about 1.7GB, but the components total about 700MB, completely wasing 0.9GB. There's also 64MB allocated to the java pool - of which only a few MB are in use, so there's a bit more that could goto the buffer cache if necessary. So there's some scope for improved caching - except the caching (just like the redo) doesn't really seem to be a problem apart from the insanely ridiculous tablescans. I'd have to check, but I thought the very specific 1.7GB figure for 32 bit Oracle was a Windows thing anyway, and you could squeeze a little bit more (up to 2GB) for Unix systems - which is what this one is.

    Regards
    Jonathan Lewis
  • 34. Re: Performance Issue
    Jonathan Lewis Oracle ACE Director
    Currently Being Moderated
    Jonathan Lewis wrote:
    yxes2013 wrote:

    This is the statspack report yesterday from 2pm-3pm where the performance is good. And the next is the statspack report today from 2pm-2:30pm where the performance problem occurs.
    Please help me find the difference or the problem on the 2nd statspack. Your help is very much appreciated. Thanks a lot.

    http://www.datafilehost.com/download-49750b9f.html (normal performance)
    http://www.datafilehost.com/download-f6c8ebbe.html (bad performance)
    The other check is simply to find something that gets bigger when the performance gets worse. The only thing I can see on a quick glance through the reports is the number of direct path writes (which report as physical writes - for a total of about 800,000 blocks written. These writes may be enough to interfere with your extravagent rate of reads.

    Since the redo size DROPs, these writes may be to the temporary tablespace (sort/hash) or they may be the result of NOLOGGING operations such as direct path loads or table moves or index rebuilds. When you look at the tablespace I/O stats, they look like workarea operations - so you may be able to find the driving SQL in v$sql_workarea.
    And here's the result of a couple more minutes of my spare time - compare the PGA histogram:

    Good day:
        Low    High
    Optimal Optimal    Total Execs Optimal Execs 1-Pass Execs M-Pass Execs
    ------- ------- -------------- ------------- ------------ ------------
         8K     16K         60,488        60,488            0            0
        16K     32K            569           569            0            0
        32K     64K             86            86            0            0
        64K    128K             55            55            0            0
       128K    256K              9             9            0            0
       256K    512K             27            27            0            0
       512K   1024K            802           802            0            0
         1M      2M          7,297         7,297            0            0
         2M      4M              6             6            0            0
         4M      8M              9             9            0            0
         8M     16M              4             4            0            0
        16M     32M              3             3            0            0
        32M     64M             12            12            0            0
              -------------------------------------------------------------
    Bad day
       Low    High
    Optimal Optimal    Total Execs Optimal Execs 1-Pass Execs M-Pass Execs
    ------- ------- -------------- ------------- ------------ ------------
         8K     16K        466,990       466,990            0            0
        16K     32K            226           226            0            0
        32K     64K             55            55            0            0
        64K    128K             45            45            0            0
       128K    256K              3             3            0            0
       256K    512K             12            12            0            0
       512K   1024K            354           354            0            0
         1M      2M             12            12            0            0
         2M      4M             11            11            0            0
         4M      8M              1             1            0            0
         8M     16M              3             3            0            0
        16M     32M              4             4            0            0
        32M     64M             24            24            0            0
       128M    256M             13             0           13            0
       256M    512M             18             0            0           18
              -------------------------------------------------------------
    It's almost certainly the big sorts hitting the fan that wreck the performance of the constant tablescans and inefficient - but file-system cached - queries.

    Regards
    Jonathan Lewis
  • 35. Re: Performance Issue
    jgarry Guru
    Currently Being Moderated
    So the 100% in-memory sorts is misleading because the bad ones are buried in the long tail of the statistic (31 out of 467k)? Heh.

    I'm wondering if it would be too strange to set a fair sized sort_area_size and set a manual workarea_size_policy for those excessive passes sessions - would that maybe throttle those down while letting others work? Feel free to slap me down if I got that backwards. Again, what the OS says about things might have a bearing.

    I'd suggest a separate dss system for those anyways.
  • 36. Re: Performance Issue
    yxes2013 Newbie
    Currently Being Moderated
    Thanks everyone,

    I will try to check your recommendations asap.

    Is there no suggestions to check the V$LOCKS and V$BLOCKERS? I am suspecting that there is some process locking the resources
    hence all the programs goes slow. But it does not affect those users who are doing transactional entry (data encoding).

    How about a defective hardware? like network cards, disk controllers, etc. Is there a command in linux that show hardware error?


    Edited by: yxes2013 on Jan 9, 2013 3:20 PM
  • 37. Re: Performance Issue
    yxes2013 Newbie
    Currently Being Moderated
    Hi All,

    Why is the redo offlining?

    I got this in the dump trace.
    Dump file /u02/oracle/oaproddb/9.2.0/admin/OAPROD_oel5/bdump/oaprod_smon_18781.trc
    Oracle9i Enterprise Edition Release 9.2.0.6.0 - Production
    With the Partitioning, OLAP and Oracle Data Mining options
    JServer Release 9.2.0.6.0 - Production
    ORACLE_HOME = /u02/oracle/oaproddb/9.2.0
    System name:    Linux
    Node name:      oel5.unitrade.local
    Release:        2.6.18-238.el5xen
    Version:        #1 SMP Tue Jan 4 16:18:33 EST 2011
    Machine:        i686
    Instance name: OAPROD
    Redo thread mounted by this instance: 1
    Oracle process number: 7
    Unix process pid: 18781, image: oracle@oel5.unitrade.local (SMON)
    
    *** 2013-01-09 18:23:47.489
    *** SESSION ID:(6.1) 2013-01-09 18:23:47.433
    SMON offlining US=31
    SMON offlining US=32
    SMON offlining US=33
    SMON offlining US=34
    SMON offlining US=35
    SMON offlining US=36
    SMON offlining US=37
    SMON offlining US=38
    SMON offlining US=39
    SMON offlining US=40
    SMON offlining US=41
    SMON offlining US=42
    SMON offlining US=43
    SMON offlining US=44
    SMON offlining US=45
    SMON offlining US=46
    SMON offlining US=47
    *** 2013-01-09 18:24:22.405
    SMON: found rollback segment # 33 at address a2b919c8 to take offline.
    SMON: found rollback segment # 36 at address a2b91b78 to take offline.
    *** 2013-01-10 06:24:47.017
    SMON offlining US=15
    SMON offlining US=16
    Can you suggest what other logs to look at?

    Thanks a lot
  • 38. Re: Performance Issue
    Justin Cave Oracle ACE
    Currently Being Moderated
    yxes2013 wrote:
    Is there no suggestions to check the V$LOCKS and V$BLOCKERS? I am suspecting that there is some process locking the resources
    hence all the programs goes slow. But it does not affect those users who are doing transactional entry (data encoding).
    Why do you suspect that? The AWR reports that you have posted don't indicate that any meaningful amount of time is being spent waiting for a lock to be released.
    How about a defective hardware? like network cards, disk controllers, etc. Is there a command in linux that show hardware error?
    Again, why do you suspect that? It seems unlikely that hardware would be defective sometimes and work correctly at other times.

    In general, speculating about possible causes of performance problems without data is not particularly helpful. It is much more useful to look at things like AWR reports that tell you where your system is spending time and to derive conclusions. If you are seeing some data that leads you to believe that either locking or defective hardware is the problem, by all means share that data with us. Otherwise, though, you're far, far more likely to spend time chasing pointless guesses rather than addressing the real source of the problem.

    Justin
  • 39. Re: Performance Issue
    jgarry Guru
    Currently Being Moderated
    See MOS      Smon Offlining Us=13 [ID 726877.1]

    This is expected behavior based on the last 12 hours of transaction activity.
  • 40. Re: Performance Issue
    yxes2013 Newbie
    Currently Being Moderated
    Thanks,

    So nothing to worry about it?

    How about this:
    Errors in file /u02/oracle/oaproddb/9.2.0/admin/OAPROD_oel5/udump/oaprod_ora_23797.trc:
    ORA-00600: internal error code, arguments: [504], [0x50042E0C], [640], [7], [shared pool], [1], [0], [0x50042ED4]
    Fri Nov 11 11:53:51 2011
    Errors in file /u02/oracle/oaproddb/9.2.0/admin/OAPROD_oel5/udump/oaprod_ora_23797.trc:
    ORA-00600: internal error code, arguments: [504], [0x50042E0C], [640], [7], [shared pool], [1], [0], [0x50042ED4]
    Fri Nov 11 11:54:15 2011
    Thanks,
  • 41. Re: Performance Issue
    jgarry Guru
    Currently Being Moderated
    >

    >
    How about a defective hardware? like network cards, disk controllers, etc. Is there a command in linux that show hardware error?
    I dunno, does dmesg tell anything? Google for show linux hardware errors, you need to be familiar with /var/log
  • 42. Re: Performance Issue
    yxes2013 Newbie
    Currently Being Moderated
    Hi all,

    I hate this error so much :(

    ORA-1652: unable to extend temp segment by 16 in tablespace TEMP
    ORA-1654: unable to extend index MRP.MRP_ATP_SCHEDULE_TEMP_N2 by 16 in tablespace             APPS_TS_TX_IDX
    ORA-1654: unable to extend index MRP.MRP_ATP_SCHEDULE_TEMP_N2 by 16 in tablespace             APPS_TS_TX_IDX
    ORA-1654: unable to extend index MRP.MRP_ATP_SCHEDULE_TEMP_N2 by 16 in tablespace             APPS_TS_TX_IDX
    ORA-1654: unable to extend index MRP.MRP_ATP_SCHEDULE_TEMP_N2 by 16 in tablespace             APPS_TS_TX_IDX
    ORA-1654: unable to extend index MRP.MRP_ATP_SCHEDULE_TEMP_N2 by 16 in tablespace             APPS_TS_TX_IDX
    ORA-1654: unable to extend index MRP.MRP_ATP_SCHEDULE_TEMP_N2 by 16 in tablespace             APPS_TS_TX_IDX
    ORA-1654: unable to extend index MRP.MRP_ATP_SCHEDULE_TEMP_N2 by 16 in tablespace             APPS_TS_TX_IDX
    ORA-1654: unable to extend index MRP.MRP_ATP_SCHEDULE_TEMP_N2 by 16 in tablespace             APPS_TS_TX_IDX
    ORA-1654: unable to extend index MRP.MRP_ATP_SCHEDULE_TEMP_N2 by 16 in tablespace             APPS_TS_TX_IDX
    ORA-1654: unable to extend index MRP.MRP_ATP_SCHEDULE_TEMP_N2 by 16 in tablespace             APPS_TS_TX_IDX
    ORA-1654: unable to extend index MRP.MRP_ATP_SCHEDULE_TEMP_N2 by 16 in tablespace             APPS_TS_TX_IDX
    ORA-1654: unable to extend index MRP.MRP_ATP_SCHEDULE_TEMP_N2 by 16 in tablespace             APPS_TS_TX_IDX
    ORA-1654: unable to extend index MRP.MRP_ATP_SCHEDULE_TEMP_N2 by 16 in tablespace             APPS_TS_TX_IDX
    ORA-1654: unable to extend index MRP.MRP_ATP_SCHEDULE_TEMP_N2 by 16 in tablespace             APPS_TS_TX_IDX
    Even if I set the datafile autoextend on, this is still occurring? So I keep and adding datafile manually.

    Edited by: yxes2013 on Jan 9, 2013 3:55 PM
  • 43. Re: Performance Issue
    jgarry Guru
    Currently Being Moderated
    You might get some performance problems if you don't have an mrp index you need. I don't know about your system, but I've seen MRP systems do strange things. Are you sure it isn't just a coalesce issue? See MOS      TROUBLESHOOTING GUIDE (TSG) - UNABLE TO CREATE / EXTEND Errors [ID 1025288.6]
  • 44. Re: Performance Issue
    jgarry Guru
    Currently Being Moderated
    yxes2013 wrote:
    Thanks,

    So nothing to worry about it?

    How about this:
    Errors in file /u02/oracle/oaproddb/9.2.0/admin/OAPROD_oel5/udump/oaprod_ora_23797.trc:
    ORA-00600: internal error code, arguments: [504], [0x50042E0C], [640], [7], [shared pool], [1], [0], [0x50042ED4]
    Fri Nov 11 11:53:51 2011
    Errors in file /u02/oracle/oaproddb/9.2.0/admin/OAPROD_oel5/udump/oaprod_ora_23797.trc:
    ORA-00600: internal error code, arguments: [504], [0x50042E0C], [640], [7], [shared pool], [1], [0], [0x50042ED4]
    Fri Nov 11 11:54:15 2011
    Thanks,
    See MOS for ora-600 tool, resolving shared pool issues, and generally ask Oracle support about ora-600, and this is why you want to be on something supported in this century.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points