1 Reply Latest reply: Jun 16, 2011 2:49 PM by PktAces RSS

    DB outage repeatedly with high load average and swap

    852524
      Hi,

      Recently one of our production database had an outage 3 days in a row with extremely high memory consumption and load average.
      The server was rebooted manually and that fixed the issue temporarily



      4312/6068 WRK:BSSVPD_084708A0_DConnector Thu Jun 02 10:39:47.714001 dbperfrq.c477
      OCI0000179 - Error - ORA-03114: not connected to ORACLE

      4312/6068 WRK:BSSVPD_084708A0_DConnector Thu Jun 02 10:39:47.714002 Jdb_drvm.c1128
      JDB9900401 - Failed to execute db request
      4312/6068 WRK:BSSVPD_084708A0_DConnector Thu Jun 02 10:39:47.714003 Jtp_cm.c1347
      JDB9900255 - Database connection to F00941 (System - 812) has been lost.
      4312/6068 WRK:BSSVPD_084708A0_DConnector Thu Jun 02 10:39:47.714004 Jtp_cm.c1301
      JDB9900256 - Database connection to (System - 812) has been re-established.



      After restart app and web servers the error not happened anymore. So, the issue was related to application, but I checked some interesting things on database:

      Load average was around 13;
      Swap was high;
      Lots of sessions with “i/o slave wait” event;
      System wait was around 30%


      After app servers have been restarted the database scenario was the same, but the application was working. So the root cause this time was not the database, but anyway I think there is something wrong from database side.

      I killed all sessions with “i/o slave wait” from JDEUSER on OS and load average reduced to 0.9.

      I found an ORA-600 on alert.log
      Jun 02 10:37:06 CDT 2011
      Errors in file /u01/app/oracle/admin/jdeprod/udump/jdeprod_ora_13420.trc:
      ORA-00600: internal error code, arguments: [12333], [7], [20], [0], [], [], [], []
      =======================================================

      The RCA from the server team:

      12:00:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad
      03:50:01 PM 367716 7827956 95.51 563812 2599880 1462444 633996 30.24 287248
      04:00:01 PM 431260 7764412 94.74 563900 2605568 1468020 628420 29.98 281724
      04:10:01 PM 131692 8063980 98.39 564016 2608012 1468592 627848 29.95 281352
      04:20:01 PM 82264 8113408 99.00 564100 2447888 1309820 786620 37.52 426156
      04:30:01 PM 66100 8129572 99.19 564180 2304184 1148188 948252 45.23 379012
      04:40:01 PM 90668 8105004 98.89 564316 2173488 970752 1125688 53.70 431488
      04:50:01 PM 21268 8174404 99.74 564472 2099072 779104 1317336 62.84 371484
      05:00:01 PM 21212 8174460 99.74 564576 2047020 615240 1481200 70.65 365060
      05:10:02 PM 29184 8166488 99.64 564692 1977860 429964 1666476 79.49 382568
      05:20:01 PM 42808 8152864 99.48 564784 1845364 275564 1820876 86.86 402448
      05:30:02 PM 43612 8152060 99.47 564848 1886040 327448 1768992 84.38 354360
      05:40:01 PM 86024 8109648 98.95 564924 1785620 113432 1983008 94.59 369176
      05:50:01 PM 7712 8187960 99.91 565052 1689948 0 2096440 100.00 309796
      06:00:03 PM 6648 8189024 99.92 565144 1678152 0 2096440 100.00 295364
      06:10:01 PM 7724 8187948 99.91 541200 1611300 0 2096440 100.00 304304
      06:20:04 PM 10584 8185088 99.87 383288 1586604 4 2096436 100.00 300632
      06:30:07 PM 10604 8185068 99.87 145044 1597684 0 2096440 100.00 295972
      06:40:01 PM 7556 8188116 99.91 129016 1494016 8 2096432 100.00 320080
      06:50:19 PM 7492 8188180 99.91 9624 1284268 16 2096424 100.00 287952
      07:00:25 PM 7328 8188344 99.91 2900 1132864 0 2096440 100.00 173928
      07:10:21 PM 6076 8189596 99.93 1900 1073344 0 2096440 100.00 137616
      07:20:03 PM 9224 8186448 99.89 2372 1075284 0 2096440 100.00 137708
      07:32:19 PM 8444 8187228 99.90 3792 1087332 0 2096440 100.00 137660
      Average: 287111 7908561 96.50 520113 2604597 1371749 724691 34.57 298045

      Jun 1 19:15:09 ausdfsjdedb01 kernel: Swap cache: add 27359401, delete 27325008, find 142293993/147426876, race 0+34
      Jun 1 19:15:09 ausdfsjdedb01 kernel: 53517 pages of slabcache
      Jun 1 19:15:10 ausdfsjdedb01 kernel: 4470 pages of kernel stacks
      Jun 1 19:15:10 ausdfsjdedb01 kernel: 50255 lowmem pagetables, 400677 highmem pagetables
      Jun 1 19:15:11 ausdfsjdedb01 kernel: 32 bounce buffer pages, 32 are on the emergency list
      Jun 1 19:15:11 ausdfsjdedb01 kernel: Free swap: 0kB
      Jun 1 19:15:11 ausdfsjdedb01 kernel: 2359296 pages of RAM
      Jun 1 19:15:11 ausdfsjdedb01 kernel: 1310718 pages of HIGHMEM
      Jun 1 19:15:11 ausdfsjdedb01 kernel: 310378 reserved pages
      Jun 1 19:15:11 ausdfsjdedb01 kernel: 2032077 pages shared
      Jun 1 19:15:11 ausdfsjdedb01 kernel: 34430 pages swap cached
      Jun 1 19:15:11 ausdfsjdedb01 kernel: Out of Memory: Killed process 8015 (oracle).
      Jun 1 19:16:11 ausdfsjdedb01 kernel: Mem-info:
      Jun 1 19:16:11 ausdfsjdedb01 kernel: Zone:DMA freepages: 26 min: 0 low: 0 high: 0
      Jun 1 19:16:11 ausdfsjdedb01 kernel: Zone:Normal freepages: 2106 min: 1279 low: 16704 high: 24544
      Jun 1 19:16:11 ausdfsjdedb01 kernel: Zone:HighMem freepages: 126 min: 255 low: 21120 high: 31680
      Jun 1 19:16:11 ausdfsjdedb01 kernel: Free pages: 2258 ( 126 HighMem)
      Jun 1 19:16:11 ausdfsjdedb01 kernel: ( Active: 941883/1284, inactive_laundry: 341, inactive_clean: 697, free: 2258 )
      Jun 1 19:16:11 ausdfsjdedb01 kernel: aa:0 ac:0 id:0 il:0 ic:0 fr:26
      Jun 1 19:16:11 ausdfsjdedb01 kernel: aa:603357 ac:6480 id:717 il:340 ic:697 fr:2106
      Jun 1 19:16:11 ausdfsjdedb01 kernel: aa:328968 ac:3033 id:613 il:0 ic:0 fr:126

      ======================================================================
      But now the issue is persisting again whereby the db hangs during login.
      Appreciate any sort of help. Thanks in advance.

      Edited by: Kryp on Jun 3, 2011 7:10 AM
        • 1. Re: DB outage repeatedly with high load average and swap
          PktAces
          You don't mention what version or type of OS you are on, no mention of db version either.

          I know in our AIX 5.3 & Ent Ed 10.2.0.2 i've seen issues with swapping of the sga when the server parameters weren't setup to lock the sga into large pages of memory and not swap it out.

          Otherwise it looks like the database needs more memory allocated to the sga.