4 Replies Latest reply: Feb 14, 2014 12:18 AM by 972061 RSS

    DB look like hang, Only sys user can connect to DB but can't query anything

    972061

      Dear all,

       

      On 13/Feb/2014@8.02AM13/Feb/2014@8.02AM, I'm found problem about oracle database (10.2.0.4) on Solaris10(Sparc) can connect only user sys (user system and all can't)    

      sqlplus promp is not immediately return can't connect. It's pending long time and return "connection lost"    

      I'm check on alert_SID.log and not found any error/warning    

          

      Problem Information:    

      1. Not involve about tns configuration or network. Cause of this problem still occur when logon by other user (not sys) at DB Server machine too.   

           and we can connect sys as sysdba from client by tns alias   

      2. After connect to database by sys user. We can't query anything (such as select * from dba_tablespaces is freezing)   

      3. User system wih role DBA can't connect too.   

      4. I'm check udump, bdump, cdump, adump for useful trace information. And not found anything durning this error   

          

      My Action:    

      - Restart listener, It's still problem   

      - Shutdown database by immediate. It's hanging after step "Stopping background process MMNL"   

         so, I'm shutdown abort and startup. Problem is clear   

          

      Thu Feb 13 08:28:16 2014   

      Shutting down instance: further logons disabled   

      Thu Feb 13 08:28:16 2014   

      Stopping background process CJQ0   

      Thu Feb 13 08:28:16 2014   

      Stopping background process QMNC   

      Thu Feb 13 08:28:18 2014   

      Stopping background process MMNL    <-- this is last step show on alert.log

          

      Thu Feb 13 08:30:08 2014   

      Shutting down instance (abort)   

      License high water mark = 2926   

      Instance terminated by USER, pid = 23752   

      Thu Feb 13 08:30:24 2014   

      Starting ORACLE instance (normal)   

          

      Other action before problem.    

      We have resize datafile on 7:49AM with complete state and I think not cause of problem.   

      Thu Feb 13 07:49:12 2014   

      /* OracleOEM */ ALTER DATABASE DATAFILE '/data/appdb/ar_data01.dbf' RESIZE  32767M   

      Thu Feb 13 07:49:43 2014   

      Completed: /* OracleOEM */ ALTER DATABASE DATAFILE '/data/appdb/ar_data01.dbf' RESIZE  32767M   

          

      My investigate   

      1 First I found about this note "No Database User Can Login Except Sys And System because Resource Manager Internal_Quiesce Plan Enabled (Doc ID 396970.1)"   

      But after check scheduler window. It's not enable.   

      I'm already test change Devlop Database to "QUIESCE mode". It's can login only sys. Other user can't login. But not same my situation. Because user sys   

      can query and opeartion normal (my situation user sys is hang after query too)   

          

      2 I'm check statistic of server (Solaris10) resource during hang. It's not peak anything (cpu, mem, iostat is OK)   

      I'm check /var/adm/message, /var/log/syslog. It's not show any information   

          

      Please help me investigate. I'm already open SR to Oracle support. But they ask me very simple missing point such as Tns configuration, connection string.. (-_-).    

          

      Thank you    

        • 1. Re: DB look like hang, Only sys user can connect to DB but can't query anything
          rp0428

          Could be a lot of things. Logs getting full and waiting for the archive process but a remote location not being available.

           

          Post info about the status of the logging and archiving configuration

          • 2. Re: DB look like hang, Only sys user can connect to DB but can't query anything
            972061

            Hi rp0428

             

            On lgwr trace in udump before hang. I found only "Warning: log write time xxx" which is normally happend long time ago by not effect system.

            *** 2014-02-13 06:11:31.274

            Warning: log write time 1890ms, size 5KB

            *** 2014-02-13 06:13:13.772

            Warning: log write time 1810ms, size 9KB

             

            About archive log, I'm check destination and free space of archive log. All of it's ok. We not change anything about config

            SQL> show parameter archive

            NAME                                 TYPE                             VALUE
            ------------------------------------ -------------------------------- ------------------------------
            archive_lag_target                   integer                          0
            log_archive_config                   string
            log_archive_dest                     string
            log_archive_dest_1                   string                           LOCATION=/orac/appdb/archive
            log_archive_dest_state_1             string                           enable
            log_archive_duplex_dest              string
            log_archive_format                   string                           %t_%s_%r.dbf
            log_archive_local_first              boolean                          TRUE
            log_archive_max_processes            integer                          2
            log_archive_min_succeed_dest         integer                          1
            log_archive_start                    boolean                          FALSE
            log_archive_trace                    integer                          0
            remote_archive_enable                string                           true
            standby_archive_dest                 string                           ?/dbs/arch
            However we found some post recommend increase number of log_archive_max_process

             

            More about archive, I have manual shell script force apply archive log every 5 Minute by crontab (by "alter system switch logfile" and "alter system archive log all;")

            This script starting around 1 month ago with no problem.

            Do you thinkg It's possible to 1 time apply 2 statement not complete in 5 minute and next interval try to apply and hang?

            However I think if that should be any alert/warning in alert_SID.log in this case.

             

            Please recommend if need more information:

             

            Thank you

            • 3. Re: DB look like hang, Only sys user can connect to DB but can't query anything
              Suntrupth

              Did you try gathering a systemstate dump at the time of the issue? This might have given you some clue.

               

               

              Did you check if there were any active transactions in your DB prior to the shutdown?

               

               

              "Stopping background process MMNL" - usually are just informational in nature and not the cause of a hang.

               

               

              "I have manual shell script force apply archive log every 5 Minute ( by

              "alter system switch logfile" and "alter system archive log all;")"

               

               

              - Any reason why you force log switches in such small intervals? Oracle recommends to switch logs with about 15-20 minutes intervals.

               

               

              Did you check if AWR snapshots were captured at the time of the issue? If yes, generate a report and check possible issues.

               

               

              Regards,

              Suntrupth

              • 4. Re: DB look like hang, Only sys user can connect to DB but can't query anything
                972061

                Hi Suntrupth,

                 

                1. I'm never know about systemstate dump before. I'm just learning after problem happen and plan to keep this if found again follow by

                (How to Collect Diagnostics for Database Hanging Issues (Doc ID 452358.1)) <-- Still not clear how to, This doc keep dump systemstate by other session. It's can keep current problem?

                 

                2. I can't check anything cause of after login to sys as sysdba at DB Local machine. It's can't query anything. It's freeze after issue statement.

                 

                3. I'm log switch every 5 min cause of I'm need send archive log to DR-Site.

                I'm use concept of DataGuard by create clone of Prod by RMAN and set shell script to manual apply log every 5 minute. Use shell script check new log and rysnc send log to DR-Site destination.

                Use shell script at destination to apply archive log. So I will have DR which is sync with Production not less than 5 minute

                 

                4. About auto hourly AWR Snapshot I'm found it's incomplete record during DB Hang (7.00-8.30AM). But I think cause we have shutdown abort around 8.28AM.

                So, It's can't keep end time. Please see

                 

                Troubleshooting: AWR Snapshot Collection issues (Doc ID 1301503.1)

                select * from DBA_HIST_SNAPSHOT order by snap_id desc;
                select * from WRM$_SNAPSHOT order by 5 desc;
                SNAP_IDINSTANCE_NUMBERBEGIN_INTERVAL_TIMEFLUSH_ELAPSEDSTATUSERROR_COUNTSNAP_FLAG
                40966113/02/2014 05.00.52.248000000 AM+00 00:00:01.800000000
                40967113/02/2014 06.00.53.157000000 AM+00 00:00:01.900000000
                40968113/02/2014 07.00.54.083000000 AM100
                40969113/02/2014 08.30.36.000000000 AM+00 00:00:06.400000000

                 

                select * from wrm$_snap_error where dbid = (select dbid from v$database)  order by snap_id;

                I'm not found record which is prefer show error number.

                 

                I'm don't know about how to generate report from snapshot. (In this case mean I will generate report from snap_id 40968? )

                Please recommend method or article about this.

                 

                Thank you very much

                Best regards,