6 Replies Latest reply: Mar 28, 2014 4:23 PM by orausern RSS

    A scheduler job getting hung causing critical issues

    orausern

      Hi Experts,

       

      We are on Oracle 11.2.0.2 on Linux. We have some jobs that are scheduled to run at night time like 1 am etc. These jobs copy data from some of the main OLTP tables in the archive tables that are in the same schema. We use PARALLEL hint in this copy sqls. Now most of the times the job finishes in somewhere between 1 to 2 minutes. But yesterday the job was hung for more than 5 hours. It caused outage as the tables got locked and we stopped the job manually to fix the outage. This issues has happened twice thus far - once in a less critical environment and once in production. Today the same job finished in 20 seconds. Could this be due to some bug in scheduler?


      I will be thankful for your inputs.

      OrauserN

        • 1. Re: A scheduler job getting hung causing critical issues
          GregV

          Hi,

           

           

          If it usually works then it's not likely it's a Scheduler bug.

          Have you checked the run details of yuor job to see when it actually started:

           

          SELECT log_date, job_name, status, actual_start_date, run_duration, additional_info

          FROM DBA_SCHEDULER_JOB_RUN_DETAILS

          WHERE job_name = '&your_job'

          ORDER BY log_date DESC;

           

          If it ran at the expected time then the problem is probably in your copy procedure.

          • 2. Re: A scheduler job getting hung causing critical issues
            orausern

            Thank you GregV. Well it did start on the expected time but it got hung!...it completed in 20 seconds today but took more than 7 hours yesterday causing an outage! There is no obvious reason that we can find here. Please see below the timings: We are at a loss to understand what may have happened here:

             

            LOG_DATEJOB_NAMESTATUSACTUAL_START_DATEOWNERRUN_DURATIONADDITIONAL_INFO
            3/28/2014
              3:00:37.631963 AM -04:00
            scheduled_job1SUCCEEDED3/28/2014
              2:00:00.479827 AM -05:00
            SCOTT+00 00:00:20.000000
            3/27/2014 10:32:51.039442 AM -04:00scheduled_job1STOPPED3/27/2014
              2:00:00.437910 AM -05:00
            SCOTT+00 07:32:51.000000REASON="Stop job
              called by user: 'SYS'"
            3/26/2014
              3:00:29.910672 AM -04:00
            scheduled_job1SUCCEEDED3/26/2014
              2:00:00.236187 AM -05:00
            SCOTT+00 00:00:30.000000
            3/25/2014
              3:00:31.671558 AM -04:00
            scheduled_job1SUCCEEDED3/25/2014
              2:00:00.166233 AM -05:00
            SCOTT+00 00:00:31.000000
            3/24/2014
              3:00:28.187225 AM -04:00
            scheduled_job1SUCCEEDED3/24/2014
              2:00:01.056921 AM -05:00
            SCOTT+00 00:00:27.000000
            3/23/2014
              3:00:28.808667 AM -04:00
            scheduled_job1SUCCEEDED3/23/2014
              2:00:00.933849 AM -05:00
            SCOTT+00 00:00:28.000000
            3/22/2014
              3:00:24.646177 AM -04:00
            scheduled_job1SUCCEEDED3/22/2014
              2:00:00.842851 AM -05:00
            SCOTT+00 00:00:24.000000
            3/21/2014
              3:00:22.093396 AM -04:00
            scheduled_job1SUCCEEDED3/21/2014
              2:00:00.652885 AM -05:00
            SCOTT+00 00:00:21.000000
            • 3. Re: A scheduler job getting hung causing critical issues
              GregV

              There can be several reasons for this unusual duration.

              Were there other heavy processes running at the same time that night? Were there some locks?

               

              If you have the license, you can run an AWR report for the 27 March between 2 am and 8 am to see what was going on.

              • 4. Re: A scheduler job getting hung causing critical issues
                orausern

                Thank  you again GregV. Can you help me with more details - I mean from AWR how can we find out about locks? I used AWR basically for top sql etc. Actually it was this session that was causing locks. It caused nearly 300 other sessions to hang! Not sure if this session itself got stuck due to some other activity - since it is all past now , how do we go about finding that out , I mean what in AWR can help to figure that out?

                • 5. Re: A scheduler job getting hung causing critical issues
                  GregV

                  AWR will not help with the lock, but it will show you what other processes were going on that time. Perhaps one of them was slowing down or blocking your job. Check from the AWR report if you see anyhting that was not supposed to run or took unusually long.

                  Unfortunately if you don't have a logging process in the procedure called by your Scheduler job, it will be hard to get more information about the problem.

                  • 6. Re: A scheduler job getting hung causing critical issues
                    orausern


                    Thanks a lot GregV. This completely answers the queries I had. thanks a lot for the excellent help!!!