I know that, by default the sql trace will not be enabled. these jobs are ran by developers. they don't set sql trace while running any query.
You asked the question below.
>Now how can I find out the exact root cause for these long running issues.
If you choose to ignore my response, then you are still left with the question above being unanswered.
The choice is yours & I don't care which you choose.
Unless & until you know where time is actually being spent, you have only random chance to guess what needs to change to reduce the elapsed time.
Was it raining?
No joke here.
many decades ago rain actually crashed my production VAX/VMS system.
It rained so hard it disrupted the microwave link used to transmit DECNet link & VMS crashed due to a bug in the Network transport layer.
I've seen several cases of unprotected cabling running outdoors being affected by lightning. In one case, I had told them about the previous case, where the I/O multiplexer of an 11/70 got blown out. In another, gung-ho marines ran an ethernet across a field between buildings, just like they did in Iraq. Field became swamp in rain.
But of course, the OP's problem is there may be no way to know the exact root cause unless something recorded what was going on during the problem time, such as AWR or statspack. It could be something as silly as a previous run had never completed, or as obscure as someone's process died and left a row locked, or as obvious as what Top.Gun suggested (I've missed the latter when just looking at top would show a runaway process, duh).
"1. sql loader job, this is used to load data to table from a falt file, it usually run only for 2 hours, but it ran for 11+ hours"
There should be a log file for this sql loader job/script. You can actually see how much data got loaded (or where it is at). Status can be found by querying the table or by looking at the log file.
There are two issue which can be related if you start investigating from the server level than db level .
two scenarios :- 1 SQL Loader 2. Stats job .
Lets see both issue one by one .
1. SQL Loader :- 1. First of all see the change in data volume . it may be possible that Data Volume has been changed and you are unaware of it . Please Found out it first .
2. Next start your investigating on I/O level of Storage then data file . It could be culprit .Take the help of ASH and ADDM and server related tool like SAR,VMSTAT,
PS,TOP etc .
2. Stats Job :- 1. First of all see from active session history what was the reason .
2. There will be possibiltes that Other jobs were running at that time and they consumed resource of your Database. Again take the help of ASH first and then look
dba_hist_* views to dig more .
It may be possibility that at that time server was over loaded . Find Out from server level command and from ASH .
See also /var/log/messages and look from Hardware latency to table level .
The above mentioned method will give you clear picture what was the real cause .
Shivendra Narain Nirala
But I don't have much knowledge on AWR.
How can i narrow the root cause through AWR report?
well , in that case , please follow below links :-
Or if you have Oracle support credential , then you can also find out Docs related to it .
Shivendra Narain Nirala
>How can i narrow the root cause through AWR report?
Compare the AWR for a "good" run window with the AWR for a "bad/poor" run window.
Compare the Reads and Elapsed Time statistics for SQL statements.
Then you can use dbms_xplan.display_awr to get the execution plans and see if they changed and how they changed.
Alternatively, the performance variance could be the result of some other change or of concurrent load on the database / server / storage.
Hemant K Chitale