It’s happened a couple times, so I will document it here for the benefit of the dear DBAs out there.
An impdp job, importing about 20 GBs, has been running for 24 hours. Whatever your hardware, that’s too much time. No error message in the impdp log (which is stuck on SCHEMA_EXPORT/TABLE/INDEX/INDEX), and there is still plenty of free space in the tablespaces being filled by the import. What gives?
So no error in the impdp log, no error in the alert.log, tablespaces have free space, so where to find anymore clues? Let’s find the impdp Oracle sessions and find out what they are up to.
First, in order to identify the Oracle sessions created by datapump, let’s run
SELECT TO_CHAR(sysdate,'DD MON, yyyy hh24:mi:ss') now, STATE,DATAPUMP_SESSIONS,JOB_NAME,OWNER_NAME,OPERATION FROM DBA_DATAPUMP_JOBS WHERE STATE='EXECUTING' ;
NOW STATE DATAPUMP_SESSIONS JOB_NAME OWNER_NAME OPERATION
-------------------------------------- ------------------------------ --------------------------------------- ------------------------------ ------------------------------
15 JUL., 2015 13:37:50 EXECUTING 3 PRODINTOBENCH IMPDPuser IMPORT
The DATAPUMP_SESSIONS column tells us how many sessions Oracle had to create for that datapump job (if you parallelize your import, that number will increase accordingly).
Then let’s SELECT OWNER_NAME , JOB_NAME ,SESSION_TYPE from DBA_DATAPUMP_SESSIONS
OWNER_NAME JOB_NAME SESSION_TYPE
-------------- -------------------- --------------
IMPDPuser PRODINTOBENCH DBMS_DATAPUMP
IMPDPuser PRODINTOBENCH MASTER
IMPDPuser PRODINTOBENCH WORKER
When no datapump jobs are running, a SELECT FROM DBA_DATAPUMP_SESSIONS will return no lines. When one is running, you will get a minimum of 3 lines (the 3 lines above). There can be more “WORKERS” if you parallelize your datapump job (with DEGREE=8, for example, you will have 8 “workers”). DBA_DATAPUMP_SESSIONS has a SADDR column, that you can link to the same column in V$SESSION -- goody!
SELECT V.STATUS, V.SID,V.SERIAL#,IO.BLOCK_CHANGES,EVENT, MODULE FROM V$SESS_IO IO,V$SESSION V WHERE IO.SID=V.SID AND V.SADDR IN (SELECT SADDR FROM DBA_DATAPUMP_SESSIONS) ORDER BY SID;
STATUS | SID | SERIAL# | BLOCK | EVENT | MODULE |
ACTIVE | 197 | 337 | 5396 | wait for unread message on broadcast channel | udi@remserv (TNS V1-V3) |
ACTIVE | 579 | 101 | 8253 | wait for unread message on broadcast channel | Data Pump Master |
ACTIVE | 768 | 22709 | 2240201 | direct path write temp | Data Pump Worker |
I add the BLOCK_CHANGES column from the V$SESS_IO view to see which session is doing the IOs (which one is doing the real work). If I re-run my SELECT periodically, I see BLOCK_CHANGES increase, for one session:
STATUS | SID | SERIAL# | BLOCK | EVENT | MODULE |
ACTIVE | 197 | 337 | 5396 | wait for unread message on broadcast channel | udi@remserv (TNS V1-V3) |
ACTIVE | 579 | 101 | 8253 | wait for unread message on broadcast channel | Data Pump Master |
ACTIVE | 768 | 22709 | 2242588 | direct path read temp | Data Pump Worker |
ACTIVE | 197 | 337 | 5396 | wait for unread message on broadcast channel | udi@remserv (TNS V1-V3) |
ACTIVE | 579 | 101 | 8253 | wait for unread message on broadcast channel | Data Pump Master |
ACTIVE | 768 | 22709 | 2263775 | direct path read | Data Pump Worker |
As long as one session is doing the dirty work, my datapump job is progressing. The other 2 sessions (“Data Pump Master “ and “udi”) spend most of their time on the “wait for unread message on broadcast channel” event.
So in the examples, above, the impdp job is not stalled. But that SELECT is the key to identifying the roadblock stalling my job.
SELECT v.status, v.SID,v.serial#,io.block_changes,event, module,v.sql_id FROM v$sess_io io,v$session v WHERE io.SID=v.SID AND v.saddr IN (SELECT saddr FROM dba_datapump_sessions) ORDER BY io.BLOCK_CHANGES
STATUS | SID | SERIAL# | BLOCK | EVENT | MODULE |
ACTIVE | 197 | 337 | 5396 | wait for unread message on broadcast channel | udi@remserv(TNS V1-V3) |
ACTIVE | 579 | 101 | 8253 | wait for unread message on broadcast channel | Data Pump Master |
ACTIVE | 768 | 22709 | 2206601 | statement suspended, wait error to be cleared | Data Pump Worker |
Now we’ve got something: “statement suspended, wait error to be cleared” means that session is stuck on some issue and is resumable. It’s the same event as when a tablespace has been maxed out. But as stated previously, my tablespaces are in fine shape. What else can prevent my import from progressing? The storage!
In my case, it turns out that one of the filesystems hosting my index files filled up. Oracle can tell when its tablespaces are filling up, but not that when the FS is filled up, hence the absence of error everywhere.
/dev/dbserv_bench 76.00 0.00 100%
What is awesome about resumable errors, and AUTOEXTEND tablespaces, is that once the sysadmins add more storage, my impdp job resumes where it left off with no human interaction:
/dev/dbserv_bench 84.00 7.03 92%
STATUS | SID | SERIAL# | BLOCK | EVENT | MODULE |
ACTIVE | 197 | 337 | 5396 | wait for unread message on broadcast channel | udi@remserv (TNS V1-V3) |
ACTIVE | 579 | 101 | 8253 | wait for unread message on broadcast channel | Data Pump Master |
ACTIVE | 768 | 22709 | 2244006 | direct path read | Data Pump Worker |
SCHEMA_EXPORT/TABLE/INDEX/INDEX
SCHEMA_EXPORT/TABLE/CONSTRAINT/CONSTRAINT
Job " IMPDPuser "." PRODINTOBENCH" completed with 1 error at 15:01:56.
Appendix A: here’s an example of a PARALLEL=4 expdp in progress:
STATUS SID BLOCK_CHANGES EVENT MODULE
-------- ---------- ------------- -------------------------------------------------- ----------------------
ACTIVE 200 595 Datapump dump file I/O Data Pump Worker
ACTIVE 7 683 direct path read Data Pump Worker
ACTIVE 398 700 direct path read Data Pump Worker
ACTIVE 774 3299 wait for unread message on broadcast channel Data Pump Master
ACTIVE 972 5558 wait for unread message on broadcast channel ude@remserv (TNS V1-V3)
ACTIVE 1176 34381 direct path read Data Pump Worker
17-07-15 18:38:42 MYADMIN > /
STATUS SID BLOCK_CHANGES EVENT MODULE
-------- ---------- ------------- -------------------------------------------------- ----------------------
ACTIVE 200 714 direct path read Data Pump Worker
ACTIVE 7 719 Datapump dump file I/O Data Pump Worker
ACTIVE 398 736 direct path read Data Pump Worker
ACTIVE 774 3299 wait for unread message on broadcast channel Data Pump Master
ACTIVE 972 5558 wait for unread message on broadcast channel ude@remserv (TNS V1-V3)
ACTIVE 1176 34417 direct path read Data Pump Worker
17-07-15 18:40:01 MYADMIN > /
STATUS SID BLOCK_CHANGES EVENT MODULE
-------- ---------- ------------- -------------------------------------------------- ----------------------
ACTIVE 398 1482 direct path read Data Pump Worker
ACTIVE 7 1588 direct path read Data Pump Worker
ACTIVE 200 1597 direct path read Data Pump Worker
ACTIVE 774 3299 wait for unread message on broadcast channel Data Pump Master
ACTIVE 972 5558 wait for unread message on broadcast channel ude@remserv (TNS V1-V3)
ACTIVE 1176 35391 direct path read Data Pump Worker
Appendix B: here’s an example of a PARALLEL=4 impdp in progress. This one runs into a typical “full tablespace” issue:
STATUS THESESSION BLOCK_CHANGES MODULE EVENT
-------- ---------- ------------- ---------------------------- -----------------------------------------------
ACTIVE 598,40311 5270 udi@dbserv (TNS V1-V3) wait for unread message on broadcast channel
ACTIVE 582,8077 15080 Data Pump Worker wait for unread message on broadcast channel
ACTIVE 634,50998 18786 Data Pump Master wait for unread message on broadcast channel
ACTIVE 599,14675 22362 Data Pump Worker wait for unread message on broadcast channel
ACTIVE 607,49998 100170 Data Pump Worker wait for unread message on broadcast channel
ACTIVE 631,24303 10231592 Data Pump Worker statement suspended, wait error to be cleared
In that case, a telling error appears in your impdp log:
ORA-39171: Job is experiencing a resumable wait.
ORA-01691: unable to extend lob segment QA.SYS_LOB0000374957C00039$$ by 16 in tablespace QADATA
After increasing the size of the said tablespace, the import automatically resumes:
STATUS THESESSION BLOCK_CHANGES MODULE EVENT
-------- ---------- -------------- ---------------------------- ---------------------------------------------
ACTIVE 598,40311 5270 udi@dbserv (TNS V1-V3) wait for unread message on broadcast channel
ACTIVE 582,8077 15080 Data Pump Worker wait for unread message on broadcast channel
ACTIVE 634,50998 18786 Data Pump Master wait for unread message on broadcast channel
ACTIVE 599,14675 22362 Data Pump Worker wait for unread message on broadcast channel
ACTIVE 607,49998 100170 Data Pump Worker wait for unread message on broadcast channel
ACTIVE 631,24303 10855394 Data Pump Worker i/o slave wait
STATUS THESESSION BLOCK_CHANGES MODULE EVENT
-------- ---------- -------------- ---------------------------- ---------------------------------------------
ACTIVE 598,40311 5270 udi@dbserv (TNS V1-V3) wait for unread message on broadcast channel
ACTIVE 582,8077 15080 Data Pump Worker wait for unread message on broadcast channel
ACTIVE 634,50998 18792 Data Pump Master wait for unread message on broadcast channel
ACTIVE 599,14675 22362 Data Pump Worker wait for unread message on broadcast channel
ACTIVE 607,49998 238646 Data Pump Worker i/o slave wait
ACTIVE 631,24303 11832492 Data Pump Worker wait for unread message on broadcast channel