Skip to Main Content

Database Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

Idle workers in IMPDP with parallel option

876123Sep 30 2011 — edited Oct 12 2011
Hello gurus,

I took a datapump export of a 29GB partitioned table with the following parameters and was extremely fast (< 10 minutes):

parallel=4
filesize=500MB
directory=PUMP_DIR
estimate=statistics
dumpfile=exp%U.dmp
reuse_dumpfiles=y
logfile=export.log
tables=user1.t1

The export produced 4 parallel workers which were active the whole time, hence why the large speed.

However when I tried to take a datapump import on the same database on an empty table (different schema), the performance was very poor (55 minutes):

parallel=4
directory=PUMP_DIR
dumpfile=exp%U.dmp
logfile=import.log
tables=user1.t1
remap_schema=user1:user2
table_exists_action=append

I noticed that the parallel workers were idle all the time (irrelevant of the parallelism degree I used) and the whole import was serialized.

Can someone give me some insight why the parallel workers were idle during the IMPDP?


[r00tb0x|http://www.r00tb0x.com]
This post has been answered by Dean Gagne-Oracle on Oct 11 2011
Jump to Answer

Comments

Dean Gagne-Oracle
Data Pump parallel works different in export and import. There are also 2 forms of parallel that you can see in both export and import. Data Pump will determine the best type of parallism to use.

Worker parallelism - this is where you will see multiple workers working on different objects.

Parallel execution slaves - this is where you see pq slaves working on the same table loading different data.

------------------
export parallelism:

For export - metadata is always exported serially. When an export job is started, 1 worker is started to fetch all of the names of the tables, partitions, and subpartions. This is known as the estimate phase. Once this is complete, this same worker will start exporting the remaining metadata. (types, users, tablespaces, etc. what ever is in the job).

The Data Pump job will not determine which parallelism is best for the data. It can use to unload data using direct path or external tables. If direct path is selected, it is exported using just 1 worker process. If external tables is selected, it can use 1 worker process or n pq slaves. So, let's say the next data to be unloaded was going to be done using pq slaves and let's say that the # of pq slaves was 3 and you said parallel = 4. The first worker is busy unloading metadata so all you have left for parallelism is 3. When that table data object gets scheduled, you will see 1 worker started and 3 pq slaves. The worker is just a coordinator so it does not count towards the parallelism number. You will have the 1st worker doing work and the 3 pq slaves doing work. You would also have the 2nd worker just coordinating stuff. So, parlalelism would be 4.

Now if Data Pump set all of your tables to use direct path, you would see:

worker 1 - unloading metadata
worker 2 - unloading table 1
worker 3 - unloading table 2
worker 4 - unloading talbe 3

when worker 2 finished, it will get the next table to unload.

You could also see a mix. So, if you are just looking at workers, you won't always see all of them busy.

IMPORT:

This works differently. Metadata is "always" created serailly. I'll explain "always" later. When a job is started, 1 worker will create all metadata up until all of the tables are created. At this point, the data is loaded in parallel using either multiple workers or multiple pq slaves. Again - depending on the best/fastest method that Data Pump determines. Once data is complete, the remaining metadata is loaded using just worker 1. So, if you are looking at a job after data is loaded,you may see idle workers. If you are looking at a job when pq slaves are very busy, you may see idle workers.

Now - "always' created serially.
- Indexes are created serially, but built using pq slaves. When indexes are being built, you will see idle workers, but you may see pq slaves running.
- package bodies are created in parallel. You would see multiple workers created package bodies at the same time.

Other than those 2 exceptions - metadata objects are created serially.

Hope this helps.

Dean
876123
I noticed that the table is being held in an exclusive DML lock by the single worker process which is doing the impdp. Hence the other processes are idle.

Is this a normal behaviour or some kind of bug?

r00tb0x
Dean Gagne-Oracle
Answer
Based on what you are seeing, I'm going to assume that you are doing a data only import, or at minimum, your tables already exist and you are importing partitioned tables. If that is true, then you are hitting a situation that is known about. Here is what is happening:

This is true for partitioned and subpartitioned tables and only if the Data Pump job that loads the data did not also create the table. The last part of the preceding sentence is what makes this true. If you run a Data Pump job that creates just the tables, then run another import job that loads the data, even if they are from the same dumpfile, you will hit this situation. The situation is:

When the Data Pump job that is loading data into a partitioned or subpartitioned table did not create the table, then Data Pump can't be sure
that the partitioning key is the same. Because of this, when Data Pump loads the data, it takes out a table lock and this blocks any other parallel workers from loading into that table. If the Data Pump job created the table, then only a partition or subpartition lock is taken out and other workers are free to take out locks on different partitions/subpartitions in the same table.

What I'm assuming you are seeing is that all of the workers are trying to load data into the same table but different partitions and one of the workers has the table lock. This will block other workers.

There is a 'fix' to this but it is a minimum fix. Data Pump sill can't load into multiple partitions since it still needs to take out the table lock, but instead, only 1 partitoin/subpartitoin will be scheduled at 1 time. You may not see the max parallel being used in these cases, but you won't see the other workers waiting for an exclusive lock on the same table.

I don't recall the patch number and I don't recall what version it went into, but Oracle Support should be able to help you on that.

If the same Data Pump job is creating the tables, or if these tables are not partitioned/subpartitioned, then I have not heard of this issue.

Thanks

Dean
Marked as Answer by 876123 · Sep 27 2020
876123
Thanks Dean, I confirmed with Oracle that this was the case...Upgrading to 11.2.0.2 solves the issue....
Dean Gagne-Oracle
Awesome!!! Glad that worked out for you.

Dean
1 - 5
Locked Post
New comments cannot be posted to this locked post.

Post Details

Locked on Nov 9 2011
Added on Sep 30 2011
5 comments
17,173 views