Skip to Main Content

Data Lake & Services

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

Sqoop export to Oracle DB fails for non-trivial file sizes

Christos Iraklis TsatsoulisDec 5 2016 — edited Jan 3 2017

Hi all,

We have been trying to export data from HDFS to Oracle DB using Sqoop in the latest Oracle Big Data Lite VM (4.6.0), to no avail so far.

The export is successful whenever the HDFS file is small (some tens of records), but fails with larger datasets (~ 300K records, 83 MB).

Here is what we have tried:

sqoop export --direct --connect jdbc:oracle:thin:@localhost:1521:orcl --username TEST --password test --table FROM_HDFS --export-dir /user/oracle/datathon --fields-terminated-by '|' --lines-terminated-by '\n'

And here is part of the output:

Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.

Please set $ACCUMULO_HOME to the root of your Accumulo installation.

16/12/05 08:17:26 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.8.0

[...]

**************************************************

*** Using Data Connector for Oracle and Hadoop ***

**************************************************

16/12/05 08:17:32 INFO oracle.OraOopManagerFactory: Oracle Database version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

16/12/05 08:17:32 INFO oracle.OraOopManagerFactory: This Oracle database is not a RAC.

16/12/05 08:17:32 INFO Configuration.deprecation: mapred.map.max.attempts is deprecated. Instead, use mapreduce.map.maxattempts

16/12/05 08:17:32 INFO tool.CodeGenTool: Beginning code generation

16/12/05 08:17:33 INFO manager.SqlManager: Executing SQL statement: SELECT "STTP","USERSESSIONID","REGISTEREDUSERID","PATHURL","ACTION","DATETIME","QUERY","CATEGORYID","SHOPID","PRODUCTID","REFURL","SKU_ID","RESULTS" FROM FROM_HDFS WHERE 0=1

16/12/05 08:17:33 INFO manager.SqlManager: Executing SQL statement: SELECT "STTP","USERSESSIONID","REGISTEREDUSERID","PATHURL","ACTION","DATETIME","QUERY","CATEGORYID","SHOPID","PRODUCTID","REFURL","SKU_ID","RESULTS" FROM "FROM_HDFS" WHERE 1=0

16/12/05 08:17:33 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce

Note: /tmp/sqoop-oracle/compile/51217b350ac880636c1682972c9d9bcf/FROM_HDFS.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

16/12/05 08:17:36 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-oracle/compile/51217b350ac880636c1682972c9d9bcf/FROM_HDFS.jar

16/12/05 08:17:36 INFO mapreduce.ExportJobBase: Beginning export of FROM_HDFS

16/12/05 08:17:36 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar

16/12/05 08:17:37 INFO manager.SqlManager: Executing SQL statement: SELECT "STTP","USERSESSIONID","REGISTEREDUSERID","PATHURL","ACTION","DATETIME","QUERY","CATEGORYID","SHOPID","PRODUCTID","REFURL","SKU_ID","RESULTS" FROM "FROM_HDFS" WHERE 1=0

[...]

16/12/05 08:17:41 INFO input.FileInputFormat: Total input paths to process : 1

16/12/05 08:17:41 INFO mapreduce.JobSubmitter: number of splits:4

16/12/05 08:17:41 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative

16/12/05 08:17:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1480943411756_0003

16/12/05 08:17:41 INFO impl.YarnClientImpl: Submitted application application_1480943411756_0003

16/12/05 08:17:41 INFO mapreduce.Job: The url to track the job: http://bigdatalite.localdomain:8088/proxy/application_1480943411756_0003/

16/12/05 08:17:41 INFO mapreduce.Job: Running job: job_1480943411756_0003

16/12/05 08:17:48 INFO mapreduce.Job: Job job_1480943411756_0003 running in uber mode : false

16/12/05 08:17:48 INFO mapreduce.Job:  map 0% reduce 0%

16/12/05 08:18:26 INFO mapreduce.Job:  map 100% reduce 0%

16/12/05 08:18:27 INFO mapreduce.Job: Job job_1480943411756_0003 failed with state FAILED due to: Task failed task_1480943411756_0003_m_000002

Job failed as tasks failed. failedMaps:1 failedReduces:0

16/12/05 08:18:27 INFO mapreduce.Job: Counters: 12

    Job Counters

        Failed map tasks=1

        Killed map tasks=3

        Launched map tasks=4

        Data-local map tasks=4

        Total time spent by all maps in occupied slots (ms)=141597

        Total time spent by all reduces in occupied slots (ms)=0

        Total time spent by all map tasks (ms)=141597

        Total vcore-seconds taken by all map tasks=141597

        Total megabyte-seconds taken by all map tasks=144995328

    Map-Reduce Framework

        CPU time spent (ms)=0

        Physical memory (bytes) snapshot=0

        Virtual memory (bytes) snapshot=0

16/12/05 08:18:27 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead

16/12/05 08:18:27 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 49.9247 seconds (0 bytes/sec)

16/12/05 08:18:27 INFO mapreduce.ExportJobBase: Exported 0 records.

16/12/05 08:18:27 ERROR tool.ExportTool: Error during export: Export job failed!

Target table FROM_HDFS has already been created in the database, and the appropriate privileges (and unlimited quota) have been given.

Running with --verbose option, we can see that the necessary JDBC driver is successfully located:

16/12/05 08:24:20 DEBUG mapreduce.JobBase: Adding to job classpath: file:/u01/orahivedp-2.1.1/jlib/ojdbc7.jar

Connection tests with sqoop eval & sqoop list-tables work properly:

sqoop eval --connect jdbc:oracle:thin:@localhost:1521:orcl --username TEST --password test --query "select 1 from dual"

[...]

------------------------

| 1                    |

------------------------

| 1                    |

------------------------

sqoop list-tables --connect jdbc:oracle:thin:@localhost:1521:orcl --username TEST --password test

[...]

FROM_HDFS

Tried variations of the export:

  • Both with and without --direct argument; in the second case, some records make it to the database table (exact number varies, seemingly randomly). No records are exported in the first case
  • Using ojdbc6.jar & ojdbc7.jar (same behavior in both cases)
  • Exports to MySQL work OK, but only with --direct argument
  • Increase no of mappers up to 8 (same behavior)
  • Tried both with internal & external database target tables (same behavior)

System info:

Sqoop 1.4.6

Oracle Database Enterprize 12.1.0.2.0 (64 bit)

(as provided in Oracle Big Data Lite VM 4.6.0)

Any ideas?

Many thanks in advance.

Message was edited by: Christos Iraklis Tsatsoulis

Comments

Locked Post
New comments cannot be posted to this locked post.

Post Details

Locked on Jan 31 2017
Added on Dec 5 2016
2 comments
3,135 views