Hi all,
We have been trying to export data from HDFS to Oracle DB using Sqoop in the latest Oracle Big Data Lite VM (4.6.0), to no avail so far.
The export is successful whenever the HDFS file is small (some tens of records), but fails with larger datasets (~ 300K records, 83 MB).
Here is what we have tried:
sqoop export --direct --connect jdbc:oracle:thin:@localhost:1521:orcl --username TEST --password test --table FROM_HDFS --export-dir /user/oracle/datathon --fields-terminated-by '|' --lines-terminated-by '\n'
And here is part of the output:
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/12/05 08:17:26 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.8.0
[...]
**************************************************
*** Using Data Connector for Oracle and Hadoop ***
**************************************************
16/12/05 08:17:32 INFO oracle.OraOopManagerFactory: Oracle Database version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
16/12/05 08:17:32 INFO oracle.OraOopManagerFactory: This Oracle database is not a RAC.
16/12/05 08:17:32 INFO Configuration.deprecation: mapred.map.max.attempts is deprecated. Instead, use mapreduce.map.maxattempts
16/12/05 08:17:32 INFO tool.CodeGenTool: Beginning code generation
16/12/05 08:17:33 INFO manager.SqlManager: Executing SQL statement: SELECT "STTP","USERSESSIONID","REGISTEREDUSERID","PATHURL","ACTION","DATETIME","QUERY","CATEGORYID","SHOPID","PRODUCTID","REFURL","SKU_ID","RESULTS" FROM FROM_HDFS WHERE 0=1
16/12/05 08:17:33 INFO manager.SqlManager: Executing SQL statement: SELECT "STTP","USERSESSIONID","REGISTEREDUSERID","PATHURL","ACTION","DATETIME","QUERY","CATEGORYID","SHOPID","PRODUCTID","REFURL","SKU_ID","RESULTS" FROM "FROM_HDFS" WHERE 1=0
16/12/05 08:17:33 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-oracle/compile/51217b350ac880636c1682972c9d9bcf/FROM_HDFS.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/12/05 08:17:36 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-oracle/compile/51217b350ac880636c1682972c9d9bcf/FROM_HDFS.jar
16/12/05 08:17:36 INFO mapreduce.ExportJobBase: Beginning export of FROM_HDFS
16/12/05 08:17:36 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
16/12/05 08:17:37 INFO manager.SqlManager: Executing SQL statement: SELECT "STTP","USERSESSIONID","REGISTEREDUSERID","PATHURL","ACTION","DATETIME","QUERY","CATEGORYID","SHOPID","PRODUCTID","REFURL","SKU_ID","RESULTS" FROM "FROM_HDFS" WHERE 1=0
[...]
16/12/05 08:17:41 INFO input.FileInputFormat: Total input paths to process : 1
16/12/05 08:17:41 INFO mapreduce.JobSubmitter: number of splits:4
16/12/05 08:17:41 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
16/12/05 08:17:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1480943411756_0003
16/12/05 08:17:41 INFO impl.YarnClientImpl: Submitted application application_1480943411756_0003
16/12/05 08:17:41 INFO mapreduce.Job: The url to track the job: http://bigdatalite.localdomain:8088/proxy/application_1480943411756_0003/
16/12/05 08:17:41 INFO mapreduce.Job: Running job: job_1480943411756_0003
16/12/05 08:17:48 INFO mapreduce.Job: Job job_1480943411756_0003 running in uber mode : false
16/12/05 08:17:48 INFO mapreduce.Job: map 0% reduce 0%
16/12/05 08:18:26 INFO mapreduce.Job: map 100% reduce 0%
16/12/05 08:18:27 INFO mapreduce.Job: Job job_1480943411756_0003 failed with state FAILED due to: Task failed task_1480943411756_0003_m_000002
Job failed as tasks failed. failedMaps:1 failedReduces:0
16/12/05 08:18:27 INFO mapreduce.Job: Counters: 12
Job Counters
Failed map tasks=1
Killed map tasks=3
Launched map tasks=4
Data-local map tasks=4
Total time spent by all maps in occupied slots (ms)=141597
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=141597
Total vcore-seconds taken by all map tasks=141597
Total megabyte-seconds taken by all map tasks=144995328
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
16/12/05 08:18:27 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
16/12/05 08:18:27 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 49.9247 seconds (0 bytes/sec)
16/12/05 08:18:27 INFO mapreduce.ExportJobBase: Exported 0 records.
16/12/05 08:18:27 ERROR tool.ExportTool: Error during export: Export job failed!
Target table FROM_HDFS has already been created in the database, and the appropriate privileges (and unlimited quota) have been given.
Running with --verbose option, we can see that the necessary JDBC driver is successfully located:
16/12/05 08:24:20 DEBUG mapreduce.JobBase: Adding to job classpath: file:/u01/orahivedp-2.1.1/jlib/ojdbc7.jar
Connection tests with sqoop eval & sqoop list-tables work properly:
sqoop eval --connect jdbc:oracle:thin:@localhost:1521:orcl --username TEST --password test --query "select 1 from dual"
[...]
------------------------
| 1 |
------------------------
| 1 |
------------------------
sqoop list-tables --connect jdbc:oracle:thin:@localhost:1521:orcl --username TEST --password test
[...]
FROM_HDFS
Tried variations of the export:
- Both with and without --direct argument; in the second case, some records make it to the database table (exact number varies, seemingly randomly). No records are exported in the first case
- Using ojdbc6.jar & ojdbc7.jar (same behavior in both cases)
- Exports to MySQL work OK, but only with --direct argument
- Increase no of mappers up to 8 (same behavior)
- Tried both with internal & external database target tables (same behavior)
System info:
Sqoop 1.4.6
Oracle Database Enterprize 12.1.0.2.0 (64 bit)
(as provided in Oracle Big Data Lite VM 4.6.0)
Any ideas?
Many thanks in advance.
Message was edited by: Christos Iraklis Tsatsoulis