2 Replies Latest reply on Jan 3, 2017 9:09 AM by Christos Iraklis Tsatsoulis

    Sqoop export to Oracle DB fails for non-trivial file sizes

    Christos Iraklis Tsatsoulis

      Hi all,

       

      We have been trying to export data from HDFS to Oracle DB using Sqoop in the latest Oracle Big Data Lite VM (4.6.0), to no avail so far.

       

      The export is successful whenever the HDFS file is small (some tens of records), but fails with larger datasets (~ 300K records, 83 MB).

       

      Here is what we have tried:

       

      sqoop export --direct --connect jdbc:oracle:thin:@localhost:1521:orcl --username TEST --password test --table FROM_HDFS --export-dir /user/oracle/datathon --fields-terminated-by '|' --lines-terminated-by '\n'

       

      And here is part of the output:

       

      Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.

      Please set $ACCUMULO_HOME to the root of your Accumulo installation.

      16/12/05 08:17:26 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.8.0

      [...]

      **************************************************

      *** Using Data Connector for Oracle and Hadoop ***

      **************************************************

      16/12/05 08:17:32 INFO oracle.OraOopManagerFactory: Oracle Database version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

      16/12/05 08:17:32 INFO oracle.OraOopManagerFactory: This Oracle database is not a RAC.

      16/12/05 08:17:32 INFO Configuration.deprecation: mapred.map.max.attempts is deprecated. Instead, use mapreduce.map.maxattempts

      16/12/05 08:17:32 INFO tool.CodeGenTool: Beginning code generation

      16/12/05 08:17:33 INFO manager.SqlManager: Executing SQL statement: SELECT "STTP","USERSESSIONID","REGISTEREDUSERID","PATHURL","ACTION","DATETIME","QUERY","CATEGORYID","SHOPID","PRODUCTID","REFURL","SKU_ID","RESULTS" FROM FROM_HDFS WHERE 0=1

      16/12/05 08:17:33 INFO manager.SqlManager: Executing SQL statement: SELECT "STTP","USERSESSIONID","REGISTEREDUSERID","PATHURL","ACTION","DATETIME","QUERY","CATEGORYID","SHOPID","PRODUCTID","REFURL","SKU_ID","RESULTS" FROM "FROM_HDFS" WHERE 1=0

      16/12/05 08:17:33 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce

      Note: /tmp/sqoop-oracle/compile/51217b350ac880636c1682972c9d9bcf/FROM_HDFS.java uses or overrides a deprecated API.

      Note: Recompile with -Xlint:deprecation for details.

      16/12/05 08:17:36 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-oracle/compile/51217b350ac880636c1682972c9d9bcf/FROM_HDFS.jar

      16/12/05 08:17:36 INFO mapreduce.ExportJobBase: Beginning export of FROM_HDFS

      16/12/05 08:17:36 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar

      16/12/05 08:17:37 INFO manager.SqlManager: Executing SQL statement: SELECT "STTP","USERSESSIONID","REGISTEREDUSERID","PATHURL","ACTION","DATETIME","QUERY","CATEGORYID","SHOPID","PRODUCTID","REFURL","SKU_ID","RESULTS" FROM "FROM_HDFS" WHERE 1=0

      [...]

      16/12/05 08:17:41 INFO input.FileInputFormat: Total input paths to process : 1

      16/12/05 08:17:41 INFO mapreduce.JobSubmitter: number of splits:4

      16/12/05 08:17:41 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative

      16/12/05 08:17:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1480943411756_0003

      16/12/05 08:17:41 INFO impl.YarnClientImpl: Submitted application application_1480943411756_0003

      16/12/05 08:17:41 INFO mapreduce.Job: The url to track the job: http://bigdatalite.localdomain:8088/proxy/application_1480943411756_0003/

      16/12/05 08:17:41 INFO mapreduce.Job: Running job: job_1480943411756_0003

      16/12/05 08:17:48 INFO mapreduce.Job: Job job_1480943411756_0003 running in uber mode : false

      16/12/05 08:17:48 INFO mapreduce.Job:  map 0% reduce 0%

      16/12/05 08:18:26 INFO mapreduce.Job:  map 100% reduce 0%

      16/12/05 08:18:27 INFO mapreduce.Job: Job job_1480943411756_0003 failed with state FAILED due to: Task failed task_1480943411756_0003_m_000002

      Job failed as tasks failed. failedMaps:1 failedReduces:0

       

      16/12/05 08:18:27 INFO mapreduce.Job: Counters: 12

          Job Counters

              Failed map tasks=1

              Killed map tasks=3

              Launched map tasks=4

              Data-local map tasks=4

              Total time spent by all maps in occupied slots (ms)=141597

              Total time spent by all reduces in occupied slots (ms)=0

              Total time spent by all map tasks (ms)=141597

              Total vcore-seconds taken by all map tasks=141597

              Total megabyte-seconds taken by all map tasks=144995328

          Map-Reduce Framework

              CPU time spent (ms)=0

              Physical memory (bytes) snapshot=0

              Virtual memory (bytes) snapshot=0

      16/12/05 08:18:27 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead

      16/12/05 08:18:27 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 49.9247 seconds (0 bytes/sec)

      16/12/05 08:18:27 INFO mapreduce.ExportJobBase: Exported 0 records.

      16/12/05 08:18:27 ERROR tool.ExportTool: Error during export: Export job failed!

       

      Target table FROM_HDFS has already been created in the database, and the appropriate privileges (and unlimited quota) have been given.

       

      Running with --verbose option, we can see that the necessary JDBC driver is successfully located:

       

      16/12/05 08:24:20 DEBUG mapreduce.JobBase: Adding to job classpath: file:/u01/orahivedp-2.1.1/jlib/ojdbc7.jar

       

       

      Connection tests with sqoop eval & sqoop list-tables work properly:

       

      sqoop eval --connect jdbc:oracle:thin:@localhost:1521:orcl --username TEST --password test --query "select 1 from dual"

      [...]

      ------------------------

      | 1                    |

      ------------------------

      | 1                    |

      ------------------------

       

      sqoop list-tables --connect jdbc:oracle:thin:@localhost:1521:orcl --username TEST --password test

      [...]

      FROM_HDFS

       

       

       

      Tried variations of the export:

       

      • Both with and without --direct argument; in the second case, some records make it to the database table (exact number varies, seemingly randomly). No records are exported in the first case
      • Using ojdbc6.jar & ojdbc7.jar (same behavior in both cases)
      • Exports to MySQL work OK, but only with --direct argument
      • Increase no of mappers up to 8 (same behavior)
      • Tried both with internal & external database target tables (same behavior)

       

      System info:

      Sqoop 1.4.6

      Oracle Database Enterprize 12.1.0.2.0 (64 bit)

      (as provided in Oracle Big Data Lite VM 4.6.0)

       

      Any ideas?

       

      Many thanks in advance.

       

      Message was edited by: Christos Iraklis Tsatsoulis