1 Reply Latest reply: Sep 25, 2013 6:16 PM by dvohra21 RSS

    java.io.FileNotFoundException in DistributedCache.addCacheFile

    Joseph Hwang

      This is my Map side Join codes. I use Eclipse IDE with hadoop-1.0.3-plugin.

       

      ===  Mapper  ====

      public class MapperWithMapsideJoin extends Mapper<LongWritable, Text, Text, Text> {

          private Hashtable<String, String> joinMap = new Hashtable<String, String>();

          private Text outputKey = new Text();

         

          public void setup(Context context) throws IOException, InterruptedException {

              Path[] cacheFiles = DistributedCache.getLocalCacheFiles(context.getConfiguration());

              if(cacheFiles != null && cacheFiles.length > 0) {

                  String line;

                  String[] tokens;

                 

                  System.out.println("Mapper : " + cacheFiles[0].toString()); // checking cachFiles locations

                 

                  BufferedReader br = new BufferedReader(new FileReader(cacheFiles[0].toString()));

                  while ((line = br.readLine()) != null) {

                      tokens = line.toString().replaceAll("\"", "").split(",");

                      joinMap.put(tokens[0], tokens[1]);

                  }

                  br.close();

              } else {

                  System.out.println("cache files is null");

              }

          }

         

          public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

              if (key.get() >0) {

                  String[] columns = value.toString().split(",");

                  if(columns != null && columns.length >0) {

                      outputKey.set(joinMap.get(columns[8]));

                      context.write(outputKey, value);

                  }

              }

          }

      }

       

      === Driver ===

      public class MapsideJoin {

       

          public static void main(String[] args) throws Exception {

              // TODO Auto-generated method stub

              Configuration conf = new Configuration();

              Job job = new Job(conf, "MapsideJopin");

             

              // 분산캐쉬 적용

              DistributedCache.addCacheFile(new Path("/home/user01/carriers.csv").toUri(), job.getConfiguration());

              System.out.println("Cache : " + job.getConfiguration().get("mapred.cache.files")); //Checking cache Files location!

             

              FileInputFormat.addInputPath(job, new Path("/home/user01/input"));

              FileOutputFormat.setOutputPath(job, new Path("/home/user01/output"));

             

              job.setJarByClass(MapsideJoin.class);

              job.setMapperClass(MapperWithMapsideJoin.class);

              job.setNumReduceTasks(0);

             

              job.setInputFormatClass(TextInputFormat.class);

              job.setOutputFormatClass(TextOutputFormat.class);

             

              job.setOutputKeyClass(Text.class);

              job.setOutputValueClass(Text.class);

             

              job.waitForCompletion(true);

          }

      }

       

      In driver class cacheFiles location is correct. It shows like below

       

      Cache : /home/user01/carriers.csv

       

      But  In mapper class cacheFiles location is now correct. It is like below with wrong file locations

       

      Mapper : /tmp/hadoop-user01/mapred/local/archive/385054838454132541_2125749813_1888223920/file/home/user01/carriers.csv

       

      So,

       

      BufferedReader br = new BufferedReader(new FileReader(cacheFiles[0].toString()));

       

      throws this FileNotFoundException.

       

      java.io.FileNotFoundException: /tmp/hadoop-user01/mapred/local/archive/385054838454132541_2125749813_1888223920/file/home/user01/carriers.csv (No such Files and Directories)

       

      How can I get the proper location string of cachFiles?  Any help will be appreciated. Thanks in advance.