Home:ALL Converter>Access a file in hadoop file system

Access a file in hadoop file system

Ask Time:2013-11-19T06:11:56         Author:Ben

Json Formatter

I have to access several files in a hadoop file system e.g. /user/.../data/somefile.txt I have no idea how to access these files. I have a code like shown below, but this doesn't work. So I tried things like "hdfs://user/....", "hdfs://localhost:50070/user/..." or using URI somehow (altough I don't really know how this works).

I was provided hadoop version 1.2.1 for this task and I'm working with ubuntu in a virtual machine and eclipse (without hadoop plug-in). I've never worked with hadoop before, so it would be great if you could help me.

     JobConf conf = new JobConf(TotalWordCount.class); 
     conf.setJobName("wordcount"); 

     conf.setOutputKeyClass(Text.class); 
     conf.setOutputValueClass(IntWritable.class); 

     conf.setMapperClass(Map.class); 
     conf.setCombinerClass(Reduce.class); 
     conf.setReducerClass(Reduce.class); 

     conf.setInputFormat(TextInputFormat.class); 
     conf.setOutputFormat(TextOutputFormat.class); 

     FileInputFormat.setInputPaths(conf, new Path("/user/.../data/textfile.txt")); 

    FileOutputFormat.setOutputPath(conf, new Path("/user/.../output"));

    LineProcessor.initializeStopWords();

    JobClient.runJob(conf); 

Running the code above I get an Error like this:

ERROR security.UserGroupInformation: PriviledgedActionException as:ds2013 cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/user/.../data/textfile.txt
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/user/.../data/textfile.txt

I also tried something like

 DistributedCache.addCacheFile((new Path("/user/.../data/textfile.txt")).toUri(), conf);
 Path[] paths = DistributedCache.getLocalCacheFiles(conf);
 Path cachePath = paths[0];
 BufferedReader stopListReader = new BufferedReader(new FileReader(cachePath.toString()));

But it can't find the File.

Exception in thread "main" java.io.FileNotFoundException: File /user/.../data/textfile.txt does not exist.

Author:Ben,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/20059027/access-a-file-in-hadoop-file-system
Ben :

Thanks guys for your help. the problem was that you simply can't run the program within eclipse as I did. When I run the jar using the terminal it finds the paths.",
2013-11-21T08:53:03
yy