I have to access several files in a hadoop file system e.g. /user/.../data/somefile.txt
I have no idea how to access these files. I have a code like shown below, but this doesn't work. So I tried things like "hdfs://user/....", "hdfs://localhost:50070/user/..." or using URI somehow (altough I don't really know how this works).
I was provided hadoop version 1.2.1 for this task and I'm working with ubuntu in a virtual machine and eclipse (without hadoop plug-in).
I've never worked with hadoop before, so it would be great if you could help me.
JobConf conf = new JobConf(TotalWordCount.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path("/user/.../data/textfile.txt"));
FileOutputFormat.setOutputPath(conf, new Path("/user/.../output"));
LineProcessor.initializeStopWords();
JobClient.runJob(conf);
Running the code above I get an Error like this:
ERROR security.UserGroupInformation: PriviledgedActionException as:ds2013 cause:org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/user/.../data/textfile.txt
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/user/.../data/textfile.txt
I also tried something like
DistributedCache.addCacheFile((new Path("/user/.../data/textfile.txt")).toUri(), conf);
Path[] paths = DistributedCache.getLocalCacheFiles(conf);
Path cachePath = paths[0];
BufferedReader stopListReader = new BufferedReader(new FileReader(cachePath.toString()));
But it can't find the File.
Exception in thread "main" java.io.FileNotFoundException: File /user/.../data/textfile.txt does not exist.