I'm using Hadoop streaming to run some Python code. I have noticed that if there is an error in my Python code (in mapper.py, for example), I won't be notified about the error. Instead, the mapper program will fail to run, and the job will be killed after a few seconds. Viewing the logs, the only error I see is that mapper.py failed to run or was not found, which is clearly not the case.
My question is, is there a specific log file I can check to see actual errors that may exist in the mapper.py code? (For example, would tell me if an import command failed)
Thank you!
edit: The command used:
bin/hadoop jar contrib/streaming/hadoop-streaming.jar \ -file /hadoop/mapper.py -mapper /hadoop/mapper.py -file /hadoop/reducer.py -reducer /hadoop/reducer.py -input /hadoop/input.txt -output /hadoop/output
and the post I am referencing for which I'd like to see the errors:
Hadoop and NLTK: Fails with stopwords
vinaut :
About the log question, see it this helps :\n\nMapReduce: Log file locations for stdout and std err\n\nI suppose that if the python file fails to run, then the interpreter should print to stdout, and you would see it in the stdout log of that node.",
2013-09-30T18:51:36