I set up a 2-node Hadoop cluster, and running start-df.sh and start-yarn.sh works nicely (i.e. all expected services are running, no errors in the logs).
However, when I actually try to run an application, several tasks fail:
15/04/01 15:27:53 INFO mapreduce.Job: Task Id :
attempt_1427894767376_0001_m_000008_2, Status : FAILED
I checked the yarn and datanode logs, but nothing is reported there.
In the userlogs, the syslogs files on the slave node all contain the following error message:
2015-04-01 15:27:21,077 INFO [main] org.apache.hadoop.ipc.Client:
Retrying connect to server:
slave.domain.be./127.0.1.1:53834. Already tried 9 time(s);
retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
2015-04-01 15:27:21,078 WARN [main]
org.apache.hadoop.mapred.YarnChild:
Exception running child :
java.net.ConnectException: Call From
slave.domain.be./127.0.1.1 to
slave.domain.be.:53834 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
So the problem is that the slave cannot connect to itself..
I checked whether there is a process running on the slave node listening at port 53834, but there is none.
However, all 'expected' ports are being listened on (50020,50075,..). Nowhere in my configuration I have used port 53834. It's always a different port on different runs.
Any ideas on fixing this issue?