I am new to Docker and trying to build a Hadoop cluster with Docker Swarm. I tried to build it with docker compose and it worked perfectly. However, I would like to add other services like Hive, Spark, HBase to it in the future so a Swarm seems a better idea.
When I tried to run it with a version 3.7 yaml file, the namenode and datanodes started successfully. But when I visited the web UI, it showed that there is no nodes available at the "Datanodes" tab (neither at the "Overview" tab). It seems the datanodes failed to connect to the namenode. I had checked the port of each node with netstat -tuplen
and both 7946 and 4789 worked fine.
Here is the yaml file I used:
version: "3.7"
services:
namenode:
image: flokkr/hadoop:latest
hostname: namenode
networks:
- hbase
command: ["hdfs","namenode"]
ports:
- target: 50070
published: 50070
- target: 9870
published: 9870
environment:
- NAMENODE_INIT=hdfs dfs -chmod 777 /
- ENSURE_NAMENODE_DIR=/tmp/hadoop-hadoop/dfs/name
env_file:
- ./compose-config
deploy:
mode: replicated
replicas: 1
restart_policy:
condition: on-failure
placement:
constraints:
- node.role == manager
datanode:
image: flokkr/hadoop:latest
networks:
- hbase
command: ["hdfs","datanode"]
env_file:
- ./compose-config
deploy:
mode: global
restart_policy:
condition: on-failure
volumes:
namenode:
datanode:
networks:
hbase:
name: hbase
Basically I just update the yaml file from this repo to version 3.7 and tried to run it on GCP. And here is my repo in case you want to replicate the case.
And this is the status of ports of the manager node:
the worker node:
Thank you for your help!