I want to connect to a remote Spark Master running on Docker, via Python in my local machine:
spark = SparkSession \
.builder \.
.master('spark://spark-spark-master:7077') \
.appName('spark-yarn') \
.getOrCreate()
I get Connection Refused
error from running the code.
Running telnet ip 7077
in my terminal gives the error:
telnet: Unable to connect to remote host: Connection refused
This is confusing, because the port on the server itself is open and the server is accepting connection from port 7077.
Running docker container ls
on the server shows:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9086cf2f26dc bde2020/spark-master:3.0.1-hadoop3.2 "/bin/bash /master.sh" 2 weeks ago Up 2 weeks 6066/tcp, 8080/tcp, 0.0.0.0:7077->7077/tcp spark_spark-master.1.qyie2bq52hbrfg2ttioz6ljwq
5133adc223ef bde2020/spark-worker:3.0.1-hadoop3.2 "/bin/bash /worker.sh" 2 weeks ago Up 2 weeks 8081/tcp spark_spark-worker.ylnj52bj78as9hxr6zdo1lgo3.kwzys14lm3uid0qclyv0jn95o
da2841b1d757 bde2020/hadoop-nodemanager:2.0.0-hadoop3.2.1-java8 "/entrypoint.sh /run…" 2 months ago Up 2 months (healthy) 8042/tcp hadoop_nodemanager.ylnj52bj78as9hxr6zdo1lgo3.o9gznaa9u57wuyf21fl9ya4hi
49a3cbb8073a bde2020/hadoop-resourcemanager:2.0.0-hadoop3.2.1-java8 "/entrypoint.sh /run…" 2 months ago Up 2 months 8088/tcp hadoop_resourcemanager.1.7kwgmhxz74brj6xs218k81ptk
10b22205a879 bde2020/hadoop-historyserver:2.0.0-hadoop3.2.1-java8 "/entrypoint.sh /run…" 2 months ago Up 2 months (healthy) 8188/tcp hadoop_historyserver.1.p3c3ouxmayxt4rvhrjlq7ti4t
775209433ea8 bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8 "/entrypoint.sh /run…" 2 months ago Up 2 months (healthy) 0.0.0.0:9000->9000/tcp, 0.0.0.0:9870->9870/tcp hadoop_namenode.1.bbt0n4ne76ddwqtmejlsf590m
5d14d16020e5 bde2020/hadoop-datanode:2.0.0-hadoop3.2.1-java8 "/entrypoint.sh /run…" 2 months ago Up 2 months (healthy) 9864/tcp hadoop_datanode.ylnj52bj78as9hxr6zdo1lgo3.e9drmfbdqicux6ltkk9gv2uh5
83f7b3290995 traefik:v2.2 "/entrypoint.sh --ap…" 2 months ago Up 2 months 80/tcp traefik_traefik.1.ha8o6dc3ewtmppkn4pauugkj
The docker-compose.yml file for spark is:
version: '3.6'
services:
spark-master:
image: bde2020/spark-master:3.0.1-hadoop3.2
networks:
- workbench
ports:
- target: 7077
published: 7077
mode: host
deploy:
restart_policy:
condition: on-failure
placement:
constraints:
- node.hostname == johnsnow
labels:
- "traefik.enable=true"
- "traefik.docker.network=workbench"
- "traefik.http.services.spark-master.loadbalancer.server.port=8080"
env_file:
- ./hadoop.env
environment:
- INIT_DAEMON_STEP=setup_spark
- "constraint:node==spark-master"
spark-worker:
image: bde2020/spark-worker:3.0.1-hadoop3.2
networks:
- workbench
environment:
- SPARK_MASTER_URL=spark://spark_spark-master:7077
deploy:
mode: global
restart_policy:
condition: on-failure
labels:
- "traefik.enable=true"
- "traefik.docker.network=workbench"
- "traefik.http.services.spark-worker.loadbalancer.server.port=8081"
env_file:
- ./hadoop.env
environment:
- INIT_DAEMON_STEP=setup_spark
- "constraint:node==spark-worker"
networks:
workbench:
external: true
Why is this error occurring?
Edit: content of spark-spark-master after running docker network -v workbench
},
"0bce286b736b1368738aa7504e1219dd12d855f7d79fc8f17d6a04b98ebe0ec1": {
"Name": "spark_spark-master.1.2tdsfzhbyl1wr8omjh518lhwg",
"EndpointID": "ae7e503e911039c3f02469a968536035b88c0213bcf09a1def49eaf7853b9085",
"MacAddress": "02:42:0a:00:01:07",
"IPv4Address": "10.0.1.7/24",
"IPv6Address": ""
},
...
"spark_spark-master": {
"VIP": "10.0.1.70",
"Ports": [],
"LocalLBIndex": 257,
"Tasks": [
{
"Name": "spark_spark-master.1.2tdsfzhbyl1wr8omjh518lhwg",
"EndpointID": "ae7e503e911039c3f02469a968536035b88c0213bcf09a1def49eaf7853b9085",
"EndpointIP": "10.0.1.7",
"Info": {
"Host IP": "ip"
}
}
]
},
workbench
network? What is the output ofdocker network inspect -v workbench
?ip
? Should that bespark_spark-master
or some other hostname instead of an IP?workbench
? If I create it withdocker network create workbench
and then run your docker-compose.yml, everything works for me. It looks likeworkbench
uses the IP range10.0.1.X
. Are the ip routes set correctly for this network (ip route
)?