7

I want to connect to a remote Spark Master running on Docker, via Python in my local machine:

spark = SparkSession \
    .builder \.
    .master('spark://spark-spark-master:7077') \
    .appName('spark-yarn') \
    .getOrCreate()

I get Connection Refused error from running the code.

Running telnet ip 7077 in my terminal gives the error:

telnet: Unable to connect to remote host: Connection refused

This is confusing, because the port on the server itself is open and the server is accepting connection from port 7077.

Running docker container ls on the server shows:

CONTAINER ID        IMAGE                                                    COMMAND                  CREATED             STATUS                  PORTS                                            NAMES
9086cf2f26dc        bde2020/spark-master:3.0.1-hadoop3.2                     "/bin/bash /master.sh"   2 weeks ago         Up 2 weeks              6066/tcp, 8080/tcp, 0.0.0.0:7077->7077/tcp       spark_spark-master.1.qyie2bq52hbrfg2ttioz6ljwq
5133adc223ef        bde2020/spark-worker:3.0.1-hadoop3.2                     "/bin/bash /worker.sh"   2 weeks ago         Up 2 weeks              8081/tcp                                         spark_spark-worker.ylnj52bj78as9hxr6zdo1lgo3.kwzys14lm3uid0qclyv0jn95o
da2841b1d757        bde2020/hadoop-nodemanager:2.0.0-hadoop3.2.1-java8       "/entrypoint.sh /run…"   2 months ago        Up 2 months (healthy)   8042/tcp                                         hadoop_nodemanager.ylnj52bj78as9hxr6zdo1lgo3.o9gznaa9u57wuyf21fl9ya4hi
49a3cbb8073a        bde2020/hadoop-resourcemanager:2.0.0-hadoop3.2.1-java8   "/entrypoint.sh /run…"   2 months ago        Up 2 months             8088/tcp                                         hadoop_resourcemanager.1.7kwgmhxz74brj6xs218k81ptk
10b22205a879        bde2020/hadoop-historyserver:2.0.0-hadoop3.2.1-java8     "/entrypoint.sh /run…"   2 months ago        Up 2 months (healthy)   8188/tcp                                         hadoop_historyserver.1.p3c3ouxmayxt4rvhrjlq7ti4t
775209433ea8        bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8          "/entrypoint.sh /run…"   2 months ago        Up 2 months (healthy)   0.0.0.0:9000->9000/tcp, 0.0.0.0:9870->9870/tcp   hadoop_namenode.1.bbt0n4ne76ddwqtmejlsf590m
5d14d16020e5        bde2020/hadoop-datanode:2.0.0-hadoop3.2.1-java8          "/entrypoint.sh /run…"   2 months ago        Up 2 months (healthy)   9864/tcp                                         hadoop_datanode.ylnj52bj78as9hxr6zdo1lgo3.e9drmfbdqicux6ltkk9gv2uh5
83f7b3290995        traefik:v2.2                                             "/entrypoint.sh --ap…"   2 months ago        Up 2 months             80/tcp                                           traefik_traefik.1.ha8o6dc3ewtmppkn4pauugkj

The docker-compose.yml file for spark is:

version: '3.6'
services:
  spark-master:
    image: bde2020/spark-master:3.0.1-hadoop3.2
    networks:
      - workbench
    ports:
      - target: 7077
        published: 7077
        mode: host
    deploy:
      restart_policy:
        condition: on-failure
      placement:
        constraints:
          - node.hostname == johnsnow
      labels:
        - "traefik.enable=true"
        - "traefik.docker.network=workbench"
        - "traefik.http.services.spark-master.loadbalancer.server.port=8080"
    env_file:
      - ./hadoop.env
    environment:
    - INIT_DAEMON_STEP=setup_spark
    - "constraint:node==spark-master"

  spark-worker:
    image: bde2020/spark-worker:3.0.1-hadoop3.2
    networks:
      - workbench
    environment:
      - SPARK_MASTER_URL=spark://spark_spark-master:7077
    deploy:
      mode: global
      restart_policy:
        condition: on-failure
      labels:
        - "traefik.enable=true"
        - "traefik.docker.network=workbench"
        - "traefik.http.services.spark-worker.loadbalancer.server.port=8081"
    env_file:
      - ./hadoop.env
    environment:
    - INIT_DAEMON_STEP=setup_spark
    - "constraint:node==spark-worker"
networks:
  workbench:
    external: true

Why is this error occurring?

Edit: content of spark-spark-master after running docker network -v workbench

 },
            "0bce286b736b1368738aa7504e1219dd12d855f7d79fc8f17d6a04b98ebe0ec1": {
                "Name": "spark_spark-master.1.2tdsfzhbyl1wr8omjh518lhwg",
                "EndpointID": "ae7e503e911039c3f02469a968536035b88c0213bcf09a1def49eaf7853b9085",
                "MacAddress": "02:42:0a:00:01:07",
                "IPv4Address": "10.0.1.7/24",
                "IPv6Address": ""
            },

...

"spark_spark-master": {
                "VIP": "10.0.1.70",
                "Ports": [],
                "LocalLBIndex": 257,
                "Tasks": [
                    {
                        "Name": "spark_spark-master.1.2tdsfzhbyl1wr8omjh518lhwg",
                        "EndpointID": "ae7e503e911039c3f02469a968536035b88c0213bcf09a1def49eaf7853b9085",
                        "EndpointIP": "10.0.1.7",
                        "Info": {
                            "Host IP": "ip"
                        }
                    }
                ]
            },
11
  • Can you share some details about the workbench network? What is the output of docker network inspect -v workbench?
    – werner
    Commented Dec 18, 2020 at 13:31
  • What are you using for ip? Should that be spark_spark-master or some other hostname instead of an IP?
    – Alex Watt
    Commented Dec 19, 2020 at 6:22
  • @AlexWatt yes I am using spark-spark-master
    – Snow
    Commented Dec 19, 2020 at 18:12
  • @werner I can output the content of spark master alone, since the whole file is pretty big
    – Snow
    Commented Dec 19, 2020 at 18:16
  • How do you create the external network workbench? If I create it with docker network create workbench and then run your docker-compose.yml, everything works for me. It looks like workbench uses the IP range 10.0.1.X. Are the ip routes set correctly for this network (ip route)?
    – werner
    Commented Dec 20, 2020 at 14:40

0

Browse other questions tagged or ask your own question.