10

I couldn't figure out what is the difference between Spark driver and application master. Basically the responsibilities in running an application, who does what?

In client mode, client machine has the driver and app master runs in one of the cluster nodes. In cluster mode, client doesn't have any, driver and app master runs in same node (one of the cluster nodes).

What exactly are the operations that driver do and app master do?

References:

2 Answers 2

10

As per the spark documentation

Spark Driver :

The Driver(aka driver program) is responsible for converting a user application to smaller execution units called tasks and then schedules them to run with a cluster manager on executors. The driver is also responsible for executing the Spark application and returning the status/results to the user.

Spark Driver contains various components – DAGScheduler, TaskScheduler, BackendScheduler and BlockManager. They are responsible for the translation of user code into actual Spark jobs executed on the cluster.

Where in Application Master is

The Application Master is responsible for the execution of a single application. It asks for containers from the Resource Scheduler (Resource Manager) and executes specific programs on the obtained containers. Application Master is just a broker that negotiates resources with the Resource Manager and then after getting some container it make sure to launch tasks(which are picked from scheduler queue) on containers.

In a nutshell Driver program will translate your custom logic into stages, job and task.. and your application master will make sure to get enough resources from RM And also make sure to check the status of your tasks running in a container.

as it is already said in your provided references the only different between client and cluster mode is

In client, mode driver will run on the machine where we have executed/run spark application/job and AM runs in one of the cluster nodes.

(AND)

In cluster mode driver run inside application master, it means the application has much more responsibility.

References :

https://luminousmen.com/post/spark-anatomy-of-spark-application#:~:text=The%20Driver(aka%20driver%20program,status%2Fresults%20to%20the%20user.

https://www.edureka.co/community/1043/difference-between-application-master-application-manager#:~:text=The%20Application%20Master%20is%20responsible,class)%20on%20the%20obtained%20containers.

4
  • so in the cluster mode, driver and AM runs in same node or different nodes (both being executor nodes in the cluster)? If same node, any significance for that (doesn't it increase load on a single container)? Since client mode supports them to run in seperate node (client - driver, executor node - AM)
    – newbie
    Commented Sep 16, 2020 at 10:58
  • 1
    i have updated the answer. and in cluster mode driver run inside application master ....doesn't it increase load on a single container ? no not at all they are very light weight thread and much of the harder tasks are done in executors. Commented Sep 16, 2020 at 11:06
  • 1
    if you have any stage which involve collection of data to master container like using collect() or collectAsList() then it has impact and you need to tune and increase driver executor memory .. otherwise most of the time it is not a problem at all. Commented Sep 16, 2020 at 11:08
  • for cluster mode, for port 8080. spark UI shows, running application and running driver, how they are related to each other, I tried to kills application but driver was not killed .
    – Monu
    Commented Sep 14, 2022 at 14:11
0

in cluster mode, the Application Master is created first in a container on a node within the YARN cluster. The driver is launched separately on the container on a node within the YARN cluster and communicates with the Application Master to coordinate the execution of the Spark application. Both the Application Master and the driver interact with the YARN ResourceManager for resource allocation and management within the cluster.

Not the answer you're looking for? Browse other questions tagged or ask your own question.