There are different modes in which we can execute a spark program. This is can be done while running the Spark-submit command to Yarn in the Hadoop cluster
Mode 1 – Local
In this mode, both the driver and executor program will run in the same machine. Whatever logs that are added to both the driver and executor will be shown in the same console.
spark-submit --master local[*] \ --conf spark.dynamicAllocation.enabled=true \ --conf spark.dynamicAllocation.minExecutors=1 \ --conf spark.dynamicAllocation.maxExecutors=30 \ --conf spark.dynamicAllocation.initialExecutors=1 \ --jars JARS_PATH \ sparkrunner.py <program arguments>
Here * in local[*] represents the maximum number of cores.
Mode 2 – Yarn Client
In this mode, the driver will be running in the machine where the spark program is triggered. The executors will be running in different machines in the cluster.
spark-submit --master yarn --deploy-mode client --driver-memory 5G --executor-memory 10G --executor-cores 2 \ --conf spark.dynamicAllocation.enabled=true \ --conf spark.dynamicAllocation.minExecutors=1 \ --conf spark.dynamicAllocation.maxExecutors=30 \ --conf spark.dynamicAllocation.initialExecutors=1 \ --jars JARS_PATH \ sparkrunner.py <program arguments>
Mode 2 – Yarn Cluster
In this mode, we do not have any control over where the driver and executor programs will be running. It is like fire and forget. The program will run the driver portion in any of the cluster machines and the executor portion in some other machines.
spark-submit --master yarn --deploy-mode cluster --driver-memory 5G --executor-memory 10G --executor-cores 2 \ --conf spark.dynamicAllocation.enabled=true \ --conf spark.dynamicAllocation.minExecutors=1 \ --conf spark.dynamicAllocation.maxExecutors=30 \ --conf spark.dynamicAllocation.initialExecutors=1 \ --jars JARS_PATH \ sparkrunner.py <program arguments>