Spark job configuration and submission
When a Spark job is launched, it creates a SparkConf
object and passes it to the constructor of SparkContext
. The SparkConf()
object contains a near exhaustive list of customizable parameters that can tune a Spark job as per cluster resources. The SparkConf
object becomes immutable once it is passed to invoke a SparkContext()
constructor, hence it becomes important to not only identify, but also modify all the SparkConf
parameters before creating a SparkContext
object.
There are different ways in which Spark job can be configured.
Spark's conf
directory provides the default configurations to execute a Spark job. The SPARK_CONF_DIR
parameter can be used to override the default location of the conf
directory, which usually is SPARK_HOME/conf
and some of the configuration files that are expected in this folder are spark-defaults.conf
, spark-env.sh
, and log4j.properties
. Log4j
is used by Spark for logging mechanism and can be configured by modifying the log4j...