Applying basic transformations to data with Apache Spark
In this recipe, we will discuss the basics of Apache Spark. We will use Python as our primary programming language and the PySpark API to perform basic transformations on a dataset of Nobel Prize winners.
How to do it...
- Import the libraries: Import the required libraries and create a
SparkSessionobject:from pyspark.sql import SparkSession
from pyspark.sql.functions import transform, col, concat, lit
spark = (SparkSession.builder
    .appName("basic-transformations")    .master("spark://spark-master:7077")    .config("spark.executor.memory", "512m")    .getOrCreate())
spark.sparkContext.setLogLevel("ERROR") - Read file: Read the
nobel_prizes.jsonfile using thereadmethod ofSparkSession:df = (spark.read.format("json")Â Â Â Â .option("multiLine", "...