Applying basic transformations to data with Apache Spark
In this recipe, we will discuss the basics of Apache Spark. We will use Python as our primary programming language and the PySpark API to perform basic transformations on a dataset of Nobel Prize winners.
How to do it...
- Import the libraries: Import the required libraries and create a SparkSessionobject:from pyspark.sql import SparkSession from pyspark.sql.functions import transform, col, concat, lit spark = (SparkSession.builder     .appName("basic-transformations")    .master("spark://spark-master:7077")    .config("spark.executor.memory", "512m")    .getOrCreate()) spark.sparkContext.setLogLevel("ERROR")
- Read file: Read the nobel_prizes.jsonfile using thereadmethod ofSparkSession:df = (spark.read.format("json")Â Â Â Â .option("multiLine", "...
 
                                             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
             
     
         
                 
                 
                 
                 
                 
                 
                 
                 
                