Reading JSON data with Apache Spark
In this recipe, we will learn how to ingest and load JSON data with Apache Spark. Finally, we will cover some common tasks in data engineering with JSON data.
How to do it...
- Import libraries: Import the required libraries and create a
SparkSessionobject:from pyspark.sql import SparkSession
from pyspark.sql.functions import *
spark = (SparkSession.builder
    .appName("read-json-data")    .master("spark://spark-master:7077")    .config("spark.executor.memory", "512m")    .getOrCreate())
spark.sparkContext.setLogLevel("ERROR") - Load the JSON data into a Spark DataFrame: The
readmethod of theSparkSessionobject can be used to load JSON data from a file or a directory. ThemultiLineoption is set totrueto parse records that span multiple lines. We need to pass the path to the JSON file as a parameter:df = ...