Reader small image

You're reading from  Data Engineering with Python

Product typeBook
Published inOct 2020
Reading LevelBeginner
PublisherPackt
ISBN-139781839214189
Edition1st Edition
Languages
Right arrow
Author (1)
Paul Crickard
Paul Crickard
author image
Paul Crickard

Paul Crickard authored a book on the Leaflet JavaScript module. He has been programming for over 15 years and has focused on GIS and geospatial programming for 7 years. He spent 3 years working as a planner at an architecture firm, where he combined GIS with Building Information Modeling (BIM) and CAD. Currently, he is the CIO at the 2nd Judicial District Attorney's Office in New Mexico.
Read more about Paul Crickard

Right arrow

Processing data with PySpark

Before processing data with PySpark, let's run one of the samples to show how Spark works. Then, we will skip the boilerplate in later examples and focus on data processing. The Jupyter notebook for the Pi Estimation example from the Spark website at http://spark.apache.org/examples.html is shown in the following screenshot:

Figure 14.6 – The Pi Estimation example in a Jupyter notebook

The example from the website will not run without some modifications. In the following points, I will walk through the cells:

  1. The first cell imports findspark and runs the init() method. This was explained in the preceding section as the preferred method to include PySpark in Jupyter notebooks. The code is as follows:
    import findspark
    findspark.init()
  2. The next cell imports the pyspark library and SparkSession. It then creates the session by passing the head node of the Spark cluster. You can get the URL from the Spark web UI...
lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Data Engineering with Python
Published in: Oct 2020Publisher: PacktISBN-13: 9781839214189

Author (1)

author image
Paul Crickard

Paul Crickard authored a book on the Leaflet JavaScript module. He has been programming for over 15 years and has focused on GIS and geospatial programming for 7 years. He spent 3 years working as a planner at an architecture firm, where he combined GIS with Building Information Modeling (BIM) and CAD. Currently, he is the CIO at the 2nd Judicial District Attorney's Office in New Mexico.
Read more about Paul Crickard