Now that you have seen the fundamental underpinnings of Spark, let's take a broader look at the architecture, context, and ecosystem in which Spark operates. This is a catch-all chapter that captures a divergent set of essential topics that will help you get a broader understanding of Spark as a whole. Once you go through this, you will understand who is using Spark and how and where it is being used. This chapter will cover the following topics:
The Datasets accompanying this book and the IDEs for data wrangling
A quick description of a data scientist's expectation from Spark
The Data Lake architecture and the position of Spark
The evolution and progression of Spark Architecture to 2.0
The Parquet data storage mechanism
So with good fundamental knowledge of the Spark framework, let's start focusing on these three topics: data scientist DevOps, data wrangling, and of course, the mechanisms in Apache Spark, including DataFrames, machine learning, and working with big...