In this chapter, we have gone through the concept of creating an RDD, to manipulating data within the RDD. We've looked at the transformations and actions available to an RDD, and walked you through various code examples to explain the differences between transformations and actions. Finally, we moved on to the advanced topics of PairRDD, where we demonstrated the creation of a Pair RDD along with some advanced transformations on the RDD.
We are now ready to explain the ETL process and the types of external storage systems that Spark can read/write data from including external filesystems, Apache Hadoop HDFS, Apache Hive, Amazon S3, and so on. We'll also look at some of the connectors to the most popular databases and how to optimally load data from storage systems, and store it back.
However, before moving on to the next chapter, have a break as you definitely deserve it!