We explored the Parquet format in Chapter 7, Spark 2.0 Concepts. To recap, Parquet is essentially an interoperable storage format. Its main goals are space efficiency and query efficiency. Parquet's origin is based on Google's Dremel and was developed by Twitter and Cloudera. It is now an Apache incubator project. The nested storage format from Google Dremel is implemented in Parquet. It stores data in a columnar format and has an evolvable schema. This enables you to optimize queries (it can restrict the columns that you need to access and so you need not bring all the columns into the memory and discard the ones not needed), and it allows storage optimization (by decoding at the column level, which gives a much higher compression ratio). Another interesting feature is that Parquet can store nested Datasets. This feature can be leveraged in curated data lakes to store subject-based data. In addition to the ability to restrict column...
- Tech Categories
- Best Sellers
- New Releases
- Books
- Videos
- Audiobooks
Tech Categories Popular Audiobooks
- Articles
- Newsletters
- Free Learning
You're reading from Fast Data Processing with Spark 2 - Third Edition
Holden Karau is a software development engineer and is active in the open source. She has worked on a variety of search, classification, and distributed systems problems at IBM, Alpine, Databricks, Google, Foursquare, and Amazon. She graduated from the University of Waterloo with a bachelor's of mathematics degree in computer science. Other than software, she enjoys playing with fire and hula hoops, and welding.
Read more about Holden Karau
Unlock this book and the full library FREE for 7 days
Author (1)
Holden Karau is a software development engineer and is active in the open source. She has worked on a variety of search, classification, and distributed systems problems at IBM, Alpine, Databricks, Google, Foursquare, and Amazon. She graduated from the University of Waterloo with a bachelor's of mathematics degree in computer science. Other than software, she enjoys playing with fire and hula hoops, and welding.
Read more about Holden Karau