Reader small image

You're reading from  Data Engineering with AWS - Second Edition

Product typeBook
Published inOct 2023
PublisherPackt
ISBN-139781804614426
Edition2nd Edition
Right arrow
Author (1)
Gareth Eagar
Gareth Eagar
author image
Gareth Eagar

Gareth Eagar has over 25 years of experience in the IT industry, starting in South Africa, working in the United Kingdom for a while, and now based in the USA. Having worked at AWS since 2017, Gareth has broad experience with a variety of AWS services, and deep expertise around building data platforms on AWS. While Gareth currently works as a Solutions Architect, he has also worked in AWS Professional Services, helping architect and implement data platforms for global customers. Gareth frequently speaks on data related topics.
Read more about Gareth Eagar

Right arrow

Identifying data transformations and optimizations

In a typical data analytics project, we ingest data from multiple data sources and then perform transforms on those datasets to optimize them for the required analytics.

In Chapter 7, Transforming Data to Optimize for Analytics, we will do a deeper dive into typical transformations and optimizations, but we will provide a high-level overview of the most common transformations here.

File format optimizations

CSV, XML, JSON, and other types of plaintext files are commonly used to store structured and semi-structured data. These file formats are useful when manually exploring data, but there are much better, binary-based file formats to use for computer-based analytics. A common binary format that is optimized for read-heavy analytics (such as by compressing data and adding in useful metadata to optimize data reads) is the Apache Parquet format. A common transformation is to convert plaintext files into an optimized format...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Data Engineering with AWS - Second Edition
Published in: Oct 2023Publisher: PacktISBN-13: 9781804614426

Author (1)

author image
Gareth Eagar

Gareth Eagar has over 25 years of experience in the IT industry, starting in South Africa, working in the United Kingdom for a while, and now based in the USA. Having worked at AWS since 2017, Gareth has broad experience with a variety of AWS services, and deep expertise around building data platforms on AWS. While Gareth currently works as a Solutions Architect, he has also worked in AWS Professional Services, helping architect and implement data platforms for global customers. Gareth frequently speaks on data related topics.
Read more about Gareth Eagar